I've been lightly banging the drum the last few years that a lot of programmers don't seem to understand how fast computers are, and often ship code that is just miserably slower than it needs to be, like the code in this article, because they simply don't realize that their code ought to be much, much faster. There's still a lot of very early-2000s ideas of how fast computers are floating around. I've wondered how much of it is the still-extensive use of dynamic scripting languages and programmers not understanding just how much performance you can throw away how quickly with those things. It isn't even just the slowdown you get just from using one at all; it's really easy to pile on several layers of indirection without really noticing it. And in the end, the code seems to run "fast enough" and nobody involved really notices that what is running in 750ms really ought to run in something more like 200us.
I have a hard time using (pure) Python anymore for any task that speed is even remotely a consideration for anymore. Not only is it slow even at the best of times, but so many of its features beg you to slow down even more without thinking about it.
I agree 100%. I wish every software engineer would spent at least a little time writing some programs in bare C and running them to get a feel for how fast a native executable can start up and run. It is breathtaking if you're used to running scripting languages and VMs.
Related anecdote: My blog used to be written using Jekyll with Pygments for syntax highlighting. As the number of posts increased, it got closer and closer. Eventually, it took about 20 seconds to refresh a simple text change in a single blog post.
I eventually decided to just write my own damn blog engine completely from scratch in Dart. Wrote my own template language, build graph, and syntax highlighter. By having a smart build system that knew which pages actually needed to be regenerated based on what data actually changed, I hoped to get very fast incremental rebuilds in the common case where only text inside a single post had changed.
Before I got the incremental rebuild system working, I worked on getting it to just to a full build of the entire blog: every post page, pages, for each tag, date archives, and RSS support. I diffed it against the old blog to ensure it produced the same output.
Once I got that working... I realized I didn't even need to implement incremental rebuilds. It could build the entire blog and every single post from scratch in less than a second.
I don't know how people tolerate slow frameworks and build systems.
Yeah, I've written static site generators in Go and Rust among other languages (it's my goto project for learning a new language). Neither needed incremental builds because they build instantly. The bottlenecks are I/O.
I've also worked in Python shops for the entirety of my career. There are a lot of Python programmers who don't have experience with and thus can't quite believe how much faster many other languages are (100X-1000X sounds fast in the abstract, but it's really, really fast). I've seen engineering months spent trying to get a CPU-bound endpoint to finish reliably in under 60s (yes, we tried all of the "rewrite the hot path in X" things), while a naive Go implementation completed in hundreds of milliseconds.
Starting a project in Python is a great way to paint yourself into a corner (unless you have 100% certainty that Python [and "rewrite hot path in X"] can handle every performance requirement your project will ever have). Yeah, 3.11 is going to get a bit faster, but other languages are 100-1000X faster--too little, too late.
Python is slow in many things like pure looping and arithmetic, even though there are workarounds to make that 1-10x slower rather than 100-1000X (eg. C-based implementations, including all the itertools stuff).
I am sometimes frustrated that I can't just loop over a string character by character and not get crappy performance, but the "problem" you (and me) are seeing in existing codebases is that Python is very inviting to beginners, and they are not frustrated with this because they don't know it :)
But as you note, bottleneck is the I/O, and a program waiting for I/O in Python and I/O in C will wait the same time after the computation is done.
If you are writing software that can parallelize well independently (eg. web apps) and your memory pressure is not the most important thing, you simply run multiple Python processes to max out the CPU (this avoids the GIL unlike async Python). And you keep your dependencies low.
> even though there are workarounds to make that 1-10x slower rather than 100-1000X (eg. C-based implementations, including all the itertools stuff).
These only apply for specific problems, and very few applications are purely CSV parsing or purely matrix math operations. In the real world, you often spend more time marshaling your Python data to C than you save by doing your computation in C.
> But as you note, bottleneck is the I/O, and a program waiting for I/O in Python and I/O in C will wait the same time after the computation is done.
The bottleneck in a static site generator is I/O. The fact that Python, Ruby, etc based implementations take tens of seconds or more while Go and Rust finish instantly for an I/O bound problem is pretty damning.
> If you are writing software that can parallelize well independently (eg. web apps) and your memory pressure is not the most important thing, you simply run multiple Python processes to max out the CPU (this avoids the GIL unlike async Python).
The goal isn’t to saturate the CPU as much as it is to complete requests in a timely fashion. If it’s just some light translation between HTTP and database layers, Python is fine, but if you have to do anything computationally significant at all, it can range from “a huge pain” to “virtually impossible”. I gave the example earlier of a web service that was struggling to complete requests in even 60s (despite using Numpy under the hood where possible) while a naive Go implementation completed in hundreds of ms.
> The bottleneck in a static site generator is I/O. The fact that Python, Ruby, etc based implementations take tens of seconds or more while Go and Rust finish instantly for an I/O bound problem is pretty damning.
My point was that if this was the case, your Python code is probably suboptimal.
Sure, you are comparing against naive implementation as well, but if performance is a concern, don't do naive Python :)
> I gave the example earlier of a web service that was struggling to complete requests in even 60s (despite using Numpy under the hood where possible) while a naive Go implementation completed in hundreds of ms.
Yes, it's easy and sometimes even idiomatic to write non-performant Python code. Getting the most out of pure Python is hard and it means avoiding some common patterns.
Eg. simply using sqlalchemy ORM (to construct rich dynamic ORM objects) instead of sqlalchemy core (tuples) to get 100k+ rows from DB is 20x slower, and that's still 2x slower from pure psycopg (also tuples using basic types). There are plenty of examples like this in Python, unfortunately.
> My point was that if this was the case, your Python code is probably suboptimal.
Sure, you are comparing against naive implementation as well, but if performance is a concern, don't do naive Python :)
I agree, and I'll go further: if performance could be a concern and you aren't certain that even optimized Python is up for the task, don't do Python. :)
I don't know how optimized these SSGs are, but given how frequently this complaint occurs and how popular they are, I would expect that someone would have tried to optimize them a bit. Even assuming naive implementations, tens of seconds versus tens of milliseconds for an I/O-bound task is pretty concerning.
> Yes, it's easy and sometimes even idiomatic to write non-performant Python code. Getting the most out of pure Python is hard and it means avoiding some common patterns.
It probably shouldn't be easy for someone to write non-performant Python code when they're trying desperately to write performant Python code. :)
> Getting the most out of pure Python is hard and it means avoiding some common patterns.
And even then, you're probably going to be coming in 10-100X slower than naive Go/Java/C#/etc unless your application happens to be a good candidate for C-extensions (e.g., matrix math) or if it really is I/O bound (a CRUD webapp). It honestly just seems better to avoid Python altogether than try to write Python without using "common patterns" (especially absent guidance about which patterns to avoid or how to avoid them).
> I agree 100%. I wish every software engineer would spent at least a little time writing some programs in bare C and running them to get a feel for how fast a native executable can start up and run. It is breathtaking if you're used to running scripting languages and VMs.
Conversely when 99.9% of the software you use in your daily life is blazing fast C / C++, having to do anything in other stacks is a complete exercise in frustration, it feels like going back a few decades in time
Conversely when 99.9% of the software you use in your daily life is user friendly Python, having to do anything in C/C++ is a complete exercise in frustration, it feels like going back a few decades in time
As a person who uses both languages for various needs, I disagree. Things which takes minutes in optimized C++ will probably take days in Python, even if I use the "accelerated" libraries for matrix operations and other math I implement in C++.
Lastly, people think C++ is not user friendly. No, it certainly is. It needs being careful, yes, but a lot of things can be done in less lines then people expect.
I was a C++ dev in a past life and I have no particular fondness for Python (having used it for a couple of decades), and "friendliness" is a lot more than code golf. It's also "being able to understand all of the features you encounter and their interactions" as well as "sane, standard build tooling" and "good debugability" and many other things that C++ lacks (unless something has changed recently).
I delved into Python recently to work on some data science hobbies and a Chess program and it's frankly been fairly shit compared with other languages I use.
Typescript (by way of comparison with other non-low-level languages) just feels far more solid wrt type system, type safety, tooling etc. C# (which I've used for years) is faster by orders of magnitude and IMO safer/easier to maintain.
Python is a powerful yet beginner friendly language with a very gentle learning slope, but I would still take C++ tooling and debuggability any day over Python.
Nah man, I've spent way too much time trying to piece together libraries to turn core dumps into a useful stack trace. Similarly, as miserable as Python package management is, at least it has a package manager that works with virtually every project in the ecosystem. I actually really like writing C++, but there are certain obstacles that slow a developer down tremendously--I could forgive them if they were interesting obstacles (e.g., I can at least amuse myself pacifying Rust's borrow checker), but there's no joy in trying to cobble together a build system with CMake/etc or try to get debug information for a segfault.
You need to provide all of the libraries referenced by the core dump (at the specific versions and compiled with debug symbols) to get gdb to produce a useful backtrace. It's been a decade since I've done professional C++ development, so I'm a bit foggy on the particulars.
Glad to hear the 2022 C++ ecosystem is finally catching up on some regards, but how does it know which version of those dependencies to download, and how does it download closed source symbols?
Java and Go were both responses to how terrible C++ actually is. While there are footguns in python, java, and go, there are exponentially more in C++.
As a person who wrote Java and loved it (and I still love it), I understand where you're coming from, however all programming languages thrive in certain circumstances.
I'm no hater of any programming language, but a strong proponent of using the right one for the job at hand. I write a lot of Python these days, because I neither need the speed, nor have the time to write a small utility which will help a user with C++. Similarly, I'd rather use Java if I'm going to talk with bigger DBs, do CRUD, or develop bigger software which is going to be used in an enterprise or similar setting.
However, if I'm writing high performance software, I'll reach for C++ for the sheer speed and flexibility, despite all the possible foot guns and other not-so-enjoyable parts, because I can verify the absence of most foot-guns, and more importantly, it gets the job done the way it should be done.
I've seen a lot of bad C++ in my life, and have seen Java people write C++ like they would Java.
Writing good C++ is hard. People who think they can write good C++ are surprised to learn about certain footguns (static initialization before main, exception handling during destructors, etc).
I found this reference which I thought was a pretty good take on the C++ learning curve.
> I've seen a lot of bad C++ in my life, and have seen Java people write C++ like they would Java.
Ah, don't remind me Java people write C++ like they write Java, I've seen my fair share, thank you.
> Writing good C++ is hard.
I concur, however writing good Java is also hard. e.g. Swing has a fixed and correct initialization/build sequence, and Java self-corrects if you diverge, but you get a noticeable performance hit. Most developers miss the signs and don't fix these innocent looking mistakes.
I've learnt C++ first and Java later. I also tend to hit myself pretty hard during testing (incl. Valgrind memory sanity and Cachegrind hotpath checks), so I don't claim I write impeccable C++. Instead I assume I'm worse than average and try to find what's wrong vigorously and fix them ruthlessly.
The remark is rooted from variable naming and code organization mostly. I've seen a C++ codebase transferred to a java developer, and he disregarded everything from the old codebase. Didn't refactor the old code, and the new additions were done Java Style. CamelCase file/variable/function names, every class on its own file with ClassName.cpp files littered everywhere, it was a mess.
The code was math-heavy, and became completely unreadable and un-followable. He remarked "I'm a java developer, I do what I do, and as long as it works, I don't care".
That was really bad. It was a serious piece of code, in production.
The biggest weakness of C++ (and C) is non-localized behavior of bugs due to undefined behavior. Once you have undefined behavior, you can no longer reason about your program in a logically consistent way. A language like Python or Java has no undefined behavior so for example if you have an integer overflow, you can debug knowing that only data touched by that integer overflow is affected by the bug whereas in C++ your entire program is now potentially meaningless.
Memory write errors (some times induced by UB) in one place of the program can easily propagate and later fail in a very different location of the program, with absolutely zero diagnostics of why your variable suddenly had a value out of possible range.
This is why valgrind, asan and friends exist. They move the error diagnostic to the place where error actually happened.
If your C++ program exhibit undefined behaviour, the compiler is allowed to format your entire hard drive. Or encrypt it and display a "plz pay BTC" message. That's called a vulnerability. Real and meaningful security checks have been removed as "dead code" because of signed integer overflow (which is undefined behaviour by default).
If anything, I would guess the gross misunderstanding sprouted somewhere between the specs and the compiler writers. Originally, UB was mostly about bailing out when the underlying platform couldn't handle this particular case, or explicitly ignoring edge cases to simplify implementations. Now however it's also a performance thing, and if anything is marked as UB then it's fair game for the optimiser — even if it could easily be well defined, like signed integer overflow on 2's complement platforms.
> If your C++ program exhibit undefined behaviour, the compiler is allowed to format your entire hard drive. Or encrypt it and display a "plz pay BTC" message.
No, it isn't. That's a completely made up fabrication. And if you had a compiler that was going to do that, then what the standard says or if there's undefined behavior is obviously not relevant or significant in the slightest.
The majority of the UB optimization complaints are because the compiler couldn't tell that UB was happening. It didn't detect UB and then make an evil laugh and go insane. That's not how this works.
Compilers cannot detect UB and then do things in response within the rules of the standard. Rather, they are allowed to assume UB doesn't happen. That's it, that's all they do. They just behave as though your source has no UB at all. As far as the compiler is concerned, UB doesn't exist and can't happen.
When a compiler can detect that UB is happening it'll issue a warning. It never silently exploits it.
> Real and meaningful security checks have been removed as "dead code" because of signed integer overflow (which is undefined behaviour by default).
Real and meaningful security checks have been removed because the security check happened after the values were already used in specific ways, not because of UB. The values were already specified in the source code to be a particular thing via earlier usage. UB is just the shield for developers who wrote a bug to hide behind to avoid admitting they had a bug.
Use UBSAN next time.
> even if it could easily be well defined, like signed integer overflow on 2's complement platforms.
Signed integer overflow is defined behavior, that's not UB. Also platform specific behavior is something the standard doesn't define - that's why it was UB in the first place.
It is kinda ridiculous it took until C++20 for this change, though
> > UB allows the to format/encrypt your entire hard drive.
> No, it isn't. That's a completely made up fabrication.
Ever heard of viruses exploiting buffer overflows to make arbitrary code execution? One cause of that can be a clever optimisation that noticed that the only way the check fails is when some UB is happening. Since UB "never happens", the check is dead code and can be removed. And if the compiler noticed after it got past error reporting, you may not even get a warning.
You still get the vulnerability, though.
> UB is just the shield for developers who wrote a bug to hide behind to avoid admitting they had a bug.
C is what it is, and we live with it. Still, it would be unreasonable to say that the amount of UB it harbours isn't absolutely ludicrous. It's like asking children to cross a poorly mapped minefield and blame them when they don't notice a subtle cue and blow themselves up.
Also, UBSan is not enough. I ran some of my code unde ASan, MSan, and UBSan, and the TIS interpreter still found a couple things. And I'm talking about pathologically straight-line code where once you test for all input sizes you have 100% code path coverage.
> Signed integer overflow is defined behavior, that's not UB.
The C99 standard explicitly states that left shift is undefined on negative integers, as well as signed integers when the result overflows. I had to get around that one personally by replacing x<<n by x(1<<n) on carry propagation code.
> Also platform specific behavior is something the standard doesn't define - that's why it was UB in the first place.*
One point I was making is, compiler writers didn't get that memo. They treat any UB as fair game for their optimisers. It doesn't matter that signed integer overflow was UB because of portability, it still "never happens".
> C is what it is, and we live with it. Still, it would be unreasonable to say that the amount of UB it harbours isn't absolutely ludicrous.
There's a lot of ludicrous stuff about C and I wouldn't recommend anyone use it for anything. Not when Rust and C++ exist.
But UB really isn't the scary boogie man. There could probably stand to be a `as-is {}` block extension for security checks, but that's really about it.
Granted, C is underpowered and I would like namespaces and generics. But from a safety standpoint nowadays, C++ is just as bad. Not only is is monstrously complex, it still has all the pitfalls of C. C++ may have been "more strongly typed" back in the day, but now compiler warnings made up for that small difference.
Granted, C++ can be noticeably safer if you go RAII pointer fest, but then you're essentially programming in Java with better code generation and a worse garbage collector.
---
There's also a reason to still write C today: its ubiquity. Makes it easier to deploy everywhere and to talk to other languages. It's mostly a library thing though, and the price in testing effort and bugs is steep.
Well, I'll check who gets rid of all undefined overflows first. 2's complement is nice and dandy, but if overflow is still undefined that doesn't buy me much.
I've written a whole bunch of all of those languages, and they each occupy a different order of magnitude of footguns. From fewest to most: Go (1X), Java (10X), Python (100X), and C++ (1000X).
Most of those aren’t “footguns” at all, but rather preferences (naming conventions, nominal vs structural subtyping) and many others are shared with Python (“magical behavior”, Go’s structural subtyping is strictly better for finding implementations than Python’s duck typing) or non-issues altogether (“the Go compiler won’t accept my invalid Go code”).
The “forget to check an error” one is valid, but rare (usually a function will return data and an error, and you can’t touch the data without handling the error)—moreover, once you use Go for a bit, you sort of expect errors by default (most things error). But yeah, a compilation failure would be better. Personally, the things that really chafe me are remembering to initialize maps, which is a rarer problem in Python because there’s no distinction between allocation and instantiating (at least not in practice). I do wish Go would ditch zero types and adopt sum types (use Option[T] where you need a nil-like type), but that ship has sailed.
I’ve operated services in both languages, and Python services would have tons of errors that Go wouldn’t have, including typos in identifiers, missing “await”s, “NoneType has no attribute ‘foo’”, etc but also considerably more serious issues like an async function accidentally making a sync call under the covers, blocking the event loop, causing health checks to fail, and ultimately bringing down the entire service (same deal with CPU intensive endpoints).
In Go, we would see the occasional nil pointer error, but again, Python has those too.
I personally find C++ more friendly, just because of the formatting that python forces upon you.
But I do have to say that I never managed to really get into python, it always just felt like to much of a hassle, thus I always avoided it if possible.
The formatting python enforces is just "layout reflects control flow". It's really not any more difficult than that, and it's a lot better than allowing layout to lie about control flow.
To each their own, but Python's use of indenting for structure is why I never tried it. It just felt, to me, like it was solving one problem with another.
I think Go gets this right: it consistently uses braces for structure, but has an idiomatic reformatting tool that is applied automatically by most IDEs. This ensures that the format and indentation always perfectly matches the code structure, without needing to use invisible characters.
I didn't like it for years but then I kind of got into it for testing out machine learning and I found it kind of neat. My biggest gripe is no longer the syntax but the slowness, trying to do anything with even a soft performance requirement means having to figure out how to use a library that calls C to do it for you. Working with large amounts of data in native Python is noticeably slower than even NodeJS.
> Things which takes minutes in optimized C++ will probably take days in Python, even if I use the "accelerated" libraries for matrix operations and other math
I’m gonna need an example because I do not believe this whatsoever.
I'd rather open the code and show what I'm talking about, however I can not.
Let's say I'm making a lot of numerical calculations which are fed from a lockless queue with atomic operations to any number of cores you want, where your performance is limited by the CPU cores' FPU performance and the memory bandwidth (in terms of both transfer speed and queries that bus can handle per second).
As I noted below, that code can complete 1.7 million complete evaluations per core, per second on older (2014 level) hardware, until your memory controller congests with all the requests. I need to run benchmarks on a newer set of hardware to get new numbers, however I seriously lack the time today to do so and provide you new numbers.
There are definitely operations you cannot speed up in Python as much as in other languages, unless you implement it in one of those other languages and interface it in Python.
That much is obvious from Python providing a bunch of C-based primitives in stdlib (otherwise they'd just be written in pure Python).
In many cases, you can make use of the existing primitives to get huge improvements even with pure Python, but you are not beating optimized C++ code (which almost has direct access to CPU vector operations as well).
Python's advantage is in speed of development, not in speed of execution. And I say that as a firm believer that majority of the Python code in existence today could be much faster only if written with the understanding of Python's internal structures.
This is because numpy and friends are really good at matmul's.
As soon as you step out of the happy path and need to do any calculation that isn't at least n^2 work for every single python call you are looking at order of magnitude speed differences.
Years ago now (so I'm a bit fuzzy on the details) a friend asked me to help optimize some python code that took a few days to do one job. I got something like a 10x speedup using numpy, I got a further 100x speedup (on the entire program) by porting one small function from optimized numpy to completely naive rust (I'm sure c or c++ would have been similar). The bottleneck was something like generating a bunch of random numbers, where the distribution for each one depended on the previous numbers - which you just couldn't represent nicely in numpy.
What took 2 days now took 2 minutes, eyeballing the profiles I remember thinking you could almost certainly get down to 20 seconds by porting the rest to rust.
Have you tried porting the problem into postgres? Not all big data problems can be solved this way but I was surprised what a postgres database could do with 40 million rows of data.
I didn't, I don't think using a db really makes sense for this problem. The program was simulating a physical process to get two streams of timestamps from simulated single-photon detectors, and then running a somewhat-expensive analysis on the data (primarily a cross correlation).
There's nothing here for a DB to really help with, the data access patterns are both trivial and optimal. IIRC it was also more like a billion rows so I'd have some scaling questions (a big enough instance could certainly handle it, but the hardware actually being used was a cheap laptop).
Even if there was though - I would have been very hesitant to do so. The not-a-fulltime-programmer PhD student whose project this was really needed to be able to understand and modify the code. I was pretty hesitant to even introduce a second programming language.
That's definitely quite curious: I am sure pure Python could have been heavily optimized to reach 2 minutes as well, though. Random number generation in Python is C-based, so while the pseudo-random generators from Python's random module might be slow, it's not because of Python itself (https://docs.python.org/3/library/random.html is a different implementation from https://man7.org/linux/man-pages/man3/random.3.html).
Call overhead and loop overhead is pretty big in Python though. The way to work around that in Python is to use C-based "primitives", like the stuff from itertools and all the builtins for set/list/hash processing (thus avoiding the n^2 case in pure Python). And when memory is an issue (preallocating large data structures can be slow as well), iterators! (Eg. compare use of range() in newer Python with use of list(range())).
I'm reasonably sure the PRNG being used in the python version came from numpy and was implemented in C (or other native code, not python). The problem was that the necessary control flow and varying parameters around it meant you had to call it once per value from python (and you had to generate a lot of values).
And if I recall correctly there was no allocation in the hot loop, with a single large array being initialized via numpy to store the values before hand. Certainly that's one of the first things I would think to fix.
I was strongly convinced at the time that there was no significant improvement left in python. With >99% of the time being spent in this one function, and no way to move the loop into native code given the primitives available from numpy. Admittedly I could have been wrong, and I'm not about to revisit the code now, since it has been years and it is no longer in use - so everything I'm saying is based off of years old memories.
Sure, numpy introduces its own set of restrictions. I was mostly referring to taking a different approach before turning to numpy, but it could very well be true.
In essence, doing what you did is the way to get performance out of Python when nothing else works.
> The problem was that the necessary control flow and varying parameters around it meant you had to call it once per value from python (and you had to generate a lot of values).
The code I've written and still working on is using Eigen, which TensorFlow also uses for its matrix operations, so, I'm not far off from these guys in terms of speed, if not ahead.
The code I've written can complete 1.7 million evaluations per core, per second, on older hardware, which is used to evaluate things up to 1e-6 accuracy, which pretty neat for what I'm working on.
Because it is like saying you use a bash script to configure and launch a c++ application and saying it is a bash script. Python is not a high performance language, it isn't meant to be and it's strengths lie elsewhere. One of it's great strengths is interop with c libs.
Your assertion was that numpy etc will be faster than something else despite being python:
> Try writing a matmul operation in C++ and profile it against the same thing done in Numpy/Pytorch/TensorFlow/Jax. You’ll be surprised.
No. When I write Tensorflow code I write Python. I don’t care what TF does under the hood just like I don’t care that Python itself might be implemented in C. Though I got to say TF is quite ugly and not a good example of Python’s user friendliness. But that’s another topic.
That's a known and widely publicised trait of Python.
In the early days, Python tutorial warned against adding to strings by doing "+" even though it works because that performed a new allocation and string copy.
What you were asked to do was use fast, optimized C-based primitives like "\n".join(list_of_strings) etc.
Basically, Python is an "ergonomic" language built in C. Saying how something is implemented in C at the lower level is pointless, because all of Python is.
Yes, doing loops over large data sets in Python is slow. Which is why it provides itertools (again, C-based functions) in stdlib.
And fortran. Which really doesn't matter that much as long as that doesn't leak to the users of numpy, and it doesn't really. The only issue is that it means if you're doing something that doesn't fit the APIs exposed by the native code (in a way where the hot loops are in native code) it's roughly as slow as normal python.
But it does for the argument of a language being fast, which is what we are talking about here. I don't think it is an appropriate argument to say "Python is fast, look at numpy", when the core pieces are written in C/Fortran. It is disingenuous, at least to me.
I like writing things in python. It honestly feels like cheating at times. Being able to reduce things down to a list comprehension feels like wizardry.
I like having things written in C/C++. Because like every deep magic, there's a cost associated with it.
> and at the end of the day LLVM compiles 30min and uses tens of GBs of RAM on average hardware
I mean, that's the initial build.
Here's my compile-edit-run cycle in https://ossia.io which is nearing 400kloc, with a free example of performance profiling, I haven't found anything like this whenever I had to profile python. It's not LLVM-sized of course, but it's not a small project either, maybe in the medium-low C++ project size: https://streamable.com/o8p22f ; pretty much a couple seconds at most from keystroke to result, for a complete DAW which links against Qt, FFMPEG, LLVM, Boost and a few others. Notice also how my IDE kindly informs me of memory leaks and other funsies.
Here's some additional tooling I'm developing - build times can be made as low as a few dozen milliseconds when one puts some work into making the correct API and using the tools correctly: https://www.youtube.com/watch?v=fMQvsqTDm3k
"10 compilers, IDEs, debuggers, package managers" what are you talking about? (Virtually) No one uses ten different tools to build one application. I don't even know of any C++-specific package managers, although I do know of language-specific package managers for... oh, right, most scripting languages. And an IDE includes a compiler and a debugger, that's what makes it an IDE instead of a text editor.
"and at the end of the day LLVM compiles 30min and uses tens of GBs of RAM on average hardware" sure, if you're compiling something enormous and bloated... I'm not sure why you think that's an argument against debloating?
>No one uses ten different tools to build one application.
I meant you have a lot of choices to make
Instead of having one strong standard which everyone uses, you have X of them which makes changing projects/companies harder, but for solid reason? I don't know.
>"and at the end of the day LLVM compiles 30min and uses tens of GBs of RAM on average hardware" sure, if you're compiling something enormous and bloated... I'm not sure why you think that's an argument against debloating?
I know that lines in repo aren't great way to compare those things, but
.NET Compiler Infrastructure:
20 587 028 lines of code in 17 440 files
LLVM:
45 673 398 lines of code in 116 784 files
The first one I built (restore+build) in 6mins and it used around 6-7GB of RAM
The second I'm not even trying because the last time I tried doing it on Windows it BSODed after using _whole_ ram (16GBs)
Compiling a large number of files on Windows is slow, no matter what language/compiler you use. It seems to be a problem with the program invocation, which takes "forever" on Windows. It's still fast for a human, but it's slow for a computer. Quite apt this comes up here ;-)
Source for claim: That's a problem we actually faced in the Windows CI at my old job. Our test suite invoked about 100k to 150k programs (our program plus a few 3rd party verification programs). In the Linux CI the whole thing ran reasonably fast, but the Windows CI took double as long. I don't recall the exact numbers, but if Windows incurs a 50ms overhead per program call you're looking at 1:20 (one hour twenty minutes) more runtime at 100k invocations.
Also I'm pretty sure I've built LLVM on 16GB memory. Took less than 10 minutes on a i7-2600. The number of files is a trade off: You can combine a bunch of small files into a large file to reduce the build time. You can even write a tool that does that automatically on every compile (and keeps sane debug info). But now incremental builds take longer, because even if you change only one small file, the combined file needs to be rebuild. That's a problem for virtually all compiled languages.
I can only guess, I am neither a LLVM nor a MSVC dev.
1. Compile times: If you have one file with 7000 LOC that and change one function in that file, the rebuild is slower than if you had 7 files with 1000 LOC instead.
2. Maintainability: Instead of putting a lot of code into one file, you put the code in multiple files for better maintainability. IIRC LLVM was FOSS from the beginning, so making it easy for lots of people to make many small contributions is important. I guess .NET was conceived as being internal to MS, so less people overall, but newcomers probably were assigned to a team for onboarding and then contributing to the project as part of that team. With other words: At MS you can call up the person or team responsible for that 10000 LOC monstrosity; but if all you got is a bunch of names with e-mail addresses pulled from the commit log, you might be in for a bad time.
3. Generated code: I don't know if either commit generated code into the repository. That can skew these numbers as well.
4. Header files can be a wild card, as it depends on how their written. Some people/projects just put the signatures in there and not too much details, others put the whole essays as docs for each {class, method, function, global} in there, making them huge.
For the record, by your stats .NET has 1180 LOC per file and LLVM 391 on average. That doesn't say a lot, the median would probably be better, or even a percentile graph. Broken down by type (header/definition vs. implementation). You might find that the distribution is similar and a few large outliers skew it (especially generated code). Or when looking at more, big projects you might find that these two are outliers. I can't say anything definite, and from an engineering perspective I think neither is "suspicious" or even bad.
My gut feeling says 700 would be a number I'd expect for a large project.
I assume the parent was talking about the fragmentation in the ecosystem (fair point, especially regarding package management landscape and build tooling), but it's unclear.
> Is performance inversely proportional to dev experience?
No. I feel there is great developer experience in many high performance languages: Java, C#, Rust, Go, etc.
In fact, for my personal tastes, I find these languages more ergonomic than many popular dynamic languages. Though I will admit that one thing that I find ergonomic is a language that lifts the performance headroom above my head so that I'm not constantly bumping my head on the ceiling.
TCC is a fast compiler. So fast, that at one time, one could use it to boot Linux from source code! But there's a downside: the code is produces is slow. There's no optimization done. None. So the trade off seems to be: compile fast but slow program, or compile slow but fast program.
The trade-off is more of a gradient: e.g. PGO allows an instrumented binary to collect runtime statistics and then use those to optimize hot paths for future build cycles.
I wish product designers took performance into consideration when they designed applications. Engineers can optimize until their fingers fall off, but if the application isn't designed with efficiency in mind (and willing to make trade-offs in order to achieve that), we'll probably just end up right back in the same place.
And a product which is designed inefficiently where the engineer has figured out clever ways to get it to be more performant is most likely a product that is more complicated under the hood than it would be if performance were a design goal in the first place.
Rather than bare C something like C++, Rust, or even Haskell would be better. C isn't the fastest, especially not with normal code. C++ templates get a bad rep, but if you want to go fast they are extremely hard to beat.
Also those languages show you don't actually have to give up modern features or even that much convenience in order to get blazing fast speeds.
At all my recent jobs, I grow frustrated with how slow running a single unit test is locally on a codebase. We are talking 5+ seconds for even the most trivial of trivial unit tests (say, purely functional arithmetic unit test).
And this is even with dynamic languages like Python (you see pytest reporting how your unit test completed in 0.00s, and wall time is 7s).
And then I get grumpy if they don't let me go and fix it because I am the only one who is that annoyed with this :D
How on earth are you getting 5 seconds for simple tests? Simple tests should be running in 8ms, and those are my 2015 numbers that I've been too lazy to update.
Have you worked on a recent idiomatic development setup (dockerised local development, top level imports of everything and plenty of setup at the top level too, people unfamiliar with how to manage .pyc files so they simply disable them...)?
Common libraries like requests or sqlalchemy take 300-500ms to import (eg. try `time python3 -c 'import requests'` and contrast just `time python3 -c ''` which is python startup overhead).
As I said, tests run in sub 10ms, but from issuing pytest to completion it's usually 5-15s.
Ah, I see, so the setup time is very slow. I don't work in python much but I've worked in a few other languages with slow startup, and amortization is your friend. It's hard though when you have a small module with 'only' 300 tests and your test is 6ms of code that works out to 40ms once setup and teardown are included. I haven't had many opportunities to have the "well maybe you should be making bigger modules" conversation but I am ready for that moment to arise.
This is usually the point at which I pull out a 'watch' implementation, since the 5 seconds it's going to take me to switch windows and hit 'up' the right number of times counts too, if we're comparing apples to apples.
That said, one of the last times I had a unit testing mentor, I walked into a project that ran 3800 tests in about 7 seconds, and then started poking around trying to figure out who was materially responsible. (He didn't know much more than me from an implementation standpoint, but boy was he good at selling people on test quality.) If that had been 20 seconds it would have still been lovely, but it wouldn't have grabbed my attention quite as much.
While I'll take a bite at this, I think it's also fair to say how poorly portable C is. Can an mobile or web engineer quickly take some C code and use it in their stack somehow? I would guess not. While it's indeed an important lesson to see the speed of some of these 'close to the metal' languages, the question of how practical they are to use is a different question.
There is a class of C code that can be made extremely portable: pure computations. This allows you to write self contained code with zero dependencies, and if you're willing to give up on SIMD you can stick to fully conforming C99.
It's not applicable for everything, but we do have some niches where it comes in handy: cryptographic libraries (I've written one), parsers and encoders of all kind, compilers…
For instance can a mobile on web engineer quickly take TweetNaCl or Monocypher and use it in their stack? Yes. They may need to write some bindings themselves, but if they can run C code at all it's fairly trivial.
This was about my experience switching from webpack to ESBuild for Javascript. Why do incremental builds if rebuilding the whole thing takes just 2s (as opposed to 90+ with webpack).
They are. They've just chosen to spend all their speed gains on more optimization passes and static analysis, to produce ever faster outputs than to produce an output faster.
Their fundamental model is one translation unit per time, while developers decided that writing all library code in headers is a good idea. Which makes them parse and DCE literally kilometers of mostly irrelevant code again and again. You’re not wrong, but it’s not the complete point. C++ development is slow as a whole, and compilers/standards do nothing to fix that. It’s a kind of F1 engine in a tractor situation.
> while developers decided that writing all library code in headers is a good idea.
It wasn't developers who designed C++'s template model which requires generic code to be fully defined in header files.
Inheriting C's textual include file based "module" system and then bolting compile-time specialized generics is a choice the C++ committee made, not C++ users. It was probably the right choice given C++'s many very difficult constraints, but that's what directly leads to huge compile times, not dumb C++ users.
Please don't write programs in bare C. Use Go if you're looking for something very simple and fast-enough for most uses; it's even memory safe as long as you avoid shared-state concurrency.
Unqualified "fast enough" is pretty much exactly the problem being pointed out. Most developers have no idea what "fast" is let alone "fast enough". If they were taught to benchmark at with a lower level language, see what adding different abstractions causes, that would help a ton.
I would personally suggest C++ though because there is such a huge amount of knowledge around performance and abstraction in that community - wonderful conference talks and blog posts to learn from.
Go comes from a different school of compiler design where the code generation is decent in most cases, but struggles with calculations and more specific patterns. Delphi is a similar compiler. Looking at benchmarks, the performance is only a few times worse than optimized C. That's on par with the most optimized JITed languages like Java, while being overall a much simpler compiler. I feel it is is fair to say 'good enough' in this situation.
It's not an "unqualified" claim, Go really is fast enough compared to the likes of Python and Ruby. I'm not saying that rewriting a Go program in a faster language (C/C++/Rust) can't sometimes be effective, but that's due to special circumstances - it's not something that generalizes to any and all programs.
You've obviously been burned by null pointers (probably not just once). And you think they are a problem, and you're right. And you think they are a mistake, and you could be right about that, too.
But they're not the only problem. Writing async network servers can be a problem, too. Go helps a lot with that problem. If for your situation it helps more with that than it hurts with nulls, then it can be a rational choice.
And, don't assume that go must be a bad choice for all programmers, in all situations. It's not.
Nothing wrong with any of these languages, especially C. It's been around since the early 70s and is not going anywhere. There's a very good reason it (and to an extent C++) is still is the default language for doing a lot of things since everyone understands it.
C and C++ both have excellent library support, perhaps the best interop of any language out there and platform support that cannot be beat.
That said, they're also challenging to use for the "average" (median) developer who'd end up creating code that is error-prone and would probably have memory leaks sooner or later.
Thus, unless you have a good reason (of which, admittedly, there are plenty) to use C or C++, something that holds your hand a bit more might be a reasonable choice for many people out there.
Go is a decent choice, because of a fairly shallow learning curve and not too much complexity, while having good library support and decent platform support.
Rust is a safer choice, but at the expense of needing to spend a non-insignificant amount of time learning the language, even though the compiler is pretty good at being helpful too.
> That said, they're also challenging to use for the "average" (median) developer who'd end up creating code that is error-prone and would probably have memory leaks sooner or later.
Many of the most highly credentialed, veteran C developers have said they can't write secure C code. Food for thought.
> Go is a decent choice, because of a fairly shallow learning curve and not too much complexity, while having good library support and decent platform support. Rust is a safer choice, but at the expense of needing to spend a non-insignificant amount of time learning the language, even though the compiler is pretty good at being helpful too.
Go doesn't have the strongest static guarantees, but it does provide a decent amount of static guarantees while also keeping the iteration cycle to a minimum. Languages like Rust have significantly longer iteration cycles, such that you can very likely ship sooner with Go at similar quality levels (time savings can go into catching bugs, including bugs which Rust's static analysis can't catch, such as race conditions). Moreover, I've had a few experiences where I got so in-the-weeds trying to pacify Rust's borrow-checker that I overlooked relatively straightforward bugs that I almost certainly would've caught in a less-tedious languages--sometimes static analysis can be distracting and in that respect, harm quality (I don't think this a big effect, but it's not something I've seen much discussion about).
There is unsecure code hidden in every project that uses any programming language ;)
I get what you're saying here, you're specifically talking about security vulnerabilities from memory related errors. I honestly wonder how many of these security vulnerabilities are truly issues that never would have come up in a more "secure" language like Java, or if the vulnerabilities would have just surfaced in a different manner.
In other words, we're constantly told C and C++ are unsafe languages they should never be used and blah blah blah. How much of this is because of the fact that C has been around since the 1970s, so its had a lot more time to rack up large apps with security vulnerabilities, whereas most of the new recommended languages to replace C and C++ have been around since the late 90s. In another 20 years will we be saying the same thing about java that people say about C and C++? And will we be telling people to switch to the latest and greatest because Java is "unsafe"? Are these errors due to the language, or is it because we will always have attackers looking for vulnerabilities that will always exist because programmers are fallible and write buggy code?
> In another 20 years will we be saying the same thing about java that people say about C and C++? And will we be telling people to switch to the latest and greatest because Java is "unsafe"?
As long as the vulnerability types that cause trouble in language B are a superset of those that cause trouble in language C, it makes sense to recommend moving from B to C for safety reasons.
This is true even if there is a language A that is even worse and in the absence of language C, we recommended moving from A to B. Code written in A will be worse in expectation than code written in B than code written in C.
> I honestly wonder how many of these security vulnerabilities are truly issues that never would have come up in a more "secure" language like Java, or if the vulnerabilities would have just surfaced in a different manner.
Memory safety vulnerabilities basically boil down to following causes: null pointer dereferences, use-after-free (/dangling stack pointers), uninitialized memory, array out-of-bounds, and type confusion. Now, strictly speaking, in a memory-safe languages, you're guaranteed not to get uncontrollable behavior in any of these cases, but if the result is a thrown exception or panic or similar, your program is still crashing. And I think for your purposes, such a crash isn't meaningfully better than C's well-things-are-going-haywire.
That said, use-after-free and uninitialized memory vulnerabilities are completely impossible in a GC language--you're not going to even get a controlled crash. In a language like Rust or even C++ in some cases, these issues are effectively mitigated to the point where I'm able to trust that it's not the cause of anything I'm seeing. Null-pointer dereferences are not effectively mitigated against in Java, but in Rust (which has nullability as part of the type), it does end up being effectively mitigated. This does leave out-of-bounds and type confusion as two errors that are not effectively mitigated by even safe languages, although they might end up being safer in practice.
It depends on what you mean by mitigated. Java mitigates null pointers by deterministically raising an exception (as well as out of range situations), but indeed it doesn’t handle them at compile time (though the latter can’t even be solved in the general case, and only with dependent types)
> There is unsecure code hidden in every project that uses any programming language ;)
Security isn't a binary :) Two insecure code bases can have different degrees of insecurity.
> I honestly wonder how many of these security vulnerabilities are truly issues that never would have come up in a more "secure" language like Java, or if the vulnerabilities would have just surfaced in a different manner.
I don't know how memory safety vulns could manifest differently in Java or Rust.
> In other words, we're constantly told C and C++ are unsafe languages they should never be used and blah blah blah. How much of this is because of the fact that C has been around since the 1970s, so its had a lot more time to rack up large apps with security vulnerabilities
That doesn't address the veteran C programmers who say they can't reliably write secure C code (that's new code, not 50 year old code).
> Are these errors due to the language, or is it because we will always have attackers looking for vulnerabilities that will always exist because programmers are fallible and write buggy code?
A memory safe language can't have memory safety vulnerabilities (of course, most "memory safe" languages have the ability to opt out of memory safety for certain small sections, and maybe 0.5% of code written in these languages is memory-unsafe, but that's still a whole lot less than the ~100% of C and C++ code).
Of course, there are other classes of errors that Java, Rust, Go, etc can't preclude with much more efficacy than C or C++, but eliminating entire classes of vulnerabilities is a pretty compelling reason to avoid C and C++ for a whole lot of code if one can help it (and increasingly one can help it).
First of all, you’re comparing “most PHP and JS programmers” with veteran C programmers, and secondly most PHP and JS programmers can write code which is secure against memory-based exploits.
It is easier to just pick an existing library and deal with security flaws, than trying to ramp up an ecosystem from scratch, unless one has the backing of a multinational pumping up development.
Yes. For some reason programming culture repeatedly fails to realise that if you want to group languages into two buckets by performance with one being "like C" and the other being "like Python" then all the languages you list (except maybe JS) belong in the "like C" bucket.
I mean, he just explained that after rewriting his program in Dart, it was fast enough? That's not really the point here.
On the other hand, I tried writing a Wren interpreter in Go and it was considerably slower than the C version. Even programming languages that are usually pretty fast aren't always fast, and interpreter inner loops are a weak spot for Go.
> I mean, he just explained that after rewriting his program in Dart, it was fast enough?
Yes, and that makes his C advocacy even less sensible. Dart is a perfectly fine language, even though it seems to be a bit underused compared to others.
I didn't advocate that anyone ship production code written in C.
I advocated that people write programs in C and run them to see how fast executables can startup and run.
(Dart isn't great for that because while its runtime performance is pretty fantastic, it does still take a hit on startup because it's a VM with a fairly large core library and runtime system.)
Spending "a little time writing some programs in C" is not the same as advocating that people write most of their code in C, or that you use it in production.
Maybe try reading Crafting Interpreters, half of which is in Java and half in C.
I upgraded a desktop machine the last time I visited my family. It was a Windows 7 computer that was at least 10 years old with 4GB of ram. They wanted to use it online for basic web browsing, so I thought I'd install Windows 10 for security reasons and drop in a modern SSD to upgrade the old 7200rpm drive to make it more snappy.
Well, it felt slower after the "upgrade". Clicking the start menu and opening something like the Downloads or Documents folder was basically instant before. Now, with Windows 10 and the new SSD there was a noticeable delay when opening and browsing folders.
It really made me wonder how it would be running something like Windows 98 and websites of the past on modern hardware.
I wonder if you'd have any more luck with that hardware putting Ubuntu Mate on it. For basic web browsing, it probably wouldn't matter much to your family whether it's running Windows or Linux.
Problem with Ubuntu is it doesn’t auto update and it’s very hard to get it to do that. Not sure it’s even possible to auto update major releases as well.
Every time I have installed Ubuntu for someone, I have come back years later and it’s still on the same version.
I am not sure about major release upgrades. But if you are on an LTS release, this should cover it for five years. And as much as I dislike snaps, they do auto updates too, so in 22.04 Firefox at least keeps up-to-date too.
Throw in more RAM and Windows 10 will likely feel snappier than Windows 7 did.
It's probable the old Windows 7 install was 32-bit while your fresh install of 10 would have defaulted to 64-bit. That combined with 10's naturally higher memory requirements means the system has less overhead to work with.
recently I've seen new laptops being shipped with 4GB. possibly with a slightly lighter (but not fully debloated) version of 10 (Home? Starter? Edu?)
I'm not sure if this is because Windows memory usage is a lot more efficient now, or if the newer processors' performances can cancel out the RAM capacity bottleneck, or if PC4-25600 + NVMe pagefiles are simply fast enough, or if manufacturers are spreading thinly during the chip shortage. but it's certainly an ongoing trend
Mother I law bought a machine with 4GB of ram, which was fine before windows 10. Now it spends all day doing page/sysfile swap from its mechanical hard drive. Basically unusable.
So here in my pocket is an 8GB stick of DDR3 sodimm for later.
32bit PAE was supported since Windows XP and initially allowed for more than 4GB of RAM to be supported, but driver issues made Microsoft put a soft-cap in 4GB under this mode[0]. But Win7 32 bits with PAE would've surely been able to use all of those 4GB fine.
Try Win-R and type "notepad", at a reasonably fast programmer's pace. It consistently loses "no" for me, sometimes more if it's feeling particularly slow.
This should involve absolutely zero disk reads or anything of the sort, it's a window that runs a command. And it used to work reliably in past years. It feels like keyboard input simply isn't buffered like it used to be. Calculator it even worse as it loses input if you start typing the formula too soon. It used to be very easy for casual calculations now I have to wait for the computer.
In a similar vein I installed Ubuntu on an older laptop that had been running Windows 10. I was shocked at how fast it was compared to Windows 10, it was night and day.
This is part of it - many things are "fast enough" that were you used to have caches that would display nearly instantly, now you don't have those - it reads from disk each time it needs to show the folder, etc.
This is very visible in any app that no longer maintains "local state" but instead is just a web browser to some online state (think: Electron, teams, etc). Disconnect the web or slow it down and it all goes to hell.
That's interesting, I cloned a Win10 installation on a HDD to a sata SSD a year or two back and the speed difference was considerable. Especially something like Atom that took minutes to open before was ready to go in like 10 seconds afterwards.
Somewhere around IIRC Win8 Microsoft must have gotten really lax about minimizing disk access. Windows started being slow as molasses on an HDD, even for stuff like opening the start menu.
This hurts performance a ton on SSDs, too, it's just less noticeable. Something that should happen so fast you can hardly measure how long it takes, takes... just long enough to notice, which may amount to 100x as long as it should take, but 100x a small number is still pretty small.
Yeah the change from a 7200 HDD to an SSD for those 10 year old machines provides a very considerable improvement. It goes from "unusable" to "moderate" performance for general web browsing and business duties.
I'm talking about Windows 10 on 4G C2Q or Phenom/Phenom II machines - they aren't fast but they're very usable with a SSD and GPU in place.
You're comparing 10 to 10, so of course an SSD will only help in that situation.
But if any parts of 10 are sufficiently badly coded compared to 7, that will overcome the drive. And some parts definitely are, especially in the start menu code.
10 years of malware definition updates. 10 years of countless security additions. Every operation needs to be checked for correction, memory safety etc.
I hope one day latency in general will be "back to normal".
I still remember how fast console based computing, an old gameboy or a 90's macintosh would be - click a button and stuff would show up instantly.
There was a tactility present with computers that's gone today.
Today everything feels sluggish - just writing this comment on my $3000 Macbook Pro and i can feel the latency, sometimes there's even small pauses. A little when i write stuff, a lot when i drag windows.
Hopefully the focus on 100hz+ screens in tech in general will put more focus on latency from click to screen print - now when resolution and interface graphics in general are close to biological limits.
I'm on an M1 Air (cheapest base model), and I use it largely for writing (also dev but I get that that's not your question).
- For native M1 apps like Pages, Sublime, or Highland there's no lag at all. For example, with Highland 2 from double-clicking a file to editing it is less than a second and there's no lag during use even with a 49,000 word book manuscript open.
- For x86 apps like the not-quite-latest Office there's a couple of seconds at first launch (for that session) whilst Rosetta does its x86 translation work, but after that it launches without lag for the remainder of that session and it stays snappy in use (snappy for Word that is).
- Native VS Code goes from launch to editing in under two seconds and never lags, even with something like side-by-side Markdown preview going.
- If you're using Vellum for publishing it's about 1.5 seconds from double-clicking a file to editing it.
That's very good to hear, I've been looking at MacBook Air also because they're pretty much the kings when it comes to battery life for a handbag sized laptop. I think the bidder MacBooks have slightly better battery, but you can't really fit those in a smaller bag, you do kinda need a backpack for it or a laptop specific bag.
> I've been looking at MacBook Air also because they're pretty much the kings when it comes to battery life for a handbag sized laptop.
Battery life is, indeed, impressive.
Last night I spent around 5 hours doing C# dev in VS Mac, with multiple projects being built every few minutes, cross-platform binaries for Intel Mac, Windows, and Linux being produced every half hour or so, plus Highland 2, Word 2016, and Vellum. With all that it used 28% battery across that 5 hours (and never got warm). On full brightness too (for my sins).
I know the question isn't about dev, but writing uses less resources and gives even better battery life so 18 hours (for example) is definitely possible.
The only issue I have is the keyboard. Far better than the 'broken' ones of a few years ago but I really wish they'd go for thicker machines and increase the travel. I've just got rid of my last ThinkPad and it's the one thing I miss.
Oh, and there is no longer a hotkey to control the backlight brightness; it's automatic. Which genuinely works perfectly except that it doesn't come on for your very first sign in at boot-up, so entering your password then can be tricky without ambient light (though after that you can use the fingerprint reader). It's a really strange UX flaw. Not related to your question, I know, but you don't say whether you're already on a Mac or switching so I wanted to be honest about this as it is really annoying but rarely mentioned.
I have an M1 Air right I'm typing on right now and have not had any sluggishness concerns besides when switching between Spaces. Even that is more of a visual stutter instead of actually lagging to the point the animation takes longer than usual. This is the first thin & light computer I've owned that I'm 100% happy with its performance.
Weird. I don't use Spaces (this is the multiple desktops thing, right?) but I've just tried it and it's not laggy at all for me. I turn on the reduce motion thing, so it fades between them rather than swiping, but neither feel laggy.
(I'm on an M1 Air and I think the performance is great)
Most flagship Android phones are >60hz and have been for a few years. Flagship iPhones and iPads are >60hz. Very nearly every gaming laptop is >60hz. Many new TVs are >60hz with inputs to match.
My guess is that few people have stopped to compare them. I've never knowingly seen a 100+hz screen in person, so I stopped by a local store. Sure enough, I could tell that the motion was smoother. Bought 2. After using those, I can feel my older monitors that I'm using to write this are choppy.
But do you notice the smoothness in the day to day basis or have you, in a way, crippled yourself, because now the majority of monitors feel choppy to you?
Sounds a bit like the, 'Never meet your heroes', thingy.
I 100% notice it but interestingly it doesn’t affect me on my laptop/desktop much since I use a mouse and scrolling is already not smooth. While mobile has smooth scrolling and a lot more animations/swipes.
Do you think that besides gaming there really any need to move to higher then 60Hz on desktops and laptops?
My phone (POCO X3 PRO) allowed me to turn on 120Hz but when I do I don't notice any change except if I really look at it, like scrolling up and down very quickly while looking behind the phone I notice a difference, but otherwise I don't notice it, so I just have it turned off, should give more battery life.
True, it's probably just bleeding edge, but i've noticed several flagship phones, have 90HZ, and the new iPad Pros have up to 120hz "smooth scrolling", so it seems something will be happening x years down the line.
For me, there is far more latency on typical operations, but far less waiting for longer intensive operations like opening a program/tab or saving a file (bloat aside, some are guilty here).
I'd also prefer the sluggishness gone if I had my choice between the two.
It's not only a matter of 750ms instead of 200ms. I'm astonished every time I open some tool like Visual Studio, SAP Power Designer, or Libre Office that can stay for the most part of a minute on its loading screen.
What do those tools even do for that long? They can read enough data from the disk to overflow my computer's main memory a few times during it.
I heard optimization described this way: Sure, you think you need to tune the engine, but really, the first thing you need to do is get the clowns out of the car.
I remember a video of a guy running an old version of Visual C++ on an equally old version of Windows, in a VM on modern hardware, to try Windows development "the old way". It took about one frame to launch. One. Frame.
By the way, Apple isn't much better. Xcode takes around 15 seconds to launch on an M1 Max.
Not only Visual Studio s up instantly in an older version of Windows running in a VM. Debugger values update instantly there as well, something that Visual Studio can no longer do.
I really liked Win 2000 because of this feeling of speed. Most programs would simply "open" when you clicked their icon. There wouldn't be a loading screen. I remember getting frustrated because I could not look at the pretty spalsh screen that Excel had added because it would flash and disappear in milliseconds. Amd this was on hardware of that time.
Just based on memory, Visual C++ 6 was written using the good old Win32 API, which is just plain C code. Without access to the source code, I can assume that the object-oriented craze and XML fad had not corrupted that codebase. Superb software.
Visual C++ 7 was rewritten to use another SDK, likely based on .Net, and it was noticeably slower. The problem, as I see it, is people don't understand the cost of abstractions and intermediate layers, and add them gratuitously. This has been a trend ever since.
> Xcode takes around 15 seconds to launch on an M1 Max
Not really related to launch time but it’s hilarious how much faster Xcode is when working with Objective-C compared to Swift. I understand why, but it’s still jarring
Of video. Which probably was 30 fps. I mean, the splash screen just blinked for a barely noticeable split second before the main window appeared. You double click the shortcut, and it's already done launching before you realize anything. That's how fast modern computers are.
(actually, some things on the M1 are fast enough that I'm now getting annoyed at networking taking what feels like ages)
Why would you assume video is at 30fps? Geographic location? People not in the US (and a handful of other countries) would assume video framerate of 25fps.
Does the refresh rate of a computer monitor get referred to as frames? Usually, it's just the frequency like 120Hz type units. Sorry for the conversation break, but I've just never heard app start up times with a framerate reference. Was just an unusual enough thing that I let me brain wonder on it longer than necessary
Oh ffs. First off, I'm not from the US. I've been there for less than a month combined. Secondly, if you do want to nitpick, at least do some research first. The video in question is 60 or 30 fps depending on the quality setting.
$ yt-dlp -F https://www.youtube.com/watch?v=j_4iTovYJtc
[youtube] j_4iTovYJtc: Downloading webpage
[youtube] j_4iTovYJtc: Downloading android player API JSON
[youtube] j_4iTovYJtc: Downloading player df5197e2
[info] Available formats for j_4iTovYJtc:
ID EXT RESOLUTION FPS │ FILESIZE TBR PROTO │ VCODEC VBR ACODEC ABR ASR MORE INFO
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
sb2 mhtml 48x27 │ mhtml │ images storyboard
sb1 mhtml 80x45 │ mhtml │ images storyboard
sb0 mhtml 160x90 │ mhtml │ images storyboard
139 m4a audio only │ 46.85MiB 48k https │ audio only mp4a.40.5 48k 22050Hz low, m4a_dash
249 webm audio only │ 49.06MiB 51k https │ audio only opus 51k 48000Hz low, webm_dash
250 webm audio only │ 63.84MiB 66k https │ audio only opus 66k 48000Hz low, webm_dash
140 m4a audio only │ 124.33MiB 129k https │ audio only mp4a.40.2 129k 44100Hz medium, m4a_dash
251 webm audio only │ 125.02MiB 130k https │ audio only opus 130k 48000Hz medium, webm_dash
17 3gp 176x144 8 │ 56.70MiB 59k https │ mp4v.20.3 59k mp4a.40.2 0k 22050Hz 144p
160 mp4 256x144 30 │ 37.86MiB 39k https │ avc1.4d400c 39k video only 144p, mp4_dash
278 webm 256x144 30 │ 42.59MiB 44k https │ vp9 44k video only 144p, webm_dash
133 mp4 426x240 30 │ 84.31MiB 87k https │ avc1.4d4015 87k video only 240p, mp4_dash
242 webm 426x240 30 │ 70.03MiB 72k https │ vp9 72k video only 240p, webm_dash
134 mp4 640x360 30 │ 167.27MiB 174k https │ avc1.4d401e 174k video only 360p, mp4_dash
18 mp4 640x360 30 │ 352.24MiB 366k https │ avc1.42001E 366k mp4a.40.2 0k 44100Hz 360p
243 webm 640x360 30 │ 134.68MiB 140k https │ vp9 140k video only 360p, webm_dash
135 mp4 854x480 30 │ 294.98MiB 307k https │ avc1.4d401f 307k video only 480p, mp4_dash
244 webm 854x480 30 │ 233.37MiB 243k https │ vp9 243k video only 480p, webm_dash
136 mp4 1280x720 30 │ 653.31MiB 680k https │ avc1.4d401f 680k video only 720p, mp4_dash
22 mp4 1280x720 30 │ ~795.07MiB 808k https │ avc1.64001F 808k mp4a.40.2 0k 44100Hz 720p
247 webm 1280x720 30 │ 548.72MiB 571k https │ vp9 571k video only 720p, webm_dash
298 mp4 1280x720 60 │ 817.18MiB 850k https │ avc1.4d4020 850k video only 720p60, mp4_dash
302 webm 1280x720 60 │ 651.39MiB 678k https │ vp9 678k video only 720p60, webm_dash
And the units? Hz and FPS are generally interchangeable but FPS is more often used as a measure of how fast something renders while Hz is more often used for monitor refresh rates (a holdover from CRTs I guess).
Exactly. Users are subsidizing the software provider with CPU cycles and employee time.
Assume it costs $800 for an engineer-day. Assume your software has 10,000 daily users and that the wasted time cost is 20 seconds (assume this is actual wasted time when an employee is actively waiting and not completing some other task). Assume the employees using the software earn on average 1/8 of what the engineer makes. It would take less than 4 days to make up for the employee's time. That $800 would save about $80,000 per year.
Obviously, this is a contrived example, but I think it's a conservative one. I'm overpaying the engineer (on average) and probably under-estimating time wasted and user cost.
Servers are expensive, too. Humans waiting on servers to process something is even more expensive. No software runs in a vacuum; someone is waiting on it somewhere.
Adding more servers doesn't generally make things faster (latency). It only raises capacity (bandwidth). It does, however, generally cost quite a bit on development. Just about the only thing worse than designing a complex system is designing a complex distributed system.
If you don't want to take the advise of running the numbers that's up to you.
E.g. if end user latency is 10ms (and it's not voip or VR or something) then that's fast enough. Doesn't matter if it's optimizable to 10 us.
If this is code running on your million CPU farm 24/7, then yeah. But always run the numbers first.
Like I said, the vast majority of code optimization opportunities are not worth taking. Some are, but only after running the numbers.
On the flip side optimizing for human time is almost always worth it, be it end users or other developers.
But run the numbers for your company. How much does a CPU core cost per hour of it's lifetime? Your developers cost maybe $100, but maybe $1000 in opportunity cost.
Depending on what you do a server may cost you as much as one day of developer opportunity time. And then you have the server for years. (Subject to electricity)
Latency and throughput may be better solved by adding machines.
> Like I said, the vast majority of code optimization opportunities are not worth taking. Some are, but only after running the numbers.
Casey Muratori said it best: there are 3 philosophies of optimisation. You're talking about the first: actual optimisation where you measure and decide what to tackle. It's rarely used, and with good reason.
The second philosophy however is very different: it's non-pessimisation. That is, avoid having the CPU do useless work all the time. That one should be applied in a fairly systematic basis, and it's not. To apply it in practice you need to have an idea of how much time your algorithm requires. Count how many bytes are processed, how many operations are made… this should give a nice upper bound on performance. If you're within an order of magnitude of this theoretical maximum, you're probably good. Otherwise you probably missed something.
The third philosophy is fake optimisation: heuristics misapplied out of context. This one should never be used, but is more frequent than we care to admit.
> avoid having the CPU do useless work all the time
It's not worth an engineer spending 1h a year even investigating this, if it's less than 20 CPU cores doing useless work.
The break even for putting someone full time on this is if you can expect them to save about fourty thousand CPU cores.
YMMV. Maybe you're a bank who has to have everything under physical control, and you are out of DC floor space, power budget, or physical machines.
There are other cases too. Maybe something is inherently serial, and the freshness of a pipeline's output has business value. (e.g. weather predictions for tomorrow are useless the day after tomorrow)
But if you're saying that this second way of optimizing is that things should be fast for its own sake, then you are not adding maximum value to the business, or the mission.
Performance is an instrumental goal of an effort. It's not the ultimate goal, and should not be confused for it.
In the specific case of batch processing, I hear you. Machine time is extremely cheap compared to engineer time.
Then there are interactive programs. With a human potentially waiting on it. Someone's whose time may be just as valuable as the engineer's time (morally that's 1/1, but even financially the difference is rarely more than a single order of magnitude). If you have as few as 100 users, shaving off seconds off their work is quickly worth a good chunk of your time.
Machine time is cheap, but don't forget that user's time is not.
You should, however, not pessimize. People make cargo-cult architecture choices that bloat their codebase, make itnless readable, and make it 100x slower.
Using actual numbers vetted by actual expenses in an actual company, if you can save 100 CPU cores by spending 3h a year keeping it optimized, then it is NOT worth it.
It is cheaper to burn CPU, even if you could spend one day a year making it max out one CPU core instead of 100.
It can be better for the business to cargo cult.
Not always. But you should remember that the point of the code is to solve a problem, at a low cost. Reducing complexity reduces engineer cost in the future and may also make things faster.
Put it this way: Would you hire someone at $300k doing nothing but optimizing your pipeline so that it takes one machine instead of one rack, or would you spend half that money (TCO over its lifetime) just buying a rack of machines?
If you wouldn't hire them to do it, then you shouldn't spend current engineers time doing it.
I wasn't talking about optimization! I was talking about non-pessimization, which includes not prematurely abstracting/generalizing your code.
I've seen people making poor decisions at the outset, and having code philosophies that actively make new code 100x slower without any clear gain. Over-generalization, 100 classes and subclasses, everything is an overriden virtual method, dogmatic TDD (luckily, nobody followed that.)
The dogma was to make things more complicated and illegible, 'because SOLID'.
Run the lifetime cost of a CPU, and compare it to what you pay your engineers. It's shocking how much RAM and CPU you can get for the price of an hour of engineer time.
And that's not even all! Next time someone reads the code, if it's "clever" (but much much faster) then that's more human time spent.
And if it has a bug because it sacrificed some simplicity? That's human hours or days.
And that's not even all. There's the opportunity cost of that engineer. They cost $100 an hour. They could spend an hour optimizing $50 worth of computer resources, or they could implement 0.1% of a feature that unlocks a million dollar deal.
Then having them optimize is not just a $50 loss, it's a $900 opportunity cost.
But yeah, shipped software like shrinkwrapped or JS running on client browsers, that's just having someone else pay for it.
(which, for the company, has even less cost)
But on the server side: yes, in most cases it's cheaper to get another server than to make the software twice as fast.
Not always. But don't prematurely optimize. Run the numbers.
One thing where it really does matter is when it'll run on battery power. Performance equals battery time. You can't just buy another CPU for that.
Yeah, it doesn't have a simple answer that works for all cases.
Say you need to do some data processing from format A to B. There's already a maintained codebase for converting from A to C, C to D, and a service that converts individual elements from D to A. All steps require storing back onto disk.
For a one-time thing it'll be MUCH cheaper to do it the naive way reusing existing high level blocks, and going to lunch (or vacation), and let it run.
For a recurring thing, or a pipeline with latency requirements, maybe it's worth building a converter from A to B.
Or… it could be cheaper to just shard A and run it on 20 CPUs.
Let's say you have the expensive piles of abstraction, and creating huge waste. At my company one HOUR of engineer time costs about the same as 20 CPUs running for A YEAR.
This means that if you reduce CPU use by 20 cores, forever, then ROI takes a full year. Including debugging, productionizing, and maintenance you pretty much can't do anything in 1h.
Likely your A-to-B converter could take 1h of human time just in ongoing costs like release management.
And to your point about code readability: Sometimes the ugly solution (A-C-D-B) is the one with less code. If you needed the A->C, C->D, D->A components anyway, then writing an A->B converter is just more code, with its potential readability problems.
On the flip side of this: It's been a trend for a long time in web development to just add layers of frameworks and it's now "perfectly normal" for a website to take 10s to load. Like what the fuck, blogspot, how do you even get to the point where you realize you need a "loading" animation, and instead of fixing the problem you actually do add one.
Human lifetimes have been spent looking at just blogspot's cogs spinning.
We shouldn't let people obtain CS degrees until they've had to write at least one fairly-complex program on a platform with little enough RAM that the amount of code in the program starts to be something they have to optimize (because the program itself takes up space in memory, not just the data it uses, which is something we hopefully all know but rarely think about in practice on modern machines). Tens or low hundreds of KB of memory. Get 'em questioning every instruction and every memory allocation.
I'm only half-joking.
[EDIT] For extra lulz let them use a language with a bunch of fancy modern language features so they get a taste of what those cost, when they realize they can't afford to use some of them.
It's not far fetched. Microcontroller programming should not be seen as magic.
And microcontrollers will never get abundant capacity because smaller and more efficient means less battery, no matter the tech level.
So it's not like "everyone should know the history of the PDP-11" which I would disagree with.
During my schooling we built traffic lights and stuff on tiny machines, and even in VHDL, even though desktop machines were hundreds of MHz. They both have a place still.
Regarding chrome, browsers are basically operating systems nowadays. A standards compliant HTML5 parser is at the bare minimum millions of lines of code. Same for the renderer and Javascript engine.
That's true. I'm not saying a browser solves a small and simple problem. But on the other hand Chrome takes much more RAM than the operating system (including desktop environment).
Even after closing all tabs, since tabs (and extensions) are basically programs in this operating system.
Yeah, at 600MB/s, 50 seconds of loading is 15GB... So ok, it can't fill my RAM at HDD speeds, but no, none of those use anything near 15GB of memory at startup. (If they did, my question would be WTF are they doing with gigabytes of memory.) And well, loading from disk ought to be the bottleneck of any reasonable cache.
About pre-computing things (that's very likely the answer), the question is what things? Excluding Visual Studio, those are very plain GUI programs, that have a huge amount of options, but not anything near enough. And on the Visual Studio case, all the indexes and intelligence helpers are certainly cached to disk, as it's impossible to recalculate them at load time (the information just isn't there).
One thing those 3 have in common is that they have complete language emulation environments that are exposed to the user but are not related to their main function. Yet, language emulation environments start-up much faster than that, so they can only explain a small part of that time.
I work at a BigCorp that ships desktop software (but none of the above products) and network latency is (usually) pretty easy to extract out of the boot critical path. Blocking UI with network calls is a big no-no, and I expect any sizeable organization to have similar guidelines.
Work like in the OP's article is probably the most difficult - it's work that is necessary, cannot be deferred, but is still slow. So it requires an expert to dig into it.
Power Designer surely is phoning home but this isn't nearly slow enough to matter here. AFAIK Visual Studio phones in during the operation, and not on startup. Libre Office almost certainly isn't phoning anywhere.
I didn't include the slowest starting software that I know, Oracle SQL Developer, because it's clear that all the slowness is caused by phoning home, several times for some reason. But that's not the case for all of them.
EDIT: Or, maybe it's useful to put it another way. The slowest region on the world for me to ping is around Eastern Asia and Australia. Some times, I get around 1.5s round trip time for there. A minute has around 40 of those.
Network lag can be worked around with concurrent programming techniques--you don't even have to use a high-performance language to do it. The problem is that concurrent programming is far beyond what the typical Jira jockey can do--bosses would rather hire commodity drones who'll put up with Agile than put up with and pay for the kind of engineers who can write concurrent or parallel programs.
I use Visual Studio on an air-gapped machine with no (active) network cards (so Windows / winsock2 knows there is nothing that can respond and any connection should error out immediately) and it still takes almost a minute.
At least VS is just kinda slow, maybe it's the XML parser :D
The answers in this subthread had me think more: I am using a company provided Windows machine and a Linux virtual desktop for the same tasks. Startup times difference for many applications is night and day. Probably due to virus scan and MS OneDrive.
> And in the end, the code seems to run "fast enough" and nobody involved really notices that what is running in 750ms really ought to run in something more like 200us.
Nobody has created a language that is both thousands of times faster than Python and nearly as straightforward to learn and to use. The closest thing I know of might be Julia, but that has its own performance problems and is tied closely to its AI/ML niche. Even within that niche I'm certainly not going to get most data scientists to write their code in C or C++ (or heaven forbid Rust) to solve a performance impediment that they've generally been able to work around.
It's great that you've been able to switch to higher-performance languages, but not everyone can do that easily enough to make it worth doing.
The "iterate from notebook to production" process which is common everywhere but the largest data engineering groups rules out anything with manual memory management from becoming popular with data science work.
Some data scientists I know like (or even love) Scala, but that tends to blow up once it's handed over to the data engineers as Scala supports too many paradigms and just a couple DSs will probably manage to find all of them in one program.
We use Go extensively for other things, and most data scientists I've worked with sketching ideas in Go liked it a lot, but the library support just isn't there, and it's not really a priority for any of the big players who are all committed to Python wrapper + C/C++/GPU core, or stock Java stacks. (The performance also isn't quite there yet compared to the top C and C++ libraries, but it's improving.)
I love scala and wish it was more popular. I've made piece with java at this point as it slowly adopts my favorite parts of scala but I miss how concise my code was.
I think that's my argument. If a developer thinks C or C++ is really that difficult and they can only write effectively in Python, they're a shitty developer and the world seems to be jam packed with them.
As a long-time C# user who started life with coding for embedded systems with C, graduated to C++ business tiers, and then on to C#, my personal crusade has always been to show that it's very possible to make things go pretty fast with C#.
One of my favorite moments happened after my C#-based back-end company was acquired by an all-[FASTER LANGUAGE] company. We had to connect our platforms and hit a shared performance goal of supporting 1 billion events/month, which amounted to something like (IIRC) 380 per second. Our platform hit that mark running on 3 server setup w/2 Amazon Medium FE servers and a SQL backend. The other company's bits choked at 10 per second, running on roughly 50x the infra.
Poorly written and architected code is a bigger drag than the specific language in many cases.
If you're using an IDE (Rider or Visual Studio) and avoid the Enterprise frameworks, then it's much easier to use than Python. Tooling makes a huge difference, no more digging through the sometimes flakey Python documentation and cursing compatibility issues with random dependencies not supporting Apple Silicon.
I agree tooling makes a huge difference but I specifically said this with the understanding that you're using C# with Visual Studio. Some stuff will be easier in C#, but a lot of other stuff just isn't as easy as in Python.
At the risk of setting up a strawman for people to punch down, try comparing how easy it is to do the equivalent of something like this in C#, and feel free to use as much IDE magic as you'd like:
x = [t[1] for t in enumerate(range(1, 50, 4)) if t[0] % 3 == 0][2:]
Was it actually easier?
There's a million other examples I could write here, but I'm hoping that one-liner will be sufficient for illustration purposes.
Enumerable.Range(1,50).Where((x,i) => i % 4 == 0).Where(e => e % 3 == 0).Skip(1).Select(e => e+4)
Okay, so you might consider that last e+4 cheating and against the spirit, but I couldn't be bothered to spend money upgrading my linqpad to support the latest .net with Enumerable.Chunk which makes taking two at a time easier for the first part.
Edit: more in spirit:
Enumerable.Range(1,50).Where(e => e % 4 == 0 && e % 3 == 0).Skip(1).Select(e => e + 1)
If I understand dataflow's example correctly you don't need the Select at the end:
var x = Enumerable.Range(1,50)
.Where((num, index) => num % 4 == 1 && index % 3 == 0)
.Skip(2)
.ToArray();
That computes the same thing as their Python snippet: [25,37,49]. Of course, what this is actually computing is whether the number is congruent to 1 modulo 4 and 3 so it was a weird example, but here's how you'd really want to write it (since a number congruent to 1 modulo 4 and 3 is the same as being congruent to 1 module 12):
var x = Enumerable.Range(1,50)
.Where(num => num % 12 == 1)
.Skip(2)
.ToArray();
Rewriting that Python example to be a bit clearer for a proper one-to-one comparison:
y = [t for t in range(1, 50, 4) if t % 3 == 1][2:]
That enumerate wrapper was unnecessary. I don't recall a way, in LINQ, to generate only every 4th number in a range, but I also haven't used C# in a few years so my memory is rusty on LINQ anyways.
You're right, the maths simplifies it a lot. I rushed out a one-liner without much analysis, and eventually come to the same conclusion.
There's no Range method that takes (start, stop, step) but it's trivial enough to write one, it's a single for loop and yield return statement.
We can even trigger the python users by doing it in one line ;)
public static class CustomEnumerable { public static IEnumerable<Int32> Range(int start, int stop, int step) {for (int i = start; i < stop; i+=step) yield return i;}}
Try writing your function definitions on one line in python!
Yeah, that would work, throw it before the Where clause and change 49. Range here doesn't specify a stopping point, but a count of generated values (this makes it not quite the same as Python's range). So you'd want:
Enumerable.Range(0,13).Select(x => 4 * x + 1).Where((e, i) => i % 3 == 0).Skip(2)
And that's equivalent to the original, short of writing a MyRange that combines the first Range and Select. Still an awful lot of work for generating 3 numbers.
No, I'm suggesting that your original example was a great example of obfuscated Python. Even supposing that you wanted to alter the total number of values generated and the number of initial values to skip, you're doing unnecessary work and made it more convoluted than necessary:
def some_example(to_skip=2, total_count=3):
return [n * 12 + 1 for n in range(to_skip, to_skip+total_count)]
There you go. Change the variable names that I spent < 1 second coming up with and that does exactly the same thing without the enumeration or discarding values. In a thread on how computer speed is wasted on unnecessary computation, it seems silly that you're arguing in favor of unnecessary work and obfuscated code.
What you're missing is that C# example works on any Enumerable. And it's very hard to explain how damn important and impressive this is without trying it first.
Yes, it's more verbose, but I can swap that initial array for a List, or a collection, or even an external async datasource, and my code will not change. It will be the same Select.Where....
> is that C# example works on any Enumerable. And it's very hard to explain how damn important and impressive this is without trying it first.
Believe me I've tried (by which I mean used it a ton). I'm not a newbie to this. C# is great. Nobody was saying it's unimportant or unimpressive or whatever.
> Yes, it's more verbose, but I can swap that initial array for a List, or a collection, or even an external async datasource, and my code will not change
Excellent. And when you want that flexibility, the verbosity pays off. When you don't, it doesn't. Simple as that.
> Excellent. And when you want that flexibility, the verbosity pays off. When you don't, it doesn't. Simple as that.
It's rarely as simple as that. For example, this entire conversation started with "At the risk of setting up a strawman for people to punch down, try comparing how easy it is to do the equivalent of something like this".
And this became a discussion of straw men :) Because I could just as easily come up with "replace a range of numbers with data that is read from a database or from async function that then goes through the same transformations", and the result might not be in Python's favor.
It's not "twice as long" in any syntactic sense, and readability is easily fixed:
Enumerable.Range(1,50)
.Where(e => e % 4 == 0 && e % 3 == 0)
.Skip(1)
.Select(e => e + 1)
That's very understandable, it's clear what it does, and if your complaint is that dotnet prefers to name expressions like Skip rather than magic syntax, we can disagree on what make things readable and easy to maintain.
It's literally "twice as long" syntactically. 120 vs. 67 characters.
And again, you keep omitting the rest of the line. (Why?) What you should've written in response was:
var y = Enumerable.Range(1,50)
.Where(e => e % 4 == 0 && e % 3 == 0)
.Skip(1)
.Select(e => e + 1)
.ToArray();
Compare:
y = [t[1] for t in enumerate(range(1, 50, 4))
if t[0] % 3 == 0][2:]
And (again), my complaint isn't about LINQ or numbers or these functions in particular. This is just a tiny one-liner to illustrate with one example. I could write a ton more. There's just stuff Python is better at, there's other stuff C# is better at, that's just a fact of life. I switch between them depending on what I'm doing.
There's not a lot of difference if you use the query syntax in C# (assuming you add an overload to Enumerable.Range() to take the skip) - only no-one uses that because it's ugly. Also really nice that the types are checked + shown by tooling, as is the syntax.
I use Python a lot for scripting - what it lacks in speed of development/runtime it gains in being more accessible to amateurs and having less "enterprise" style libraries (particularly with cryptographic libraries, MS abstract way too much whilst Python just has think wrappers around C). That makes Python a strong scripting language for me. PyCharm is really nice too.
For real work? C# is better as long as you have either VS or Rider. Really dislike the VS Code experience (these JS-based editors are slow and nowhere near as nice a Rider) so then I can understand why people would avoid it.
The ToArray is unneccessay, it's much more idiomatic dotnet to deal with IEnumerable all the way through.
The only meaningful difference in lengths is that C# doesn't have an Enumable.Range(start, stop, increment) overload but it's easy enough to write one, and then it'd be essentially the same length.
"Unnecessary"? You can't just change the problem! I was asking for the equivalent of some particular piece of code using a list, not a different one using a generator. Sometimes you want a generator, sometimes you want an array. In either language.
This is a silly argument, you're asking for a literal translation of a pythonic problem without allowing the idioms from the other languages.
If you were actually trying to solve the problem in dotnet, you'd almost certainly structure it as the Queryable result and then at the very end after composing run ToList, or ToArray or consume in something else that will enumerate it.
Now even including the ToList it's now just four basic steps:
Range, Filter, Skip, Enumerate.
Those are the very basics, all one line if wanted. It doesn't get much more basic than that, and I'd still argue it's easier for someone new to programming to see what's going on in the C# than the python example.
edit: realised the maths simplifies it even further.
There's very little difference between the two as long as you're using modern versions of both and add your own functions to fill any API gaps and are using type hinting properly in Python. My C# tends to be "larger" because I use more vertical whitespace and pylint is rather opinionated.. :)
Where you can complain about C# - and I do - is where you're having to write (or work with) code which has been force to stick to strict architectural and style standards. That makes code-bases which are very hard to understand for newbies and are verbose.
On the flip side, once you start doing anything even slightly interesting with Python you run into the crappy package management. The end result of which is lots of frustration getting projects working and a lot of time wasted on administration vs work.
But who compares Python with C#, they are not even in the same league? Python is a glorified bash scripting replacement with a mediocre JIT engine. Modern C# is faster than Go which is what it is competing against.
I was able to convert a couple of my data scientist colleagues over to using Scala (given that they were writing code for our Spark cluster it seemed like a no-brainer compared to Python or R). It's not thousands of times faster but it might be ten or a hundred times faster, and a lot of the time you can write the very same code aside from punctuation (and even that difference is smaller in Scala 3, although I don't think Spark has moved to that yet).
And yet, even with all the evidence that modern, heavily-bloated software development is AWFUL (constant bugs and breakage because no one writing code understands any of the software sitting between them and the machine, much less understands the machine; Rowhammer, Spectre, Meltdown, and now Hertzbleed; sitting there waiting multiple seconds for something to launch up another copy of the web browser you already have running just so that you can have chat, hi Discord)... you still have all the people in the comments below trying to come up with reasons why "oh no it's actually good, the poor software developers would have to actually learn something instead of copying code off of Stack Overflow without understanding it".
most numerical algorithms are loops over arrays, accumulators and simple arithmetic. this is where numba shines.
for the other cases, there's a python compatibility mode (on by default) that allows for use of arbitrary python.
the hard parts in numba are ensuring type inference works correctly and adding it to existing python environments that might have dependencies pinned at inconvenient versions or other drama associated with adding an entire llvm to your python environment.
also, there's the explosion of python versions cross numpy/mkl versions cross distributions cross bitwidths... but that's the nature of publicly shipping numerical code in python in general.
all that said, when it's all set up, numba can be quite elegant and simpler than cython.
Python _is_ slow, but even back in 2006 on a pentium 4 I had no problem using it with PyGame to build a smooth 60fps rtype style shooter for a coding challenge.
One just has to not do anything dumb in the render loop and it's plenty responsive.
Of course, if you're going to interactively process a 50mb csv or something... But even then pandas is faster.
Nah that's too general. A lot of website/app backends use Django or Fastapi and they work fine. Many more use PHP, also not a language famed for extreme performance.
It depends on the application. Personally I wouldn't use Python for a GUI (because I'd use JS/TS).
I'd be the first to complain about latency where it maters, but launching a Python program is perceptually instant (and significantly lower-latency than many nominally "faster" languages, IME).
> And in the end, the code seems to run "fast enough" and nobody involved really notices that what is running in 750ms really ought to run in something more like 200us.
At least with Chrome's V8, the difference is not that big.
Sure, it loses to C/C++, because it can't vectorize and uses orders of magnitude more memory, but at least in the Computer Language Benchmarks Game it's "just" 2-4x slower.
I remember getting a faster program doing large matrix multiplication in JavaScript than in C with -o1, because V8 figured out that I'm reading from and writing to the same cell, so optimised that out, which gave it an edge, because in both cases the memory bandwidth limited the speed of execution.
As for Electron and the like: half of the reason why they're slow is that document reflows are not minimized, so the underlying view engine works really, really hard to re-render the same thing over and over again.
It's not nearly as visible in web apps, because these in turn are often slowed down by the HTTP connection limit(hardcoded to six in most browsers).
Languages top out at around 50x, and that's the extreme of pure CPython to C.
For as many factors of magnitude as I am talking about, you have to be screwing up algorithms, networks, and a whole bunch of other things too.
Python and similar languages like Ruby really do make it easy to accidentally pile things on top of each other, but you can screw up in pure assembler with enough work put into it. Assembler doesn't stop you from being accidentally quadratic or using networks in a silly way.
Except as a developer I lose lots of time if I have to wait long for my code (esp. Unit tests) to run. Having said that larger projects in C/C++ are often very slow to build (esp. if dependencies are not well defined and certain header files affect huge numbers of source files - a problem that doesn't exist with higher level languages).
But even if using a particular language and framework saves developer time, it rarely seems to translate into developers using that saved time to bother optimizing where it might really count.
I've not found that to be the case. The first draft might get done faster, but then I spend more time debugging issues in dynamic languages that only show up at runtime that the compiler would find in other languages. And then more time optimizing the code, adding caching, moving to more advanced algorithms, and rewriting parts in C just to get it to run at a reasonable speed when the naive approach I implement in other languages is fast enough on first try.
For most tasks, modern mid-level statically typed languages like C#, Go, Kotlin really are the sweet spot for productivity. Languages like Python, Ruby and JS are a false economy that appear more productive than they really are.
That's only an excuse if you're sociopathically profit-oriented. The program is developed orders of magnitude fewer times than it is run. Shitty performance, like pollution, is an externality that can be ignored but should not.
Shitty performance certainly is bad, but it is not an externality like emissions into the atmosphere. The fundamental difference is that the customer (and only the customer) is harmed by bad performance, while emissions harms everyone.
I'm not so sure. Emissions don't harm everyone instantly; they affect people disproportionately and only impact everyone over time as the effects accumulate. Sure, maybe bad performance only affects the customer initially, but can't you help but wonder what the cumulative opportunity cost of bad performance on civilization has been?
The predominant perception among nontechnical people is that computers are fundamentally unreliable and slow. It doesn't seem unreasonable to think that might be holding up the rate of innovation.
By this reasoning, though, there is no such thing as localized harm. The reason I can't abide by the idea that "there's no such thing as localized harm" is that, when you actually try to analyze nonlocal harms caused by personal decisions, you get swallowed up by the butterfly of doom.
It's the butterfly effect. For example, a lot of software that actually gets written is a net negative to society, even if it functions perfectly. So does making it more efficient actually benefit anybody? And a lot of other software is embedded in organizations that will add features to the software until it fails, expanding like an ideal gas to fill whatever space it's given, so even if you make it more efficient and less failure-prone, you're only really delaying the inevitable anyway. However, making a bureaucratic organization less efficient might not actually stop it; consider, for example, how the Social Security Card was originally engineered to be unusable as a national ID, but got used as one anyway, so now the United States not only has a national ID that most citizens didn't want, but we're stuck with a bad one. However, identity theft might actually be considered just another case of externalities, and if the bureaucrats had to eat the cost of easy-to-forge national IDs, this problem might have gotten fixed.
I think you can analyze nonlocal harms, but not using informal reasoning in a chatroom. There are too many possible interactions in the real world to fit them all in your head. You end up with an impossible-to-analyze infinite regress.
Instead, nonlocal harms should probably expect real-world measurements to prove that they actually exist and aren't entirely being washed out by the much larger effect sizes of unrelated phenomena.
Externalities is a concept in economic theory. It shows how net negative behaviour occurs, even when all actors act perfectly rational and have perfect information (while optimizing for their own gain). Bad software does simply not map to this concept in the same way environmental damage does. In your example people use bad software, against their interest, despite better alternatives.
> any task that speed is even remotely a consideration for anymore
How do you know whether or not speed is a consideration?
Yes, OP delivered impressive efficiency gains. I'm sure he could improve the efficiency even more by dropping into pure Assembly.
But is it worth it?
The prime consideration is not execution speed but maintainability. The further that OP got away from pure Python, the more difficult to maintain the code became. That's a downside.
Now, OP describes an important technique because in the real world, you have a performance budget. Code needs to execute at speeds that return quickly enough to the user, or long execution is financially expensive (i.e. cloud computing resources), etc. But optimizing beyond what the budget requires is wasteful in terms of time needed to do the optimization as well as harmful in terms of negatively impacting future maintainability.
> The prime consideration is not execution speed but maintainability.
Why? And how did you measure this drop in maintainability? I'm asking because I see developers prioritize _perceived_ maintainability over _measurable_ things that matter to the user (like performance).
I don't really find python slow for what I do (typically writing UIs around computer vision systems) but also, several years back I made a microcontroller-based self-balancing robot. It was hard to debug the PID and the sensor, so I replaced it with a Pi Zero and the main robot loop ran in python- enough to read the accelerometer, compute a PID update, and send motor instructions- 100 times a second. If there was a problem (say, another heavy process, like computer vision, running on the single CPU) it would eventually not respond fast enough and the robot would fall over.
Most of the time it's not that you need a faster language, it's that you need to write faster code. I was working on a problem recently where random.choices was slow but I realized that due to the structure of my problem I could convert it to numpy and get a 100X speedup.
More important than the language, is using the right tool for the job. If you are using the scientific Python stack, correctly, you'll have a difficult time beating that with c++. For many applications. While producing way simpler and more maintainable code.
I felt this pretty viscerally recently. I did Advent of Code 2021 in python last year. My day job is programming in Python so I didn't really think about the execution speed of my solutions much.
As a fun exercise this year I've been doing Advent of Code 2020 in C, and my god it's crazy how much faster my solutions seem to execute. These are just little toy problems, but even still the speed difference is night and day.
Although, I still find Python much easier to read and maintain, but that may just be I'm more experienced with the language.
> Although, I still find Python much easier to read and maintain, but that may just be I'm more experienced with the language.
Python is definitely easier to read and maintain if you have loads of dependencies. C dependency management is a pain.
If you can read and write a little C, you should consider giving C#/Java/Kotlin/Swift a try. They're probably an order of magnitude slower than C if you write them in a maintainable style, but they're still much faster than Python. If you're doing stuff like web APIs then ASP.NET/Spring will perform very admirably without manually optimizing code, for example. You might find that these languages are C-like enough to understand and Python-like enough to be productive in. Or you might not, but it's worth a shot!
I personally believe that C is difficult if not impossible to properly to maintain long term, at least not as much as the faster alternatives. On the other hand my experience with Python is that it's one of the slowest mainstream languages out there, relying heavily on C libraries to get acceptable performance.
Haha, what C/C++ web framework should I use instead of Django/Rails/JS-whatever? Performance is a consideration, but I'm not going to reinvent a bunch of packages because of it.
This kind of blanket comment that "scripting languages are too slow" makes it sound like you shouldn't use them for anything, but they are perfectly adequate for many tasks. I'm more likely to have network and DB slowdowns than problems with scripting languages.
There is a balance, like sure there is inefficient code but often its because that code is accessing an I/O resource inefficiently, and so the CPU and RAM speed of the host machine isnt the bottlebeck no matter what dumb things the programmer does
So you dont need to pretty much ever reinvent or even use a hackerrank algorithm, you need to understand that the database compute instance has a fast cpu and lots of RAM too
"I have a hard time using (pure) Python anymore for any task that speed is even remotely a consideration anymore. Not only is it slow even at the best of times, but so many of its features beg you to slow down even more without thinking about it."
Something that you do a lot? Fine, write it in C/C++/Rust.
It's something that costs thousands/millions of dollars of compute? Ok, maybe it's worth it for you to spend a month on, put your robe on, and start chanting in latin.
Brian Cantrill has a video out there where he rewrote some C code in Rust and the benchmark was enough faster that he couldn't make sense of it. After much digging it turned out that it was because Rust was using a better data structure to represent the data, one that's difficult to get right in C.
In the end his test was comparing algorithms not compilers, but there is still something to that: we always make algorithmic compromises based on what is robust and what is brittle in our language of choice. The speed limits don't matter if only a madman would ever drive that fast.
But perfect performance isn't even the benchmark, it's "not ridiculously slow". This is what is meant by "Computers are fast, but you don't know it", you don't even know how ludicrously fast computers are because so much stuff is so insanely slow.
They're so fast that, in the vast majority of cases, you don't even need optimization, you just need non-pessimization: https://youtu.be/pgoetgxecw8
To be really fast, yes. Those are optimizations that allow you to go beyond the speed of just C and proper algorithms.
But C and proper algorithms are still fast - Moore's law is going wider, yes, and single-threaded advancements aren't as impressive as they used to be, but solid C code and proper algorithms will still be faster than it was before!
What's not fast is when, instead of using a hashmap when you should have used a B-tree, you instead store half the data in a relational database from one microservice and the other half on the blockchain and query it using a zero-code platform provided by a third vendor.
These things only net you one or two orders of magnitude (and give you very little or even negative power efficiency gain), or maybe 3 for the gpu.
This pales in comparison to the 4-6 orders of magnitude induced by thoughtless patterns, excessive abstraction, bloat, and user-hostile network round trips (this one is more like 10 orders of magnitude).
Write good clean code in a way that your compiler can easily reason about to insert suitable vector operations (a little easier in c++, rust, zig etc. than c) and it's perfect performance in my book even if it isn't saturating all the cores
I have a hard time using (pure) Python anymore for any task that speed is even remotely a consideration for anymore. Not only is it slow even at the best of times, but so many of its features beg you to slow down even more without thinking about it.