Top
Best
New

Posted by WoodenChair 7 hours ago

Python numbers every programmer should know(mkennedy.codes)
162 points | 76 commentspage 2
xnx 5 hours ago|
Python programmers don't need to know 85 different obscure performance numbers. Better to really understand ~7 general system performance numbers.
ZiiS 4 hours ago||
This is really weird thing to worry about in python. But is also misleading; Python int is arbitrary precision, they can take up much more storage and arithmetic time depending in their value.
lunixbochs 2 hours ago||
I'm confused why they repeatedly call a slots class larger than a regular dict class, but don't count the size of the dict
ktpsns 5 hours ago||
Nice numbers and it's always worth to know an order of magnitude. But these charts are far away from what "every programmer should know".
jerf 5 hours ago|
I think we can safely steelman the claim to "every Python programmer should know", and even from there, every "serious" Python programmer, writing Python professionally for some "important" reason, not just everyone who picks up Python for some scripting task. Obviously there's not much reason for a C# programmer to go try to memorize all these numbers.

Though IMHO it suffices just to know that "Python is 40-50x slower than C and is bad at using multiple CPUs" is not just some sort of anti-Python propaganda from haters, but a fairly reasonable engineering estimate. If you know that you don't really need that chart. If your task can tolerate that sort of performance, you're fine; if not, figure out early how you are going to solve that problem, be it through the several ways of binding faster code to Python, using PyPy, or by not using Python in the first place, whatever is appropriate for your use case.

belabartok39 2 hours ago||
Hmmmm, there should absolutely be standard deviations for this type of work. Also, what is N number of runs? Does it say somewhere?
mikeckennedy 45 minutes ago|
It is open source, you could just look. :) But here is a summary for you. It's not just one run and take the number:

Benchmark Iteration Process

Core Approach:

- Warmup Phase: 100 iterations to prepare the operation (default)

- Timing Runs: 5 repeated runs (default), each executing the operation a specified number of times

- Result: Median time per operation across the 5 runs

Iteration Counts by Operation Speed: - Very fast ops (arithmetic): 100,000 iterations per run

- Fast ops (dict/list access): 10,000 iterations per run

- Medium ops (list membership): 1,000 iterations per run

- Slower ops (database, file I/O): 1,000-5,000 iterations per run

Quality Controls:

- Garbage collection is disabled during timing to prevent interference

- Warmup runs prevent cold-start bias

- Median of 5 runs reduces noise from outliers

- Results are captured to prevent compiler optimization elimination

Total Executions: For a typical benchmark with 1,000 iterations and 5 repeats, each operation runs 5,100 times (100 warmup + 5×1,000 timed) before reporting the median result.

belabartok39 18 minutes ago||
That answers what N is (why not just say in the article). If you are only going to report medians, is there an appendix with further statistics such as confidence intervals or standard deviations. For serious benchmark, it would be essential to show the spread or variability, no?
mikeckennedy 4 hours ago||
Author here.

Thanks for the feedback everyone. I appreciate your posting it @woodenchair and @aurornis for pointing out the intent of the article.

The idea of the article is NOT to suggest you should shave 0.5ns off by choosing some dramatically different algorithm or that you really need to optimize the heck out of everything.

In fact, I think a lot of what the numbers show is that over thinking the optimizations often isn't worth it (e.g. caching len(coll) into a variable rather than calling it over and over is less useful that it might seem conceptually).

Just write clean Python code. So much of it is way faster than you might have thought.

My goal was only to create a reference to what various operations cost to have a mental model.

willseth 3 hours ago|
Then you should have written that. Instead you have given more fodder for the premature optimization crowd.
tgv 5 hours ago||
I doubt list and string concatenation operate in constant time, or else they affect another benchmark. E.g., you can concatenate two lists in the same time, regardless of their size, but at the cost of slower access to the second one (or both).

More contentiously: don't fret too much over performance in Python. It's a slow language (except for some external libraries, but that's not the point of the OP).

jerf 5 hours ago|
String concatenation is mentioned twice on that page, with the same time given. The first time it has a parenthetical "(small)", the second time doesn't have it. I expect you were looking at the second one when you typed that as I would agree that you can't just label it as a constant time, but they do seem to have meant concatenating "small" strings, where the overhead of Python's object construction would dominate the cost of the construction of the combined string.
jchmbrln 5 hours ago||
What would be the explanation for an int taking 28 bytes but a list of 1000 ints taking only 7.87KB?
wiml 3 hours ago|
That appears to be the size of the list itself, not including the objects it contains: 8 bytes per entry for the object pointer, and a kilo-to-kibi conversion. All Python values are "boxed", which is probably a more important thing for a Python programmer to know than most of these numbers.

The list of floats is larger, despite also being simply an array of 1000 8-byte pointers. I assume that it's because the int array is constructed from a range(), which has a __len__(), and therefore the list is allocated to exactly the required size; but the float array is constructed from a generator expression and is presumably dynamically grown as the generator runs and has a bit of free space at the end.

lopuhin 2 hours ago|||
That's impressive how you figured out the reason for the difference in list of floats vs list of ints container size, framed as an interview question that would have been quite difficult I think
mikeckennedy 2 hours ago|||
It was. I updated the results to include the contained elements. I also updated the float list creation to match the int list creation.
oogali 5 hours ago||
It's important to know that these numbers will vary based on what you're measuring, your hardware architecture, and how your particular Python binary was built.

For example, my M4 Max running Python 3.14.2 from Homebrew (built, not poured) takes 19.73MB of RAM to launch the REPL (running `python3` at a prompt).

The same Python version launched on the same system with a single invocation for `time.sleep()`[1] takes 11.70MB.

My Intel Mac running Python 3.14.2 from Homebrew (poured) takes 37.22MB of RAM to launch the REPL and 9.48MB for `time.sleep`.

My number for "how much memory it's using" comes from running `ps auxw | grep python`, taking the value of the resident set size (RSS column), and dividing by 1,024.

1: python3 -c 'from time import sleep; sleep(100)'

mwkaufma 4 hours ago|
Why? If those micro benchmarks mattered in your domain, you wouldn't be using python.
coldtea 4 hours ago||
That's an "all or nothing" fallacy. Just because you use Python and are OK with some slowdown, doesn't mean you're OK with each and every slowdown when you can do better.

To use a trivial example, using a set instead of a list to check membership is a very basic replacement, and can dramatically improve your running time in Python. Just because you use Python doesn't mean anything goes regarding performance.

mwkaufma 3 hours ago||
That's an example of an algorithmic improvement (log n vs n), not a micro benchmark, Mr. Fallacy.
PhilipRoman 2 hours ago||
...and other hilarious jokes you can tell yourself!
More comments...