Top
Best
New

Posted by WoodenChair 1/1/2026

Python numbers every programmer should know(mkennedy.codes)
429 points | 186 commentspage 3
HenriTEL 1/3/2026|
The goal of the article is not to know the exact numbers by heart, duh!

Care about orders of magnitude instead, in combination with the speed of hardware https://gist.github.com/jboner/2841832 you'll have a good understanding of how much overhead is due to the language and the constructs to favor for speed improvements.

Just reading the code should give you a sense of its speed and where it will spend most time. Combined with general timing metrics you can also have a sense of the overhead of 3rd party libraries (pydantic I'm looking at you).

So yeah, I find that list quite useful during the code design, likely reduce time profiling slow code in prod.

pvtmert 1/1/2026||
There are lots of discussions about relatedness of these numbers for a regular software engineer.

Firstly, I want to start with the fact that the base system is a macOS/M4Pro, hence;

- Memory related access is possibly much faster than a x86 server. - Disk access is possibly much slower than a x86 server.

*) I took x86 server as the basis as most of the applications run on x86 Linux boxes nowadays, although a good amount of fingerprint is also on other ARM CPUs.

Although it probably does not change the memory footprint much, the libraries loaded and their architecture (ie. being Rosetta or not) will change the overall footprint of the process.

As it was mentioned on one of the sibling comments -> Always inspect/trace your own workflow/performance before making assumptions. It all depends on specific use-cases for higher-level performance optimizations.

jchmbrln 1/1/2026||
What would be the explanation for an int taking 28 bytes but a list of 1000 ints taking only 7.87KB?
wiml 1/1/2026|
That appears to be the size of the list itself, not including the objects it contains: 8 bytes per entry for the object pointer, and a kilo-to-kibi conversion. All Python values are "boxed", which is probably a more important thing for a Python programmer to know than most of these numbers.

The list of floats is larger, despite also being simply an array of 1000 8-byte pointers. I assume that it's because the int array is constructed from a range(), which has a __len__(), and therefore the list is allocated to exactly the required size; but the float array is constructed from a generator expression and is presumably dynamically grown as the generator runs and has a bit of free space at the end.

lopuhin 1/1/2026|||
That's impressive how you figured out the reason for the difference in list of floats vs list of ints container size, framed as an interview question that would have been quite difficult I think
mikeckennedy 1/1/2026|||
It was. I updated the results to include the contained elements. I also updated the float list creation to match the int list creation.
charlieyu1 1/1/2026||
Surprised that list comprehensions are only 26% faster than for loops. It used to feel like 4-5x
oogali 1/1/2026||
It's important to know that these numbers will vary based on what you're measuring, your hardware architecture, and how your particular Python binary was built.

For example, my M4 Max running Python 3.14.2 from Homebrew (built, not poured) takes 19.73MB of RAM to launch the REPL (running `python3` at a prompt).

The same Python version launched on the same system with a single invocation for `time.sleep()`[1] takes 11.70MB.

My Intel Mac running Python 3.14.2 from Homebrew (poured) takes 37.22MB of RAM to launch the REPL and 9.48MB for `time.sleep`.

My number for "how much memory it's using" comes from running `ps auxw | grep python`, taking the value of the resident set size (RSS column), and dividing by 1,024.

1: python3 -c 'from time import sleep; sleep(100)'

boutell 1/2/2026||
Knowing all of these is exactly what a developer shouldn't need to do. Fix "big O" problems in your own code. And be aware of a few exceptionally weird counterintuitive things if it matters on a "big O" level — like "you think this common operation is O(1) but it's actually O(N^2)". If there actually are any of those. And just get stuff done.

I guess you could find yourself in a situation where a 2X speedup is make or break and you're not a week away from needing 4X, etc. But not very often.

snakepit 1/1/2026||
This is helpful. Someone should create a similar benchmark for the BEAM. This is also a good reminder to continue working on snakepit [1] and snakebridge [2]. Plenty remains before they're suitable for prime time.

[1] https://hex.pm/packages/snakepit [2] https://hex.pm/packages/snakebridge

CmdrKrool 1/1/2026||
I'm confused by this:

  String operations in Python are fast as well. f-strings are the fastest formatting style, while even the slowest style is still measured in just nano-seconds.
  
  Concatenation (+)   39.1 ns (25.6M ops/sec)
  f-string            64.9 ns (15.4M ops/sec)
It says f-strings are fastest but the numbers show concatenation taking less time? I thought it might be a typo but the bars on the graph reflect this too?
Liquid_Fire 1/2/2026||
Perhaps it's because in all but the simplest cases, you need 2 or more concatenations to achieve the same result as one single f-string?

  "literal1 " + str(expression) + " literal2"
vs

  f"literal1 {expression} literal2"
The only case that would be faster is something like: "foo" + str(expression)
Izkata 1/2/2026||
String concatenation isn't usually considered a "formatting style", that refers to the other three rows of the table which use a template string and have specialized syntax inside it to format the values.
mopsi 1/2/2026||
It is always a good idea to have at least a rough understanding of how much operations in your code cost, but sometimes very expensive mistakes end up in non-obvious places.

If I have only plain Python installed and a .py file that I want to test, then what's the easiest way to get a visualization of the call tree (or something similar) and the computational cost of each item?

belabartok39 1/1/2026|
Hmmmm, there should absolutely be standard deviations for this type of work. Also, what is N number of runs? Does it say somewhere?
mikeckennedy 1/1/2026|
It is open source, you could just look. :) But here is a summary for you. It's not just one run and take the number:

Benchmark Iteration Process

Core Approach:

- Warmup Phase: 100 iterations to prepare the operation (default)

- Timing Runs: 5 repeated runs (default), each executing the operation a specified number of times

- Result: Median time per operation across the 5 runs

Iteration Counts by Operation Speed: - Very fast ops (arithmetic): 100,000 iterations per run

- Fast ops (dict/list access): 10,000 iterations per run

- Medium ops (list membership): 1,000 iterations per run

- Slower ops (database, file I/O): 1,000-5,000 iterations per run

Quality Controls:

- Garbage collection is disabled during timing to prevent interference

- Warmup runs prevent cold-start bias

- Median of 5 runs reduces noise from outliers

- Results are captured to prevent compiler optimization elimination

Total Executions: For a typical benchmark with 1,000 iterations and 5 repeats, each operation runs 5,100 times (100 warmup + 5×1,000 timed) before reporting the median result.

belabartok39 1/1/2026||
That answers what N is (why not just say in the article). If you are only going to report medians, is there an appendix with further statistics such as confidence intervals or standard deviations. For serious benchmark, it would be essential to show the spread or variability, no?
More comments...