Top
Best
New

Posted by WoodenChair 1/1/2026

Python numbers every programmer should know(mkennedy.codes)
429 points | 186 commentspage 5
Y_Y 1/1/2026|
int is larger than float, but list of floats is larger than list of ints

Then again, if you're worried about any of the numbers in this article maybe you shouldn't be using Python at all. I joke, but please do at least use Numba or Numpy so you aren't paying huge overheads for making an object of every little datum.

jiggawatts 1/2/2026||
My god, the memory bloat is out of this world compared to platforms like the JVM or .NET, let alone C++ or Rust!
intalentive 1/2/2026||
Great resource. I would also like to see a comparison of variable access times across different scopes.
Redoubts 1/1/2026||
> Attribute read (obj.x) 14 ns

note that protobuf attributes are 20-50x worse than this

rozab 1/2/2026||
I wonder why an empty set takes so much more memory than an empty dict
m3047 1/1/2026||
+1 but I didn't see pack / unpack...
lcnmrn 1/1/2026||
LLMs can improve Python code performance. I used it myself on a few projects.
iamnotsure 1/2/2026||
Exactly wrong.
zbentley 1/1/2026||
I have some questions and requests for clarification/suspicious behavior I noticed after reviewing the results and the benchmark code, specifically:

- If slotted attribute reads and regular attribute reads are the same latency, I suspect that either the regular class may not have enough "bells on" (inheritance/metaprogramming/dunder overriding/etc) to defeat simple optimizations that cache away attribute access, thus making it equivalent in speed to slotted classes. I know that over time slotting will become less of a performance boost, but--and this is just my intuition and I may well be wrong--I don't get the impression that we're there yet.

- Similarly "read from @property" seems suspiciously fast to me. Even with descriptor-protocol awareness in the class lookup cache, the overhead of calling a method seems surprisingly similar to the overhead of accessing a field. That might be explained away by the fact that property descriptors' "get" methods are guaranteed to be the simplest and easiest to optimize of all call forms (bound method, guaranteed to never be any parameters), and so the overhead of setting up the stack/frame/args may be substantially minimized...but that would only be true if the property's method body was "return 1" or something very fast. The properties tested for these benchmarks, though, are looking up other fields on the class, so I'd expect them to be a lot slower than field access, not just a little slower (https://github.com/mikeckennedy/python-numbers-everyone-shou...).

- On the topic of "access fields of objects" (properties/dataclasses/slots/MRO/etc.), benchmarks are really hard to interpret--not just these benchmarks, all of them I've seen. That's because there are fundamentally two operations involved: resolving a field to something that produces data for it, and then accessing the data. For example, a @property is in a class's method cache, so resolving "instance.propname" is done at the speed of the methcache. That might be faster than accessing "instance.attribute" (a field, not a @property or other descriptor), depending on the inheritance geometry in play, slots, __getattr[ibute]__ overrides, and so on. On the other hand, accessing the data at "instance.propname" is going to be a lot more expensive for most @properties (because they need to call a function, use an argument stack, and usually perform other attribute lookups/call other functions/manipulate locals, etc); accessing data at "instance.attribute" is going to be fast and constant-time--one or two pointer-chases away at most.

- Nitty: why's pickling under file I/O? Those benchmarks aren't timing pickle functions that perform IO, they're benchmarking the ser/de functionality and thus should be grouped with json/pydantic/friends above.

- Asyncio's no spring chicken, but I think a lot of the benchmarks listed tell a worse story than necessary, because they don't distinguish between coroutines, Tasks, and Futures. Coroutines are cheap to have and call, but Tasks and Futures have a little more overhead when they're used (even fast CFutures) and a lot more overhead to construct since they need a lot more data resources than just a generator function (which is kinda what a raw coroutine desugars to, but that's not as true as most people think it is...another story for another time). Now, "run_until_complete{}" and "gather()" initially take their arguments and coerce them into Tasks/Futures--that detection, coercion, and construction takes time and consumes a lot of overhead. That's good to know (since many people are paying that coercion tax unknowingly), but it muddies the boundary between "overhead of waiting for an asyncio operation to complete" and "overhead of starting an asyncio operation". Either calling the lower-level functions that run_until_complete()/gather() use internally, or else separating out benchmarks into ones that pass Futures/Tasks/regular coroutines might be appropriate.

- Benchmarking "asyncio.sleep(0)" as a means of determining the bare-minimum await time of a Python event loop is a bad idea. sleep(0) is very special (more details here: https://news.ycombinator.com/item?id=46056895) and not representative. To benchmark "time it takes for the event loop to spin once and produce a result"/the python equivalent of process.nextTick, it'd be better to use low-level loop methods like "call_soon" or defer completion to a Task and await that.

867-5309 1/1/2026|
tfa mentions running benchmark on a multi-core platform, but doesn't mention if benchmark results used multithreading.. a brief look at the code suggests not
More comments...