Top
Best
New

Posted by WoodenChair 1/1/2026

Python numbers every programmer should know(mkennedy.codes)
429 points | 186 comments
thundergolfer 1/1/2026|
A lot of people here are commenting that if you have to care about specific latency numbers in Python you should just use another language.

I disagree. A lot of important and large codebases were grown and maintained in Python (Instagram, Dropbox, OpenAI) and it's damn useful to know how to reason your way out of a Python performance problem when you inevitably hit one without dropping out into another language, which is going to be far more complex.

Python is a very useful tool, and knowing these numbers just makes you better at using the tool. The author is a Python Software Foundation Fellow. They're great at using the tool.

In the common case, a performance problem in Python is not the result of hitting the limit of the language but the result of sloppy un-performant code, for example unnecessarily calling a function O(10_000) times in a hot loop.

I wrote up a more focused "Python latency numbers you should know" as a quiz here https://thundergolfer.com/computers-are-fast

saagarjha 1/2/2026||
I do performance optimization for a system written in Python. Most of these numbers are useless to me, because they’re completely irrelevant until they become a problem, then I measure them myself. If you are writing your code trying to save on method calls, you’re not getting any benefit from using the language and probably should pick something else.
srean 1/2/2026||
It's always a balance.

Good designs do not happen in a vacuum but informed with knowledge of at least the outlines of the environment.

One can have a breakfast pursuing an idea -- let me spill some sticky milk on the dining table, who cares, I will clean up if it becomes a problem later.

Another is, it's not much of an overbearing constraint not to make a mess with spilt milk in the first place, maybe it will not be a big bother later, but it's not hurting me much now, to be not be sloppy, so let me be a little hygienic.

There's a balance between making a mess and cleaning up and not making a mess in the first place. The other extreme is to be so defensive about the possibility of creating a mess that it paralyses progress.

The sweet spot is somewhere between the extremes and having the ball-park numbers in the back of one's mind helps with that. It informs about the environment.

Scubabear68 1/1/2026|||
No.

Python’s issue is that it is incredibly slow in use cases that surprise average developers. It is incredibly slow at very basic stuff, like calling a function or accessing a dictionary.

If Python didn’t have such an enormous number of popular C and C++ based libraries it would not be here. It was saved by Numpy etc etc.

aragilar 1/2/2026|||
I'm not sure how Python can be described as "saved" by numpy et al., when the numerical Python ecosystem was there near the beginning, and the language and ecosystem have co-evolved? Why didn't Perl (with PDL), R or Ruby (or even php) succeed in the same way?
HenriTEL 1/2/2026||||
22ns for a function call and dictionary key lookup, that's actually surprisingly fast.
mrguyorama 1/2/2026|||
In that time the java app parsed 50 strings for object hierarchies (using a regex that isn't cached) and extracted values from a request object to a processing object, handled errors, and logged results.

3 times.

This is the naive version of that code, because "I will parallelize it later" and I was just getting the logic down.

Turns out, when you use programming languages that are fit for purpose, you don't have to obsess over every function call, because computers are fast.

I think people vastly underestimate how slow python is.

We are rebuilding an internal service in Java, going from python, and our half assed first attempts are over ten times faster, no engineering required, exactly because python takes forever just to call a function. The python version was dead, and would never get any faster without radical rebuilds and massive changes anyway.

It takes python 19ns to add two integers. Your CPU does it in about 0.3 ns..... in 2004.

That those ints take 28 bytes each to hold in memory is probably why the new Java version of the service takes 1/10th the memory as well.

igiveup 1/3/2026|||
22ns might be about 100 processor instructions. Somehow I doubt that any programming language can parse 50 strings in 100 instructions, let alone with naive code.
mrguyorama 1/5/2026||
You are right.

I have said something very wrong. Java may be fast but it isn't magic.

What I claimed is not possible and I should have realized that.

I can not correct my previous claim. I wish HN had longer edit windows and I wish HN allowed you to downvote older comments.

I cannot erase this wrong info

HenriTEL 1/3/2026|||
Python is slow but in my experience (that mostly relates to web services and data processing) I found that I/O was by far the biggest bottleneck. Waiting for the database, another http service or local storage, which often takes more than 1ms anyway.
Scubabear68 1/3/2026||
Yep.

Except when it’s not I/O.

dnautics 1/2/2026|||
i hate python but if your bottleneck is that sqlite query, optimizing a handful of addition operations is a wash. thats why you need to at least have a feel for these tables
NoteyComplexity 1/2/2026|||
Agreed, and on top of that:

I think these kind of numbers are everywhere and not just specific to Python.

In zig, I sometimes take a brief look to the amount of cpu cycles of various operations to avoid the amount of cache misses. While I need to aware of the alignment and the size of the data type to debloat a data structure. If their logic applies, too bad, I should quit programming since all languages have their own latency on certain operations we should aware of.

There are reasons to not use Python, but that particular reason is not the one.

oofbey 1/1/2026|||
I think both points are fair. Python is slow - you should avoid it if speed is critical, but sometimes you can’t easily avoid it.

I think the list itself is super long winded and not very informative. A lot of operations take about the same amount of time. Does it matter that adding two ints is very slightly slower than adding two floats? (If you even believe this is true, which I don’t.) No. A better summary would say “all of these things take about the same amount of time: simple math, function calls, etc. these things are much slower: IO.” And in that form the summary is pretty obvious.

microtonal 1/1/2026||
I think the list itself is super long winded and not very informative.

I agree. I have to complement the author for the effort put in. However it misses the point of the original Latency numbers every programmer should know, which is to build an intuition for making good ballpark estimations of the latency of operations and that e.g. A is two orders of magnitude more expensive than B.

i_am_a_peasant 1/1/2026|||
our build system is written in python, and i’d like it not to suck but still stay in python, so these numbers very much matter.
notepad0x90 1/2/2026|||
For some of these, there are alternative modules you can use, so it is important to know this. But if it really matters, I would think you'd know this already?

For me, it will help with selecting what language is best for a task. I think it won't change my view that python is an excellent language to prototype in though.

TacticalCoder 1/2/2026|||
> ... a function O(10_000) times in a hot loop

O(10_000) is a really weird notation.

tialaramex 1/2/2026||
Generously we could say they probably mean ~10_000 rather than O(10_000)
thundergolfer 1/3/2026||
I meant it as an order-of-magnitude notation, so means more like 10,000-90,000. Eg. calling the function 3,000 times is OK, but 30,000 is too much. Odd notation, yes, but I've picked it up somewhere along the way.
oofbey 1/4/2026||
I think it follows naturally from speech. People say "order ten thousand" pretty naturally. Big-O notation is often vocalized as "order". But technically it's a a clash of ideas when written that way.
nutjob2 1/1/2026||
> A lot of important and large codebases were grown and maintained in Python

How does this happen? Is it just inertia that cause people to write large systems in a essentially type free, interpreted scripting language?

hibikir 1/1/2026|||
Small startups end up writing code in whatever gets things working faster, because having too large a codebase with too much load is a champagne problem.

If I told you that we were going to be running a very large payments system, with customers from startups to Amazon, you'd not write it in ruby and put the data in MongoDB, and then using its oplog as a queue... but that's what Stripe looked like. They even hired a compiler team to add type checking to the language, as that made far more sense than porting a giant monorepo to something else.

xboxnolifes 1/1/2026||||
It's very simple. Large systems start as small systems.
dragonwriter 1/1/2026||
Large systems are often aggregates of small systems, too.
oivey 1/1/2026||||
It’s a nice and productive language. Why is that incomprehensible?
oofbey 1/1/2026||||
It’s very natural. Python is fantastic for going from 0 to 1 because it’s easy and forgiving. So lots of projects start with it. Especially anything ML focused. And it’s much harder to change tools once a project is underway.
passivegains 1/1/2026||
this is absolutely true, but there's an additional nuance: yes, python is fantastic, yes, it's easy and forgiving, but there are other languages like that too. ...except there really aren't. other than ruby and maybe go, every other popular language sacrifices ease of use for things that simply do not matter for the overwhelming majority of programs. much of python's popularity doesn't come from being easy and forgiving, it's that everything else isn't. for normal programming why would we subject ourselves to anything but python unless we had no choice?

while I'm on the soapbox I'll give java a special mention: a couple years ago I'd have said java was easy even though it's tedious and annoying, but I've become reacquainted with it for a high school program (python wouldn't work for what they're doing and the school's comp sci class already uses java.)

this year we're switching to c++.

zelphirkalt 1/1/2026|||
Omg, switching to C++ for pupils programming beginners ... "How to turn off the most students from computer programming?" 101. Really can't get much worse than C++ for beginners.
nightfly 1/2/2026|||
PSU (Oregon) uses C++ as just "c with classes" and ignores the rest of C++ for intro to programming courses. It frustrates people who already use C++ but otherwise works pretty well.
tialaramex 1/2/2026|||
We should distinguish "First language" classes (for Computer Scientists who will likely learn many other languages and are expected to graduate knowing enough to just pick up another language with self study in reasonable time) from "Only language" classes for subjects where you might find it useful to write some software. These have different goals, it wouldn't make sense to teach say, OCaml as the only language but it's entirely reasonable as a first language.
Izkata 1/2/2026||||
This was how we learned it in an intro class in highschool ages ago, worked pretty well there too.
jgalt212 1/2/2026|||
C++, The Good Parts
pjmlp 1/2/2026|||
Back in the 1990's, C++ used to be taught at high school students and first year university students.
maccard 1/3/2026||
I just went and checked my university - they’re still teaching c++ to first year uni students in 2025
pjmlp 1/2/2026|||
Cough, Lisp.
wiseowise 1/1/2026||||
Python has types, now even gradual static typing if you want to go further. It's irrelevant whether language is interpreted scripting if it solves your problem.
tjwebbnorfolk 1/1/2026||||
Most large things begin life as small things.
IshKebab 1/1/2026|||
Someone says "let's write a prototype in Python" and someone else says "are you sure we shouldn't use a a better language that is just as productive but isn't going to lock us into abysmal performance down the line?" but everyone else says "nah we don't need to worry about performance yet, and anyway it's just a prototype - we'll write a proper version when we need to"...

10 years later "ok it's too slow; our options are a) spend $10m more on servers, b) spend $5m writing a faster Python runtime before giving up later because nobody uses it, c) spend 2 years rewriting it and probably failing, during which time we can make no new features. a) it is then."

rented_mule 1/2/2026|||
What many startups need to succeed is to be able to pivot/develop/repeat very quickly to find a product+market that makes money. If they don't find that, and most don't, the millions you talk about never come due. They also rarely have enough developers, so developer productivity in the short term is vital to that iteration speed. If that startup turns into Dropbox or Instagram, the millions you mention are round-off error on many billions. Easy business decision, and startups are first and foremost businesses.

Some startups end up in between the two extremes above. I was at one of the Python-based ones that ended up in the middle. At $30M in annual revenue, Python was handling 100M unique monthly visitors on 15 cheap, circa-2010 servers. By the time we hit $1B in annual revenue, we had Spark for both heavy batch computation and streaming computation tasks, and Java for heavy online computational workloads (e.g., online ML inference). There were little bits of Scala, Clojure, Haskell, C++, and Rust here and there (with well over 1K developers, things creep in over the years). 90% of the company's code was still in Python and it worked well. Of course there were pain points, but there always are. At $1B in annual revenue, there was budget for investments to make things better (cleaning up architectural choices that hadn't kept up, adding static types to core things, scaling up tooling around package management and CI, etc.).

But a key to all this... the product that got to $30M (and eventually $1B+) looked nothing like what was pitched to initial investors. It was unlikely that enough things could have been tried to land on the thing that worked without excellent developer productivity early on. Engineering decisions are not only about technical concerns, they are also about the business itself.

gcanyon 1/1/2026||||
What language is “just as productive but isn't going to lock us into abysmal performance down the line”?

What makes that language not strictly superior to Python?

IshKebab 1/2/2026|||
Typescript, C#, Go, Rust.

I'd say they are almost strictly superior to Python, but there are some minor factors why you might still choose Python over those. E.g. arbitrary precision integers, or the REPL. Go is a bit tedious and Rust is harder to learn (but productive once you have).

But overall they would all be a better choice than Python. Yes even for startups who need to move fast.

gcanyon 1/3/2026||
I say this out of genuine surprise, not judgment -- I never would have guessed that native Typescript is significantly faster than native Python. To be fair, I don't know if "significantly" is justified. I get that Typescript is typed (heh) and Python is not, but compared to my daily driver language (also not typed) Python is so much faster that I subconsciously think of typing as not being the major performance unlock. But I guess once you optimize the rest, typing is (nearly) the final boss of performance.
nazgul17 1/2/2026|||
Loose typing makes you really fast at writing code, as long as you can keep all the details in your head. Python is great for smaller stuff. But crossed some threshold, the lack of a mechanism that has your back starts slowing you down.
gcanyon 1/2/2026||
Sure, my language of choice is more flexible than that: I can type

   put "test abc999 this" into x
   add 1 to char 4 to 6 of word 2 of x
   put x -- puts "test abc1000 this"
But I'm still curious -- what's the better language?
anhner 1/2/2026||||
If I made an app in python and in 10 years it grows so successful that it needs a $10m vertical scale or $5m rewrite, I wouldn't even complain.
fud101 1/2/2026|||
I don't know a better open source language than Python. Java and C# are both better (platforms) but they come with that obvious corporate catch.
pjmlp 1/2/2026||
You can still get to use Scala, Kotlin, Clojure, F#, all with better performance than Python, and similar prototyping capabilities.
fud101 1/3/2026||
Those are non-mainstream languages, I think the point stands. You would need to prove the case for Scala or Kotlin tbh, Scala is hideously complex and Kotlin like Python suffers from becoming too bloated, accumulating features and syntax that are either not required or difficult to remember. Clojure and F# are nice and all but very niche.
pjmlp 1/3/2026||
Python being complex isn't an issue for beginners, apparently.

Kotlin owns the mobile development market with 80% Android market share.

Scala was the AI before Python with Hadoop, Spark and friends.

Lisps might be niche, yet they were Python's flexibility, with machine code compilers, since 1958.

fooker 1/1/2026||
Counterintuitively: program in python only if you can get away without knowing these numbers.

When this starts to matter, python stops being the right tool for the job.

libraryofbabel 1/1/2026||
Or keep your Python scaffolding, but push the performance-critical bits down into a C or Rust extension, like numpy, pandas, PyTorch and the rest all do.

But I agree with the spirit of what you wrote - these numbers are interesting but aren’t worth memorizing. Instead, instrument your code in production to see where it’s slow in the real world with real user data (premature optimization is the root of all evil etc), profile your code (with pyspy, it’s the best tool for this if you’re looking for cpu-hogging code), and if you find yourself worrying about how long it takes to add something to a list in Python you really shouldn’t be doing that operation in Python at all.

eichin 1/1/2026||
"if you're not measuring, you're not optimizing"
Demiurge 1/1/2026|||
I agree. I've been living off Python for 20 years and have never needed to know any of these numbers, nor do I need them now, for my work, contrary to the title. I also regularly use profiling for performance optimization and opt for Cython, SWIG, JIT libraries, or other tools as needed. None of these numbers would ever factor into my decision-making.
AtlasBarfed 1/1/2026||
.....

You don't see any value in knowing that numbers?

Demiurge 1/2/2026|||
That's what I just said. There is zero value to me knowing these numbers. I assume that all python built in methods are pretty much the same speed. I concentrate on IO being slow, minimizing these operations. I think about CPU intensive loops that process large data, and I try to use libraries like numpy, DuckDB, or other tools to do the processing. If I have a more complicated system, I profile its methods, and optimize tight loops based on PROFILING. I don't care what the numbers in the article are, because I PROFILE, and I optimize the procedures that are the slowest, for example, using cython. Which part of what I am saying does not make sense?
KeplerBoy 1/2/2026||
That makes perfect sense. Especially since those numbers can change with new python versions.
TuringTest 1/1/2026|||
As others have pointed out, Python is better used in places where those numbers aren't relevant.

If they start becoming relevant, it's usually a sign that you're using the language in a domain where a duck-typed bytecode scripting-glue language is not well-suited.

MontyCarloHall 1/1/2026|||
Exactly. If you're working on an application where these numbers matter, Python is far too high-level a language to actually be able to optimize them.
Quothling 1/1/2026|||
Why? I've build some massive analytic data flows in Python with turbodbc + pandas which are basically C++ fast. It uses more memory which supports your point, but on the flip-side we're talking $5-10 extra cost a year. It could frankly be $20k a year and still be cheaper than staffing more people like me to maintain these things, rather than having a couple of us and then letting the BI people use the tools we provide for them. Similarily when we do embeded work, micro-python is just so much easier to deal with for our engineering staff.

The interoperability between C and Python makes it great, and you need to know these numbers on Python to know when to actually build something in C. With Zig getting really great interoperability, things are looking better than ever.

Not that you're wrong as such. I wouldn't use Python to run an airplane, but I really don't see why you wouldn't care about the resources just because you're working with an interpreted or GC language.

fooker 1/1/2026|||
> you need to know these numbers on Python to know when to actually build something in C

People usually approach this the other way, use something like pandas or numpy from the beginning if it solves your problem. Do not write matrix multiplications or joins in python at all.

If there is no library that solves your problem, it's a great indication that you should avoid python. Unless you are willing to spend 5 man-years writing a C or C++ library with good python interop.

Quothling 1/3/2026|||
> People usually approach this the other way, use something like pandas or numpy from the beginning if it solves your problem.

That is exactly how we approach it though. We didn't start out with turbodbc + pandas, it started as an sql alchemy and pandas service. Then when it was too slow, I got involved, found and dealth with the bottle necks. I'm not sure how you would find and fix such things without knowing the efficiency or lack there of in different parts of Python. Also, as you'll notice, we didn't write ur own stuff, we simply used more efficient Python libraries.

oivey 1/1/2026|||
People generally aren’t rolling their own matmuls or joins or whatever in production code. There are tons of tools like Numba, Jax, Triton, etc that you can use to write very fast code for new, novel, and unsolved problems. The idea that “if you need fast code, don’t write Python” has been totally obsolete for over a decade.
fooker 1/1/2026||
Yes, that's what I said.

If you are writing performance sensitive code that is not covered by a popular Python library, don't do it unless you are a megacorp that can put a team to write and maintain a library.

oivey 1/1/2026||
It isn’t what you said. If you want, you can write your own matmul in Numba and it will be roughly as fast as similar C code. You shouldn’t, of course, for the same reason handrolling your own matmuls in C is stupid.

Many problems can performantly solved in pure Python, especially via the growing set of tools like the JIT libraries I cited. Even more will be solvable when things like free threaded Python land. It will be a minority of problems that can’t be, if it isn’t already.

its-summertime 1/1/2026|||
From the complete opposite side, I've built some tiny bits of near irrelevant code where python has been unacceptable, e.g. in shell startup / in bash's PROMPT_COMMAND, etc. It ends up having a very painfully obvious startup time, even if the code is nearing the equivalent of Hello World

    time python -I -c 'print("Hello World")'
    real    0m0.014s
    time bash --noprofile -c 'echo "Hello World"'
    real    0m0.001s
dekhn 1/1/2026|||
What exactly do you need 1ms instead of 14ms startup time in a shell startup? The difference is barely perceptible.

Most of the time starting up is time spent seartching the filesystem for thousands of packages.

NekkoDroid 1/1/2026||
> What exactly do you need 1ms instead of 14ms startup time in a shell startup?

I think as they said: when dynamically building a shell input prompt it starts to become very noticable if you have like 3 or more of these and you use the terminal a lot.

dekhn 1/1/2026||
Ah, I only noticed the "shell startup" bit.

Yes, after 2-3 I agree you'd start to notice if you were really fast. I suppose at that point I'd just have Gemini rewrite the prompt-building commands in Rust (it's quite good at that) or merge all the prompt-building commands into a single one (to amortize the startup cost).

its-summertime 1/2/2026||
https://starship.rs/ perhaps? I should probably start using it again honestly.
fragmede 1/2/2026||
it feels good to have all that information at your fingertips but most of the time the default config is way too noisy.
bathtub365 1/1/2026|||
These basically seem like numbers of last resort. After you’ve profiled and ruled out all of the usual culprits (big disk reads, network latency, polynomial or exponential time algorithms, wasteful overbuilt data structures, etc) and need to optimize at the level of individual operations.
BiteCode_dev 1/2/2026||
Not at all.

Some of those number are very important:

- Set membership check is 19.0 ns, list is 3.85 μs. Knowing what data structure to use for the job is paramount.

- Write 1KB file is 35.1 μs but 1MB file is only 207 μs. Knowing the implications of I/O trade off is essential.

- sum() 1,000 integers is only 1,900 ns: Knowing to leverage the stdlib makes all the difference compared to manual loop.

Etc.

A few years ago I did a Python rewrite of a big clients code base. They had a massive calculation process that took 6 servers 2 hours.

We got it down to 1 server, 10 minutes, and it was not even the goal of the mission, just the side effect of using Python correctly.

In the end, quadratic behavior is quadratic behavior.

fooker 1/2/2026|||
List membership check being significantly slower than set membership check is freshman computer science 101.
zelphirkalt 1/1/2026||
I doubt there is much to gain from knowing how much memory an empty string takes. The article or the listed numbers have a weird fixation on memory usage numbers and concrete time measurements. What is way more important to "every programmer" is time and space complexity, in order to avoid designing unnecessarily slow or memory hungry programs. Under the assumption of using Python, what is the use of knowing that your int takes 28 bytes? In the end you will have to determine, whether the program you wrote meats the performance criteria you have and if it does not, then you need a smarter algorithm or way of dealing with data. It helps very little to know that your 2d-array of 1000x1000 bools is so and so big. What helps is knowing, whether it is too much and maybe you should switch to using a large integer and a bitboard approach. Or switch language.
kingstnap 1/1/2026||
I disagree. Performance is a leaky abstraction that *ALWAYS* matters.

Your cognition of it is either implicit or explicit.

Even if you didn't know for example that list appends was linear and not quadratic and fairly fast.

Even if you didn't give a shit if simple programs were for some reason 10000x slower than they needed to be because it meets some baseline level of good enough / and or you aren't the one impacted by the problems inefficacy creates.

Library authors beneath you would still know and the APIs you interact with and the pythonic code you see and the code LLMS generate will be affected by that leaky abstraction.

If you think that n^2 naive list appends is a bad example its not btw, python string appends are n^2 and that has and does affect how people do things, f strings for example are lazy.

Similarly a direct consequence of dictionaries being fast in Python is that they are used literally everywhere. The old Pycon 2017 talks from Raymond talk about this.

Ultimately what the author of the blog has provided is this sort of numerical justification for the implicit tacit sort of knowledge performance understanding gives.

Qem 1/1/2026||
> Under the assumption of using Python, what is the use of knowing that your int takes 28 bytes?

Relevant if your problem demands instatiation of a large number of objects. This reminds me of a post where Eric Raymond discusses the problems he faced while trying to use Reposurgeon to migrate GCC. See http://esr.ibiblio.org/?p=8161

Aurornis 1/1/2026||
A meta-note on the title since it looks like it’s confusing a lot of commenters: The title is a play on Jeff Dean’s famous “Latency Numbers Every Programmer Should Know” from 2012. It isn’t meant to be interpreted literally. There’s a common theme in CS papers and writing to write titles that play upon themes from past papers. Another common example is the “_____ considered harmful” titles.
willseth 1/1/2026||
Good callout on the paper reference, but this author gives gives every indication that he’s dead serious in the first paragraph. I don’t think commenters are confused.
dekhn 1/1/2026|||
That doc predates 2012 significantly.

From what I've been able to glean, it was basically created in the first few years Jeff worked at Google, on indexing and serving for the original search engine. For example, the comparison of cache, RAM, and disk: determined whether data was stored in RAM (the index, used for retrieval) or disk (the documents, typically not used in retrieval, but used in scoring). Similarly, the comparison of California-Netherlands time- I believe Google's first international data cetner was in NL and they needed to make decisions about copying over the entire index in bulk versus serving backend queries in the US with frontends in the NL.

The numbers were always going out of date; for example, the arrival of flash drives changed disk latency significantly. I remember Jeff came to me one day and said he'd invented a compression algorithm for genomic data "so it can be served from flash" (he thought it would be wasteful to use precious flash space on uncompressed genomic data).

Kwpolska 1/1/2026|||
This title only works if the numbers are actually useful. Those are not, and there are far too many numbers for this to make sense.
Aurornis 1/1/2026||
The title was meant to be taken literally, as in you're supposed to memorize all of these numbers. It was meant as an in-joke reference to the original writing to signal that this document was going to contain timing values for different operations.

I completely understand why it's frustrating or confusing by itself, though.

shanemhansen 1/1/2026||
Going to write a real banger of a paper called "latency numbers considered harmful is all you need" and watch my academic cred go through the roof.
AnonymousPlanet 1/1/2026||
" ... with an Application to the Entscheidungsproblem"
willseth 1/1/2026||
Every Python programmer should be thinking about far more important things than low level performance minutiae. Great reference but practically irrelevant except in rare cases where optimization is warranted. If your workload grows to the point where this stuff actually matters, great! Until then it’s a distraction.
HendrikHensen 1/1/2026||
Having general knowledge about the tools you're working with is not a distraction, it's an intellectual enrichment in any case, and can be a valuable asset in specific cases.
willseth 1/1/2026||
Knowing that an empty string is 41 bytes or how many ns it takes to do arithmetic operations is not general knowledge.
oivey 1/1/2026||
How is it not general knowledge? How do you otherwise gauge if your program is taking a reasonable amount of time, and, if not, how do you figure out how to fix it?
dirtbag__dad 1/2/2026|||
In my experience, which is series A or earlier data intensive SaaS, you can gauge whether a program is taking a reasonable amount of time just by running it and using your common sense.

P50 latency for a fastapi service’s endpoint is 30+ seconds. Your ingestion pipeline, which has a data ops person on your team waiting for it to complete, takes more than one business day to run.

Your program is obviously unacceptable. And, your problems are most likely completely unrelated to these heuristics. You either have an inefficient algorithm or more likely you are using the wrong tool (ex OLTP for OLAP) or the right tool the wrong way (bad relational modeling or an outdated LLM model).

If you are interested in shaving off milliseconds in this context then you are wasting your time on the wrong thing.

All that being said, I’m sure that there’s a very good reason to know this stuff in the context of some other domains, organizations, company size/moment. I suspect these metrics are irrelevant to disproportionately more people reading this.

At any rate, for those of us who like to learn, I still found this valuable but by no means common knowledge

oivey 1/2/2026||
I'm not sure it's common knowledge, but it is general knowledge. Not all HNers are writing web apps. Many may be writing truly compute bound applications.

In my experience writing computer vision software, people really struggle with the common sense of how fast computers really are. Some knowledge like how many nanoseconds an add takes can be very illuminating to understand whether their algorithm's runtime makes any sense. That may push loose the bit of common sense that their algorithm is somehow wrong. Often I see people fail to put bounds on their expectations. Numbers like these help set those bounds.

dirtbag__dad 1/2/2026||
Thanks this is helpful framing!
willseth 1/1/2026||||
You gauge with metrics and profiles, if necessary, and address as needed. You don’t scrutinize every line of code over whether it’s “reasonable” in advance instead of doing things that actually move the needle.
oivey 1/1/2026||
These are the metrics underneath it all. Profiles tell you what parts are slow relative to others and time your specific implementation. How long should it take to sum together a million integers?
willseth 1/1/2026||
It literally doesn’t matter unless it impacts users. I don’t know why you would waste time on non problems.
oivey 1/1/2026|||
No one is suggesting “wasting time on non problems.” You’re tilting at windmills.
willseth 1/2/2026||
Read more carefully
cycomanic 1/1/2026|||
But these performance numbers are meaningless without some sort of standard comparison case. So if you measure that e.g. some string operation takes 100ns, how do you compare against the numbers given here? Any difference could be due to PC, python version or your implementation. So you have to do proper benchmarking anyway.
ehaliewicz2 1/2/2026||
If your program does 1 million adds, but it takes significantly longer than 19 milliseconds, you can guess that something else is going on.
amelius 1/1/2026|||
Yeah, if you hit limits just look for a module that implements the thing in C (or write it). This is how it was always done in Python.
ryandrake 1/1/2026|||
I am currently (as we type actually LOL) doing this exact thing in a hobby GIS project: Python got me a prototype and proof of concept, but now that I am scaling the data processing to worldwide, it is obviously too slow so I'm rewriting it (with LLM assistance) in C. The huge benefit of Python is that I have a known working (but slow) "reference implementation" to test against. So I know the C version works when it produces identical output. If I had a known-good Python version of past C, C++, Rust, etc. projects I worked on, it would have been most beneficial when it came time to test and verify.
pjmlp 1/2/2026||||
I rather have a JIT that avoids the "rewrite in C", unless there is no way around it, after heroic optimizations.
willseth 1/1/2026|||
Sometimes it’s as simple as finding the hotspot with a profiler and making a simple change to an algorithm or data structure, just like you would do in any language. The amount of handwringing people do about building systems with Python is silly.
kc0bfv 1/1/2026||
I agree - however, that has mostly been a feeling for me for years. Things feel fast enough and fine.

This page is a nice reminder of the fact, with numbers. For a while, at least, I will Know, instead of just feel, like I can ignore the low level performance minutiae.

f311a 1/1/2026||

   > Strings
   >The rule of thumb for strings is the core string object takes 41 bytes. Each       additional character is 1 byte.
That's misleading. There are three types of strings in Python (1, 2 and 4 bytes per character).

https://rushter.com/blog/python-strings-and-memory/

riazrizvi 1/1/2026||
The titles are oddly worded. For example -

  Collection Access and Iteration
  How fast can you get data out of Python’s built-in collections? Here is a dramatic example of how much faster the correct data structure is. item in set or item in dict is 200x faster than item in list for just 1,000 items!
It seems to suggest an iteration for x in mylist is 200x slower than for x in myset. It’s the membership test that is much slower. Not the iteration. (Also for x in mydict is an iteration over keys not values, and so isn’t what we think of as an iteration on a dict’s ‘data’).

Also the overall title “Python Numbers Every Programmer Should Know” starts with 20 numbers that are merely interesting.

That all said, the formatting is nice and engaging.

robertclaus 1/1/2026||
I liked reading through it from a "is modern Python doing anything obviously wrong?" perspective, but strongly disagree anyone should "know" these numbers. There's like 5-10 primitives in there that everyone should know rough timings for; the rest should be derived with big-O algorithm and data structure knowledge.
sjducb 1/1/2026||
It’s missing the time taken to instantiate a class.

I remember refactoring some code to improve readability, then observing something that was previously a few microseconds take tens of seconds.

The original code created a large list of lists. Each child list had 4 fields each field was a different thing, some were ints and one was a string.

I created a new class with the names of each field and helper methods to process the data. The new code created a list of instances of my class. Downstream consumers of the list could look at the class to see what data they were getting. Modern Python developers would use a data class for this.

The new code was very slow. I’d love it if the author measured the time taken to instantiate a class.

smcin 1/1/2026||
Instantiating classes is in general not a performance issue in Python. Your issue here strongly sounds like you're abusing OO to pass a list of instances into every method and downstream call (not just the usual reference to self, the instance at hand). Don't do that, it shouldn't be necessary. It sounds like you're trying to get a poor-man's imitation of classmethods, without identifying and refactoring whatever it is that methods might need to access from other instances.

Please post your code snippet on StackOverflow ([python] tag) or CodeReview.SE so people can help you fix it.

> created a new class with the names of each field and helper methods to process the data. The new code created a list of instances of my class. Downstream consumers of the list could look at the class to see what data they were getting.

lifeisstillgood 1/1/2026||
I went to the doctor and I said “It hurts when I do this”

The doctor said, “don’t do that”.

Edit: so yeah a rather snarky reply. Sorry. But it’s worth asking why we want to use classes and objects everywhere. Alan Kay is well known for saying object orientated is about message passing (mostly by Erlang people).

A list of lists (where each list is four different types repeated) seems a fine data structure, which can be operated on by external functions, and serialised pretty easily. Turning it into classes and objects might not be a useful refactoring, I would certainly want to learn more before giving the go ahead.

sjducb 1/2/2026|||
The main reason why is to keep a handle on complexity.

When you’re in a project with a few million lines of code and 10 years of history it can get confusing.

Your data will have been handled by many different functions before it gets to you. If you do this with raw lists then the code gets very confusing. In one data structure customer name might be [4] and another structure might have it in [9]. Worse someone adds a new field in [5] then when two lists get concatenated name moves to [10] in downstream code which consumes the concatenated lists.

krior 1/1/2026|||
I mean it sounds reasonable to me to wrap the data into objects.

customers[3][4]

is a lot less readable than

customers[3].balance

lifeisstillgood 1/1/2026||
Absolutely

But hidden in this is the failing of every sql-bridge ever - it’s definitely easier for a programmer to read customers(3).balance but the trade off now is I have to provide class based semantics for all operations - and that tends to hide (oh you know, impedance mismatch).

I would far prefer “store the records as plain as we can” and add on functions to operate over it (think pandas stores basically just ints floats and strings as it is numpy underneath)

(Yes you can store pyobjects somehow but the performance drops off a cliff.)

Anyway - keep the storage and data structure as raw and simple as possible and write functions to run over it. And move to pandas or SQLite pretty quickly :-)

fragmede 1/2/2026||
customers[3]['balance'] seems like a reasonable middle ground, no?
lifeisstillgood 1/3/2026||
It depends - most likely that’s storing as a language specific data structure (dict in python then serialised to disk). At this point we’re walking into harder to turn around decisions and might as well do it properly. It still really “it depends” …
perrygeo 1/1/2026|
> small int (0-256) cached

It's -5 to 256, and these have very tricky behavior for programmers that confuse identity and equality.

  >>> a = -5
  >>> b = -5
  >>> a is b
  True
  >>> a = -6
  >>> b = -6
  >>> a is b
  False
Tostino 1/2/2026||
Java does similar. Confusing for beginners who run into it for the first time for sure.
fragmede 1/2/2026||
wat
More comments...