Go Optimization Guide

Posted by jedeusus 3/31/2025

484 points | 161 commentspage 2

EdwardDiego 4/1/2025|

Huh, this surprises me about Golang, didn't realise it was so similar to C with struct alignment. https://goperf.dev/01-common-patterns/fields-alignment/#why-...

Cthulhu_ 4/1/2025|

Yup, it's a fairly low-level language intended as a replacement to C/C++ but for modern day systems (networked, concurrent, etc). You don't have manual memory management per se but you still need to decide on heap vs stack and consider the hardware.

jerf 4/1/2025||

"you still need to decide on heap vs stack"

No, you can't decide on heap vs stack. Go's compiler decides that. You can get feedback about the decision if you pass the right debug flags, and then based on that you may be able to tickle the optimizer into changing its mind based on code changes you make, but it'll always be an optimization decision subject to change without notice in any future versions of Go, just like any other language where you program to the optimizer.

If you need that level of control, Go is generally not the right language. However, I would encourage developers to be sure they need that level of control before taking it, and that's not special pleading for Go but special pleading for the entire class of "languages that are pretty fast but don't offer quite that level of control". There's still a lot of programmers running around with very 200x ideas of performance, even programmers who weren't programmers at the time, who must have picked it up by osmosis.

(My favorite example to show 200x perf ideas is paginated APIs where the "pages" are generally chosen from the set {25, 50, 100} for "performance reasons". In 2025, those are terribly, terribly small numbers. Presenting that many results to humans makes sense, but my default size for paginating API calls nowadays is closer to 1000, and that's the bottom end, for relatively expensive things. If I have no reason to think it's expensive, tack another order of magnitude on to my minimum.)

fmstephe 4/2/2025||

Just an anecdote from work to back this up. I wrote a system that was taking requests, making another request to a service (that basically wrapped elasticsearch) and then processed the results and returned to the results to the caller.

By default the elastic-search results were paginated and defaulted to some small number in the order of 25..100. I increased this steadily upwards beyond 100,000 to the point where every request always returned the entire result in the first page. And it _transformed_ the performance of the service. From one that was unbearably slow for human users to one that _felt_ instantaneous. I had real perf numbers at the time, but now all I have are the impressions.

But the lesson on the impact of the overhead of those paginated calls was important. Obviously everything is specific and YMMV, but this something worth having in the back of your mind.

jensneuse 3/31/2025||

You can often fool yourself by using sync.Pool. pprof looks great because no allocs in benchmarks but memory usage goes through the roof. It's important to measure real world benefits, if any, and not just synthetic benchmarks.

makeworld 3/31/2025||

Why would Pool increase memory usage?

jensneuse 4/1/2025|||

Let's say you have constantly 1k requests per second and for each request, you need one buffer, each 1 MiB. That means you have 1 GiB in the pool. Without a pool, there's a high likelihood that you're using less. Why? Because in reality, most requests need a 1 MiB buffer but SOME require a 5 MiB buffer. As such, your pool grows over time as you don't have control over the distribution of the size of the pool items.

So, if you have predictable object sizes, the pool will stay flat. If the workloads are random, you have a new problem because, like in this scenario, your pool grows 5x more.

You can solve this problem. E.g. you can only give back items into the pool that are small enough. Alternatively, you could have a small pool and a big pool, but now you're playing cat and mouse.

In such a scenario, it could also work to simply allocate and use GC to clean up. Then you don't have to worry about memory and the lifetime of objects, which makes your code much simpler to read and reason about.

jerf 4/1/2025|||

Long before sync.Pool was a thing, I wrote a pool for []bytes: https://github.com/thejerf/gomempool I haven't taken it down because it isn't obsoleted by sync.Pool because the pool is aware of the size of the []bytes. Though it may be somewhat obsoleted by the fact the GC has gotten a lot better since I wrote it, somewhere in the 1.3 time frame. But it solve exactly that problem I had; relatively infrequent messages from the computer's point of view (e.g., a system that is probably getting messages every 50ms or so), but that had to be pulled into buffers completely to process, and had highly irregular sizes. The GC was doing a ton of work when I was allocating them all the time but it was easy to reuse buffers in my situation.

theThree 4/1/2025|||

>That means you have 1 GiB in the pool.

This only happen when every request last 1 second.

xyproto 3/31/2025|||

I guess if you allocate more than you need upfront that it could increase memory usage.

throwaway127482 4/1/2025||

I don't get it. The pool uses weak pointers under the hood right? If you allocate too much up front, the stuff you don't need will get garbage collected. It's no worse than doing the same without a pool, right?

cplli 4/1/2025||

What the top commenter probably failed to mention, and jensneuse tried to explain is that sync.Pool makes an assumption that the size cost of pooled items are similar. If you are pooling buffers (eg: []byte) or any other type with backing memory which during use can/will grow beyond their initial capacity, can lead to a scenario where backing arrays which have grown to MB capacities are returned by the pool to be used for a few KB, and the KB buffers are returned to high memory jobs which in turn grow the backing arrays to MB and return to the pool.

If that's the case, it's usually better to have non-global pools, pool ranges, drop things after a certain capacity, etc.:

https://github.com/golang/go/issues/23199 https://github.com/golang/go/blob/7e394a2/src/net/http/h2_bu...

nopurpose 4/1/2025||

also no one GCs sync.Pool. After a spike in utilization, live with increased memory usage until program restart.

ncruces 4/1/2025||

That's just not true. Pool contents are GCed after two cycles if unused.

nopurpose 4/1/2025||

What do you mean? Pool content can't be GCed , because there are references to it: pool itself.

What people do is what this article suggested, pool.Get/pool.Put, which makes it only grow in size even if load profile changes. App literally accumulated now unwanted garbage in pool and no app I have seen made and attempt to GC it.

ahmedtd 4/1/2025|||

From the sync.Pool documentation:

> If the Pool holds the only reference when this happens, the item might be deallocated.

Conceptually, the pool is holding a weak pointer to the items inside it. The GC is free to clean them up if it wants to, when it gets triggered.

ashf023 4/1/2025||||

sync.Pool uses weak references for this purpose. The pool does delay GC, and if your pooled objects have pointers, those are real and can be a problem. If your app never decreases the pool size, you've probably reached a stable equilibrium with usage, or your usage fits a pattern that GC has trouble with. If Go truly cannot GC your pooled objects, you probably have a memory leak. E.g. if you have Nodes in a graph with pointers to each other in the pool, and some root pointer to anything in the pool, that's a memory leak

andrewf 4/2/2025||||

https://github.com/golang/go/blob/master/src/sync/pool.go#L2...

The GC calls out to sync.Pool's cleanup.

nikolayasdf123 4/1/2025||

nicely organised. I feel like this could grow into community driven current state-of-the-art of optimisation tips for Go. just need to allow people edit/comment their input easily (preferably in-place). I see there is github repo, but my bet people would not actively add their input/suggestions/research there, it is hidden too far from the content/website itself

whalesalad 4/1/2025|

For sure. Feels like the broader dev community could use a generic wiki platform like this, where every language or toolkit can have its own section. Not just for performance/optimization, but also for idiomatic ways to use a language in practice.

inadequatespace 4/2/2025||

Why doesn’t the compiler pack structs for you if it’s as easy as shuffling around based on type?

greatgib 4/5/2025|

Because the organization of your struct is exactly how the memory have to be organized and that might be important for you. The compiler doesn't know your intended usage so it can't rework the structure at its will.

For example, you might take the block of memory and data and send it to another system that will decode it. Or you can take the block of memory and store it in a file or in a hardware device where it means something in this specific order.

kunley 4/1/2025||

"Although the struct Data contains a [1024]int array, which is 4 KB (assuming int is 4 bytes on the architecture used)"

Huh,what?

I mean, who uses 32b architecture by default?

bombela 4/1/2025|

Most C/C++ compilers have 32b int on 64b arch. Maybe the confusion comes from that.

Also it would be 4KiB not 4KB.

kunley 4/7/2025||

What?

Article is about the Go compiler. On 64 bit arch Go int is 64 bits.

_345 4/1/2025||

Anyone know of a resource like this but for Python 3?

asicsp 4/1/2025|

This might help: https://pythonspeed.com/datascience/

nikolayasdf123 4/1/2025||

nice article. good to see statements backed up by Benchmarks right there

black_13 4/1/2025||

[dead]

devcoder78 4/1/2025||

[dead]