I guess this is one reason why object-orientation has such a bad reputation.
I once worked at a bank where the OO mentor had taught people that the only object they needed was "Tape" and have them replicate the structure of data on the old spooled tape reels.
The struct of arrays reminds me of this optimization.
Andrew Kelley: A Practical Guide to Applying Data Oriented Design (DoD)
you should check these two talks out then.
However it requires additional hardware to recognize patterns which benefit from prefetching, and every time the CPU prefetches data which ends up not being used it has both burned energy and memory bandwidth, and evicted data from the cache which might be needed (cache pollution).
As TFA mentions, a CPU does some predictions about what cache lines to prefetch, e.g. when you do sequential reads. Moreover, the x86_64 instruction set provides a prefetch instruction through which you are able to give the CPU a hint "hey, I'm gonna be using this soon, prepare accordingly, pretty please".
Still, the utility of prefetching is diminished if you only use a single byte from each cache line, because the mechanism generally depends on you doing other work while the next cache line is being fetched. So really the best case scenario is to take as much time as possible to work with what is already fetched, so that there is time for the next unit of data to be fetched in the meantime.