Top
Best
New

Posted by i_love_limes 6/26/2025

AlphaGenome: AI for better understanding the genome(deepmind.google)
529 points | 176 comments
RivieraKid 6/26/2025|
I wish there's some breakthrough in cell simulation that would allow us to create simulations that are similarly useful to molecular dynamics but feasible on modern supercomputers. Not being able to see what's happening inside cells seems like the main blocker to biological research.
bglazer 6/27/2025||
Molecular dynamics describes very short, very small dynamics, like on the scale of nanoseconds and angstroms (.1nm)

What you’re describing is more like whole cell simulation. Whole cells are thousands of times larger than a protein and cellular processes can take days to finish. Cells contain millions of individual proteins.

So that means that we just can’t simulate all the individual proteins, it’s way too costly and might permanently remain that way.

The problem is that biology is insanely tightly coupled across scales. Cancer is the prototypical example. A single mutated letter in DNA in a single cell can cause a tumor that kills a blue whale. And it works the other way too. Big changes like changing your diet gets funneled down to epigenetic molecular changes to your DNA.

Basically, we have to at least consider molecular detail when simulating things as large as a whole cell. With machine learning tools and enough data we can learn some common patterns, but I think both physical and machine learned models are always going to smooth over interesting emergent behavior.

Also you’re absolutely correct about not being able to “see” inside cells. But, the models can only really see as far as the data lets them. So better microscopes and sequencing methods are going to drive better models as much as (or more than) better algorithms or more GPUs.

fainpul 6/27/2025|||
> A single mutated letter in DNA in a single cell can cause a tumor that kills a blue whale.

Side note: whales rarely get cancer.

https://en.wikipedia.org/wiki/Peto's_paradox

https://www.youtube.com/watch?v=1AElONvi9WQ

cysteinechapel 6/30/2025||||
Scales can also decouple from each other. Complex trait genetic variation at the whole genome level acts predominantly in an additive fashion even though individual genes and variants have clearly non-linear epistatic interactions.
howardyou 7/10/2025|||
[dead]
mbeavitt 6/27/2025|||
Simulating the real world at increasingly accurate scales is not that useful, because in biology - more than any other field - our assumptions are incorrect/flawed most of the time. The most useful thing simulations allow us to do is directly test those assumptions and in these cases, the simpler the model the better. Jeremy Gunawardena wrote a great piece on this: https://bmcbiol.biomedcentral.com/articles/10.1186/1741-7007...
cysteinechapel 6/30/2025|||
Plenty of simple models in biology that don't model the underlying details provide profoundly generalizable insights across scales. The percolation threshold model explains phase transition behavior from the savanna-forest transition to the complement immune system to epidemics to morphogenesis to social networks.
kylehotchkiss 6/27/2025|||
And the extremely difficult, expensive, and often resultless process of confirming/denying these assumptions is one of the greatest uses of tax dollars and university degrees I can think of, yet, the current admin has taken the perspective that it's all Miasma but also cut the EPA, which by their logic, would stop the Miasma
andrewchoi 6/26/2025|||
The folks at Arc are trying to build this! https://arcinstitute.org/news/virtual-cell-model-state
dekhn 6/26/2025||
STATE is not a simulation. It's a trained graphical model that does property prediction as a result of a perturbation. There is no physical model of a cell.

Personally, I think arc's approach is more likely to produce usable scientific results in a reasonable amount of time. You would have to make a very coarse model of the cell to get any reasonable amount of sampling and you would probably spend huge amounts of time computing things which are not relevant to the properties you care amount. An embedding and graphical model seems well-suited to problems like this, as long as the underlying data is representative and comprehensive.

ahns 6/27/2025|||
You may enjoy this, from a top-down experimental perspective (https://www.nikonsmallworld.com/galleries/small-world-in-mot...). Only a few entries so far show intracellular dynamics (like this one: https://www.nikonsmallworld.com/galleries/2024-small-world-i...), but I always enjoy the wide variety of dynamics some groups have been able to capture, like nervous system development (https://www.nikonsmallworld.com/galleries/2018-small-world-i...); absolutely incredible.
RivieraKid 6/27/2025||
Very interesting, thanks.
kylehotchkiss 6/27/2025|||
How can you simulate what is not yet reliably known? Ugh it's so frustrating to hear AI 'thought leaders' going on and on about this being a pancea, especially when a majority of funding for the research even needed to train models has been substantially cut so Elon could have more rocket dollars
t_serpico 6/27/2025|||
'Seeing' inside cells/tissues/organs/organisms is pretty much most modern biological research.
tim333 6/27/2025|||
It's a main aim at DeepMind. I hope they succeed as it could be very useful.
RivieraKid 6/27/2025||
Do they specifically state that it's their main aim anywhere?

Edit: Never mind, I've googled the answer.

RivieraKid 6/27/2025||
It seems that this would be a very coarse-grained simulation of a cell, nowhere close to the usefulness to a proper molecular dynamics simulation, if I understand correctly.
m3kw9 6/26/2025|||
I believe this is where quantum computing comes in but could be a decade out, but AI acceleration is hard to predict
eleveriven 6/27/2025|||
What's missing feels like the equivalent of a "fast-forward" button for cell-scale dynamics
j7ake 6/27/2025|||
Why simulate? We can already do it experimentally
mnw21cam 6/27/2025|||
In my field, we're always wanting to see what will happen when DNA is changed in a human pancreatic beta cell. We kind of have a protocol for producing things that look like human pancreatic beta cells from human stem cells, but we're not really sure that they are really going to behave like real human pancreatic beta cells for any particular DNA change, and we have examples of cases where they definitely do not behave the same.
tim333 6/27/2025|||
You can't see what's going on in most cases.
noduerme 6/26/2025||
I wish there were more interest in general in building true deterministic simulations than black boxes that hallucinate and can't show their work.
Kalanos 6/27/2025||
The functional predictions related to "non-coding" variants are big here. Non-coding regions, referred to as the dark genome, produce regulatory non-coding RNA's that determine the level of gene expression in a given cell type. There are more regulatory RNA's than there are genes. Something like 75% of expression by volume is ncRNA.
dekhn 6/27/2025||
There is a big long-running argument about what "functional" means in "non-coding" parts of the genome. The deeper I pushed into learning about the debate the less confident I became of my own understanding of genomics and evolution. See https://www.sciencedirect.com/science/article/pii/S096098221... for one perspective.
wespiser_2018 6/27/2025||
It's possible that the "functional" aspect of non-coding RNA exists on a time scale much larger that what we can assay in a lab. The sort of "junk DNA/RNA" hypothesis: the ncRNA part of the genome is material that increases fitness during relative rare events where it's repurposed into something else.

On a millions or billions of year time frame, the organisms with the flexibility of ncRNA would have an advantage, but this is extremely hard to figure out with a "single point in time" view point.

Anyway, that was the basic lesson I took from studying non-coding RNA 10 years ago. Projects like ENCODE definitely helped, but they really just exposed transcription of elements that are noisy, without providing the evidence that any of it is actually "functional". Therefore, I'm skeptical that more of the same approach will be helpful, but I'd be pleasantly surprised if wrong.

cysteinechapel 6/30/2025||
Such an advantage that is rare and across such long time scales would be so small on average that it would be effectively neutral. Natural selection can only really act on fitness advantages greater than on the order of the inverse of effective population size, which for large multicellular organisms such as animals, is low. Most of this is really just noisy transcription/binding/etc.

For example, we don't keep transposons in general because they're useful, which are almost half of our genomes, and are a major source of disruptive variation. They persist because we're just not very good at preventing them from spreading, we have some suppressive mechanisms but they don't work all the time, and there's a bit of an arms race between transposons and host. Nonetheless, they can occasionally provide variation that is beneficial.

richardvc 7/1/2025||
Understanding the genome has always felt like trying to solve a massive puzzle with pieces constantly shifting. Tools like AlphaGenome are changing that—offering a more focused way to interpret complex genetic data. In the lab I worked in, precision was everything. We relied heavily on uv spectrophotometry for DNA quantification, and the systems from https://www.berthold.com/en/ stood out for their consistency, even under demanding conditions. Their devices helped streamline processes where accuracy couldn’t be compromised, especially when dealing with fragile or low-concentration samples. Founded back in 1949, they’ve become a global reference for reliable measuring technology. From radiation detection to life sciences and industrial process control, their solutions cover diverse fields. For anyone navigating genomics or analytical research, choosing the right tools isn’t just about features—it’s about long-term dependability and clarity in results.
xipho 6/27/2025||
"To ensure consistent data interpretation and enable robust aggregation across experiments, metadata were standardized using established ontologies."

Can't emphasize enough about how DNA requires human data curation to make things work, even from day one alignments models were driven based on biological observations. Glad to see UBERON, which represents a massive amount of human insight and data curation of what is for all intents and purposes a semantic-web product (OWL based RDF at the heart) playing a significant role.

jebarker 6/26/2025||
I don't think DM is the only lab doing high-impact AI applications research, but they really seem to punch above their weight in it. Why is that or is it just that they have better technical marketing for their work?
331c8c71 6/26/2025||
This one seems like well done research but in no way revolutionary. People have been doing similar stuff for a while...
Gethsemane 6/26/2025||
Agreed, there’s been some interesting developments in this space recently (e.g. AgroNT). Very excited for it, particularly as genome sequencing gets cheaper and cheaper!

I’d pitch this paper as a very solid demonstration of the approach, and im sure it will lead to some pretty rapid developments (similar to what Rosettafold/alphafold did)

tim333 6/26/2025|||
They have been at it for a long time and have a lot of resources courtesy of Google. Asking perplexity it says the alphafold 2 database took "several million GPU hours".
kridsdale3 6/26/2025||
It's also a core interest of Demis.
forgotpwagain 6/27/2025|||
DeepMind/Google does a lot more than the other places that most HN readers would think about first (Amazon, Meta, etc). But there is a lot of excellent work with equal ambition and scale happening in pharma and biotech, that is less visible to the average HN reader. There is also excellent work happening in academic science as well (frequently as a collaboration with industry for compute). NVIDIA partners with whoever they can to get you committed to their tech stack.

For instance, Evo2 by the Arc Institute is a DNA Foundation Model that can do some really remarkable things to understand/interpret/design DNA sequences, and there are now multiple open weight models for working with biomolecules at a structural level that are equivalent to AlphaFold 3.

daveguy 6/26/2025|||
Well, they are a Google organization. Being backed by a $2T company gives you more benefits than just marketing.
jebarker 6/26/2025||
Money and resources are only a partial explanation. There’s some equally and more valuable companies that aren’t having nearly as much success in applied AI.
sidibe 6/27/2025||
There are more valuable companies but there aren't companies with more resources. If apple wanted to turn all their cash pile into something like Google's infrastructure it would still take years
eleveriven 6/27/2025|||
Other labs are definitely doing amazing work too, but often it's either more niche or less public-facing
nextos 6/26/2025|||
In biology, Arc Institute is doing great novel things.

Some pharmas like Genentech or GSK also have excellent AI groups.

331c8c71 6/26/2025||
Arc have just released a perturbation model btw. If it reliably beats linear benchmarks as claimed it is a big step

https://arcinstitute.org/news/virtual-cell-model-state

inquirerGeneral 6/26/2025||
[dead]
seydor 6/26/2025||
this is such an interesting problem. Imagine expanding the input size to 3.2Gbp, the size of human genome. I wonder if previously unimaginable interactions would occur. Also interesting how everything revolves around U-nets and transformers these days.
pfisherman 6/27/2025||
You would not need much more than 2 megabases. The genome is not one contiguous sequence. It is organized (physically segregated) into chromosomes and topologically associated domains. IIRC 2 megabases is like the 3 sd threshold for interactions between cis regulatory elements / variants and their effector genes.
teaearlgraycold 6/26/2025|||
> Also interesting how everything revolves around U-nets and transformers these days.

To a man with a hammer…

TeMPOraL 6/26/2025|||
Or to a man with a wheel and some magnets and copper wire...

There are technologies applicable broadly, across all business segments. Heat engines. Electricity. Liquid fuels. Gears. Glass. Plastics. Digital computers. And yes, transformers.

SV_BubbleTime 6/26/2025|||
Soon we’ll be able to get the whole genome up on the blockchain. (I thought the /s was obvious)
eleveriven 6/27/2025||
Even just modeling 3D genome organization or ultra-long-range enhancers more realistically could open up new insights
another_twist 6/27/2025||
So very similar approach to Conformer - convolution head for downsampling and transformer for time dependencies. Hmm, surprising that this idea works across application domains.
kylehotchkiss 6/27/2025||
I'm somewhat a noob here, but does this model have good understanding of things like OvRFs, methylation, etc, or is it strictly a sequence pattern matching thingy?
insane_dreamer 6/27/2025|
These are the type of advances in AI models that I'm excited about because of their potentially beneficial high impact for mankind. Not models that are a better (but less reliable) search engine or coding assistant and email writer. I wish more effort/money was going into this.
More comments...