Invention of DNA "page numbers" opens up possibilities for the bioeconomy

Posted by dagurp 12 hours ago

Invention of DNA "page numbers" opens up possibilities for the bioeconomy(www.caltech.edu)

134 points | 89 comments

koeng 1 hour ago|

I work in DNA assembly and synthesis. Here is my take:

They don't use oligo pools - "This capacity may be adapted to use large oligo pools to substantially reduce the cost per construct45 but requires further engineering to account for the formation of the unintended Sidewinder heteroduplexes before assembly and the higher truncation rate of pooled oligos"

This absolutely destroys any unit economics when it comes to DNA synthesis. Oligo pool synthesis isn't 10x cheaper, it's 100x to 1000x cheaper than individual oligo synthesis.

So what they really have is a good way to do DNA assembly from synthesized oligos; fair. But we have that: GoldenGate can do 40 part assemblies, hell it can do 52 part assemblies, and you CAN use oligo pools - https://pmc.ncbi.nlm.nih.gov/articles/PMC10949349/ (there are a couple enzymatic properties which allow this, mainly that you can use full doublestranded DNA, which you can make with a PCR. Can't make these overhang guys with a PCR).

We've even found that with some GoldenGate enzymes, the biology somehow breaks the current models of the physics of ligation by being so efficient - https://www.biorxiv.org/content/10.64898/2026.01.31.702778v1

Their gels do look really good, I'll admit. I can imagine circumstances (exception cases) where this would be better. But not only is this kind of thing for 99% of cases has already been available for many years while being orders of magnitude cheaper (plural).

GlibMonkeyDeath 6 hours ago||

What really blows my mind about this is that they are using off-the-shelf T4 Ligase to ligate the junctions. I figured this was going to be some tour-de-force of enzyme engineering, but nope, all the reagents are pretty much commercially available.

It is super clever and exciting. Note that people have been able to assemble short (<100 bases) DNA oligomer fragments of synthetic DNA into longer fragments using "splint" oligos since forever. But in this case, each splint has to be custom engineered to only bind to the junction of interest (in practice it is pretty tricky and expensive to do this.) These guys figured out a way to use engineered sequences to make the match, and used a clever (but also more or less standard) way to chew up the engineered stuff, leaving behind only the desired long assembly with no scars at the end of the process.

pcrh 3 hours ago|

Yes, it's very elegant! It's one of those things you wish you had thought of yourself. Kudos to these guys for being first.

trebligdivad 10 hours ago||

That page numbers in books were only invented 50 years after the printing press is a fun snippet from the article

observationist 7 hours ago||

Sometime after 685 AD, they invented spaces between words. All text - in Latin to that point, mostly - was written in scriptio continua.

All sorts of ambiguity and hilarity would ensue; to be a good writer, you needed to ensure that words didn't bleed together and form incorrect meanings in unintended combinations. If you lost your place when reading, you'd have to know generally where you were in a scroll, and restart from a place you remembered.

Kinda crazy to think how difficult it would be to cross reference things and do collaborative research with no spaces or pages.

wl 5 hours ago|||

Hittite was putting spaces between words in the 17th century BCE. And if we're just interested in Latin, it used the interpunct as a word divider hundreds of years before the use of the space as word divider happened. The use of scriptio continua despite knowledge of word dividers was a choice.

observationist 4 hours ago||

I wonder, how much was gatekeeping, keeping things hard on purpose, how much waas inertia, "that's just the way things are done", and how much was a kind of despairing "holy shit, it'd be so much work to have to go through and recopy everything in the new format, literally decades of effort, and there's other things we want to do with our lives".

The whole context of written words had so much implicit process and knowledge and institutional memory, compared to now when we have petabytes of throwaway logs and trivial scratchpads for software running on a "just in case I might need to figure something out" basis. I'd love to see a written word graph over time, starting ~4k BC to now. And the complexity and diversity of those automated words are going up like crazy since LLMs.

duskwuff 2 hours ago||

Also probably a bit of "good parchment is expensive, why would we waste it on blank space?"

datsci_est_2015 6 hours ago||||

Also kind of crazy how long “but that’s the way we’ve always done it” can remain the dominant system, despite a revolutionary change being so trivially achievable. This required absolutely no technological advancement, literally just putting a little more space between letters to reduce ambiguity.

Ekaros 3 hours ago|||

English is good example. It has not been fixed for long while. Even if there would be so many better ways to write certain words.

HappMacDonald 2 hours ago||

I feel like Mathematical notation is also a great example (since Math is ultimately a separate language: the language of measurement)

It's been built up over centuries where new innovations and shifts in perspective often create new kinds of notation, but those most frequently just get tacked onto whatever else is already standard and the new notations almost never actually supplant the old.

AFAICT we haven't really had a big shift in fundamental mathematical notation in Europe (and its colonies) since Roman Numerals (CXXIII) gave way to Arabic (123) numerals four hundred years ago. 8I

duskwuff 2 hours ago||

> AFAICT we haven't really had a big shift in fundamental mathematical notation in Europe (and its colonies) since Roman Numerals (CXXIII) gave way to Arabic (123) numerals four hundred years ago. 8I

Your history is a little confused. Arabic numerals came into use in Europe as early as the 13th century (introduced by Leonardo Fibonacci), but most other mathematical notation like "=" or or "√" didn't show up until the 16th or 17th century.

wat10000 2 hours ago|||

Imagine if it Turned Out that Capitalizing Various Words made Things more Readable. How Quickly Would That be Adopted?

datsci_est_2015 12 minutes ago||

Do you speak German? A language famous for capitalizing its nouns of course.

jjtheblunt 2 hours ago||||

i've had lots of Latin, know what you mean, but then thought of the Pantheon, where the word breaks (acronyms included) are indicated (with interstitial dots).

https://commons.wikimedia.org/wiki/File:Pantheon_Rom_1_cropp...

mistrial9 5 hours ago|||

yeah - under certain "the winners write the history" framework, I believe that scribes did not add spaces between "words".. However, the world is a big place; history is long.

swalsh 9 hours ago|||

Many times obvious things are only obvious once you see them. Like roller suitcases.

bookofjoe 6 hours ago||

See also: the wheel

ssivark 7 hours ago|||

The early printing press was probably focused on short few page documents (an increasing scale), and it wouldn't be surprising if page numbers were a solution to help printers not mix up pages.

adrian_b 7 hours ago|||

Your hypothesis does not match history, because the early printing was focused on things that had a potentially large market, which at that time meant books like The Bible, with a lot of pages.

The parent article mentions that binding the pages of the first bibles in the correct order, in the absence of page numbers, was an extremely tedious work.

That is why page numbers have invented many years later, exactly as you say, "to help printers not mix up pages".

observationist 6 hours ago||||

The Gutenberg Bible was one of the first mass produced books - no page numbers on early copies.

https://en.wikipedia.org/wiki/Gutenberg_Bible#/media/File:Gu...

Hindsight is 20/20 , lol. There are so many obvious, effective constructs and functions in modern English, we kinda miss the absolute janky mess of hacking and tradition and arbitrary rules and facepalm moments that went in to the last 1500+ years of development, let alone the tens of thousands of years prior.

mmooss 1 hour ago|||

> it wouldn't be surprising if page numbers were a solution to help printers not mix up pages.

It's an interesting idea. Remember they printed large sheets containing many 'pages', I think even in different orientations, which were then folded and the ends cut to produce a nice orderly codex for the reader. They were printing in a different order than the one you read in.

I do think they numbered the large sheets or similar, and you can find old books that retain that number, but I don't recall what it is called.

BurningFrog 4 hours ago||

I can see how it wculd take that long to realize it would be nice to have a way to tell people which page to look at in their exact copy of a book.

jryb 11 hours ago||

Paper: https://www.nature.com/articles/s41586-025-10006-0

victor106 5 hours ago||

For someone in Software what is a good way to learn the fundamentals of this?

vikramkr 5 hours ago||

If you live near a community bio lab see if you can join up and take some classes to learn some basic lab techniques. And some sort of intro bio class via mooc/textbook/local college class whatever if you can but community lab is honestly a great place to start if you have one.

The main thing to keep in mind is that all the stuff that involves analogies between software and biology is almost universally a bullshit oversimplification that you can safely ignore. It's just that software is so profitable and there's so much vc money in it that there's a ton of pressure to be like "oh we can program biology like we program computers." We can't - we invented computers but didn't invent biology. Biology is the end result of 4 billion years of unchecked entropy - it's a chaos system, non deterministic in the wildest ways, impossibly complicated, and yet something we are getting astonishingly good at understanding and engineering.

Basically, all the biologists that started companies that were like "we can program biology like we can program computers" are bankrupt now.

On the other hand, the computer scientists that respected the nature of biology and pushed the limits of computing to develop Alphafold - giant models trained on the full complexity of biological data - finally created computer systems that could handle biological systems like protein folding at an extraordinary level of capability. They won a nobel.

ramon156 4 hours ago|||

Follow up question (Not OP), would alphafold more be used to experiment with an already-defined theory that you have, or could you also make some toy projects (e.g. how people make projects around trading engines).

I'm wondering if I could find a fun weekend project in alphafold just to see what it's like.

elric 4 hours ago|||

TIL community biolabs are a thing ...

Are they really? Is this just limited to some very specific areas with an active biotech scene?

cess11 3 hours ago||

In my part of the world it is a common thing among high schoolers, which form associations and use labs at school or a local university.

It's not uncommon that adults do something similar and run a community workshop with whatever the members are interested in.

zulko 5 hours ago||

Possibly not what you're asking for, but I wrote a generally-accessible intro to why it can be tricky to assemble many DNA fragments with "Golden Gate Assembly", a mainstream method which relies on short sequence overhangs. The Sidewinder method discussed in this thread aims to solve that "short overhang" problem.

https://zulko.github.io/bricks_and_scissors/posts/overhangs/

Metacelsus 1 hour ago||

The Church lab came up with this in 2006, sadly it never took off: https://patents.google.com/patent/US20060281113A1/en

1vuio0pswjnm7 1 hour ago||

Here's the paper without the cookies

https://web.archive.org/web/20260121201045if_/https://www.na...

daemonk 3 hours ago||

Pretty cool technique using complementary overhangs and toehold sequences to generate a 3-way heteroduplex, ligate knick, and then remove barcode duplex.

They don't give much details on how the barcode duplex is removed though. I guess ultimately the barcode duplex strands can just be melted off and the ligated strand can be used to template off of.

If this can be made into an easy to use kit, can really make vector generation much easier and hopefully not locked into proprietary systems.

I can imagine a company that bioinformatically generates libraries of common long oligos with corresponding barcode and allow end-users to select oligos to modularly ligate together in a one pot reaction. Cool stuff.

mbreese 3 hours ago||

We’ve been able to do this type of nucleotide 3D engineering for a while. I used to use large DNA branched complexed fluorophores to label cDNA back when I was in grad school. They were more or less mixed of DNA that self assembled into larger hairballs.

But branched DNA is really interesting. It’s a bit hard to get my head around. We spend so much time thinking about DNA in the 2D sequence sense, it’s easy to forget that it exists in 3D space.

I’m honestly not sure how different this really is to the traditional ways of doing this (with custom oligos). The common set of large self-hybridizing oligos is definitely easier, but you still have to have compatible tag overhangs between your two fragments. Meaning, it isn’t quite as universal and you’ll still need work to pair the fragments together. But where I think it might be useful is if you have a set of common hybridizing pairs that can be easily located onto the custom flanking oligos. You’ll still need some sequence analysis to get your custom oligos, but it would make the process more “standardized”.

I think the main bonus here is the self correcting selection… that you only end up with matching pairs linking together, so you could really have a mix in a one tube reaction that links many kilobase fragments together. That’s quite nice. And useful. And still cool.

One thing that is interesting is that this is another step towards getting the “writing” step of DNA analysis better. For the past 50+ years, we’ve developed all sorts of tools for reading DNA. It’s only really been the past 20-ish or so that we’ve had tools for writing. And now we can write longer chunks. That’s all a good thing.

Not sure I think it’s revolutionary (yet), but that’s a university PR release for you! I’m still thinking about the paper.

omnicognate 3 hours ago||

> using complementary overhangs and toehold sequences to generate a 3-way heteroduplex, ligate knick, and then remove barcode duplex

At first I thought this was about olympic figure skating, but after a bit of googling I think:

Complementary overhang - https://en.wikipedia.org/wiki/Sticky_and_blunt_ends

Toehold sequences: https://en.wikipedia.org/wiki/Toehold_mediated_strand_displa...

Ligate (ligase?) knick (nick?) - https://en.wikipedia.org/wiki/Nick_(DNA)

Barcode - https://en.wikipedia.org/wiki/DNA_barcoding

Heteroduplex - https://en.wikipedia.org/wiki/Heteroduplex

biophysboy 8 hours ago||

Chemical modifications of DNA are so amazing, and underpin so much DNA related research and engineering. Illumina and Moderna would not exist without DNA mods. It’s very cool that the set of tools is expanding further!

“ Guided by the removable DNA page numbers, Sidewinder achieves an incredibly high fidelity in DNA construction with a measured misconnection rate of just one in one million, a four to five magnitude improvement over all prior techniques whose misconnection rates range from 1-in-10 to 1-in-30.”

I wonder if this is even a problem, since you could amplify the correct sequence with PCR afterward.

mbreese 2 hours ago||

I don’t think PCR is necessarily relevant here. I had the impression that this would be lost useful at linking multi-kb fragments together. If we are looking at sizes much above 2kb, PCR is going to struggle to generate full length fragments efficiently.

I didn’t see this technique as having DNA modification per-se, but a novel way to managing the hybridization process. It’s stock (well engineered) oligos, if I read it correctly.

codesnik 8 hours ago|||

pcr amplifies all sequences, correct or wrong, no? and as I understand it, it works on short snippets the best.

biophysboy 8 hours ago||

It amplifies sequences that contain the two primer sequences on each end of the target. So if you had synthesized sequence XYZ with some mistakes like YZX, then you could target X and Z and purify.

You're correct that PCR has a limited max length, but it is longer and cheaper than vanilla DNA synthesis.

bookofjoe 6 hours ago||

Kary B. Mullis Nobel Prize lecture Nobel Lecture, December 8, 1993

The Polymerase Chain Reaction

https://www.nobelprize.org/prizes/chemistry/1993/mullis/lect...

oofbey 7 hours ago||

Intuitively I agree some kind of selective amplification should be able to correct for the mistakes. But I think it will be complicated. Because the filtering process needs to be much more complex. It can’t just chemically match to a known subsequence - you won’t know where the mistake might be in a long sequence.

biophysboy 6 hours ago||

This is a good point. WXYZ and WYXZ are indistinguishable via PCR. And the possibilities accumulate with more segments.

smackeyacky 11 hours ago|

Ok that’s it for me. Selective breeding via BLUP at least had a speed limit, this is going to end with cronenburg brundlefly creations.

More comments...