CERN uses ultra-compact AI models on FPGAs for real-time LHC data filtering

Posted by TORcicada 1 day ago

CERN uses ultra-compact AI models on FPGAs for real-time LHC data filtering(theopenreader.org)

318 points | 144 commentspage 3

mentalgear 1 day ago|

That's what Groq did as well: burning the Transformer right onto a chip (I have to say I was impressed by the simplicity, but afterwards less so by their controversial Kushner/Saudi investment) .

NitpickLawyer 1 day ago|

> That's what Groq did as well: burning the Transformer right onto a chip

Are you perhaps confusing Groq with the Etched approach? IIUC Etched is the company that "burned the transformer onto a chip". Groq uses LPUs that are more generalist (they can run many transformers and some other architectures) and their speed comes from using SRAM.

nerolawa 1 day ago||

the fact that 99% of LHC data is just gone forever is insane

johngossman 1 day ago|

Not really. Think of the experiment as a very, very high speed camera. They can't store every frame, so they try to capture just the "interesting" ones. They also store some random ones that can be used later as controls or in case they realize they've missed something. That's the whole job of these various layers of algorithms: recognizing interesting frames. Sometimes a new experiment basically just changes the definition of "interesting"

amelius 1 day ago||

When is the price of fabbing silicon coming down, so every SMB can do it?

IshKebab 1 day ago|

My guess would be never. The closest you can get is "multi project wafers" where you get bundled with a load of other projects. As I understand it they're on the order of $100k which is cheap, but if you actually want to design and verify a chip you're looking at at least several million in salaries and software costs. Probably more like $10m, especially if you're paying US salaries. And of course that would be for a low performance design.

I think a better question would be "when are FPGAs going to stop being so ridiculously overpriced". That feels more possible to me (but still unlikely).

fc417fc802 1 day ago||

Doesn't this vary wildly depending on the process node though? The cutting edge stuff keeps getting increasingly ridiculous meanwhile I thought you could get something like 50 nm for cheap. I also remember seeing years ago that some university had a ~micron (IIRC) process that you could order from.

100721 1 day ago||

Does anyone know why they are using language models instead of a more purpose-built statistical model? My intuition is that a language model would either be overfit, or its training data would have a lot of noise unrelated to the application and significantly drive up costs.

LeoWattenberg 1 day ago||

It's not an LLM, it is a purpose built model. https://arxiv.org/html/2411.19506v1

5 years ago we would've called it a Machine Learning algorithm. 5 years before that, a Big Data algorithm.

IanCal 1 day ago|||

We’ve been calling neural nets AI for decades.

> 5 years before that, a Big Data algorithm.

The DNN part? Absolutely not.

I don’t know why people feel the need for such revisionism but AI has been a field encompassing things far more basic than this for longer than most commenters have been alive.

magicalhippo 1 day ago||

> AI has been a field encompassing things far more basic than this for longer than most commenters have been alive.

When I was 13, having just started programming, I picked up a book from a "junk bin" at a book store on Artificial Intelligence. It must have been from the mid-80s if not older.

It had an entire chapter on syllogism[1] and how to implement a program to spit them out based on user input. As I recall it basically amounted to some string exteaction assuming user followed a template and string concatenation to generate the result. I distinctly recall not being impressed about such a trivial thing being part of a book on AI.

[1]: https://en.wikipedia.org/wiki/Syllogism

rjh29 1 day ago||

Eliza was 1960s.

In the 1990s I remember taking my friend's IRC chat history and running it through a Markov model to generate drivel, which was really entertaining.

t0lo 1 day ago|||

i hate that we're in this linguistic soup when it comes to algorithmic intelligence now.

kevmo314 1 day ago|||

This might be some journalistic confusion. If you go to the CERN documentation at https://twiki.cern.ch/twiki/bin/view/CMSPublic/AXOL1TL2025 it states

> The AXOL1TL V5 architecture comprises a VICReg-trained feature extractor stacked on top of a VAE.

dmd 1 day ago||

… they’re not? Who said they are? The article even explicitly says they’re not?

progval 1 day ago||

For 40 minutes, the article claimed they used LLMs. They changed the wording twice: https://theopenreader.org/index.php?title=Journalism:CERN_Us... and https://theopenreader.org/index.php?title=Journalism%3ACERN_...

kittikitti 11 hours ago||

Congratulations, this is a great achievement in Real-Time LHC data filtering.

Kapura 1 day ago||

Why did we stop calling this stuff machine learning again? this isn't even an llm, which has become the common bar for 'ai'

dguest 1 day ago|

Because every principle investigator in academia works in sales.

Some tried to hold out and keep calling it "ML" or just "neural networks" but eventually their colleagues start asking them why they aren't doing any AI research like the other people they read about. For a while some would say "I just say AI for the grant proposals", but it's hard to avoid buzzwords when you're writing it 3 times a day I guess.

Although note that the paper doesn't say "AI". The buzzword there is "anomaly detection" which is even weirder: somehow in collider physics it's now the preferred word for "autoencoder", even though the experiments have always thrown out 99.998% of their data with "classical" algorithms.

porridgeraisin 21 hours ago||

The library they used (or used to use) is `hls4ml`. https://github.com/fastmachinelearning/hls4ml

I hacked on it a while back, added Comv2dTranspose support to it.

aj7 23 hours ago||

I wonder if it is a PhD thesis to prove that the data prefiltering doesn’t bias the results.

logicallee 1 day ago|

I hope they have good results and keep all the data they need, and identify all the interesting data they're looking for. I do have a cautionary tale about mini neural networks in new experiments. We recently spent a large amount of time training a mini neural network (200k parameters) to make new predictions in a very difficult domain (predicting specific trails for further round collisions in a hash function than anyone did before.) We put up a spiffy internal dashboard[1] where we could tune parameters and see how well the neural network learns the existing results. We got to r^2 of 0.85 (that is very good correlation) on the data that already existed, from other people's records and from the data we solved for previously. It showed such a nicely dropping loss function as it trained, brings tears to the eye, we were pumped to see how it performs on data it didn't see before, data that was too far out to solve for. So many parameters to tune! We thought we could beat the world record by 1 round with it (40 instead of 39 rounds), and then let the community play with it to see if they can train it even better, to predict the inputs that let us brute force 42 round collisions, or even more. We could put up a leaderboard. The possiblities were endless, all it had to do was do extrapolate some input values by one round. We'd take the rest from there with the rest of our solving instrastructure.

After training it fully, we moved on to the inference stage, trying it on the round counts we didn't have data for! It turned out ... to have zero predictive ability on data it didn't see before. This is on well-structured, sensible extrapolations for what worked at lower round counts, and what could be selected based on real algabraic correlations. This mini neural network isn't part of our pipeline now.

[1] screenshot: https://taonexus.com/publicfiles/mar2026/neural-network.png

More comments...