CERN uses ultra-compact AI models on FPGAs for real-time LHC data filtering

Posted by TORcicada 11 hours ago

CERN uses ultra-compact AI models on FPGAs for real-time LHC data filtering(theopenreader.org)

254 points | 119 comments

chsun 3 hours ago|

One of the authors (of one of the two models, not this particular paper) here. Just a clarification, these models are *not* burned into silicon. They are trained with brutal QAT but are put onto fpgas. For axol1tl, the weights are burned in the sense that the weights are hard-wired in the fabric (i.e., shift-add instead of conventional read-muk-add cycle), but not on the raw silicon so the chip can be reprogrammed. Though, for projects like smartpixel or HG-Cal readout, there are similar ones targeting silicon (google something like "smartpixel cern", "HGCAL autoencoder" and you will find them), and I thought it was one of them when viewing the title.

Some slides with more info: https://indico.cern.ch/event/1496673/contributions/6637931/a... The approval process for a full paper is quite lengthy in the collaboration, but a more comprehensive one is coming in the following months, if everything went smoothly.

Regarding the exact algorithm: there are a few versions of the models deployed. Before v4 (when this article was written), they are slides 9-10. The model was trained as a plain VAE that is essentially a small MLP. In inference time, the decoder was stripped and the mu^2 term from the KL div was used as the loss (contributions from terms containing sigma was found to be having negliable impact on signal efficiency). In v5 we added a VICREG block before that and used the reconstruction loss instead. Everything runs in =2 clock cycles at 40MHz clock. Since v5, hls4ml-da4ml flow (https://arxiv.org/abs/2512.01463, https://arxiv.org/abs/2507.04535) was used for putting the model on FPGAs.

For CICADA, the models was trained as a VAE again, but this time distilled with supervised loss on the anomaly score on a calibration dataset. Some slides: https://indico.global/event/8004/contributions/72149/attachm... (not up-to-date, but don't know if there other newer open ones). Both student and teacher was a conventional conv-dense models, can be found in slides 14-15.

Just sell some of my works for running qat (high-granularity quantization) and doing deployment (distributed arithmetic) of NNs in the context of such applications (i.e., FPGA deployment for <1us latency), if you are interested: https://arxiv.org/abs/2405.00645 https://arxiv.org/abs/2507.04535

Happy to take any questions.

stefanpie 1 hour ago|

Very cool to see you work! Early in my PhD I did some work with GNN accelerators on FPGAs (which I think later ended up in some form as a colab with some CERN or Fermilab folks) and have chatted a bit in the past with the FastML, HLS4ML, and HEP folks.

I have since pivoted a lot of my PhD work (still related the HLS and EDA). But I wonder what is the current main limitation/challenges of building these trigger systems in hardware today. For example, in my mind it seems like the EDA and tooling can be a big limitation such as reliance on commercial HLS tools which can be buggy, hard to use, and hard to debug. From experience, this makes it harder to build different optimized architectures in hardware or build co-design frameworks without having high HLS expertise or putting in a lot of extra engineering/tooling effort. Also tool runtimes make the design and debug cycle longer, especially if you are trying to DSE on post-implementation metrics since you bring in implementation tools as well.

But I might be way off here and the real challenges are with other aspects beyond the tools.

intoXbox 10 hours ago||

They used a custom neural net with autoencoders, which contain convolutional layers. They trained it on previous experiment data.

https://arxiv.org/html/2411.19506v1

Why is it so hard to elaborate what AI algorithm / technique they integrate? Would have made this article much better

dcanelhas 9 hours ago||

I'm half expecting to see "AI model" appearing as stand-in for "linear regression" at this point in the cycle.

ninjagoo 9 hours ago|||

> I'm half expecting to see "AI model" appearing as stand-in for "linear regression" at this point in the cycle.

Already the case with consulting companies, have seen it myself

idiotsecant 6 hours ago||

Some career do-nothing-but-make-noise in my organization hired a firm to 'Do AI' on some shitty data and the outcome was basically linear regression. It turns out that you can impressive executives with linear regression if you deliver it enthusiastically enough.

ozim 1 hour ago|||

Not everyone knows everything so knowledge is the new oil.

I do know about linear regression even had quite some of it at university.

But I still wouldn’t be able to just implement it on some data without good couple days to weeks of figuring things out and which tools to use so I don’t implement it from scratch.

tasuki 5 hours ago|||

Tbh, often enough, linear regression is exactly what is needed.

idiotsecant 3 hours ago||

Yes, and we do it every day and call it 'linear regression' and don't need a data center full of expensive toys to do it

blitzar 8 hours ago||||

I'm half expecting to see "AI model" appearing as stand-in for "if > 0" at this point in the cycle.

Foobar8568 7 hours ago|||

This is why I am programming now in Ocaml, files themselves are AI ( ml ).

srean 6 hours ago||

I am sure you did not forget that pattern matching.

Vetch 6 hours ago|||

This is essentially what any relu based neural network approximately looks like (smoother variants have replaced the original ramp function). AI, even LLMs, essentially reduce to a bunch of code like

    let v0 = 0
    let v1 = 0.40978399*(0.616*u + 0.291*v)
    let v2 = if 0 > v1 then 0 else v1

    let v3 = 0
    let v4 = 0.377928*(0.261*u + 0.468*v)
    let v5 = if 0 > v4 then 0 else v4...

samrus 6 hours ago||

Thats a bit far. Relu does check x>0 but thats just one non-linearity in the linear/non-linear sandwich that makes up universal function approximator theorem. Its more conplex than just x>0

Vetch 4 hours ago|||

The relu/if-then-else is in fact centrally important as it enables computations with complex control flow (or more exactly, conditional signal flow or gating) schemes (particularly as you add more layers).

greenavocado 5 hours ago|||

Multiply-accumulate, then clamp negative values to zero. Every even-numbered variable is a weighted sum plus a bias (an affine transformation), and every odd-numbered variable is the ReLU gate (max(0, x)). Layer 2 feeds on the ReLU outputs of layer 1, and the final output is a plain linear combination of the last ReLU outputs

    // inputs: u, v
    // --- hidden layer 1 (3 neurons) ---
    let v0  = 0.616*u + 0.291*v - 0.135
    let v1  = if 0 > v0 then 0 else v0
    let v2  = -0.482*u + 0.735*v + 0.044
    let v3  = if 0 > v2 then 0 else v2
    let v4  = 0.261*u - 0.553*v + 0.310
    let v5  = if 0 > v4 then 0 else v4
    // --- hidden layer 2 (2 neurons) ---
    let v6  = 0.410*v1 - 0.378*v3 + 0.528*v5 + 0.091
    let v7  = if 0 > v6 then 0 else v6
    let v8  = -0.194*v1 + 0.617*v3 - 0.291*v5 - 0.058
    let v9  = if 0 > v8 then 0 else v8
    // --- output layer (binary classification) ---
    let v10 = 0.739*v7 - 0.415*v9 + 0.022
    // sigmoid squashing v10 into the range (0, 1)
    let out = 1 / (1 + exp(-v10))

phire 9 hours ago||||

I'm sure I've seen basic hill climbing (and other optimisation algorithms) described as AI, and then used evidence of AI solving real-world science/engineering problems.

LiamPowell 9 hours ago|||

Historically this was very much in the field of AI, which is such a massive field that saying something uses AI is about as useful as saying it uses mathematics. Since the term was first coined it's been constantly misused to refer to much more specific things.

From around when the term was first coined: "artificial intelligence research is concerned with constructing machines (usually programs for general-purpose computers) which exhibit behavior such that, if it were observed in human activity, we would deign to label the behavior 'intelligent.'" [1]

[1]: https://doi.org/10.1109/TIT.1963.1057864

zingar 8 hours ago||

That definition moves the goalposts almost by definition, people only stopped thinking that chess demonstrated intelligence when computers started doing it.

Eufrat 8 hours ago||

The term artificial intelligence has always been just a buzzword designed to sell whatever it needed to. IMHO, it has no meaningful value outside of a good marketing term. John McCarthy is usually the person who is given credit for coming up with the name and he has admitted in interviews that it was just to get eyeballs for funding.

coherentpony 6 hours ago|||

I am somewhat cynically waiting for the AI community to rediscover the last half a century of linear algebra and optimisation techniques.

At some point someone will realise that backpropagation and adjoint solves are the same thing.

bonoboTP 5 hours ago|||

There are plenty of smart people in the "AI community" already who know it. Smugly commenting does not replace actual work. If you have real insight and can make something perform better, I guarantee you that many people will listen (I don't mean twitter influencers but the actual field). If you don't know any serious researcher in AI, I have my doubts that you have any insight to offer.

whattheheckheck 5 hours ago|||

I am sure they are aware...

thesz 4 hours ago||||

There is an HIGGS dataset [1]. As name suggest, it is designed to apply machine learning to recognize Higgs bozon.

[1] https://archive.ics.uci.edu/ml/datasets/HIGGS

In my experiments, linear regression with extended (addition of squared values) attributes is very much competitive in accuracy terms with reported MLP accuracy.

dguest 4 hours ago||

The LHC has moved on a bit since then. Here's an open dataset that one collaboration used to train a transformer:

https://opendata-qa.cern.ch/record/93940

if you can beat it with linear regression we'd be happy to know.

yread 8 hours ago||||

And why not, when linear regression works, it works so well it's basically magic, better than intelligence, artificial or otherwise

plasino 7 hours ago||||

Having work with people who do that, I can guarantee that’s not the case. See https://ssummers.web.cern.ch/conifer/ and HSL4ML, these run BDT and CNN

Staross 7 hours ago|||

That works well to get around patents btw :)

etrautmann 8 hours ago|||

It seems like most of the implementation is FPGA, which I wouldn’t call “physically burned into silicon.” That’s quite a stretch of language

vultour 9 hours ago|||

Because if it’s not an LLM it’s not good for the current hype cycle. Calling everything AI makes the line go up.

danielbln 7 hours ago||

LLMs also make the cynicism go up among the HN crowd.

okamiueru 4 hours ago|||

Hm. Is HN starting to become more skeptical of LLMs? For the past couple of years, HN has seemed worryingly enthusiastic about LLMs.

andersonpico 4 hours ago|||

How so? Half the people here have LLM delusion in every thread posted here; more than half of the things going to the frontpage are AI. Just look at hours where Americans are awake.

irishcoffee 3 hours ago||

Fucking Americans. Only 4% of the world population, with the magic of disproportionately afflicting the global news headlines which make their way here.

It’s impressive, honestly.

fnord77 5 hours ago|||

Thanks for tracking this down. I too am annoyed when so-called technical articles omit the actual techniques.

jgalt212 5 hours ago||

Because it does not align with LLM Uber Alles.

jurschreuder 6 hours ago||

I've got news for you, everybody with a modern cpu uses this, which use a perceptron for branch prediction.

andromaton 1 hour ago||

Indeed, some examples:

https://news.ycombinator.com/item?id=12340348 Neural network spotted deep inside Samsung's Galaxy S7 silicon brain (2016)

https://ieeexplore.ieee.org/document/831066 Towards a high performance neural branch predictor (1999)

archermarks 4 hours ago|||

I didn't know that! Do you have any references that go into more depth here? I'd be curious how the architect and train it.

isotypic 3 hours ago||

I believe D. A. Jimenez and C. Lin, "Dynamic branch prediction with perceptrons" is the paper which introduced the idea. It's been significantly refined since and I'm not too familiar with modern improvements, but B. Grayson et al., "Evolution of the Samsung Exynos CPU Microarchitecture" has a section on the branch predictor design which would talk about/reference some of those modern improvements.

archermarks 2 hours ago||

Thank you, I'll give them a read.

amelius 5 hours ago|||

At this point AI basically means "we didn't know how to solve the problem so we just threw a black box at it".

integralid 4 hours ago||

I disagree. More often than not is "We know how to solve the problem, and the solution is some linear algebra"

Legend2440 3 hours ago||

I disagree with both of you.

It's not about linear algebra (which is just used as a way to represent arbitrary functions), it's about data. When your problem is better specified from data than from first principles, it's time to use an ML model.

Create 1 hour ago||

Other news, is that HEP has used FPGAs for L0 triggers (amongst others) for decades. These always had a diverse selection criteria in their algorithms, event filters, suppression, weights etc. And just mentioning, that some custom radhard simple readout silicon from the likes of STM isn't any news either.

And for historians: Delphi people (amongst others) had papers on Higgs selection using (A)NN from LEP data (overfit :) , obviously without the 5 sigma. It was an argument for LHC.

Dear downvoters/shadowbanners: do your homework.

serendipty01 10 hours ago||

Might be related: https://www.youtube.com/watch?v=T8HT_XBGQUI (Big Data and AI at the CERN LHC by Dr. Thea Klaeboe Aarrestad)

https://www.youtube.com/watch?v=8IZwhbsjhvE (From Zettabytes to a Few Precious Events: Nanosecond AI at the Large Hadron Collider by Thea Aarrestad)

Page: https://www.scylladb.com/tech-talk/from-zettabytes-to-a-few-...

konradha 7 hours ago||

How are FPGAs "bruned into silicon"? Would be news to me that there are ASICs being taped out at CERN

eqvinox 7 hours ago||

CERN in fact does design custom ASICs for other things: https://indico.cern.ch/event/1115079/contributions/4693643/a...

(Probably not for this here though.)

danparsonson 7 hours ago||

Could they.... have someone else do it for them?

dguest 3 hours ago|||

CERN doesn't build everything CERN uses:

- FPAGs like this one are generally COTS.

- All the experiments use GPUs which come straight from the vendors.

- Most of the computing isn't even on site, it's distributed around the world in various computing centers. Yes they also overflow into cloud computing but various publicly funded datacenters tend to be cheaper (or effectively "free" because they were allocated to CERN experiments).

Some very specific elements (those in the detector) need to be radiation hard and need O(microsecond) latency. These custom electronics are built all over the world by contributing national labs and universities.

CERN builds a bit.

Create 7 minutes ago||

CERN builds almost next to nothing anymore. Half a century ago they really did do RF cavities, cooling, electronics etc. Not anymore. It is either COTS (DELL, Alterra etc.) or chiefly vendor bidding for some custom parts. Much like what NASA (from Rocketdyne, TRW to Boeing and SpaceX) or copycat ESA (Airbus, DLR, BAE's suppliers) does today.

It is a project bureau. Everything is essentially outsourced, leaving a management shell institute to parade for VIPs. Actually they are close to completely forgetting what they already knew in the hard sciences domain.

samrus 6 hours ago|||

Glib, but it wont be cost effective at that small scale

danparsonson 3 hours ago||

So are we arguing that the article that talks about them using ASICs is just making that up then? Otherwise what's the fourth option?

Who says CERN needs to be cost effective?

quijoteuniv 10 hours ago||

A bit of hype in the AI wording here. This could be called a chip with hardcoded logic obtained with machine learning

FartyMcFarter 10 hours ago||

AI is not a new thing, and machine learned logic definitely counts as AI.

monkeydust 9 hours ago|||

For those that have experience with ML, yes. For those that have recently become acquainted with it (more on business side) they seem to really struggle with this in my experience. '

volemo 9 hours ago|||

Yeah, and don’t forget Eliza!

bonoboTP 5 hours ago|||

ML is part of AI, and has always been. AI is not equal to chatgpt and AI wasn't coined/conceived in November 2022.

killingtime74 10 hours ago|||

Is a LLM logic in weights derived from machine learning?

shlewis 10 hours ago|||

Well, yes. That's literally what it is.

dmd 10 hours ago||

What what is? The article has nothing to do with LLMs. It even explicitly says they don’t use LLMs.

shlewis 8 hours ago||

> Is a LLM logic in weights derived from machine learning?

I was just answering this question. LLM logic in weights is fundamentally from machine learning, so yes. Wasn't really saying anything about the article.

quijoteuniv 10 hours ago|||

Good one… but Is a DB query filter AI? I forgot to say though is sounds like a really cool thing to do

stingraycharles 10 hours ago||

Strictly speaking, expert systems are AI as well, as in, an expert comes up with a bunch of if/else rules. So yes technically speaking even if they didn’t acquire the weights using ML and hand-coded them, it could still be called AI.

phire 9 hours ago||

It is 100% valid to label an algorithm that plays tic-tac-toe as "AI"

Much of the early AI research was spent on developing various algorithms that could play board games.

Didn't even need computers, one early AI was MENACE [1], a set of 304 matchboxes which could learn how to play noughts and crosses.

[1] https://en.wikipedia.org/wiki/Matchbox_Educable_Noughts_and_...

stingraycharles 9 hours ago|||

Yup this is exactly my point, in the 80s there were plenty of “AI” companies and “fuzzy logic” was the buzzword of the day.

FarmerPotato 3 hours ago|||

I built the Matchbox for Hexapawn, detailed in National Geographic Kids!

I didn't know what a Jujube was, but I got the idea.

hrmtst93837 1 hour ago||

Calling it "AI" is marketing sugar. It is closer to an inference-only state machine where gradient descent did the wiring instead of an engineer, and the annoying part is that once the detector setup or noise profile moves, retraining and redeploy stop being normal ML chores and turn into hardware respins, validation, and a lot of waiting. That distinction stops sounding pedantic the first time a bug fix means touching silicon instead of pushing to a repo.

Surac 9 hours ago||

Very important! This is not a LLM like the ones so often called AI these days. Its a neural network in a FPGA.

duskdozer 8 hours ago||

I guess shows the LLM-companies' marketing worked very well because that's what I immediately thought of.

IshKebab 9 hours ago||

> FPGA

So they aren't "burned into silicon" then? The article mentions FPGAs and ASICs but it's a bit vague. I would be surprised if ASICs actually made sense here.

fecal_henge 6 hours ago||

They make sense when you consider that 'on detector' electronics has all sorts of constraints that FPGAs cant compete on: Power, Density, Radiation hardness, Material budget.

armcat 8 hours ago||

Not on the same extreme level, but I know that some coffee machines use a tiny CNN based model locally/embedded. There is a small super cheap camera integrated in the coffee machine, and the model does three things: (1) classifies the container type in order to select type of coffee, (2) image segmentation - to determine where the cup/hole is placed, (3) regression - to determine the volume and regulate how much coffee to pour.

TORcicada 7 hours ago||

Thanks for the thoughtful comments and links really appreciated the high-signal feedback. We've updated the article to better reflect the actual VAE-based AXOL1TL architecture (variational autoencoder for anomaly detection). Added the arXiv paper and Thea Aarrestad's talks to the Primary Sources.

dguest 3 hours ago|

While you are at it:

> To meet these extreme requirements, CERN has deliberately moved away from conventional GPU or TPU-based artificial intelligence architectures.

This isn't quite right either: CERN is using more GPUs than ever. The data processing has quite a few steps and physicists are more than happy to just buy COTS GPUs and CPUs when they work.

peelslowlysee 3 hours ago|

First internship, cern, summer 1989 on the opal lepc pit, wrote offline data filtering program in FORTRAN. Blast from the past.

More comments...