Posted by ChernovAndrei 12/18/2025
They use 3 to 4 orders lower number of trained parameters and have just enough complexity that a team of 3 or four can handle several thousands of such streams.
Could you explain how ? Cause I am working on this essentially right now and it seems management is wanting to go the way of Deep NNs for our customers.
In general I would recommend get Hyndman's (free) book on forecasting. That will definitely get you upto speed.
https://news.ycombinator.com/item?id=46058611
Wishing you the best.
If it's the case that you will ship the code over client's fence and be done with it, that is, no commitments regarding maintenance, then I will say do what the management wants. If you will continue to remain responsible for the ongoing performance of the tool then you will be better if choosing a model you understand.
Management is leaning toward a deep learning forecasting approach — train a neural net to predict expected cost and then use multiple deviation scorers (including Wasserstein distance) to flag anomalies.
A simpler v1 is already live, and this newer approach isn’t my call. I’m still fairly new to anomaly detection, so for now I’m mostly trying to learn and ship within the existing direction rather than fight it.
For Chronos-2 (the current state of the art in time-series modeling), the setup is almost identical to that of LLMs because it is based on the T5 architecture. The main difference is that, in time-series models, tokens correspond to subintervals in the real-valued (ℝ) space. You can check the details here: https://arxiv.org/pdf/2510.15821