Posted by surprisetalk 1 day ago
The author finds, as many do, that naive or first-approximation approaches fail within certain constraints and that more complex methods are necessary to achieve simplicity. He finds, as I have, that perceptual and spectral domains are a better space to work in for things that are perceptual and spectral than in the raw data.
What I don't see him get to (might be the next blog post, IDK), is getting into constraints in the use of color - everything is in 'rainbow town' as we say, and it's there that things get chewy.
I'm personally not a fan of emissive green LED light in social spaces. I think it looks terrible and makes people look terrible. Just a personal thing, but putting it into practice with these sorts of systems is challenging as it results in spectral discontinuities and immediately requires the use of more sophisticated color systems.
I'm also about maximum restraint in these systems - if they have flashy tricks, I feel they should do them very very rarely and instead have durational and/or stochastic behavior that keeps a lot in reserve and rewards closer inspection.
I put all this stuff into practice in a permanent audio-reactive LED installation at a food hall/ nightclub in Boulder: https://hardwork.party/rosetta-hall-2019/
I really like your LED installation in Rosetta Hall, it looks beautiful!
https://en.wikipedia.org/wiki/Wicked_problem
Kinda funny but I am a fan of green LED light to supplement natural light on hot summer days. I can feel the radiant heat from LED lights on my bare skin and since the human eye is most sensitive to green light I feel the most comfortable with my LED strip set to (0,255,0)
(Note both the scanner in front of KITT and the visual FX on his dashboard when he speaks, which changes from season to season.)
The wickedness comes from wanting something that works just as well for John Summit as the Grateful Dead as Mozart and Bad Bunny.
But it seems like you could cheat for installations where the type of music is known and go from there. The other cheat is to have a "tap" button, and to pull that data and go from there.
mental note: the thought "it can't be that hard" when obviously it is sent me down a rabbit hole for a couple of hours
I wonder if transformer tech is close to achieving real-time audio decoding, where you can split a track into it's component instruments, and light show off of that. Think those fancy Christmas time front yard light shows as opposed to random colors kind of blinking with what maybe is a beat.
There was a nice paper with an overview last year too https://arxiv.org/html/2511.13146v1 that introduced RT-STT which is still being tweaked and built upon in the MSS scene
The high quality ones like MDXNet and Demucs usually have at least several seconds of latency though, but for something like displaying visuals high quality is not really needed and the real time approaches should be fine.
At the end it's "just" chunking streamed audio into windows and predicting which LEDs a window should activate. One can build a complex non-realtime pipeline, generate high-quality training data with it, and then train a much smaller model (maybe even an MLP) with it to predict just this task.
Another related project that builds on a similar foundation: https://github.com/ledfx/ledfx
I remember thinking really hard on what to do with color. Except like you say mine is pretty much a naive fft.
https://github.com/aleksiy325/PiSpectrumHoodie?tab=readme-ov...
Thanks for reminding me.
I tried recreating the app (and I can connect via BT to the lights) but writing the audio-reactive code was the hardest part (and I still haven't managed to figure out a good rule of thumb or something). I mainly use it when listening to EDM or club music, so it's always a classic 4/4 110-130bpm signature, yet it's hard to have the lights react on beat.
And of course, by the time I got it to work perfectly I never looked at it again. As is tradition.
It was fiddly, and probably too inaccurate for a modern audience but I can't claim it was diabolically hard. Tuning was a faff but we were more willing to sit and tweak resistor and capacitor values then.
“Most people who attempt audio reactive LED strips end up somewhere around here, with a naive FFT method. It works well enough on a screen, where you have millions of pixels and can display a full spectrogram with plenty of room for detail. But on 144 LEDs, the limitations are brutal. On an LED strip, you can't afford to "waste" any pixels and the features you display need to be more perceptually meaningful.”
(And it looks like the 7 frequencies are not distributed linearly—perhaps closer to the mel scale.)
I tried using one of the FFT libraries on the Arduino directly but had no luck. The MSGEQ7 chip is nice.
But perhaps you'd get better results if more of a ML speech/audio recognition pipeline were included?
Eg. the pipeline could separate out drum beats from piano notes, and present them differently in the visualization?
An autoencoder network trained to minimize perceptual reconstruction loss would probably have the most 'interesting' information at the bottleneck, so that's the layer I'd feed into my LED strip.
Effects themselves are written in embedded Javascript and can be layered a bit like photoshop. Currently it only supports driving nanoleaf and wled fixtures, though wled gives you a huge range of options. The effect language is fully exposed so you can easily write your own effects against the real-time audio signals.
It isn't open source though, and still needs better onboarding and tutorials. Currently it's completely free, haven't really decided on if I want to bother trying to monetize any of it. If I were to it would probably just be for DMX and maybe midi support. Or maybe just for an ecosystem of portable hardware.