Posted by AndreSlavescu 14 hours ago
We at Hathora have recently released our ultra-low latency deployment of Qwen/Qwen3-Omni-30B-A3B-Instruct, one of the leading open source speech-to-speech-capable models.
Platform release:
https://models.hathora.dev/model/qwen3-omni
With the release, it got us thinking, what if we built a visualization tool to see what actually happens when you record audio and get a audio-response back? With that, we introduce our visualization website, that gives a high level overview of the individual pieces present in making speech-to-speech inference possible in qwen3-omni.
Feel free to give it a try, give us any feedback, and give the platform a try!