Show HN: LemonSlice – Upgrade your voice agents to real-time video

Posted by lcolucci 1/27/2026

Hey HN, we're the co-founders of LemonSlice (try our HN playground here: https://lemonslice.com/hn). We train interactive avatar video models. Our API lets you upload a photo and immediately jump into a FaceTime-style call with that character. Here's a demo: https://www.loom.com/share/941577113141418e80d2834c83a5a0a9

Chatbots are everywhere and voice AI has taken off, but we believe video avatars will be the most common form factor for conversational AI. Most people would rather watch something than read it. The problem is that generating video in real-time is hard, and overcoming the uncanny valley is even harder.

We haven’t broken the uncanny valley yet. Nobody has. But we’re getting close and our photorealistic avatars are currently best-in-class (judge for yourself: https://lemonslice.com/try/taylor). Plus, we're the only avatar model that can do animals and heavily stylized cartoons. Try it: https://lemonslice.com/try/alien. Warning! Talking to this little guy may improve your mood.

Today we're releasing our new model* - Lemon Slice 2, a 20B-parameter diffusion transformer that generates infinite-length video at 20fps on a single GPU - and opening up our API.

How did we get a video diffusion model to run in real-time? There was no single trick, just a lot of them stacked together. The first big change was making our model causal. Standard video diffusion models are bidirectional (they look at frames both before and after the current one), which means you can't stream.

From there it was about fitting everything on one GPU. We switched from full to sliding window attention, which killed our memory bottleneck. We distilled from 40 denoising steps down to just a few - quality degraded less than we feared, especially after using GAN-based distillation (though tuning that adversarial loss to avoid mode collapse was its own adventure).

And the rest was inference work: modifying RoPE from complex to real (this one was cool!), precision tuning, fusing kernels, a special rolling KV cache, lots of other caching, and more. We kept shaving off milliseconds wherever we could and eventually got to real-time.

We set up a guest playground for HN so you can create and talk to characters without logging in: https://lemonslice.com/hn. For those who want to build with our API (we have a new LiveKit integration that we’re pumped about!), grab a coupon code in the HN playground for your first Pro month free ($100 value). See the docs: https://lemonslice.com/docs. Pricing is usage-based at $0.12-0.20/min for video generation.

Looking forward to your feedback!

EDIT: Tell us what characters you want to see in the comments and we can make them for you to talk to (e.g. Max Headroom)

*We did a Show HN last year for our V1 model: https://news.ycombinator.com/item?id=43785044. It was technically impressive but so bad compared to what we have today.

133 points | 132 commentspage 4

r0fl 1/27/2026|

Wow this is the most impressive thing I’ve seen on hacker news in years!!!!!

Take my money!!!!!!

lcolucci 1/27/2026|

Wow thank you so much :) We're so proud of it!!

ripped_britches 1/28/2026||

Very freaking impressive!

lcolucci 1/28/2026|

Thank you so much!

swyx 1/28/2026||

this is like Tavus but it doesnt suck. congrats!

lcolucci 1/28/2026|

Thank you! And the cool thing is it's actually a full video world model. We'll expose more of those capabilities soon

dsrtslnd23 1/28/2026||

where can I find the 20B model? it sounded like it would be open - but I am not sure with the phrasing...

andrew-w 1/28/2026|

We have not released the weights, but it is fully available to use in your websites or applications. I can see how our wording there could be misconstrued -- sorry about that.

jedwhite 1/27/2026||

That's an interesting insight about "stacking tricks" together. I'm curious where you found that approach hit limits. And what gives you an advantage if anything against others copying it. Getting real-time streaming with a 20B parameter diffusion model and 20fps on a single GPU seems objectively impressive. It's hard to resist just saying "wow" looking at the demo, but I know that's not helpful here. It is clearly a substantial technical achievement and I'm sure lots of other folks here would be interested in the limits with the approach and how generalizable it is.

sid-the-kid 1/27/2026||

Good question! Software gets democratized so fast that I am sure others will implement similar approaches soon. And, to be clear, some of our "speed upgrades" are pieced together from recent DiT papers. I do think getting everything running on a single GPU at this resolution and speed is totally new (as far as i have seen).

I think people will just copy it, and we just need to continue moving as fast as we can. I do think that a bit of a revolution is happening right now in real-time video diffusion models. There are so many great papers being published in that area in the last 6 months. My guess is that many DiT models will be real time within 1 year.

jedwhite 1/28/2026|||

> I do think getting everything running on a single GPU at this resolution and speed is totally new

Thanks, it seemed to be the case that this was really something new, but HN tends to be circumspect so wanted to check. It's an interesting space and I try to stay current but everything is moving so fast. But I was pretty sure I hadn't seen anyone do that. Its a huge achievement to do it first and make it work for real like this! So well done!

sid-the-kid 1/27/2026||||

One thing that is interesting: LLMs pipelines have been highly optimize for speed (since speed is directly related to cost for companies). That is just not true for real-time DiTs. So, there is still lots of low hanging fruit for how we (and others) can make things faster and better.

storystarling 1/27/2026|||

Curious about the memory bandwidth constraints here. 20B parameters at 20fps seems like it would saturate the bandwidth of a single GPU unless you are running int4. I assume this requires an H100?

andrew-w 1/27/2026||

Yep, the model is running on Hopper architecture. Anything less was not sufficient in our experiments.

wahnfrieden 1/28/2026||

Please add InWorld TTS integration

lcolucci 1/28/2026||

That's a good one. I would suggest asking them to integrate with LiveKit. Then it'll be really easy to combine InWorld and LemonSlice.

davidz 1/28/2026||

we gotchu: https://docs.livekit.io/agents/models/tts/inference/inworld/

sid-the-kid 1/28/2026||

never head of InWorld. Pretty impressive.

benswerd 1/27/2026||

The last year vs this year is crazy

lcolucci 1/27/2026||

Agreed. We were so excited about the results last year and they are SO BAD now by comparison. Hopefully we'll say the same thing again in the couple months

bluedel 1/28/2026||

Hopefully not. I'm impressed with the engineering, it is a technological achievement, but my only hope right now is that this tech plateaus pretty much immediately. I can't think of a single use-case that wouldn't be at best value-neutral, and at worst extremely harmful to the people interacting with it.

sid-the-kid 1/27/2026||

thanks! it just barley worked last year, but not much else. this year it's actually good. we got lucky: it's both new tech and turned out to be good quality.

shj2105 1/27/2026||

Not working on mobile iOS

lcolucci 1/27/2026|

what's not working for you?

ed_mercer 1/27/2026||

This looks super awesome!

sid-the-kid 1/27/2026|

thank you! it's by far the thing I have worked on that I am most proud of.

marieschneegans 1/27/2026|

This is next-level!

lcolucci 1/27/2026|

Thanks so much! We're super proud of it

More comments...