Show HN: LemonSlice – Upgrade your voice agents to real-time video

Posted by lcolucci 1/27/2026

Hey HN, we're the co-founders of LemonSlice (try our HN playground here: https://lemonslice.com/hn). We train interactive avatar video models. Our API lets you upload a photo and immediately jump into a FaceTime-style call with that character. Here's a demo: https://www.loom.com/share/941577113141418e80d2834c83a5a0a9

Chatbots are everywhere and voice AI has taken off, but we believe video avatars will be the most common form factor for conversational AI. Most people would rather watch something than read it. The problem is that generating video in real-time is hard, and overcoming the uncanny valley is even harder.

We haven’t broken the uncanny valley yet. Nobody has. But we’re getting close and our photorealistic avatars are currently best-in-class (judge for yourself: https://lemonslice.com/try/taylor). Plus, we're the only avatar model that can do animals and heavily stylized cartoons. Try it: https://lemonslice.com/try/alien. Warning! Talking to this little guy may improve your mood.

Today we're releasing our new model* - Lemon Slice 2, a 20B-parameter diffusion transformer that generates infinite-length video at 20fps on a single GPU - and opening up our API.

How did we get a video diffusion model to run in real-time? There was no single trick, just a lot of them stacked together. The first big change was making our model causal. Standard video diffusion models are bidirectional (they look at frames both before and after the current one), which means you can't stream.

From there it was about fitting everything on one GPU. We switched from full to sliding window attention, which killed our memory bottleneck. We distilled from 40 denoising steps down to just a few - quality degraded less than we feared, especially after using GAN-based distillation (though tuning that adversarial loss to avoid mode collapse was its own adventure).

And the rest was inference work: modifying RoPE from complex to real (this one was cool!), precision tuning, fusing kernels, a special rolling KV cache, lots of other caching, and more. We kept shaving off milliseconds wherever we could and eventually got to real-time.

We set up a guest playground for HN so you can create and talk to characters without logging in: https://lemonslice.com/hn. For those who want to build with our API (we have a new LiveKit integration that we’re pumped about!), grab a coupon code in the HN playground for your first Pro month free ($100 value). See the docs: https://lemonslice.com/docs. Pricing is usage-based at $0.12-0.20/min for video generation.

Looking forward to your feedback!

EDIT: Tell us what characters you want to see in the comments and we can make them for you to talk to (e.g. Max Headroom)

*We did a Show HN last year for our V1 model: https://news.ycombinator.com/item?id=43785044. It was technically impressive but so bad compared to what we have today.

133 points | 132 commentspage 2

bn-l 1/28/2026|

I wish I could invest in this company. Really. This is the most exciting revenue opportunity I’ve seen during this recent AI hype cycle.

lcolucci 1/28/2026|

That's super nice of you to say. Thank you!

wumms 1/27/2026||

You could add a Max Headroom to the hn link. You might reach real time by interspersing freeze frames, duplicates, or static.

sid-the-kid 1/27/2026||

And, just like that, Max Headroom is back: https://lemonslice.com/try/agent_ccb102bdfc1fcb30

sbarre 1/28/2026||

That.. is not Max Headroom.

lcolucci 1/28/2026|||

Can you help us make him? What's the right voice? https://lemonslice.com/hn

sbarre 1/29/2026||

https://www.youtube.com/watch?v=cYdpOjletnc

andrew-w 1/28/2026|||

I wonder how it would come across with the right voice. We're focused on building out the video layer tech, but at the end of the day, the voice is also pretty important for a positive experience.

sid-the-kid 1/27/2026||

1) yes on Max Headroom. we are on it. 2) it already is real time...?

wumms 1/27/2026||

Whoops! Mistook the "You're about to speak with an AI."-progress bar for processing delay.

lcolucci 1/27/2026|||

I wonder if we should make the UI a more common interface (e.g. "the call is ringing") to avoid this confusion?

It's a normal mp4 video that's looping initially (the "welcome message") and then as soon as you send the bot a message, we connect you to a GPU and the call becomes interactive. Connecting to the GPU takes about 10s.

sid-the-kid 1/27/2026|||

Makes sense. The init should be about 10s. But, after that, it should be real time. TBH, this is probably a common confusion. So thanks for calling it out.

dang 1/28/2026||

https://lemonslice.com/hn/agent_4d10f62632fd841b

(Update of https://news.ycombinator.com/item?id=43785494)

lcolucci 1/28/2026|

The curve is accelerating!

Obertr 5 days ago||

Have just tried it. Impressive. We are definitely moving into this space

Questions what are the main differences between you and anam.ai ? They also do real time lip sync plus looks like they are cheaper. Do you optimise for price or quality? And do you focus on lipsync or full movement etc?

zvonimirs 1/27/2026||

We're launching a new AI assistant and I wanted to make it alive so I started to play around with LemonSlice and I loved it!! I wanted to make our assistant be like a coworker that can give it an ability to create Loom style videos. Here's what I created - https://drive.google.com/file/d/1nIpEvNkuXA0jeZVjHC8OjuJlT-3...

Anyway, big thumbs up for the LemonSlice team, I'm excited to see it progress. I can definitely see products start coming alive with tools like this.

sid-the-kid 1/27/2026||

Very cool! Thanks for sharing. I love your use-case of turning an AI coding agent into more of an AI employee. Will be interesting to see if users can connect better with the product this way.

bzmrgonz 1/27/2026||

How did your token spend add up? I'm hesitant with evil customers racking up ai charges just to shit and giggles. Even competitors might sponsor some runaway charges.

r0fl 1/27/2026||

Wow I can’t get enough of this site! This is literally all I’ve been playing with for like half an hour. Even moved a meeting!

My mind is blown! It feels like the first time I used my microphone to chat with ai

lcolucci 1/27/2026||

This comment made my day! So happy you're liking it

sid-the-kid 1/27/2026||

glad we found somebody who likes it as much as us! BTW, biggest thing we are working to improve is speed of the response. I think we can make that much faster.

peddling-brink 1/27/2026||

I got really excited when I saw that you were releasing your model.

> Today we're releasing our new model* - Lemon Slice 2, a 20B-parameter diffusion transformer that generates infinite-length video at 20fps on a single GPU - and opening up our API.

But after digging around for a while, searching for a huggingface link, I’m guessing this was just a unfortunate turn of phrase, and you are not in fact, releasing an open weights model that people can run themselves?

Oh well, this looks very cool regardless and congratulations on the release.

andrew-w 1/28/2026||

Thanks! And sorry! I can see how our wording there could be misconstrued. With a real-time model, the streaming infrastructure matters almost as much as the weights themselves. It will be interesting to see how easily they can be commoditized in the future.

sid-the-kid 1/28/2026||

Thank you! We are considering to release an open-source version of the model. Somebody will do it soon. Might as well be us. We are mostly concerned with the additional overhead of releasing and then supporting it. So, TBD.

js4ever 1/28/2026||

Overhead? None Your real concern is: will potential customers run the model by themselves and skipping us?

Answer is no because you will eventually release a subpar model not your sorta model.

Also people don't have infrastructure to run this at scale (100-500 concurrent users) at best they can run it for 1-2 concurrent users.

This could be a good way for peoples to test it then use your infra.

Ah but you do have an online demo, so you might think this is enough, WRONG.

leetrout 1/28/2026||

Quick feedback if you're still monitoring the thread:

I did /imagine cheeseburger and /imagine a fire extinguisher and both were correctly generated but the agent has no context. when I ask what they are holding in both cases they ramble about not holding anything and referencing lemons and lemon trees.

I expected it to retain the context as the chat continues. If I ask it what it imagined it just tells me I can use /imagine.

lcolucci 1/28/2026||

Good idea. We need to do that. I'm also excited to push the /imagine stuff further and have B-roll interspersed with the talking (like a documentary) or even follow the character around as they move (like a video game)

andrew-w 1/28/2026||

Not something we had thought to do tbh, but would definitely enhance the experience. And, should be reasonable to do. Thanks!

jamesdelaneyie 1/28/2026||

I didn't know /imagine could be followed by a prompt, but similarly I asked the avatar about it's appearance and stated it had none. Should probably give it the context of what it's appearance is like, same thing happened for questions like where are you? What are you holding? Who's that behind you? etc etc

lcolucci 1/28/2026||

This is so obvious now that you say it (* facepalm *). We definitely need to give the LLM context on the appearance (both from the initial image as well as any /imagine updates during the call). Thanks for pointing it out!

dreamdeadline 1/27/2026||

Cool! Do you plan to expose controls over the avatar’s movement, facial expressions, or emotional reactions so users can fine-tune interactions?

lcolucci 1/27/2026||

Yes we do! Within the web app, there's a "action text prompt" section that allows you to control the overall actions of the character (e.g. "a fox talking with lots of arm motions"). We'll soon expose this in the API so you can control the characters movements dynamically (e.g. "now wave your hand")

sid-the-kid 1/27/2026||

Our text control is good, especially for emotions. For example, you can add the text prompt: "a person talking. they are angry", and agent will have an angry expression.

You can also control background motions (like ocean waves, or a waterfall or car driving).

We are actively training a model that has better text control over hand motions.

sid-the-kid 1/27/2026|

hey HN! one of the founders here. as of today, we are seeing informational avatars + roleplaying for training as the most common use cases. The roleplaying use-case was surprising to us. Think a nurse training to triage with AI patients. Or, SDRs practicing lead qualification with different kinds of clients.

More comments...