Posted by jeffmcjunkin 5 hours ago
In ChatGPT right now, you can have a audio and video feed for the AI, and then the AI can respond in real-time.
Now I wonder if the E2B or the E4B is capable enough for this and fast enough to be run on an iPhone. Basically replicating that experience, but all the computations (STT, LLM, and TTS) are done locally on the phone.
I just made this [0] last week so I know you can run a real-time voice conversation with an AI on an iPhone, but it'd be a totally different experience if it can also process a live camera feed.
The E2B/E4B models also support voice input, which is rare.
Google is the only USA based frontier lab releasing open models. I know they aren't doing it out of the goodness of their hearts.
total duration: 12m41.34930419s
load duration: 549.504864ms
prompt eval count: 25 token(s)
prompt eval duration: 309.002014ms
prompt eval rate: 80.91 tokens/s
eval count: 2174 token(s)
eval duration: 12m36.577002621s
eval rate: 2.87 tokens/s
Prompt: whats a great chicken breast recipe for dinner tonight? total duration: 37.44872875s
load duration: 145.783625ms
prompt eval count: 25 token(s)
prompt eval duration: 215.114666ms
prompt eval rate: 116.22 tokens/s
eval count: 1989 token(s)
eval duration: 36.614398076s
eval rate: 54.32 tokens/s