Posted by tamnd 20 hours ago
This is probably far from the raw intelligence provided by cloud providers.
Still, this shines more light on local LLMs for agentic workflows.
Are there any architectures that don't rely on feeding the entire history back into the chat?
Recurrent LLMs?
Not really. That's going to land you somewhere in the 0.2-0.5 tokens a second range
Lovely as modern nvmes are they're not memory
Nonetheless eventually i want to build an at-home system. I imagine some smaller local model could handle metadata assignment quite well.
edit: Though TIL Mac Studio doesn't offer 512GB anymore... DRAM shortage lol. Rough.
https://artificialanalysis.ai/models?models=gpt-5-5%2Cgpt-5-...
This is also a fine example of a vibe-coded project with purpose, as you acknowledged.