Work? I don't want it local at all. I want it all cloud agent.
The moment we see standardized and batteries-included pathways to integrate search, ideally at no additional cost, in things like LM Studio combined with better tool calling in the local models, you'll quickly see local model performance catch up.
If you are simply measuring Watt Cost per Token, you are missing the mark drastically. You have to measure quality output per Watt.
It sounds reasonably difficult to benchmark this, maybe I'm wrong though.
If we could even get something like GPT 5.5 running locally that would be quite useful.