Posted by amarble 21 hours ago
The title asserts there is minimal downside to switching to open models, but the article provides zero evidence that this is true, and the author hasn't even attempted it yet. The end of the article states "I’m hoping it’s going to be minimal".
I wonder if I can get a post to the front page with the title: "There are no real barriers to humans colonizing Mars next month". And at the end, "I'm hoping there are no real challenges."
> There remains a clear penalty for being an open LLM user. Every leaderboard consistently gets topped by proprietary models served over API. Today on June 21, 2026, Claude and GPT are at the top of the Artificial Analysis intelligence leaderboard. That’s from the performance side. The compatibility side is worse too. Claude code just works, and more generally, the big two provide nice APIs that make them easy to use, and, even if it’s a low bar, are “trustworthy” in the sense that we’ve largely all agreed we don’t mind sending them our LLM queries and trust them to handle them appropriately.
> Open models are served via various means, some by the companies that released them and some by third parties like OpenRouter. Unfortunately, both of these routes are dodgier in terms of privacy and data sharing, and I would not feel the same comfort sending API calls containing client or confidential data to them3.
> The other option or course is to run them yourself. This solves the privacy issue but is at least two of expensive, complicated, and comparatively slow.
So...there's actually quite a bit of downside, then? Why the misleading title?
That's why I'm using eurouter.ai with the following routing rule for all my requests:
{
"model": "glm-5.2",
"models": [
"deepseek-v4-pro",
"deepseek-v4-flash"
],
"provider": {
"allow_fallbacks": true,
"data_collection": "deny",
"data_residency": "EU",
"max_retention_days": 0,
"eu_owned": true
}
}
Sure, it's quite expensive, but at least on a legal side data privacy is ensured. I trust them more than e.g. Anthropic, OpenAI or OpenRouter.Personally, I find it morally unacceptable to use U.S. AI tools, because I do not want to support them financially and thus support the crimes they are involved in[1].
What gets me the most is that they claim that the model should follow the https://www.anthropic.com/constitution and they claim that it's embedded into the model. However, system prompts in claude code and cowork re-iterate all of these points and if they're embedded you shouldn't need to do that. Now, if you ask the API version of claude to be a hitler supporter with enough prompt engineering it will become one which directly contradicts what they claim to do, opus 4.7 specifically will be happy to create anti-(insert minority group) propaganda although I haven't had the same success with 4.8 thus far, but I also haven't been motivated enough to push it in that direction yet since I've been more interested in exploting the cyber capabilities of the model.
My conclusion from the very start is that Anthropic's strategy are pure optics and considering the fact that there was an outpoor of support for the company I think it has been very successful.
On second thought, it's not funny.
And this is coming from a CEO who constantly claims moral superiority and advances the idea that China is bad
Regardless of Anthropic's "moral" position (inasmuch as a corporation can even have morals) against spying on non-Americans, they would have no way to enforce that limitation against the government because non-citizens outside of the USA have no protections from the intrusions of the US government.
More generally it would be overpowered by the Sovereign Acts Doctrine.
The facts aren’t identical to the 2008 Yahoo FISCR case but that case sets the tone for how any clauses like this would just be brushed under the rug.
I agree that the Apple case indicates that there’s a lot of uncertainty around this type of issue, at least post 1953 when title II of the DPA expired after Youngstown Sheet & Tube Co. v. Sawyer (1952)
Alleged red lines. Could be just talking points for garnering sympathy. Big tech aren’t exactly known for being truthful, especially big tech partnering with esteemed Palantir.
- The prices are ridiculous (15 % markup for free account).
- They have a rate limit of 1000 requests per month, unless you pay 40€ per month for ... what exactly is their value proposition?
- They have a single provider (TensorX) for DeepSeek-V4-Pro, with a cache read cost that is over 100 times higher than DeepSeek ($0.44 vs $0.003625). Notably, I had to look at the TensorX website for that information, since I could not find any information about cached token cost on eurorouter.ai.
If there aren't enough businesses who want to do this, the EU should figure out how it can properly incentivise that to change.
Low carbon does not equal expensive, either. Solar is the cheapest power generation method. Solar plus grid scale batteries is in the same cost ballpark as natural gas.
There’s nothing about data centers that is inherently a high carbon business. It’s only a high carbon business in places like the US where political leadership purposefully fights against renewable energy projects that private businesses want to undertake on their own dime.
EURouter (Amsterdam): https://www.eurouter.ai/pricing
Eden AI (France): https://www.edenai.co/pricing
nexos.ai (Lithuania): https://nexos.ai/pricing/
Requesty (Germany): https://www.requesty.ai/pricing
Cortecs (Austria): https://cortecs.ai/pricing
Nordference (Estonia): https://nordference.ai/pricing
Guess those are really popping up as mushrooms, eh? Not an endorsement of any of those on my part cause I haven't personally used them, but seems like there are at least options for those who need them.
"AI-assisted targeting in the Gaza Strip" - https://en.wikipedia.org/wiki/AI-assisted_targeting_in_the_G...
"Palantir allegedly enables Israel's AI targeting in Gaza, raising concerns over war crimes" - https://www.business-humanrights.org/de/neuste-meldungen/pal...
"What The Wounds Are Telling Us" - https://www.volkskrant.nl/kijkverder/v/2025/gunshot-palestin...
> Limited data retention and review as part of our safety work. Prompts submitted to, and outputs generated by, Mythos-class models are retained for 30 days for trust and safety purposes, on every platform where these models are offered.
> Change applies to organizations that have set up workspaces with zero data retention (ZDR) in Claude Console, use Claude Code with ZDR in Claude Enterprise, or access Claude through AWS Bedrock, Google Cloud Agent Platform, or Microsoft Foundry with ZDR.
https://support.claude.com/en/articles/15425996-data-retenti...
That's a pretty big downside if data privacy and sharing is one of the main concerns.
Do you have a sound reason to need EU data locality? You can.
Do you want the confidence (and are willing to accept the expense) of only running models on local hardware you control? You can.
Do you want the cheapest possible option - choosing a Chinese, for example, provider, or perhaps a provider offering it for free where you agree they can use your prompts? You can.
Do you need to comply with some kind of regulation like GDPR or rules for contracting with the U.S. federal government? No problem. (Although I'm still waiting for DeepSeek V4 to show up on Amazon BedRock so it can be used from GovCloud...)
Do you have moral objections and want to actually live by them? You can.
Models are converging, but they converge in bands, and frontier is frontier. I would not like to have any workflows in any area of my business where output is generated by an assortment of models from different providers. For trivial, mundane tasks that might be fine, but it certainly doesn't apply across the board.
Maybe it was funny to you, but designing data platforms that respect GDPR and involve LLMs is a thing.
The age old joke;
A Russian and an American are drinking at a bar
The Russian says "I'm impressed by american propaganda. It's so subtle but effective."
The american responds "What are you talking about, we don't do propaganda."
A Russian and an American get on a plane in Moscow and get to talking. The Russian says he works for the Kremlin and he's on his way to go learn American propaganda techniques.
"What American propaganda techniques?" asks the American.
"Exactly," the Russian replies.
https://www.law.cornell.edu/wex/covert_propaganda
To be clear, I'm happy to grant that:
* The Pentagon won't provide jets for your war movie if your war movie portrays the US military in a bad light
* The US engages in information operations in foreign countries, e.g. discouraging people in the Philippines from getting the Chinese COVID vaccine
* Voice of America and similar US-government sponsored outlets are, in fact, sponsored by the US government
But the notion that covert, English-language US government propaganda is ubiquitous and effective seems like a half-baked, un-falsifiable conspiracy theory with little supporting evidence.
The internet is full of false or misleading claims about the US which go un-refuted. There's just way too much low-hanging fruit going un-picked here to believe that the USG is running massive English-language covert propaganda ops.
A specific example of a false anti-American claim which is extremely widespread: Many Europeans believe that the US promised to protect Ukraine in the 1994 Budapest Memorandum. This is false. We only promised to go to the UN Security Council, which we did. You can verify for yourself with a quick trip to the UN website, the memorandum is not very long: https://treaties.un.org/doc/Publication/UNTS/Volume%203007/P...
If the American government possessed the propaganda wizardry that people ascribe to it, I expect the entire internet would be well-acquainted with the actual contents of this memorandum. Instead, you have randos like me trying to fight a tsunami of misinformation (likely Ukrainian-origin) related to this memorandum, using only a shovel.
European here, following the Ukraine situation closely. I absolutely never heard that one. The main issue in the 1994 Budapest Memorandum that has been mentioned in the media in recent years is that Russia would respect the independence, sovereignty, and existing borders of Ukraine, which is clearly there in article 1. Thanks for the link though, it is quite enlightening.
> ... misinformation (likely Ukrainian-origin) ...
Your post is also "a half-baked, un-falsifiable conspiracy theory with little supporting evidence" ;)
If the US was attacked the way Ukraine was attacked, and foreign intervention was key to our survival as a nation, I expect the Pentagon would deploy foreign info ops in that situation. That doesn't seem like a heavy lift to me.
Occam's Razor: If something is a core/essential national interest, it's reasonable to expect a government to pull out all the stops. But governments are fairly ineffectual for the most part. Everyone saw how the USG mishandled e.g. COVID, mishandled the war with Iran, yet we expect the USG to be wizards at covert propaganda? It doesn't really track. I'm sure we are doing covert propaganda here and there, and we would ramp it up in an emergency.
Anyways, if you want to point to specific content you suspect as USG propaganda, be my guest. My point is, the fact that people rarely do this seems evidence against widespread USG propaganda. "They don't point it out because the propaganda is too good" has a suspicious un-falsifiable quality to it.
okay....
>People who reference supposed US government propaganda rarely provide much in the way of concrete examples.
YOU'VE ALREADY SAID THAT
A few years prior to the Budapest Memorandum, the UN Security Council had authorized military action to liberate Kuwait. 42 countries participated in the coalition that drove Iraqi forces out of Kuwait: https://en.wikipedia.org/wiki/Coalition_of_the_Gulf_War
The expectation at the time was clearly more than just "we'll bring it up at the UN for dicussion". The current weaseling over the exact wording looks weak and pathetic, and has a certain flavor of propaganda that tries to convince everyone of something that's not quite true. The fact remains that the US strong-armed Ukraine out of nuclear weapons, and when Ukraine was eventually invaded, tried to strong-arm Ukraine into surrender. This reflects very poorly on the US.
...
"A ‘no’ vote from any one of the five permanent members of the Council stops action on any measure put before it. The body’s permanent members are: China, France, Russian Federation, the United Kingdom, and the United States."
https://news.un.org/en/story/2022/02/1112802
(emphasis mine)
This is 101-level UN stuff. If Ukrainian diplomats were unaware that Russia can veto Security Council resolutions, that means they were totally incompetent.
It's also misleading to say the US "strong-armed" Ukraine out of its nukes... it was originally Ukraine's idea to abandon nukes, and they didn't have the control codes for the nukes on their territory anyways. The US attempted influence via carrots (financial assistance), not sticks ("strong-arming").
In any case, we did far more than just bring it up at the UN for discussion. See this map from a year or two ago: https://pbs.twimg.com/media/HKNCFWPbEAA7p5g?format=jpg&name=...
Mostly, in response to US generosity, Europeans just complained that the US should give even more. Your comment illustrates this perfectly--you speak as though the US only responded via UN diplomacy, completely neglecting over one hundred billion dollars the US sent in Ukraine aid, to a country which is not even a treaty ally of ours. When Biden was president, right after he saved Ukraine's butt in the initial invasion, public opinion of the US in Europe was barely even net-positive.
The real question is why Europeans spend so much time harassing the US for Ukraine funds, and so little time harassing tight-fisted countries which are actually in Europe like Ireland, Switzerland, Austria, Spain, etc. The answer: Europe has a transatlantic philosophy that the US brings the guns and the Europeans bring the scolding. As long as Ireland/Switzerland/Austria/Spain nod along with the scolding, they are doing their part, as far as Europe is concerned.
> This is 101-level UN stuff. If Ukrainian diplomats were unaware that Russia can veto Security Council resolutions, that means they were totally incompetent.
There are ways around it, if there's a will: https://en.wikipedia.org/wiki/United_Nations_General_Assembl...It is safe to say that the present lack of leadership from the US was not foreseen at the time. It was unimaginable that Russia would launch a major ground war in Europe and that the American president would blame the victim of the aggression and try to coerce them into surrender while sucking up to the aggressor. This is not how things were conducted back then. It was the era of Schwarzkopfs showing strength and resolve by giving presentations on how coalition tanks had pummeled the enemy in the past few weeks, not of Sullivans showing weakness and indecisiveness by endlessly yapping about "escalation".
The core problem is that the US has spent almost a century embedding itself in all kinds of relationships (cultural, political, economic, military), but has lost the ability to carry out that central role. Biden did not save Ukraine. The limited but valuable military support fostered an unhealthy relationship that gave the US a veto over Ukraine's (and other allies') actions, but the US leaders do not have the statemanship to use that power responsibly. Biden's legacy is the shortsighted micromanagement that turned the fast and effective Ukrainian counteroffensives of 2022 into slow and costly trench warfare of 2026, all while emboldening enemies like Iran to launch assaults like October 7th.
It really amazes me how much misinformation is out there about this thing. It only has six points, each one a single paragraph long. It's very quick and easy to read, yet people apparently can't be bothered to look up the actual text of the thing they're discussing.
That's only one consequence of Trump's de-facto betrayal of Ukraine in support of his daddy figure in the Kremlin.
I completely agree about no countries giving up their nukes in the future, but that's a consequence of the weak agreement, plus other actions like knocking over Iraq and Libya but not North Korea, tearing up the JCPOA with Iran, and... well, it seems like non-proliferation is mostly lip service in general.
This seems tautological because Europe is pretty weak on the values that people in the US might care about (freedom of speech, limited govt, etc).
What values specifically are you optimizing for here?
The US federal government forced Paramount to take Colbert off the air. Seems that people in the US don’t actually value these things.
> What values specifically are you optimizing for here?
Probably not being fascist.
Not really; the Ellisons are quite close to Trump. Nobody was forced to do anything. Had the FCC actually revoked their license, and had Paramount actually been willing to fight, they could have sued. It's not easy to force anyone that rich to do anything; the state works on behalf of capital. It seems like europe is more aware of the meaningless bluster than the actual crimes being committed
There are much better things to point to to illustrate the deterioration of the rule of law, like blatantly illegal deportation of citizens without due process. Or raping children in concentration camps under the guise of cracking down on crime. We may never even know who was seized and what happened to them and there's little incentive for our very pro-corporate media to report on it.
But sure, paramount is the real victim here.
https://en.wikipedia.org/wiki/Merger_of_Skydance_Media_and_P...
Read that timeline and then see if you're still convinced that they didn't at least seem to have done a thing or 2 to appease the federal government
There’s just no comparison really. You must really be inhaling some nonsense X propaganda if you think government overreach is worse in Western Europe.
https://www.facebook.com/story.php?story_fbid=13879460433775...
„ deployed the military in cities they don’t like for no other reason than intimidation of political rivals” That’s one perspective on simply trying to enforce laws.
Moreover, let’s not forget about how Biden government tried to silence Rogan.
And that's a good thing.
Edit: c'mon people, if you're going to use such ambiguous phrases at least have the spine to clue the reader in to what you want them to refer to in this context.
Of course there were also absolutism, colonialism, the jacobines, nazism & facism, to name just a few. Part of western values, from my perspective at least, is an implicit promise, that what happened in the 20th century with facism was the darkest hour, so to speak-> never again
Then you haven't been paying attention. The constitution prevents citizens from being convicted, but that doesn't stop arrests or being turned away at the border (even for permanent residents who've lived in the US for decades), and US citizens don't seem to care, so it's cold comfort for many of us.
I think maybe you haven't been paying attention.
Most of us do care. Trump's approval rating is pretty low at 36%, and his disapproval rating is high. Just because he's still causing chaos doesn't mean the majority of us don't care about it. There's just no legal way to remove him, and his cronies simply won't do it - there's not enough votes in congress or he would have been gone after his first or second impeachment.
https://www.npr.org/2026/06/20/nx-s1-5861764/trumps-job-appr...
Don’t get me wrong, I know the thousands reasons why you won’t join a protest, I’m „guilty“ myself. I just want to argue against your argument that I quoted because this puts all of us in an unhelpful victim mentality.
Hah. When was the last time a non-violent protest yielded some kind of result by itself? Certainly never in american history.
Anyway, there are daily protests. They just aren't covered by the media. Hell, the protests for palestine never stopped... the media just never wanted to cover them.
Terrorist attacks, kidnappings, etc made that change take longer. What made MLK Jr so unique was that he carried a message of peace, not a message of war.
The militant factions never had any real power and would have never been close to powerful enough to overthrow the government, and if they’d been more successful, would have swayed the masses’ opinions in the wrong direction.
https://en.wikipedia.org/wiki/List_of_protests_and_demonstra...
Trump's highest rating was ~47% when he came into office, but he was pretty stably in the low 40s until the new war. The actual drop is somewhere from ~40-42 to ~36-38 - about 10% of his base. Significant, but probably not enough to actually matter unless it drops further.
But the turnout at the periodic nationwide "No Kings" protests has been very good, and they have fortunately stayed peaceful.
checks notes what's this? The protests were organized by oligarchic lackeys? Hmm
By contrast, Biden at the same point in his term was hovering around 39%, for the heinous crime of... rebuilding the US economy? Including some woke riders in his infrastructure bill?
At this point, a fair assessment of US citizens is that on average, they seem to consider that being a right-wing autocrat wannabe, threatening to invade allied countries "as a negotiating tactic", being a climate change denier, starting a humiliating failed war, trying to blackmail the press into compliance, etc, are about 3% worse than being a cringe center-left bureaucrat.
"US citizens don't seem to care" is an apt hyperbole.
When the parties are both fucking stupid when it comes to issues that matter, the entire right/left spectrum goes out the window.
https://www.fox4news.com/news/woman-arrested-facebook-post-c...
X seems to work great. Inciting men in with gambling, porn, crypto, ai and other broistan staples, then feeding them far-right nonsense info points.
The numbers commonly being reporting include stalkers, criminals, etc.
You don't get arrested for being politically incorrect in the UK. You get arrested for posting something threatening, harassing, inciteful, or grossly indecent. Also, being arrested and being charged are two completely separate things.
https://eternallyradicalidea.com/p/the-situation-for-free-sp...
In any case, practically speaking, censorship helped the rise of the Nazis: https://www.fire.org/news/blogs/eternally-radical-idea/would...
You can see far-right parties surging across Europe. Speech restriction isn't just authoritarian, it's also counterproductive.
As an American I am actually quite worried about Europe's far right. Those guys are very scary, and it's creepy the way they have been able to influence the right here in the US. The MAGA movement was far more multicultural back in the 2010s, before Europe's far right was able to influence it with their ethnic cleansing and pogrom fantasies.
If you're in a hole, maybe stop digging?
Why not host in east asia? Or southeast asia? Or south america? Or africa? Then you avoid both the government with incentive to spy on you (assuming you live in the EU) and american companies.
If anything the EU puts limits on what EU member countries and companies can do. By hosting in one of the EU countries you have stronger legal guarantees on data privacy than in any other area. A possible exception is Switzerland (not a EU member), which historically has had even stronger privacy laws, though these have been weakened recently IIRC.
You do not seem to understand what the EU is. It is not a country, it does not have a police or anything like the NSA.
I know LLMs move at the speed of light (especially these past few quarters), but if Opus and GPT "a few months ago" were really like open weight models, then there's really no reason to not switch, especially for those who were using these models a few months ago.
Your codebase didn't change, so use the open weight model. Don't move the goalposts.
So yeah, I'm totally fine using Kimi-2.7, GLM-5.2 or Deepseek-v4. I think we've already hit the ceiling and most improvements now seem to be from harness improvements and slightly better RL to improve reasoning/tool calling.
It’s pretty good at catching when performance is degraded. It was for a week or so before Fable launched for instance, probably due to a/b testing or capacity as you noted.
Maybe the truth is the newest models aren't actually as impressive as we thought. Maybe our perception of progress is being manipulated via months of gradual, silent and unverifiable degradation.
Let’s say I’m a bad faith LLM operator, and I want to degrade my model so the next release looks better and people want to switch to the more expensive one. How would I do that?
They wouldn't even need to do this uniformly, quantized versions of the model could be routed only a subset of the requests. They could do this to nerf the old model, or more likely just to give themselves more hardware to run the new one on by handling more requests on less hardware. Or to handle increased request volume as traffic ramps up faster than hardware can be provisioned.
Playing with local models at various quants, the degradation can be hard to spot. Sometimes it's only noticeable in aggregate. And even then, you never really know if you just got unlucky with a bad response due to RNG.
I've had Opus 4.6 fall into some weirdly incoherent loops that I rarely see from even Sonnet, that felt like the kind of thing I got frequently with Qwen3.5 9B on local. And the above applies... Was that just bad RNG? Or was my request to Opus routed to some lower quality variant? There's no great way for me to tell for any given request, nor any way to guarantee Anthropic _didn't_ do that.
I don't seem to get any of this with GPT-5.5 or GPT-5.5-Pro (not that I use 5.5-Pro enough to know for sure, but when I do use it, it never seems nerfed).
At least it's going to be usable as a very high end gaming PC.
There is also a low probability that someone enters peace negotiations solely to threaten the negotiators with death, yet here we are. With these guys it is: Better safe than sorry.
I didn't appreciate this until I started down that road myself.
Couldn't have put it better myself. That's what all this comes down to. Owning the hardware, owning the inference. Not perpetually renting them out on a meter like in the dystopian future they're envisioning.
lol his already happened with Fable!
Long term predictability ought to far outweigh a few more cycles of performance.
The top models also seem to have inconsistent performance depending on the time of day and how far we are from the next release.
Even with minor automation I feel like I can watch OpenAI and Anthropic engineers fiddling in real-time. Tuesdays behaviour changes by Thursday, 10AMs production isn’t possible at 11:30AM. Nutty.
Which is what I suspect the providers are doing to fit more inference on the same amount of hardware over time.
https://marginlab.ai/trackers/claude-code-historical-perform...
There were at least a couple of these degradation trackers.
I experiment a lot with the open models and I’m getting tired of this trope. I’m not yet convinced that even the best open weight models are equal to Opus from “a few months” ago.
I know what the benchmarks say. I had higher hopes. My real experience just doesn’t match the benchmarks.
I also do a lot of work that even Opus 4.8 struggles with. When even the cutting edge LLMs aren’t all the way there yet, my motivation to switch to something even further behind just isn’t there.
5.2 lives up to the hype. I don't find it to be the best at anything except coding. But for coding... yeah, it lives up to the hype. Not quite Opus 4.8-level, but I would feel comfortable comparing it to 4.5, at least if it had vision capabilities.
That's exactly the problem I have... with Anthropic and "Open""AI"
The moat is so flat, it only gives +1 food and +1 production. +1 gold with a road.
The really interesting thing is that it's typically those very same accounts who were explaining, a few months ago, that thanks to their commercial model they were gaining so much time and producing so much fantastic code.
A few months passes and suddenly the open-source model have caught up with the models that were gaining them so much time and that produced amazing code (in production everywhere for sure btw) but... It's impossible to work with these models.
Rinse and repeat.
The current models, according to them, are basically AGI and they can go fishing while paid subscriptions solve the world's problems.
But when it six months there shall be new closed, pricey, models and when the open ones shall have reach the level of Fable, we'll hear how it's impossible to work in late 2026 on a model that is "only at the level of Fable".
These people should have been snake-oil salesmen (and it could be what they actually are).
Not unusual in the tech space, but this has been basically constantly happening for two years now? I can't imagine the improvements are more than incremental at this point.
Just like the OS ecosystem I think we'll see a similar trajectory with OAI, Anthropic and Google but on a much accelerated time scale. I think the lobbying has begun to lock in their fate for revenue - because none of them give a shit about their users. I do hope, however, that Anthropic continues to over rotate and continue to gimp their models into uselessness. I just asked Opus 4.8 the other day to look at some code as an adversary and summarize areas that should be addressed. Nothing specific and it shut down the conversation. However starting a new prompt and prodding the model from a different angle yielded the results I asked for directly. Pick a lane. Or, don't and continue to lose industry respect and consideration.
10% failure rate would drive me absolutely insane.
not all of us are doing noob shit lol
Edit: To clarify what I mean by this:
Anyone who uses LLMs for larger-than-small-module code generation, pretend-not-vibecoding (a.k.a spec-driven development), or outright vibecoding, etc., is using an LLM "heavily", IMO.
The appropriate things to use them for is information retrieval, plus as a basic extra signal in debugging, code understanding, quality checks, and so on.
Also, it's not illegal to be incompetent. Most people were incompetent long before LLMs showed up, it's not some rarity.
How did we get from prising software freedoms to this?
I don’t think the hardware requirements are relevant. If a research lab publishes the code their particle collider runs under the GPL, that doesn’t make it not OSS even though they’re the only ones on the planet with the hardware to run it.
On the spectrum of:
careful engineering--hacking--mad science
This kind of thing falls far towards the mad science end of the scale, but has proven effective.[1] It seems inevitable that decent local models will be possible as the technology and the hardware is improving at a rate beyond the growth of the knowledge base to be distilled.
> sending your money
akchyually if you do it right, you are sending negative money; fair enough otherwise
> I’m hoping it’s going to be minimal.
I have multiple subscriptions and I pay per token to try out different LLM providers through OpenRouter. I also run open weight models locally.
I just can’t agree yet. The models from Anthropic and OpenAI really are that much better than anything else. The open weight models must be universally benchmaxxed across the board because my real world experience with them is very different than what the benchmarks imply. I get downvoted a lot for speaking about my experience because I don’t think it’s the reality that people want to hear right now, but it’s true for complex work.
I do think there are a lot of easier tasks that can be handled appropriately by the open weight models in the hands of a skilled operator. If an entire job is simple enough that you wouldn’t hesitate to hand it off to a junior with a little supervision then any model will do. However for a lot of the work I do, even Opus 4.8 on Max requires a lot of attention and extra steering and review to keep it on track. Fable did, too, though to a lesser degree. When I try to use the big open weight models (hosted, because they’re not running at reasonable speeds locally at a quantization I can tolerate) it feels like I spend more time waiting while they burn tokens for output that I probably have to reject anyway, at least for the bigger tasks. I wish they were there, but that’s not the case yet.
> There remains a clear penalty for being an open LLM user.
The conversation here _around_ the article is interesting, but the article itself boils down to “I’m going to try using open models and hope for the best.”
Having played a bit with Fable, reinforced the above.
This certainly seems feasible for open weight models eventually, but I'm still extremely skeptical of the claims about reaching this level with any open weight model that can be run locally (nevermind the hardware costs to do so practically).
1. Unfortunatly in my tests the open models do not (yet?) rival, at least Claude Opus, for software development/engineering and adjacent tasks.
2. Enjoy while it lasts. I'll be genuinly amazed these open models will not be declared 'illegal' under some security pretense by the end of the year. And I say 'pretense' because the primary driver will be regulatory capture and industry protectionism.