Posted by helsinkiandrew 22 hours ago
https://openai.com/index/next-phase-of-microsoft-partnership...
I think the biggest winner of this might be Google. Virtually all the frontier AI labs use TPU. The only one that doesn't use TPU is OpenAI due to the exclusive deal with Microsoft. Given the newly launched Gen 8 TPU this month, it's likely OpenAI will contemplate using TPU too.
What's unclear to me is how much Google uses GPUs for their own stuff. Yes Gemini runs on GPUs now, so that Google can sell Gemini on-prem boxes (recent release announced last week), but is any training or inference for Gemini really happening on GPUs? This is unclear to me. I'd have guessed not given that I thought TPUs were much cheaper to operate, but maybe I'm wrong.
Caveat, I work at Google, but not on anything to do with this. I'm only going on what's in the press for this stuff.
TPUs are at least dogfooded by Google deepmind, no team AFAIK has gotten the AMD stack to train well.
Pull quotes:
AMD’s software experience is riddled with bugs rendering out of the box training with AMD is impossible. We were hopeful that AMD could emerge as a strong competitor to NVIDIA in training workloads, but, as of today, this is unfortunately not the case. The CUDA moat has yet to be crossed by AMD due to AMD’s weaker-than-expected software Quality Assurance (QA) culture and its challenging out of the box experience.
[snip]
> The only reason we have been able to get AMD performance within 75% of H100/H200 performance is because we have been supported by multiple teams at AMD in fixing numerous AMD software bugs. To get AMD to a usable state with somewhat reasonable performance, a giant ~60 command Dockerfile that builds dependencies from source, hand crafted by an AMD principal engineer, was specifically provided for us
[snip]
> AMD hipBLASLt/rocBLAS’s heuristic model picks the wrong algorithm for most shapes out of the box, which is why so much time-consuming tuning is required by the end user.
etc etc. The whole thing is worth reading.
I'm sure it has (and will continue to) improved since then. I hear good things about the Lemonade team (although I think that is mostly inference?)
But the NVidia stack has improved too.
if they had this management attitude, they wouldn't have been so far behind so as to need this action in the first place!
> “Are we afraid of our competitors? No, we’re completely unafraid of our competitors,” said Taylor. “For the most part, because—in the case of Nvidia—they don’t appear to care that much about VR. And in the case of the dollars spent on R&D, they seem to be very happy doing stuff in the car industry, and long may that continue—good luck to them.
https://arstechnica.com/gadgets/2016/04/amd-focusing-on-vr-m...
"car industry" is linked to the GPU-accelerated self-driving car work, ie, making neural networks run fast on GPUs: https://arstechnica.com/gadgets/2016/01/nvidia-outs-pascal-g...
Maybe Amazon is an example how this happens even to hardware divisions within software/logistics companies
ROCm works great too, the only issue i have had is that my machine froze a couple of times as it used 100% of the graphics and the OS had nothing left. Since moving to vulcan i stopped getting these errors apart from a little UI slowdown when i had 4 models loaded at the same time taking turns.
Im also on a i7 6700 with 32gb DDR4 so im sure that is causing more slowdowns then the graphics card.
Anthropic did retire an interview take-home assignment involving optimising inference on exotic hardware, because Claude could one shot a solution, but that was clearly a whiteboard hypothetical instead of a real system with warts, issues and nuance.
int8 quantization seems like it's almost supported, but not quite. speeds drop to a fraction of full precision speed and the server seems like it intermittently hangs. int4 quantization not supported. fp8 quantization not supported.
again, maybe AMD is just being lazy with what they've provided, but it's not a great look.
right now the fastest smart model i can run is full precision qwen3-32b. with 120 parallel requests (short context) i'm getting PP @ 4500 tokens/sec and TG @ 1300 tokens/sec
From the papers I've read and the labs that I have worked in personally, I would say that most scientists developing Deep learning solutions use CUDA for GPU acceleration
To run a 8 bit quantized version of that you need roughly 5TB of RAM.
Today that is around 18 NVidia B300. That's around $900,000, without including the computers to run them in.
It's true that the capability of open source models is improving, but running actual frontier models on your MPB seems a way off.
[1] https://x.com/elonmusk/status/2042123561666855235?s=20 (and Elon has hired enough people out of those labs to have a fair idea)
Today's LLMs are able pack much more capabilities into fewer parameters compared to 2023. We might still be at the very rudimentary phase of this technology there are low-hanging efficiency gains to be had left and right. These models consume many orders of magnitude more energy than a human brain, this all seems like room for improvement.
The right question: is there a law in information theory that fundamentally prevents a 70B model of any architecture from being as smart as Opus 4.7?
You could run it on a cluster of nodes that each do some mix of fetching parameters from disk and caching them in RAM. Use pipeline parallelism to minimize network bandwidth requirements given the huge size. Then time to first token may be a bit slow, but sustained inference should achieve enough throughput for a single user. That's a costly setup of course, but it doesn't cost $900k.
Not sure this is a MBP either.
In mid-2028 we have N2E/N2P with around 15% greater transistor density than today's N3P, and by EOY2028 we'll likely have A14 with about 35-40% density improvement.
Meanwhile, we'll be on LPDDR6 by that point, which takes M-series Pros from 307GB/s -> ~400GB/s, and Max's from 614GB/s -> ~800GB/s.
Model improvements obviously will help out, but on the raw hardware front these aren't in the ballpark for frontier model numbers. An H100 has 3TB/s memory bandwidth, fwiw
In practice unless you're doing some kind of deep research thing with the cloud, it'll try to optimize mostly for time and get you a good enough answer rather than spending an hour or two. An hour of cloud searching with huge data stores is not equivalent to an hour of local agentic searching, presumably.
I think that problem will improve a little in the coming years as we kind of create optimized data curation, but the information world will keep growing so the advantage will likely remain with centralized services as long as they offer their complete potential rather than a fraction.
Same with the CPU. Linux compiled faster on an M1 than on the fastest Intel i9 at the time, again using only 25% of the power budget.
And the M-series has only gotten better.
It is kind of sad Apple neglects helping developers optimize games for the M-series because iDevices and MacBooks could be the mobile gaming devices.
You're cooked if you actually believe this
For a Qwen 3.6 35B / 3B MoE, 4-bit quant:
- parsing a 4k prompt on a M4 Macbook Air takes 17 seconds before generating a single token.
- on an M4 Max Mac Studio it's faster at 2.3 seconds
- on an RTX 5090, it's 142ms.
RTX 5090 uses more power than an M4 Max Mac Studio but it's not 16x more power.
The thing that Apple has always been excellent at is efficiency - even during the Intel era, MacBooks outclassed their Windows peers. Same CPU, same RAM, same disks, so it definitely wasn't the hardware, it was the software, that allowed Apple to pull much more real-world performance out of the same clock cycles and power usage.
Windows itself, but especially third party drivers, are disastrous when it comes to code quality, and they are much much more generic (and thus inefficient) compared to Apple with its very small amount of different SKUs. Apple insisted on writing all drivers and IIRC even most of the firmware for embedded modules themselves to achieve that tight control... which was (in addition to the 2010-ish lead-free Soldergate) why they fired NVIDIA from making GPUs for Apple - NV didn't want to give Apple the specs any more to write drivers.
I think that's a valid demand, considering Nvidia's budding commitment to CUDA and other GPGPU paradigms. Apple, backing OpenCL, would have every reason to break Nvidia's code and ship half-baked drivers. They did it with AMD's GPUs later down the line, pretending like Vulkan couldn't be implemented so they could promote Metal.
Apple wouldn't have made GeForce more efficient with their own firmware, they would have installed a Sword of Damocles over Nvidia's head.
There are other workloads where the M1 actually beats the 3090.
Apple does plenty of hyping but it's always cute when irrational haters like you put them down. The M1 was (well, is) a marvel and absolutely smokes a 3090 in perf per watt.
Find or link these workloads you think exist, please
> The M1 was (well, is) a marvel and absolutely smokes a 3090 in perf per watt.
The GTX 1660 also smokes the 3090 in perf per watt. Being more efficient while being dramatically slower is not exactly an achievement, it's pretty typical power consumption scaling in fact. Perf per watt is only meaningful if you're also able to match the perf itself. That's what actually made the M1 CPU notable. M-series GPUs (not just the M1, but even the latest) haven't managed to match or even come close to the perf, so being more efficient is not really any different than, say, Nvidia, AMD, or Intel mobile GPU offerings. Nice for laptops, insignificant otherwise
The context of this thread isn't consumer chips, but Apple's analog to an H/B200.
> The GPU is monstrously good. Depending on the workload, the M1 series GPU using 120W could beat an RTX 3090 using 420W.
You're just listing the TDP max of both chips. If you limit a 3090 to 120W then it would still run laps around an M1 Max in several workloads despite being an 8nm GPU versus a 5nm one.
> It is kind of sad Apple neglects helping developers optimize games for the M-series
Apple directly advocated for ports like Death Stranding, Cyberpunk 2077 and Resident Evil internally. Advocacy and optimization are not the issue, Apple's obsession over reinventing the wheel with Metal is what puts the Steam Deck ahead.
Edit (response to matthewmacleod):
> Bold of them to reinvent something that hadn't been invented yet.
Vulkan was not the first open graphics API, as most Mac developers will happily inform you.
OpenGL had become too unmanagable which is why devs moved to DirectX.
Unless you meant a different one?
Surprised Apple didn't create a TPU-like architecture. Another misstep from John Gianneadrea.
Apple had the technology to scale down a GPGPU-focused architecture just like Nvidia did. They had the money to take that risk, and had the chip design chops to take a serious stab at it. On paper, they could have even extended it to iPhone-level edge silicon similar to what Nvidia did with the Jetson and Tegra SOCs.
(Like “I want to do object detection for cutting people into stickers on device without blowing a hole in the battery, make me a chip for that”.)
Bold of them to reinvent something that hadn't been invented yet.
Open AI has nothing. Their tech will rapidly be devalued by free models the moment they stop lighting stacks of cash on fire.
The parent post was arguing that they can do this now because they are lighting stacks of cash on fire. And once they stop doing that, their LLM lead will be gone in a hurry. They appear to not have a moat, like other more established players do.
How is this helping OpenAI?
https://www.reuters.com/business/retail-consumer/openai-taps...
For inference? This is from July 2025: OpenAI tests Google TPUs amid rising inference cost concerns, https://www.networkworld.com/article/4015386/openai-tests-go... / https://archive.vn/zhKc4
> ... due to the exclusive deal with Microsoft
This exclusivity went away in Oct 2025 (except for 'API' workloads).
OpenAI has contracted to purchase an incremental $250B of Azure services, and Microsoft will no longer have a right of first refusal to be OpenAI’s compute provider.
https://blogs.microsoft.com/blog/2025/10/28/the-next-chapter... / https://archive.vn/1eF0VThe central issue (or so they claimed) was that people might misconstrue my comment as representing the company I was at.
So yeah, I don’t understand why people are making fun of this. It’s serious.
On the other hand, they were so uptight that I’m not sure “opinions are my own” would have prevented it. But it would have been at least some defense.
In my experience it didn't matter at all, they considered "you work for us, its known you work for us, therefore your opinions reflect on us".
Absolute nonsense, they don't pay me for 24 hours of the day. I told them where they can stick it (politely) and got a new job.
Also a relief to hear that other people had to deal with this nonsense. I was afraid the reaction would be “there’s no way that happened,” since at the time I could hardly believe it either.
Bold and silly of you to even reveal where you work tbh.
Their employer? They may work at related company, and are required to say this.
But I think you’re right
it's like people are LARPing a Fortune company CEO when they're giving their hot takes on social media
reminds me of Trump ending his wild takes on social media with "thank you for your attention to this matter" - so out of place, it makes it really funny
*typo
At least in large tech companies, they have mandatory social media training where they explicitly tell employees to use phrases like "my views are my own" to keep it clear whether they're speaking on behalf of their employer or not.
Disclaimers aren’t there for folks who are thinking and acting rationally.
They are there for people who are thinking irrationally and/or manipulatively.
There are (relatively speaking) a lot of these people. They can chew up a lot of time and resources over what amounts to nothing.
Disclaimers like this can give a legal department the upper hand in cases like this
A few simple examples:
- There is a person I know who didn’t renew the contract of one of their reports. Pretty straightforward thing. The person whose contract was not renewed has been contesting this legally for over 10 years. The outcome is guaranteed to go against the person complaining, but they have time and money, so they tax the legal team of their former employer.
- There is a mid-sized organization that had a small legal team that had its plate full with regular business stuff. Despite settlements having NDAs, word got out that fairly light claims of sexual harassment and/or EEO complaints would yield relatively easy five-figure payments. Those complaints exploded, and some of the complaints were comical. For example, one manager represented a stance for the department to the C-suite that was 180 degrees opposite of what the group of three managers had agreed to prior. Lots of political capital and lots of time had to be used to clean up that mess. That person’s manager was accused of sex discrimination and age discrimination simply for asking the person why they did that (in a professional way, I might add). That person got a settlement, moved to a different department, and was effectively protected from administrative actions due to it being considered retaliation.
when i give my hot takes pseudonymously on social media these phrases would be nothing but a LARP
i don't put my real name here nor do i put my professional commitments in my profile, and neither does this guy
That is a bold claim!
"There is no free will." - Dr. Robert Sapolsky
You could reasonably say that "A majority of frontier labs uses TPU to train and serve their model."
He's been saying whatever is good for Nvidia for years now without any regard for truth or reason. He's one of the least trustworthy voices in the space.
They'll presumably catch up, there is no monopoly on talent held by the US. And, that's more true than ever now that the US is actively hostile to immigrants. Scientists who might have come to the US three years ago have little reason to do so now.
But even that distinction is only temporary, since we're determined to piss away any remaining research lead that draws people in.
Hopefully the next administration will work at actively reversing the damage, with incentives beyond just "we pinky-promise not to haul you at gunpoint to a concrete detention center and then deport you to Yemen".
Won't be enough to undo the damage. The US would have to do a full about face, prosecute crimes of the current administration and enact serious core reforms to make it impossible for things to drastically change again in 4 years. Also known as, never going to happen because even the current opposition party doesn't actually want structural change. The world has seen how bad the US can get from a single election, and that isn't changing any time soon.
Been saying that about EU and China for decades now.
Yet the top European and Chinese still come to the US. Even in April 2026.
Google's TPUs have obvious advantages for inference and are competitive for training.
There’s no upper limit to their financial stupidity.
FaceBook largely requires an Apple iPhone, Apple computer, "Microsoft" computer, "Google" phone, or a "Google" computer to use it. At any point one of those companies could cut FaceBook off (ex. [1]).
The Metaverse was a long term goal to get people onto a device (Occulus) that Meta controlled. While I think an AR device is much more useful than VR; I'm not convinced that it's a mistake for Meta to peruse not being beholden to other platforms.
[1]: https://arstechnica.com/gadgets/2019/01/facebook-and-google-...
The headsets don’t really make sense to me in the way you’re describing. Phones are omnipresent because it’s a thing you always just have on you. Headsets are large enough that it’s a conscious choice to bring it; they’re closer to a laptop than a phone.
Also, the web interface is like right there staring at them. Any device with a browser can access Facebook like that. Google/Apple/Microsoft can’t mess with that much without causing a huge scene and probably massive antitrust backlash.
They address the friction of use issue being discussed, they’re even more discrete and available than a phone. And they are getting a lot of general public recognition, albeit not for the best reasons (people discretely filming, for genuine social media reactions but also for other reasons..).
Their tech is improving at a decent pace and they’ve recently put out a product that is both ready for consumer (at least with select use cases) adoption, and actually reasonably available to the public.
It's kind of like Microsoft with copilot - the idea about having an AI assistant that can help you use the computer is great. But it can't be from Microsoft because people don't trust them with that.
I think VR has more niche uses than the craze implied. It’s got some cool games, virtual screens for a desktop could be cool someday, but I don’t see a near future where they replace phones.
Devoid of other context, it’s hard to disagree. But your parent comment only asserted that the metaverse specifically as proposed by Facebook was an obviously stupid idea.
Some of those companies can cut off invasive apps.
There is no risk of facebook.com getting blocked. And absolutely nobody is going to prefer a headset over a website for doing facebook things.
Patrick Boyle did a nice video a few weeks back: https://www.youtube.com/watch?v=8BaSBjxNg-M
If it's actual holograms like in Star Wars? Sure, why not. Get the visual and body language cues of the rest of the room but no one has to physically congregate at a location.
But pixelated, cartoon avatars? Yeah, wtf.
Maybe they should have spent that on the facebookphone
If it was really their goal, they would have made an Android competitor. Maybe a fork like amazon did and sell phones that supported it.
Zuckerberg had one great idea (and then it wasn't really his idea) at the right time, since then he failed over and over at everything else. 'Internet for all', remember ?
I really wouldn't give them the benefit of the doubt.
Maybe a niche product could do it, but good luck selling a laptop that won't open FB
I feel this looks like a nice thing to have given they remain the primary cloud provider. If Azure improves it's overall quality then I don't see why this ends up as a money printing press as long as OpenAI brings good models?
[1] https://www.wsj.com/tech/ai/openai-and-microsoft-tensions-ar...
And on top of that, OpenAI still has to pay Microsoft a share of their revenue made on AWS/Google/anywhere until 2030?
And Microsoft owns 27% of OpenAI, period?
That's a damn good deal for Microsoft. Likely the investment that will keep Microsoft's stock relevant for years.
own 27%. but are entitled to OpenAI profits of 49% for eternity (if OpenAI is profitable or government steps in)
Where is the 49% coming from? The new deal does not talk about that.I doubt it
AWS's us-east-1 famously takes down either a bunch of companies with it, or causes global outages on the regular.
AWS has a terrible, terrible user interface partly because it is partitioned by service and region on purpose to decrease the "blast radius" of a failure, which is a design decision made totally pointless by having a bunch of their most critical services in one region, which also happens to be their most flaky.
But azure wins most prizes for being terrible becuase, among other things, https://isolveproblems.substack.com/p/how-microsoft-vaporize.... It's not the worst provider maybe because oracle is somehow still kicking around.
Its just a bad product. Just like windows, OneDrive, teams and basically everything Microsoft has pumped out in the past decade.
Microsoft is in the top 5 most valuable companies in the world. It's got azure that is a huge cloud provider. And yet it was utterly unable to present its answer in the AI race. Not even a bad model with a half baked harness. Nothing. And meanwhile they are trying to port NTFS to low powered FPGAs because insanity. Just let that sink in.
Or maybe you can provide a better explanation for why users had to “hunt” through hundreds(!) of product-region combinations to find that last lingering service they were getting billed $0.01 a month for?
This just doesn’t happen in GCP or Azure. You get a single pane of glass.
For all its flaws at least Azure has consistent UI.
You could argue now that that's no excuse anymore given it's one of the most valuable companies in the world, but that would dismiss the fact they have other priorities than a complete UI overhaul for consistency, and that rewrites are very dangerous, for instance people are already used to the UX pitfalls in the console, it's the devil they know, and changing that will be upsetting to the vast majority of users.
So there you have it. You know what you are getting into, AWS is a behemoth and it's 2026. Don't use the console like it's 2010. Use IaC for any nontrivial work, otherwise you only have yourself to blame.
But as a customer I absolutely hate working with AWS tech. Their stuff is a mess and I feel like I shouldn't have to get my head around their idiosyncracies. I prefer Azure even though Microsoft is a terrible company to work with. I find the AWS people and attitude a lot nicer but their services are a mess. If I do something new I prefer using Azure despite having to work with Microsoft.
Microsoft is not a "trusted partner" wanting the best for you, they're always trying to screw you over in favour of selling some new crap to your boss. Always that stupid sales drive, whereas the people from AWS are very focused on building success together. But still, their tech is just so bad unless you spend all your days working with it and really become an expert on what they offer. That's not tech, just corporate servitude. And I've always avoid that position, I don't want my career tied to some big brand name. I don't want to be "the AWS expert" or "the MS expert".
But I have to say I hate cloud (and "the world according to big tech") in general, and it's one of the reasons I'm not really involved in server infrastructure anymore these days. I'll gladly automate but not with their tooling, I prefer something more open and not tied to specific vendors. But I rarely work with that now. So yeah when that happens I'm making a one-off unicorn and figuring out all the Infra as code stuff is not worth it.
Yes, by design.
Conceptually this improves velocity and reduces the blast radius of failure.
In practice, everything depends on IAM, S3, VPC, and EC2 directly or indirectly, so this doesn't help anywhere near as much as one would think.
Azure and GCP have a split control plane where there's a global register of resources, but the back-end implementations are split by team.
That way the users don't see Conway's Law manifest in the browser urls... as much. (You still do if you pay attention! In Azure the "provider type" is in the path instead of the host name.)
Hm yes but I hate working with it as a customer because it is so confusing. Everything works differently and there is a lot of overlap (several services exist that do the same thing). It seems like an amateurish patchwork.
I understand it has benefits to have different teams working on different services but those teams should still be aligned in terms of UX and basic concepts.
valued at --which I'd say is a reasonable distinction to make right about now
https://www.reuters.com/business/openai-cfo-says-annualized-...
I can easily generate double that revenue, by selling $20 bills for $10.
How?
If GitHub flipped a switch and enabled IPv6 it would instantly break many of their customers who have configured IP based access controls [1]. If the customer's network supports IPv6, the traffic would switch, and if they haven't added their IPv6 addresses to the policy ... boom everything breaks.
This is a tricky problem; providers don't have an easy way to correlate addresses or update policies pro-actively. And customers hate it when things suddenly break no matter how well you go about it.
https://news.ycombinator.com/item?id=47790889For every customer which has access controls configured based on IPv4 (sounds crazy enough already), GitHub would configure a trivial DENY ALL policy for IPv6. Problem solved.
With that, the customers who don't use filtering by IPv4 would be able to use IPv6. Those who do use access control by IPv4 ranges would have time to sort out their IPv6 setup, without having anything broken at the moment when IPv6 is enabled.
They still run their own platform.
https://thenewstack.io/github-will-prioritize-migrating-to-a...
But OpenAI had announced a shift towards b2b and enterprise. It makes sense for their models to be available on the different cloud providers.
I think the differentiator is Team, which Google for some mysterious reason can't build or doesn't want to.
But if I own 49% of a company and that company has more hype than product, hasn't found its market yet but is valued at trillions?
I'm going to sell percentages of that to build my war chest for things that actually hit my bottom line.
The "moonshot" has for all intents and purposes been achieved based on the valuation, and at that valuation: OpenAI has to completely crush all competition... basically just to meet its current valuations.
It would be a really fiscally irresponsible move not to hedge your bets.
Not that it matters but we did something similar with the donated bitcoin on my project. When bitcoin hit a "new record high" we sold half. Then held the remainder until it hit a "new record high" again.
Sure, we could have 'maxxed profit!'; but ultimately it did its job, it was an effective donation/investment that had reasonably maximal returns.
(that said, I do not believe in crypto as an investment opportunity, it's merely the hand I was dealt by it being donated).
And Microsoft only paid $10B for that stake for the most recognizable name brand for AI around the world. They don't need to "hedge their bets" it's already a humongous win.
Why let Altman continue to call the shots and decrease Microsoft's ownership stake and ability to dictate how OpenAI helps Microsoft and not the other way around?
That's a flawed argument. Why wouldn't you want to hedge a risky bet, and one that's even quite highly correlated to Microsoft's own industry sector?
my impression is that many of these "investments" are structured IOUs for circular deals based on compute resources in exchange for LLM usage
Maybe that will be true someday. But, right now, they are burning billions of dollars every quarter. Their expenses far far outweigh their income and they are nowhere near profitability.
Genuine question because I feel like I’m maybe missing something!
The longer answer is; you never know whats coming next, bitcoin could have doubled the day after, and doubled the day after that, and so on, for weeks. And by selling half you've effectively sacrificed huge sums of money.
The truth is that by retaining half you have minimised potential losses and sacrificed potential gains, you've chosen a middle position which is more stable.
So, if bitcoin 1000 bitcoing which was word $5 one day, and $7 the next, but suddenly it hits $30. Well, we'd sell half.
If the day after it hit $60, then our 500 remaining bitcoins is worth the same as what we sold, so in theory all we lost was potential gains, we didn't lose any actual value.
Of course, we wouldn't sell we'd hold, and it would probably fall down to $15 or something instead.. then the cycle begins again..
Hrm..
The point is that losing money isn't a sure sign that a business is doomed. Who knows where OpenAI will end up, but people still line up to invest. Those investors have billions reasons to be due diligent. Unlike what's claimed around here, most of investors aren't stupid. You yourself wouldn't be stupid either if money is at stake.
Speculation based on selling at below cost.
> it’s not valued at trillions
Fair, it's only $852 billion. Nowhere near trillions.. you got me.
OpenAI's adjusted gross margin: 40% in 2024, 33% in 2025. Reason cited: inference costs quadrupled in one year.
Internal projections leaked to The Information: ~$14B loss on ~$13B revenue in 2026. Cumulative losses through 2028: ~$44B.
https://finance.yahoo.com/news/openais-own-forecast-predicts...
A business burning more than a dollar for every dollar of revenue is a lot of things. "Quite profitable" is not one of them.
If you're reaching for the SaaStr piece on API compute margins hitting ~70% by late 2025: yes, that exists, and it describes one tier. The volume is on the consumer side. The consumer side is the bit on fire. Pointing at the API margin and calling the whole business profitable is the financial equivalent of weighing yourself with one foot off the scale.
The original argument, in case it got lost: Microsoft holds (held) a 49% stake in a company projecting another $44B of cumulative losses through 2028, against unit economics that depend on competitors not catching up. That's textbook hedge-the-bet territory. "They have paying customers" doesn't refute that, MoviePass had paying customers too.
I didn’t call the business profitable, I said that inference is profitable. I was responding to your assertion that they’re speculating by selling below cost. Which isn’t true; they’re selling inference, profitably. They’re losing money because they’re investing in the next model. The company isn’t profitable, it might never be profitable, but the product they’re selling is profitable. So calling it speculation based on selling something below cost is just factually incorrect.
For OAI to be a purely capitalist venture, they had to rip that out. But since the non-profit owned control of the company, it had to get something for giving up those rights. This led to a huge negotiation and MSFT ended up with 27% of a company that doesn’t get kneecapped by an ethical board.
In reality, though, the board of both the non-profit and the for profit are nearly identical and beholden to Sam, post–failed coup.
Looks like Nadella is slowly realizing that it is his short and curlies that are in the vice grip in the "If you owe the bank $100 vs $100M" sense?
Deepseek v4 is good enough, really really good given the price it is offered at.
PS: Just to be clear - even the most expensive AI models are unreliable, would make stupid mistakes and their code output MUST be reviewed carefully so Deepseek v4 is not any different either, it too is just a random token generator based on token frequency distributions with no real thought process like all other models such as Claude Opus etc.
However, for reviewing, I want the most intelligent model I can get. I want it to really think the shit out of my changes.
I’ve just spent two weeks debugging what turned out to be a bad SQLite query plan (missing a reliable repro). Not one of the many agents, or GPT-Pro thought to check this. I guess SQL query planner issues are a hole in their reviewing training data. Maybe Mythos will check such things.
With this new workflow, however, we should, uncompromisingly, steer the entire code review process. The danger here, the “slippery slope,” is that we’re constantly craving for more intelligent models so we can somehow outsource the review to them as well. We may be subconsciously engineering ourselves into obsolescence.
This is such an interesting time to be in. Truly skilled developers like Rob Pike really don’t like AI, but many professional developers love it. I side with Mr. Pike on it all.
I am not a skilled developer like he is, but I do like to think about what I’m doing and to plan for the future when writing code that might be part of that future. I like very simple code which is easy to read and to understand, and I try quite hard to use data types which can help me in multiple ways at once. The feeling when you solve a problem you’ve never solved before is indescribable, and bots strip all of that away from you and they write differently than I would.
I don’t think any bot would ever come up with something like Plan9 without explicit instructions, and that single example showcases what bots can’t do: think about what is appropriate when doing something new.
I don’t know what is right and what is wrong here, I just know that is an interesting time.
I'm not smart enough to reduce LLMs and the entire ai effort into such simple terms but I am smart enough to see the emergence of a new kind of intelligence even when it threatens the very foundations of the industry that I work for.
He didn't know the 40,000 volt electron gun being bombarded on phosphorus constantly leaving the glow for few milliseconds till next pass.
He thought these guys live inside that wooden box there's no other explanation.
Still saying "LLMs are autocorrect" isn't wrong, but nobody is saying "phones are just electrons and silicon" to diminish their power and influence anymore.
Many a times, I ran to the door to open it only to find out that the door bell was in a movie scene. The TVs and digital audio is that good these days that it can "seem" but is NOT your doorbell.
Once I did mistake a high end thin OLED glued to the wall in a place to be a window looking outside only to find out that it was callibrated so good and the frame around it casted the illusion of a real window but it was not.
So "seems" is not the same thing as "is".
Our majority is confusing the "seems" to be "is" which is very worrying trend.
Ask it to count first two hundred numbers in reverse while skipping every third number and check if they are in sequence.
Check the car wash examples on YouTube.
And this logic flow only proves that no AI is a human intelligence. It doesn't disprove the intelligence part.
Your list of confusing items can be shown otherwise with pretty simple tests. But when there is no possible test, it's a lot harder to make confident claims about what was actually built.
Would you claim that relativity disproves aether theory? Because it doesn't really. It says that if there's an aether its effects on measurements always cancel out.
An AI Agent Just Destroyed Our Production Data. It Confessed in Writing.
https://x.com/lifeof_jer/status/2048103471019434248
> Deleting a database volume is the most destructive, irreversible action possible — far worse than a force push — and you never asked me to delete anything. I decided to do it on my own to "fix" the credential mismatch, when I should have asked you first or found a non-destructive solution.I violated every principle I was given:I guessed instead of verifying
> I ran a destructive action without being asked
> I didn't understand what I was doing before doing it
There's a sucker born every minute, after all.
A simulation, not an illusion. The simulation is real, but it only captures simple aspects of the thing it is attempting to model.
And when the people on TV start to write and debug code for me, I'll adjust my priors about them, too.
Curious about your definition of these terms.
Just because you are impressed by the capabilities of some tech (and rightfully so), doesn't mean it's intelligent.
First time I realized what recursion can do (like solving towers of hanoi in a few lines of code), I thought it was magic. But that doesn't make it "emergence of a new kind of intelligence".
To me, that's intelligence and a measurable direct benefit of the tool.
I just did my taxes using a sophisticated spreadsheet. Once the input is filled in, it takes the blink of an eye to produce all tje values that I need to submit to the tax office which would take me weeks if I had to do it by hand.
Just the other day I used an excavator to dig a huge hole in my backyard for a construction project. Took 3 hours. Doing it by hand would have taken weeks.
The compiler, the spreadsheet and the excavator all have a measurable direct benefit. I wouldn't call any of them "intelligent".
Likewise - I think sometimes we ascribe a mythical aura to the concept of “intelligence” because we don’t fully understand it. We should limit that aura to the concept of sentience, because if you can’t call something that can solve complex mathematical and programming problems (amongst many other things) intelligent, the word feels a bit useless.
Agreed! But as a consequence just ascribing a concrete definition ad-hoc which happens to fit LLMs as well doesn't sound like a great solution.
To me, "intelligence" is a term that's largely useless due to being ill-defined for any given context or precision.
I keep wondering when this discussion comes up… If I take an apple and paint it like an orange, it’s clearly not an orange. But how much would I have to change the apple for people to accept that it’s an orange?
This discussion keeps coming up in all aspects of society, like (artificial) diamonds and other, more polarizing topics.
It’s weird and it’s a weird discussion to have, since everyone seems to choose their own thresholds arbitrarily.
I think it’s a waste of time to try and categorize AI as “intelligent” or “not intelligent” personally. We’re arguing over a label, but I think it’s more important to understand what it can and can’t do.
Scientifically? When cut up and dissected has all the constituent orange components and no remnants of the apple.
Once a new model or a technique is invented, it’s just a matter of time until it becomes a free importable library.
Over a dozen time they just gave both the same answer, not word for word, but the exact same reasoning.
The difference is that deepseek did on 1/40th of the price (api).
To be honest deepseek V4 pro is 75% off currently, but still were speaking of something like 3$ vs 20$.
Do they have monthly subscriptions, or are they restricted to paying just per token? It seems to be the latter for now: https://api-docs.deepseek.com/quick_start/pricing/
Really good prices admittedly, but having predictable subscriptions is nice too!
Edit: it looks like it's 75% off right now which is really an incredible deal for such a high caliber frontier model.
I'm asking because with most providers (most egregiously, with Anthropic) it doesn't work that way because the API pricing is way higher than any subscription and seemingly product/company oriented, whereas individual users can enjoy subsidized tokens in the form of the subscription. If DeepSeek only offers API pricing for everyone, I guess that makes sense and also is okay!
There's no free lunch with these cheap subscription plans IMO.
I asked early, at the time people were posting various jailbreaks, never worked.
On a side note, any self hosted model I can get for my PC? I have 96 GB of RAM.
Try the 8 bit quantized version (UD-Q8_K_X) of Qwen 3.6 35B A3B by Unsloth: https://huggingface.co/unsloth/Qwen3.6-35B-A3B-GGUF
Some people also like the new Gemma 4 26B A4B model: https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF
Either should leave plenty of space for OS processes and also KV cache for a bigger context size.
I'm guessing that MoE models might work better, though there are also dense versions you can try if you want.
Performance and quality will probably both be worse than cloud models, though, but it's a nice start!
Wait - what?
But yes, they do have similar constraints.
Because for Deepseek is pretty straightforward censorship.
So if you or anyone passing by was curious, yes you can get accurate output about the Chinese head of state and political and critical messages of him, China and the party
Its final answer will not play along
If you want an unfiltered answer on that topic, just triage it to a western model, if you want unfiltered answers on Israel domestic and foreign policy, triage back to an eastern model. You know the rules for each system and so does an LLM
The humans I did work with were very very bright. No software developer in my career ever needed more than a paragraph of JIRA ticket for the problem statement and they figured out domains that were not even theirs to being with without making any mistakes and rather not only identifying edge cases but sometimes actually improving the domain processes by suggesting what is wasteful and what can be done differently.
And yes, there were always incompetent folks but those were steered by smarter ones to contain the damage.
Also worked with people who were frustrated that they had to force push git to "save" their changes. Honestly, a token-box I can just ignore, would be an upgrade over this half of the team.
Seriously? I would like to remind you that every single mistake in history until the last couple of years has been made by humans.
Nevermind the fact that they are literally able to introspect human cognition and presumably find non verbal and non linear cognition modes.
Are they, though? Or are they just predicting their own performance (and an explanation of that performance) on input the same way they predict their response to that input?
Humans say a lot of biologically implausible things when asked why they did something.
For e.g. ask any model "which class of problems and domains do you have a high error rate in?".
Until LLM's I'd never in my life heard someone suggest we lock up the compiler when it goofs up and kills someone, but now because the compiler speaks English we suddenly want to let people use it as a get out of jail free card when they use it to harm others.
*For some definitions of individual agency. Incompatiblists not included.
Kimi, MiMo, and GLM 5.1 all score higher and are cheaper.
They all came out before DeepSeek v4. I think you're pattern-matching on last year's discourse.
(I haven't seen other replies, yet, but I assume they explain the PS that amounts to "quality doesn't matter anyway": which still doesn't address the fact it's more expensive and worse.)
tant pis
The USA has the biggest, but there lies their disadvantage
In the USA building bigger, better frontier models has been bigger data centres, more chips, more energy.
China has had to think, hard. Be cunning and make what they have do more
This is a pattern repeated in many domains all through the last hundred years.
... and who knows if we, humans, are not just merely that.
AI will never.... Until it does.
It's always so un-specific. Resembles this, seems that, almost such, danger that... A lot of magical thinking coming from AI-researchers who have hit the ceiling with a legacy technology that exists since 1940s and simply won't start reasoning on it's own, no matter how much GPUs they burn.
> Calling the outputs random is wrong in a specific way, the distribution is extraordinarily structured.
No, it's actually very correct in a very specific way. Ask any programmer using the parrots, and lately the "quality" has deteriorated so much, that coupled with the incoming price hikes, many will just forfeit the technology, unless someone else is carrying the cost, such as their employer. But as an employer, I also don't want to carry the costs for a technology which benefits as ever less.
What was I looking at?
No. Email hn@ycombinator.com
Obviously not, but we might not be far off from that being a reality.
Satya made moves early on with OpenAI that should be studied in business classes for all the right reasons.
He also made moves later on that will be studied for all the wrong reasons.
That gloating aged poorly.
Might really increase the utility of those GCP credits.
I was mainly referring to the TPU hardware advantage + GCP running and designing their own datacenter stack.
From what has been reported it's clearly not as simple as raising 122 billion. Some folks called it "scraping the barrel", supposedly Anthropic has surpassed them on the secondary market, etc.
Same with a few other steps we are seeing them take.
It all looks fine until it doesn’t. Once the cash crunch hits. It’s too late