Posted by jamdesk 1 day ago
https://decrypt.co/371971/openai-broadcom-jalapeno-first-cus...
https://www.cnn.com/2026/06/24/tech/openai-broadcom-jalapeno...
> the use of OpenAI models to accelerate parts of the design and optimization process.
I wish there was more about this. As is I kind of have to assume that this is just meaningless marketing, like saying development was accelerated by Microsoft Office or their 5k LG Ultrafine 40-inch monitors.
Like, if this was as big a deal as it kind of vaguely implies, they would be making a bigger deal of it, right?
Given constant weights / biases of a Transformer / DNN you could use pipelining to feed forward calculations through the array one layer at a time. For DNN's with thousands of layers you might see 1:1 speed up per layer channel.
I doubt they would undergo this process for marginal gains.
That's not to say I expect they'll ship something competitive with Google's custom AI hardware on the first go, since Google has been at it for quite a while, but there's very few technical problems large sums of money won't solve.
IDK how the custom hardware exploits this; would love to hear any ideas!
You might like this article [1], titled "FPGA-based CNN Acceleration using Pattern-Aware Pruning". More context and details can be found in the PhD thesis of Léo Pradels [2].
[1]: https://inria.hal.science/hal-04689673/document
[2]: https://theses.hal.science/tel-05021575v1/file/PRADELS_Leo.p...
Even for a company’s first design?
The Fire Phone was Jeff Bezos' personal baby, and we know how that went. Then there was the Apple G4 Cube with Steve Jobs, the Model X' Falcon Wing doors and Elon, and lets not even talk about the Metaverse and Zuck.
I'd rather guess that Jeff Bezos' opinion on what makes a good phone is/was different on the opinion of many potential buyers.
Because the CEO was behind it, breathing down their necks.
If you consider that outcome a worthwhile endeavor, I don't know what else to say.
As another commenter said, Broadcom is very experienced with backend design (as well as the supply chain management, testing, etc. that comes after the chip is taped out) and so this can't be regarded as a "first chip". Richard Ho (the head of hardware at OpenAI) is also extremely experienced and used to be the head of the Google TPU effort -- where he actually worked with Broadcom in a similar tapeout already. So yes, this is not a "first design"!
A big part of the semiconductor industry also operates on a reputation basis. Broadcom (like TSMC) is a neutral party as a design house, but if they did something like this, it might ruin that reputation.
My recollection is that PA Semi was very much for the architectural and design talent, even though it was an “asset purchase” and all the existing Power & military chips were hived off.
For Intrinsity I recall a lot of interest was actually in their existing graphics work and EDA. ISTR that those early mobile GPUs were what they focused on.
I was in the mansfield org circa ‘07-11. I spent a lot of time flying between cupertino and austin/bee caves that first year.
Whoever it was, whooo, that's hot shit. I remember an M1 MacBook Air just cleaning the clock of an Intel MacBook Pro and thinking "x86_64 has real competition again".
Great silicon. I'm over it with not having root on my own machine, so I've left the ecosystem, but it's really nice hardware, can't dispute that.
And a lot of them are sitting under Qualcomm via the Nuvia acquisition.
Design verification also involves a lot of traditional programming which benefits from LLMs.
So it’s not meaningless at all. You could download some of the open source chip design software today and the LLMs could even help you get started on your own tiny chip if you are so interested.
I think we're not there yet. I've been meaning to look at this flux.ai to see if it has the prompts/workflow worked out better than what I was able to cobble together in a few hours. Maybe Alteryx's MCP server would have been better. I'll try that this weekend for another board I've got.
PCB design and 3D CAD design are different topics.
Hardware Description Languages are closer to programming languages than CAD. Look at some Verilog to get an idea - https://en.wikipedia.org/wiki/Verilog
This is very unlike how FPGA and (I assume) ASIC is done. That is more like a traditional programming language but everything happens all at once (no sequence of statements outside tests, if you need that you have to write a state machine yourself). You define logic expressions between signal, add stateful latches, etc. But you never specify the physical layout.
Instead you feed your description to a tool that acts a constraint solver/optimiser that computes the layout for you (this is for FPGAs called synthesising IIRC, it is akin to a compiler). Typically quite slow, even for small circuts like we did at university it took minutes, and for large circuits it might easily days.
Now, this raises the question, what if you design a PCB net list using AI, but then use traditional autorouting and layout? I believe that can also be done, but I have no experience designing PCBs, so I don't know how well it works.
[1] https://deeppcb.ai/reinforcement-learning-pcb-routing-explai... [2] https://deeppcb.ai/cooper/ [3] https://deeppcb.ai/deeppcb-kicad-plugin-ai-pcb-routing/
Disclaimer: I work at InstaDeep, the company behind DeepPCB, but I don't work on this product.
And the two things that take up VAST amounts of time in ASIC design are testbenches and timing closure.
A LOT of hardware design is testbenches to verify things. AI is REALLY GOOD at generating things like testbenches. And nobody really cares if the quality of your testbench code sucks as long as it validates what it claims to.
I don't know how good AI is at timing closure, but I wouldn't necessarily be surprised if it is pretty good at it up to the physical point. That's lots of textual output which you can put a constraint on.
Everything involving physical design, though, tends to be a disaster waiting to happen if you let AI loose on it.
In my experience they are not especially good at SystemVerilog. There's a lot of knowledge about it that is locked behind paywalls and it's very niche.
My guess is the "from scratch" here is quite the exaggeration. Otherwise why did they need Broadcom?
Early testing shows that the first-generation accelerator will deliver performance per watt substantially better than current state-of-the-art
What is substantial here? Vera Rubin is shipping in volume later this year and it is expected to be 10x more power efficient for inference than Blackwell.[0] Even if they're already taped out the chip, getting bugs fixed, getting chips manufactured, getting HBM allocation, getting a rack design, hooking them up together, putting them in a data center will likely take at least another 12 months or likely more. By the time this chip is in data centers in volume, they're likely competing against Vera Rubin Ultra or maybe even Feynman.Personally, I don't think OpenAI should have invested in this project. It's too early for them. They should have focused on models like Anthropic and win there. When they're profitable, they can take on these projects.
The risk here is very high for OpenAI because AI has a hard cap in energy. If you have a gigawatt, you should only install the best chips. If Nvidia's chips are better, then this is a wasted project and likely wasted billions.
[0]https://developer.nvidia.com/blog/scaling-token-factory-reve...
I don't know how much of the things outside of the chip Broadcom has vs Google's proprietary tech that is not shared with Broadcom.
Nvidia's Vera Rubin has 6 unique chips working together in a single rack.[0]
[0]https://developer-blogs.nvidia.com/wp-content/uploads/2026/0...
So one of my pet theories I haven't seen in general discourse is that AI came from the massive vector processing jump available commercially in GPUs when it left CPU bound processing behind. That's a factor of 100x-1000x of processing power.
AI is not-quite-there, and to get even another leap might take another 10-100x processing power.
Now... what? ASICs probably won't deliver even a 10x? There's only so much you get out of node shrinks.
"Substantial" doesn't even mean twice IMO. "Substantial" almost sounds like ... 15% better?
1) OpenAI genuinely have AI technologies that can improve chip design (bold, unlikely claim, needs evidence)
2) OpenAI designed test/verification models and kernels that could be run on the simulated hardware to test its performance
As you and others have said, it's hard to trust when they are happy to write something that could easily only mean the latter but sounds like the former.
Is it worth the claim that they are making in a press release?
Definitely, yes, because being vague about it like they have been lets investors fill-in-the-blanks with whatever they want it to mean.
Other companies? Fool me once Altman, let's see the thing at scale making money.
Near frontier AI is clearly relevant to some kinds of logic design, I'm learning some Hardcaml at the moment and yeah, AI is super helpful.
Can it leapfrog a company without hardware experience to near the front of the pack of companies with decades of hardware experience? Less obvious.
Unrelatedly, would OpenAI dramatically overstate something to manipulate the press and public and capital markets?
It's arguably their core competency .
AI is going to matter in logic design and synthesis. How much, how soon, and where are open questions.
"The future is here, it's just not evenly distributed"
Chip design languages (HDLs like Verilog or VHDL) are well understood by LLMs. They don’t need specialty tools to use GPT-5.5 or other LLMs with them.
You could even try it yourself with open source chip design tooling if you wanted to see it.
It is still a bold claim and it still needs evidence.
We would obviously get a bit more of the evidence if it were to be more useful for the upcoming IPO than this rather open-ended, reinterpretable phrasing.
No, obviously. They'd be expected to do a substantially worse job and yet still drastically accelerate the design process.
LLMs make all sorts of dumb mistakes when writing c++ or python yet are nonetheless massively beneficial.
I've used GPT-5.5 and Opus both for FPGA design with good results. We built a lot of tooling around it to help the models, but even without that they're definitely capable of designing digital logic.
This actually plays out across every field and is well documented. An expert can recognize the hallucinations and bullshit coming out of LLMs, while non-experts see plausible output and do not know enough to know it is BS.
Why is that a bold and unlikely claim?
Are you saying that AI, which has been proven to cure diseases, solve our hardest math problems, write complex computer code and generate entire generated worlds and HD video from a simple prompt would somehow be like, my bad, I guess I can't design chips?
We're not quite there yet :)
https://en.wikipedia.org/wiki/List_of_unsolved_problems_in_m...
Because they could have offered even slightly more evidence.
They're burning more cash than pretty much anyone else and doesn't have anything public that looks like a matching revenue stream so they probably need one very badly.
It doesn't have to be revolutionary, it could just be AI-assisted design and lined up well enough with their operations for a custom ASIC to be worth it.
honestly you don't realise how much more efficient it is until you are stuck using the wrong flavour of outlook, the spam filter breaks or sloppy spelling, punctuation and grammar force you to clarify details needlessly.
FWIW, Google is now on their 8th generation TPU, having put out the last 4 generations on a 1-year cadence.
something can be non-novel in the industry, yet novel to the reader, at which point it is useful ... for such readers.
https://deepmind.google/blog/how-alphachip-transformed-compu...
1. https://www.investing.com/news/stock-market-news/openai-unve...
Q2 is forecasted to be negative, partly because of RAM prices like you said, but for the most part this is something that only price sensitive nerds care about. Broadcom sells a ton of server chips. Server sales are up 30% vs last year so I highly doubt they're desperate to use their allocation
I thought of PCs second since most chip manufacturers make some thing or another that goes into them (Broadcom probably more than Qualcomm), and yes it's very suprising that PC sales don't seem to be down yet.
> the full-year 2026 [PCD] outlook has been revised to −10.4% year-over-year
because
> erosion of consumer purchasing power amid regional inflation and currency volatility in many key markets, compounded by memory and storage shortages that are proving more severe than anticipated in the previous forecast cycle.
The positive Q1 YoY growth
> was largely the product of pull-forward demand, as both consumer and commercial buyers accelerated purchases ahead of anticipated price increases and limited product availability.
The idea that only nerds care about the cost of things is... absurd.
For hardware purchases, laypeople may go about it the other way from what nerds would do: instead of deciding what they need in terms of computing power and memory, and then finding a cheap offer for that, they just decide how much they want to spend, and then buy a device at that price point irrespective of its performance characteristics. If you shop like this, and would have purchased anything but a rock-bottom low-end device two years ago, prices have remained stable.
There are a lot of large tech companies that most of HN has never heard about that completely dominate entire segments.
Broadcom has become wealthy by being Google's TPU hardware partner, including sharing their TSMC capacity with Google, and evidently now they are doing the same thing with OpenAI. What a brilliant way to take advantage of the AI gold rush!
I wish they weren't using their piles of money to extort money out of the software industry like they are with VMWare and Bitnami.
https://finance.yahoo.com/sectors/technology/articles/broadc...
Oh dear god. I'm actually feeling sorry for Google at that point. Good luck, you'll need it...
Kinda, but not exactly.
Broadcom cornered the enterprise infra and security market in the late 2010s and early 2020s after acquiring CA Technologies, BMC (EDIT: Did NOT acquire them, they were considering it back in 2018 but decided against it and KKR ended up acquiring them), Symantec (which they bought instead of BMC), and VMWare and were able to make a strong cybersecurity story during the late 2010s cybersecurity and SaaS boom.
That gave them plenty of cashflow that helped subsidize their hardware business when hardware was not viewed as hot as it is today.
Additionally, Broadcom is GCP's marquee customer and has been for a little under a decade so they were able to make a sweetheart deal where all that software businesses at Broadcom would be exclusively using GCP and in return GCP would working with Broadcom to design it's silicon and source infra needed for their DC buildouts.
Ironically, the DoJ blocking Broadcom's acquisition of Qualcomm was the best thing it ever could have done for Broadcom, because it gave Broadcom the dry powder to dominate the Enterprise SaaS and build a strong niche in the cybersecurity space.
> piles of money to extort money out of the software industry
From personal experience, executives and leadership who started off in the electronics and hardware industry are much more vicious and cutthroat than their peers who started in software.
Working in an industry that historically had to deal with high commodification, low margins, and long tail sales leads to leadership that can execute. Additionally, no one climbs the leadership ladder without having spent years as a line-level engineer, but that's true for software as well to an extent.
Edit: can't reply
> Did they acquire also BMC?
Nope.
Broadcom was considering acquiring them in 2018 but decided not to go through with the opportunity and KKR jumped in.
> From personal experience, executives and leadership who started off in the electronics and hardware industry are much more vicious and cutthroat than their peers who started in software.
Only The Paranoid Survive is quite a name for a management book. It implies surviving in the world you are speaking about.
[0] https://www.goodreads.com/book/show/66863.Only_the_Paranoid_...
There would be 1 multiplier per weight (and since they're constant, the whole thing turns into a bunch of simple adders), and the total pipelined system throughput would be one token per clock cycle.
That means you can probably have millions of users simultaneously using a single bit of silicon, with perhaps 500 million tokens per second coming out the output bus.
Downside is this chip would be huuuuge - a whole wafer.
Wafer level faults probably won't matter though - neural nets are resistant to a few missing or wrong weights.
Due to the speed the industry moves, you'd want to race from model weights to production super fast, make 50 wafers, use them for a year, then bin them when that model is obsolete.
Well I've gotten one of those "holy fuck this is the future" deeply unsettled anxious feelings in my gut again. It's been a week or 2, it was time.
They're pretty supply constrained right now though and their production costs seem prohibitive.
The interesting players at the moment are from Toronto: taalas (print the model onto the silicon) and tenstorrent (dataflow programming based hardware)
I suspect for equal performance, that's probably a 5x increase in silicon area (and therefore cost).
I've been wondering about that for a while now. For a lot of tasks putting weights in ROM is probably OK. OTOH:
>> There would be 1 multiplier per weight...
I'm not sure that is a good idea. Maybe if its quantized down to 2 bits... Otherwise maybe a small ROM near each multiplier (or row of them or whatever) so the multipliers could handle N distinct matrix operations without having to move the data from far away.
Another fun thought is to have a row of MAC units on DRAM so a DRAM row would be a vector. Row size might be 64Kbit or 8K weights if they're 8bit. This also keeps the weights and calcs on the same chip. I'm not sure this would put enough multipliers on one chip though. Systolic arrays can have tens or hundreds of thousands each doing one op per clock cycle.
Nonetheless, yes, there are already implemented solutions for small NNs (I understand mostly acting as triggers).
Not really that: you are pointing to Compute-In-Memory (CIM) - techniques where the data (here, a multiplier value) is part of the processor (here, the multiplying circuit).
The problem of "fetch and process" is bypassed completely architecturally: the data is there where the processing happens - it's not moved, there is no latency.
Brain science people “love” traumatic brain injury cases because it can help explore what happens when bits of the “brain wafer” get damaged. We’ve learned a lot from such things.
I wonder if people are intentionally “destroying” parts of the model weights to learn more about what happens? Like could you strategically wipe a gig of the model so it’s “all zeros” and see what happens?
I have to wonder
Anthropic published an important work around one year and a half ago.
> #Tracing the thoughts of a large language model#
https://www.anthropic.com/research/tracing-thoughts-language...
https://news.ycombinator.com/item?id=43495617 (27 March 2025)
What's everyone think of Taalas?
They're actually burning the LLM model into the silicon, with some onboard memory for fine-tuning. They claim huge cost / latency wins.
Super fast demo live at: https://chatjimmy.ai/
https://www.reddit.com/r/singularity/comments/1r9frzk/taalas...
Well if you are exclusively using GPUs that are general purpose, of course you leave so much efficiency on the table. That’s why Google started making TPUs more than a decade ago. I remember that kerfuffle when Google fired Timnit Gebru when Gebru’s paper used GPUs to calculate the environment impact of LLMs while ignoring the efficiency of TPUs; this basically made Jeff Dean very angry due to that wide efficiency gap.
The real efficiency win in these chips is that they are made for inference only. You can throw away the vast majority of a chip if you only need a few ops, a single precision (like INT8 or FP8) and don't need ultra fast interconnects.
Google’s internal review blocked it from publication. Stated reasons were about paper quality. You can speculate whether that was the real reason.
Gebru issued an ultimatum email and said she would resign if some list of conditions weren’t met.
Google said “thanks, we accept your resignation”.
She claims it is retaliation, but it seems more like an own-goal if you ask me. She basically handed Google the solution to their problem.
Practical lesson: don’t tell your employer you might quit before you’re ok with leaving.
It really depends on the pricepoint at which they can get a board. If they can do a ~32B model for 1k$ and a size of an external HDD, I'd buy one now, even knowing that it won't be upgradeable / the model remains fixed. The speeds they've shown are a quality of its own, and there's plenty you can do with such a model and faster than instant responses.
If you consider the places you could deploy it -- with no network access, and at those high speeds... very useful .. for adding vague "common sense" fuzzy thinking to all kinds of applications that right now piss consumers off with poor UX. Esp if the model can do voice-to-text and text-to-speech well (some of the smaller models can)
The state-of-the-art models aren't at "can fully replace knowledge worker" levels yet and I doubt they'll get there any time soon, so charging $2000 / month for access isn't going to happen. Right now everyone and their dog is being handed subsidized credits to play with AI, but the actual outcome is rarely good enough to be worth the money they'd need to charge for it. It might very well take another order of magnitude or two to get LLMs to be truly good (if it is even possible at all), and considering how much money is already being pumped into it I just don't see that happening.
On the other hand, the dumb models are more than adequate for simple noncritical tasks, like directing a user to the appropriate FAQ entry, or playing phone decision tree. There's a lot of money in making chatbot assistants actually useful, or in augmenting website search. Turning it into a glorified "language-to-API-call" translator doesn't take a lot of smarts, but as long as it's cheap you can make a killing in volume.
This is a lane I’ve been experimenting in —- seeing what I can get out of models that work in 16GB VRAM for simple tasks (screen scraping, decision tree navigation, natural language queries). It’s interesting for sure (certainly reveals non-deterministic limits) and promising for low criticality review-opportunity tasks, but I also feel like I need better sources/community for understanding and reflection. Preferably those that aren’t hype channels. Any pointers?
I understood it as a proof-of-concept, not a for-mass-production single blueprint - i.e.: "if you need your NN in a CIM form on ASIC, we can do it".
Their next proof-of-concept was said to be meant to be about size: "we showed you we can do it with 8b, now we are working to show you we can do 24b or 32b". Then, "and we plan to go bigger and faster".
> Our second model, still based on Taalas’ first-generation silicon platform (HC1), will be a mid-sized reasoning LLM. It is expected in our labs this spring and will be integrated into our inference service shortly thereafter. // Following this, a frontier LLM will be fabricated using our second-generation silicon platform (HC2). HC2 offers considerably higher density and even faster execution. Deployment is planned for winter (19 Feb 2006)
8B models aren't useful in general, but for specific use cases they can provide an enourmous amount of intelligence - nVidia's Tesla/Waymo competitor is a 7B LLM with a 2B diffusion model, and running that at those speeds could be an order of magnitude cheaper than existing solutions.
I assert like 80% of this “multi agent parallel workflow” business is simply a workaround to models being soooooo slow. Like as the dude driving these things… you kick it off and twiddle your thumbs waiting minutes to hours sometimes for all the inference and token generator to finish. So you dispatch multiple workstreams in parallel to be more efficient.
I assert that if the model was even 10x faster we’d be using these things radically different. You’d be doing things that are currently time prohibitive. At 100x, holy shit will software dev get crazy. You’d be kicking off hundreds of parallel workers attacking a problem from every angle and stuff. Who even knows!!!
And the thing is, 10x will absolutely come and probably even 100x. And it will be sold like a video game cartridge or something depending on how the actual model gets “baked” into the hardware. No remote inference at all.
My understanding is that robotics doesn't really rely much on LLM's in the first place but rather other things.
Is the thing that you are suggesting that it would ingest all real time data and then reason through it at an incredibly fast speed and then act on it and re-iterate? I might imagine some problems with this though I am not a robotics engineer and perhaps someone who deeply understands this topic can give more information.
They tend to collapse into nonsense and hallucinations pretty quickly if you move slightly out of the envelope of the current visual reasoning benchmaxxing.
I'd say virtually all robots you've seen in the real world today rely on classical approaches - you build a rudimentary map, then use classical algorithms to find paths/do area coverage. The robots do no reason or understand what they're looking for, they're more like in-game units. At most there's some bounded, lightweight image classification going on.
LLMs can understand and reason about the world natively. nVidia has a Tesla FSD/Waymo competitor which simply their 7B reasoning LLM but instead of outputting tokens directly, its outputs are fed to a 2B diffusion model that outputs 1.6 second long trajectory for the car, and this is enough for an L2 system. But to make this work, they need the model to run at 10Hz, so they use super high-end hardware to do it (Jetson Thor) and the car is still "blind" for 100ms at a time (they have a parallel classical safety system).
With on-chip LLMs you could run this loop at like 100Hz on a chip that costs a few hundred bucks, rather than 10Hz on a board that costs several thousand.
The hyperscalers like AWS will made great use of these to serve up models that will be relevant for several years. But right now, we're still seeing significant bumps in model quality every couple of months - especially with open-weight models like Deepseek/Kimi/GLM.
Until that point, though, I don't see how this is ever going to be cost effective vs general purpose hardware.
I also think we'll see miniature versions of this baked into mobile hardware for super fast and efficient on-device LLMs.
1. If LLMs keep improving, burning models onto silicon becomes obsolete too fast and is not worth doing. Outcome: We keep getting better LLMs. 2. If LLM improvements slow down, they will be burned onto silicon. Outcome: We get faster, cheaper and energy-efficient LLMs.
Either way sounds great to me. It will certainly be a mix so we can even get both.
However, based off first impressions, it seems like this is meant for inference side, and not training, which is also an interesting choice.
Nvidia is king of general purpose training chips. But inferences can be specialized.
Yes? That’s why more money will be spent on inference than training?
I’m talking absolute cost. As the number of people using AI and burning tokens goes up the amount of spend on inference goes up.
I am fairly confident that Anthropic has way way more GPUs serving Claude Code to users than they have training models. They’ve got a lot of users!!
> API price is becoming more important than SOTA capability.
Also yes? This is why custom silicon for efficient inference makes sense!
I think we’re in total agreement here :)
We're starting to see what really matters here, and though this is hand wavy the TPU makes similar claims.
I think googles memo about having no moat still stands (see: https://newsletter.semianalysis.com/p/google-we-have-no-moat... if you are unaware). It kind of makes sense that all of this is looking more like 60's to 90's IBM, DEC, Cray, Sun and the hardware race that happened then. History doesn't repeat but it often rhymes and I suspect that these efforts will follow the same trajectory.
https://www.computerhistory.org/storageengine/first-commerci...
Compare that to a multi-terabyte ssd. Now apply that improvement to how an LLM is architected and run now. With AI assisting, it won't be long before a leap occurs and these data centers with all their current ultra-cutting edge Nvidia cards are nearly obsolete overnight.
But if you have such a breakthrough could you not also apply it and run 200T models on todays datacenters?
"The future" being "whenever training and inference at increased scale becomes economical". Which is probably bounded by new generations of hardware, but might also be pushed forward by algorithmic advances.
The likes of Mythos show that the scaling laws are real, and you can x5/x2 the total/active params and get meaningful gains. If "inference per param" gets cheaper? Up the params and get more intelligence for the same price.
The IBM 350 was commercialized 70 years ago; it took 70 years for someone like you to be able to compare that to a multi-TB SSD.
Furthermore, nothing says that Moore's Law will necessarily apply to LLMs, for decades to come.
I think there will be specialized hardware (beside GPUs) that would be custom made for LLMs. Yes TPUs exist, but mainly for datacenter. GPUs exist, but they are adapted from mainly graphic application. Once all the demand from data center dries up, innovation will kick in.
it will build expertise/infra/know-how foundation for next generation of hardware
So far, the accelerator is showing cost savings of roughly 50% compared with typical AI graphics processing units, Broadcom Chief Executive Officer Hock Tan said in an interview. - [0]
50% cost saving. The picture changes so quickly, there are still a lot of low hanging fruits, that I find any discussion about whether a vendor has moats, or if they can recoup investment, is moot and futile.
[0] - https://www.bloomberg.com/news/articles/2026-06-24/openai-an...
At $0.07/kWh, that costs $70,000 every hour in just electricity. $1.7 million /day. $613 million /year.
I had claude estimate the GPU cost of such a deployment:
> To get racks per GW: a full NVL72 rack draws roughly 130-132 kW under full load. If a 1 GW facility runs ~715 MW of IT power (after a ~1.4 PUE for cooling), that's on the order of 4,000–4,500 racks. At $3.4M of compute hardware each, the GPU-system cost lands around $14–15 billion.
15 billion / 613 million / year = ~24.5 years til electricity costs catch up to the GPUs. Obviously electricity isn't 100% of OpEx, but I'd expect it to be the majority for AI deployments.
Regardless, if you can cut the $613 million/yr in half that's still massive savings.
So after the IPO and will be featured heavily in the IPO sales brochure as a future promise?
I'm sceptical over any pre-IPO announcements.
IIRC their biggest cost they're "hiding" in their financials by doing creative accounting is inference (putting it into marketing and whatnot, in the billions)... if they can't hide it in their S-1 then they have to rationalize it, either by a) increasing the prices (not gonna happen, with token based billing orgs are already watching their codex spends) or b) lowering the inference costs. You can lower that by "soft optimizing" (dumbing down) your models but then you have the other players breathing down your neck (see quick rise of Claude), or actually optimizing, in software and in hardware. We're like 5 years into the rise of LLMs, there's not THAT much left on the table unless you write to the metal you specifically designed for your models (and I'm pretty sure the lack of "nvidia tax" would help with covering most of the r&d costs of a custom solution, at least in the long term).
50% cheaper inference without losses in fidelity would unquestionably be a massive win for OpenAI.
No, the nonprofit org stays nonprofit, while the for-profit org it owns will become publically traded.
Does anybody actually believe that?
would be very interesting to see any papers/data around this
Update: Somebody on Twitter said it's going to be hosted 50/50 at Microsoft and Oracle.