Posted by mustaphah 2 days ago
1. If the benchmarks are just testing the ability to get the answers from history then something is clearly wrong with the benchmark.
2. If that's even a possibility then that's going to lower confidence in the ability to deal with the vast majority of problems where you don't already have the answer written down.
3. That's not the customers problem to solve on behalf of the vendor.
The test environment contains the answers to the questions.
It's perfectly reasonable to expect a level of performance concordant with the marketing of these tools. Claiming this is superintelligence, while also excusing its poor performance is dishonest and false advertising.
Turns out the test shouldn't have the answers included in it?
https://www.oracle.com/news/announcement/blog/oracle-cloud-c...
Wall Street is currently heavily punishing any company who misses their quarter, including NVIDIA!, after beating on their quarter.
Oracle had a earnings miss in the current quarter!
Their current REALITY is ~$15B quarterly revenue (with cloud infra ~$3B) and only ~$12B in near-term deferred backlog and deferred backlog is NOT revenue. To justify the valuation, this would imply OCI going from ~$18B in FY26 to ~$140B by FY30 that is an insane promise of +$120B in 4 years but back-loaded into the year 3 or year 4. :-))
Capex needs ~$35B next year just to chase GPUs/power and if they miss one quarter the story implodes. The supposed rational, efficient market, is paying near $1T today for back-loaded hopes.
Is completely bubble math. Like anybody, including Oracle AND their Customers, have ANY idea of their Capex in 4 years.
Complete and total bubble.
How can we ever perform this sort of faux-neutral agentic evaluation in an environment where we want agents to have access to the sum total of knowledge (which will necessarily include being able to learn about the evaluation being conducted and its expectations)?