Top
Best
New

Posted by HellsMaddy 1 day ago

Claude Opus 4.6(www.anthropic.com)
2229 points | 965 commentspage 13
1970-01-01 19 hours ago|
Here's one I've been using for awhile. The 'smarter' LLMs will overconfidently spit out 7. The dumber ones ask for more info. Opus 4.6 fails.

     A round drink coaster with a diameter of 9 sits between a beer glass and a wood table. The glass has a wall thickness of 1. What is the inner diameter of the glass?
raincole 19 hours ago||
Gemini 3 Pro gives me this:

> Based on the information provided, it is impossible to determine the inner diameter of the glass.

Here is why:

The Coaster Dimension: Knowing the coaster has a diameter of 9 only tells us the size of the object under the glass. It does not mean the glass is the same size; the glass could be smaller (fitting entirely on the coaster) or larger (overhanging the coaster). The Glass Dimensions: While you provided the wall thickness (1), you did not provide the outer diameter of the glass. To find the inner diameter, the formula is: Inner Diameter = Outer Diameter - (2 × Wall Thickness)

Without knowing the Outer Diameter of the glass, the calculation cannot be completed.

raincole 19 hours ago||
GPT 5.2 Chat:

> There isn’t enough information to determine the inner diameter *for sure*.

All we’re told is: - The coaster has a diameter of 9. - The glass sits on the coaster. - The glass wall thickness is 1.

Nothing explicitly states that the *outer diameter of the glass equals the coaster’s diameter*. The glass could be smaller than the coaster, which is very common.

*However*, if we make the usual implicit assumption in this kind of problem—that the coaster’s diameter matches the *outer diameter of the glass base*—then:

- Outer diameter of glass = 9 - Wall thickness = 1 on each side

So: \[ \text{Inner diameter} = 9 - 2(1) = 7 \]

*Inner diameter = 7 (same units as the coaster)*

Without that assumption, the problem is underdetermined.

tototrains 18 hours ago||
Opus 4.6 Extended thinking:

```

Finding the Inner Diameter The coaster sits beneath the glass on the table, meaning the glass rests on top of it. Assuming the coaster matches the outer diameter of the glass base:

Outer diameter of glass = diameter of coaster = 9 Wall thickness = 1 (on each side, so subtract twice)

dinner=9−2(1)=7d_{\text{inner}} = 9 - 2(1) = 7dinner =9−2(1)=7 The inner diameter of the glass is 7.

```

Makes its assumption clear, seems reasonable?

1970-01-01 18 hours ago||
Assumptions need to be stated or you're solving only a discreet part of the problem! Try this, see if you get another deadpan assumption.

     A solar system has 3 planets in concentric orbit. PlanetZ is the farthest with an orbit diameter of 9. PlanetY has an obit diameter one greater than PlanetX. What is the orbit diameter of PlanetX?
mikalauskas 18 hours ago|||
Minimax M2.1:

The inner diameter of the glass is *7*.

Here's the reasoning: - The coaster (diameter 9) sits between the glass and table, meaning the glass sits directly on the coaster - This means the *outer diameter of the glass equals the coaster diameter = 9* - The glass has a wall thickness of 1 on each side - *Inner diameter = Outer diameter - 2 × wall thickness* - Inner diameter = 9 - 2(1) = 9 - 2 = *7*

nopinsight 6 hours ago|
Some of Opus 4.6's standout results for me:

* GDPVal Elo: 1606 vs. GPT-5.2's 1462. OpenAI reported that GPT-5.2 has a 70.9% win-or-tie rate against human professionals. (https://openai.com/index/gdpval/) Based on Elo math, we can estimate Opus 4.6's win-or-tie rate against human pros at 85–88%.

* OSWorld: 72.7%, matching human performance at ~72.4% (https://os-world.github.io/). Since the human subjects were CS students and professionals, they were likely at least as competent as the average knowledge worker. The original OSWorld benchmark is somewhat noisy, but even if the model remains somewhat inferior to humans, it is only a matter of time before it catches up or surpasses them.

* BrowseComp: At 84%, it is approaching human intersubject agreement of ~86% (https://openai.com/index/browsecomp/).

Taken together, this suggests that digital knowledge work will be transformed quite soon, possibly drastically if agent reliability improves beyond a certain threshold.

rishabhaiover 6 hours ago|
Agreed. These metrics + my personal use convey reliable intelligence over consistent usage. Moving forward, if context windows get bigger and token price lower, I have a hard time figuring out why your argument would be wrong.