Posted by babelfish 9 hours ago
I went back to the BixBench benchmark which they mentioned. I couldn't find official results for Anthropic models, but I found a project taking Opus 4.6 from 65.3% to 92.0% (which would be above GPT-Rosalind) with nearly 200 carefully crafted skills [1]. There also appears to be competitive competitor models with scores on par with this tuned GPT.
Sam Altman, August 2025
For me too, it was around that time last year, with GPT-5, Claude Sonnet 4.5 and then Gemini 3 that I started feeling that these models are clearly becoming great at reasoning. I'm not at all opposed to saying that they are around PhD-level on at least some domains.
It’s kind of gross to make money off her name (if that’s what’s happening) posthumously. It’s a complicated story anyway. IIRC her sister referred to it as “the Cult of Rosalind” when people were cashing in on books about her.
I'm absolutely ok with a legitimate lab scientist conducting biochemical research getting suggestions about substances that are generally considered dangerous but might be appropriate for their study, and it'll be up to the scientist to discern whether it is indeed appropriate to use.
Isn't this more akin to "Rosalind! You are a respected world-class expert! Can you help me?"