Posted by abnry 6 days ago
Scientific Estimate: 71,650 words
"Unbelievable. Are you actually Stephen Fry in disguise?"
Core Basics: 16/20
Intermediate: 15/20
Advanced: 19/20
Expert: 18/20
Grandmaster: 16/20
This is significant beyond this particular app, because biases like this are found all over the place in popular LLM benchmarks.
It would have paired well with an exposition of vanilla Monte Carlo and the benefits of stratified sampling.
Although stratified sampling is good, one can do better in this case by using adaptive sampling, where one uses a runtime (Bayesian) estimate of vocabulary to maximize information gain per question -- preferrentially sample from those strata where the current strata specific estimate has higher variance.
Vibe coders need to be forced to spend one day learning basic CSS before they're allowed to use an LLM to make a website and the internet would be a lot more pleasant as we move forward with slopification.. It doesn't have to be sloppy, and doesn't take all that much studying to at least be able to steer an llm in the right direction to make something look nice. At this point everything is just the same 3 colors and a centered flex column with weird spacing.