Posted by abnry 6 days ago
And one of them prevented me from a perfect score, when I guessed wrong.
On the second run-through there was significant overlap. Maybe 30 or 40% of the words were from the first run-through.
Good news for the project is that I think you can easily tweak the LLM to generate better alternatives.
I got 89/100, which extrapolates to 72,700. As a non-native speaker, I'm quite happy with that.
It's annoying that you need to click 3 times per question, and the buttons are in 2 different places.
Maybe would be better to just let me click the answer I want and then instantly show me the next question?
Also who is Sandi?
No offence mean to anyone, but the whole exercise feels very QI : superficial 'understanding' of a large range of things (for example words) without much of a connection between these words.
This approach could also work for getting more accurate results:
1. Show word without any definitions
2. User clicks "I know" or "I don't know"
3. If user clicked "I know", show actual definition of word
4. User selects "I was correct" or "I was not correct"
I scored 71,000.
First try I noticed it in about the middle of the quiz and got ~65k Second try I selected only the longest and got ~78k
I wonder if the test is calibrated to the fact that some answers are just well guessed? I am not a native English speaker, but I speak 3 languages overall and have basic notions in Latin, and I have to admit it helped a lot in "deciphering" a few words that I didn't know at all. And in at least 2 cases I just guessed correctly.
But to be honest many that might catch out a native speaker are just the Spanish/French/Latin word, so it was too easy in a way.
Scientific Estimate: 71,650 words
"Unbelievable. Are you actually Stephen Fry in disguise?"
Core Basics: 16/20
Intermediate: 15/20
Advanced: 19/20
Expert: 18/20
Grandmaster: 16/20
This is significant beyond this particular app, because biases like this are found all over the place in popular LLM benchmarks.