How many of the 170k English words do you know?

Posted by abnry 5 days ago

How many of the 170k English words do you know?(vocabowl-870366514258.us-west1.run.app)

501 points | 554 comments

brianleb 5 days ago|

As others have pointed out, too many clicks per word. I am a sucker for a 'how many words do you know' quiz so I finished anyway. Overall I'm skeptical of the classifications. In broad strokes, the early words are easier and the latter words are more challenging, but the middle is pretty muddied.

Some of the words chosen are rather absurd/inappropriate: breviary (which I got wrong but felt like a vaguely religious word) was characterized as intermediate but I think it's much more obscure and less obvious than that; Hippopotomonstrosesquippedaliophobia was used as a word (I got that wrong as well) - any type of 'phobia' word is really the sort of thing a fourth grader opens up a page in the dictionary and points out, not a word that is used... ever; metamorphosis and kinetic were labeled expert, which I don't agree with (what elementary schooler doesn't learn about the metamorphosis of a caterpillar into a butterfly? what high schooler doesn't learn about kinetic energy?).

Most words were reasonably well defined in a way that most people would understand or recognize. A few words had poor definitions: lethargy ("the state of being lethargic" - obvious); complacent ("smug satisfaction with oneself" - I disagree that complacency is intrinsically smug); magnanimous ("generous toward a rival" - I disagree that a rival must be involved); gauche ("socially awkward" - this is sort of close but the given definition completely misses the idea of being tactless).

They call it scientific and give a hand-wavey formula, but they don't explain how words are stratified in the first place. If stratified sampling is a formally recognized method of doing this, it would be nice to have a link to a real reference. I think I know a lot of words, but I am skeptical of the estimate this app provided (north of 75k).

julianeon 5 days ago||

I'll contest a few of these, which I thought were good.

Breviary: this was, to me, known and not uncommon. It's widely known to Catholics, but also, if you have an interest in medieval art or books, you'd likely know it too. It was one of the main types of books before the invention of the printing press. Think of an image from an illuminated manuscript, 50% chance it's from one.

Hippopotomonstrosesquippedaliophobia: it's not that you're expected to know the whole word, but they're looking for you to recognize components of it and infer the meaning from that. I knew sesquippedalian (sometimes jokingly used in "long word" contexts) so that was easy: but phobia is also easily identifiable, and hippo, from the latin root, I knew was not as obvious as the animal, but probably something like "large" (clue: the Hippodrome). So you could, even knowing only "phobia" and being able to guess "hippo", have a good basis for your choice.

Complacent and gauche: have heard both these uses, I think that's straightforwardly correct. If this was a dictionary that would, at worst, be the 2nd or 3rd definition. No complaints.

Source: I used to place in spelling bees and could've been a contender but I didn't have the discipline to study the dictionary for hours on the weekends, which is the next level.

hatthew 5 days ago|||

To me, hippopotomonstrosesquippedaliophobia feels less like vocabulary and more like trivia

duskwuff 5 days ago||

Yes. It is a word which seems to only be used as an example of a long and obscure word. I have never heard it used expressively, other than as a joke.

tialaramex 5 days ago||

Yes, breviary is the only word in the first 80 that I hadn't ever seen before but I understand that if you're Catholic it's probably not that much weirder than "rosary" or "Eucharist" or whatever which are words I did know, so fine.

In the last batch there were a few words that I was vaguely confident of but a lot more of them seemed like "stunt" words I would never see because every time they'd need defining so why bother.

Also I was assuming it was picking from a huge set, but it seems everybody was shown the same words, so while it's supposedly a "sample" any bias, even if unintended, shows up in the results, if you wanted to be scientific perhaps you'd do this for 1000 words and then sample 100 questions from that for each participant or something.

duskwuff 4 days ago|||

> Also I was assuming it was picking from a huge set, but it seems everybody was shown the same words

I don't think I got "breviary" when I tried. Maybe it's using a decision tree, but everyone's ending up on the same branch by getting most of the words right?

LtWorf 4 days ago|||

In many books from 1800s the priest is always described having his breviary at hand. It's also often featured when priests appear in jokes.

klempner 5 days ago||||

I will say that breviary it showed up in "advanced" for me, and was one of only two words below "grandmaster" I missed. In the modern era it is jargon, it's just that the in group (practitioners of liturgical Christianity) are in the ballpark of a quarter of the English speaking world.

I'll remark that "if you have interest in [some particular academic pursuit], you'd likely know it" is a pretty decent description of the sort of word that shows up in "grandmaster" tier.

(I have joked that, living in Japan, my English is getting worse faster than my Japanese is getting better, but breviary might well be a concrete example.)

bbor 5 days ago||||

For explicit comparison: kinetic and metamorphosis are ~10x as common as breviary, and 10,000x as common as hippo….

See NGRAMs: https://books.google.com/ngrams/graph?content=Breviary%2CHip...

beojan 5 days ago||||

> So you could, even knowing only "phobia" and being able to guess "hippo", have a good basis for your choice.

Except "hippo-" is from Greek and means "horse".

mpreda 5 days ago||||

> and hippo, from the latin root, I knew was not as obvious as the animal, but probably something like "large" (clue: the Hippodrome)

Well.. Hippos is greek for horse, and Hippopotamus is a "river horse". Same for Hippodrome, a course for horses. And in latin, hypo means small (and not large), as seen in e.g. hypoglycemia.

wazoox 5 days ago|||

Hypo is Greek too, not Latin "small" for a latin radical would be "mini" (from "minus") like in miniature, minuscule, etc.

mejutoco 5 days ago|||

And I thought in German Nilpferd (horse from the Nile) sounded ridiculous. It is almost the same as the original. TIL

phatskat 4 days ago||||

I was, as was gp, confused by “complacent” as I haven’t typically used it or thought of it to include a smugness and immediately went to the ol’ Google only to find that “smug” appears in Oxford Language’s* definition as well. The key though is “smug or uncritical”, so while smug may not be typical for some it does make sense now that I have that added knowledge.

And iirc “gauche” had more than just “socially awkward” in the correct answer but speeding through it again I didn’t get gauche as a word. That said, something gauche, to me, has always been something glaringly “not ok” in a social sense so again, that tracks. Oxford Language defines it as

> lacking ease or grace; unsophisticated and socially awkward.

Which is closer to the quiz’s definition and again, tracks with my internal thinking of the word’s use.

> Hippopotomonstrosesquippedaliophobia

Was just plain fun - as soon as I saw the “fear of long words” I was like of course that’s it

*I mistakenly put “Merriam Webster” the first time around - while MW doesn’t include the word smug itself, the 1.b definition is simply “self-satisfied”

FireBeyond 4 days ago|||

Ucalegon was perhaps the most ridiculous to me, much more a factor of your knowledge of classical literature than vocabulary.

trebligdivad 5 days ago|||

Yeh, it had 'kerfuffle' as one of the last words but that's very common. Yet it had Zenzizenzizenzic (which I'd never heard of but I think I guessed it right)

It really could do with a summary showing the answers you made and corrections for what you got wrong.

wingmanjd 5 days ago|||

Ha, a Wikipedia article link to Zenzizenzizenzic was on HN earlier today! I don't think I would have gotten that one right otherwise.

https://news.ycombinator.com/item?id=48603664

kurthr 5 days ago|||

Yep, 'panacea' was grandmaster, but 'quire' was intermediate?

mr_toad 4 days ago||

The dictionary they’re using seems to pre-date the printing press.

Muskwalker 5 days ago|||

> complacent ("smug satisfaction with oneself" - I disagree that complacency is intrinsically smug)

I agree that it doesn't seem 'smug', but weirdly both dictionary dot com and Wiktionary give 'smug' as a synonym or part of the definition.

But they also analyze 'smug' as equivalent to self-satisfied or self-complacent, so maybe that's the word whose meaning is not as expected.

(I would think of "smug" less as "self-" anything - it implies a relation, it's more like exulting in a superior situation one has over someone. And 'complacent' is at base being content with one's situation, but often with the negative implication that one should be acting to make things better instead)

n4r9 5 days ago||

In my mind "complacent" means the opposite of "pro-active" - not taking actions or decisions in the face of an issue. That could be because of feeling panicked or uncertain. So I was also surprised by the "smug" and "self-satisfied" parts.

skydhash 4 days ago||

I don’t get the smug. But complacent has always has the “self-satisfied” and “a bit lazy”. Not exactly what is in your comment, but someone that is rooted in his behavior, but not for stubbornness or arrogance, just “it is ok that way because I’m happy and there’s no reason to change”

montag 5 days ago|||

The test hardly seems adaptive (if at all) and yet it made the HN front page. That’s impressive.

ralferoo 4 days ago||

Of the words people are commenting on here, I only remembered one of that (maybe that's just because I got it wrong), zenzizenzizenzic. I guess if I'd realised zenzi was relating to squaring, I might have guessed it.

I think some of the, were flawed - I can't remember what it was now, but one word two of the meanings were kind of appropriate, but I chose the wrong one, and I think there were 2-3 words I didn't know but guessed from the components in the words. At least one I also guessed that way, but got the complete opposite meaning!

I like this kind of test, but for me, the first 2 sections (which I aced) were kind of redundant. Maybe they needed to stratify it more or do it more dynamically, e.g. maybe do half the layer 1 questions, and if you get all them correct, move on to half the layer 2 questions. If you get one wrong, you get the rest of the layer 2 questions, and maybe if you get more than a certain number of those wrong you also have to go back and do the rest of layer 1. If you ace the first half of layer 2 as well as layer 1, maybe you jump straight into layer 3, etc...

kevin_thibedeau 5 days ago|||

> what high schooler doesn't learn about kinetic energy?

95% of Americans.

buzzerbetrayed 5 days ago||

Sick burn bro.

I can assure you that just about every American that has made it through middle school has been taught about kinetic energy. Let alone high school.

ralferoo 4 days ago|||

Being taught something has little correlation with learning it, and even less with remembering it years later.

Perhaps just because it suits my learning style, I find learning is actually easier if I attempt to work something out or guess it, and then am corrected when wrong, because then I have a memory to anchor it on. If I skip that part and just try to learn some facts, very little is retained. One consequence of this is that I prefer science / logic based subjects to things like history or geography (as in places, etc, not the science parts) where it's just a bunch of arbitrary facts that you can't just guess or work out for yourself.

Antoniocl 5 days ago||||

Oh interesting, is it actually covered as part of the standard compulsory public school curriculum? Genuinely surprised, because here in Canada (Ontario) it's covered as an elective in 11th grade physics, which roughly 15/120 people in my graduating class opted to take.

derektank 5 days ago|||

Each state maintains its own public school curriculum, so generalizing about US education in the first place is a fool’s errand. But certainly in many states, students will take a generic science course covering the basics of Newtonian mechanics, the periodic table, and Mendelian genetics in middle school (roughly ages 12-14) before more specialized courses become available in high school, such as Physics or Biology where these subjects would be covered in greater depth and breadth.

LtWorf 4 days ago|||

I think the generic idea of kinetic energy can be explained much much earlier and to everyone, while how to calculate it can come later on.

naishoya 5 days ago||||

Perhaps they were taught about it, but did they learn it?

Have they retained that knowledge beyond the test at the end of the semester?

Anecdotal observations would imply that they have indeed been taught it, and indeed have failed to retain the concept.

I have no rigorous data regarding either; but the generally poor outcomes which appear as result of a lack of retention of scientific, math, socio-economic, and anthropological instruction do seem self evident both from within and outside of the US, in headlines and actions, writ large and for all to see.

Is the problem the use of teaching methods which focus on short-term memorization rather than conceptual comprehension? Is it the lack of support for instructors? Is it a lack of focus in the student body? Is it some or all of the above in varying degree? Or something else entirely?

bbor 5 days ago|||

Not to bring up the topic de jure too early, but this seems like a very lazy usage of AI. Especially egregious when it’s to redo something that’s been done a thousand times…

asah 5 days ago|||

I used <tab> and <space> and left the mouse hovering over the continue button, and it went very quickly.

da_grift_shift 5 days ago|||

Unfortunate bullshit asymmetry here. Taking the time to thoughtfully point out inaccuracies in a piece of vibesludge excreted in seconds.

herodoturtle 5 days ago||

“vibesludge”

^_^ hah what a great word, first time seeing it.

Another one I came across recently - “sloptimization”

taffydavid 5 days ago|||

If you didn't already know what Hippopotomonstrosesquippedaliophobia was, it's very easy to guess from four options.

I agree there were too many clicks per word, I took me too long to finish. But I also found it too easy to guess the few words I did not know

balazspeczeli 5 days ago|||

Does the ability to guess the meaning of a word (from four options or context) is the same as knowing the word and using it in your speech?

GolfPopper 4 days ago|||

It is not. Vocabulary is far from the binary of "you know this word or you don't". At a minimum, it is usually split into passive and active vocabulary, with passive being the words you understand when encountered, and active being the words you can use effectively. Wikipedia's entry is a pretty good overview.

d0mine 5 days ago||||

It is more like an ability to recognize the word when it is used in context.

I got ~1/3 that is very generous estimate even for "recall" case (recognize), and it obviously false for the "generate" case (using in speech) where I guess my vocabulary is likely ~1/90 of all English words.

taffydavid 5 days ago|||

No, absolutely not, and that's my point. A real test would be having to type the definition, or pick from ten options

GolfPopper 4 days ago||

I think that picking the correct (or most correct, which is trickier) use of the word in context (out of, as you say, many options) might be a good way to test for receptive vocabulary.

z2 4 days ago|||

In particular, I got a bunch of guesses correct because there's a pattern that several options are often related to each other, and either only one is different, (e.g., "Do good", "Do bad", "Be evil") -- in which case the answer is obvious -- or at least there's a contrasting pair which narrows it down to a 50% chance.

justaman123 4 days ago||

[dead]

camillomiller 4 days ago|||

My man, it’s a slippy sloppy app claude coded possibly in an afternoon at best…

mhdhejazi 3 days ago|||

One interesting observation: If you’re unsure of the correct answer, select the longest option, and you’re almost always correct!

alienbaby 5 days ago|||

69400 for me, and I knew I fucked up on ~ 5 I really did know.. or perhaps I didn't know them as well as I thought?

gerdesj 5 days ago|||

"Hippopotomonstrosesquippedaliophobia"

Hippopotamus does mean river horse and I was caught out by that (note the o instead of a in ...poto...). I think that word is really a joke - lol - a bit like floccinausilihilipilification, which I wont bother looking up the speling 4.

andrewflnr 5 days ago|||

Various sources suggest it's a literal joke, e.g. https://en.wiktionary.org/wiki/hippopotomonstrosesquipedalio...

alienbaby 5 days ago|||

I was gonna say, you spelled that wrong :p

ggorlen 4 days ago|||

[dead]

deejaaymac 5 days ago||

"what high schooler doesn't learn about kinetic energy?"

A lot of them, because being an anti-intellectual is 'cool'

sd9 5 days ago||

Interesting concept, but 100 words is really quite a lot to get through... It's tiresome trudging through the easy words at the start, and I never got to see the interesting words before getting bored.

I've seen other systems like this calibrate far more quickly by assigning a sort of score and confidence behind the scenes. Confidence starts out low and increases over time - correct/incorrect answers rapidly adjust score at the beginning, then things settle down.

In practice this means you get a sequence of increasingly uncommon words initially, until you get one wrong, then you drop back to something easier until you start getting things right again, and eventually circle around words at your level.

Also - too many clicks per word. It's low stakes, just let me click the definition once and I'll live if I misclick (or add an undo button).

datsci_est_2015 5 days ago||

> Also - too many clicks per word. It's low stakes, just let me click the definition once and I'll live if I misclick.

This, and accept that people will have incorrect input and build it into the confidence. Even the smartest person in the world sometimes makes clerical errors, or has the wrong neuron fire at the wrong moment.

gpt5 5 days ago|||

[dead]

thenthenthen 5 days ago|||

Moly holy the clicking is too much 3 clicks that could be one :O

conradludgate 5 days ago||

300* that could be 100*

Karliss 5 days ago||

Even better if keyboard keys (1,2,3,4) were also supported.

dylanz 5 days ago|||

+1 to all these points especially the first one. I dropped off after about 10 words and didn't have a clear path to move to the next level.

DC-3 5 days ago|||

It also doesn't get hard enough. Also way too many of the words are just words about long words, or the tendency to be verbose.

philipwhiuk 5 days ago|||

It does get hard enough but only in the very last fraction.

Zenzizenzizenzic for example.

JumpCrisscross 5 days ago||

If I had to write out the definition, I’d have been screwed. The recursive structure of the word makes it out as a child’s word or something from mathematics. Given where it is in the game, that left one answer out of the four.

dgellow 5 days ago||||

Level 5 grandmaster was hardcore!

magicalhippo 5 days ago|||

I got zeitgeist, panacea and obfuscate on Level 5... wut?

Some at Level 4 was definitely a lot more obscure than those.

suzzer99 5 days ago|||

How jejune of you.

IshKebab 5 days ago||||

It gets impossible. Yarborough is apparently not a town in England. I guess technically it's a village but come on...

alentred 5 days ago||||

> It also doesn't get hard enough

Oh come on! Like you really knew what "Hippopotomonstrosesquippedaliophobia" is?

shdon 5 days ago|||

I thought that one was pretty well known. But then, I can also rattle off Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch at will.

NopIdoN 5 days ago|||

But why?

say what you like about antidisestablishmentarianism; at least it's an ethos

shdon 5 days ago|||

Somebody obviously coined the word as a self-referential joke. And somehow it stuck. That makes it memorable.

Speaking of things that stick... arachibutyrophobia is the feat of getting peanut putter stuck to the roof of your mouth. (I admit I had to look that one up, as it's not nearly as memorable, though I knew the word existed).

gerdesj 5 days ago||||

They are Welsh?

I too can say it and I'm very English...ish. LlanPG is a tourist attraction and a great example of an amateur advertising idea smashing it!

Animats 5 days ago|||

That's round 2 of England's established church problems. Not as bad as round 1, where the Catholic church was violently disestablished by Henry VIII so he could divorce his wife. Cromwell told the Catholics they could go "to hell or to Carna". I've been to Carna. It's bleak.

It's hard to disestablish a religion. Too many people believe. In Russia, the Russian Orthodox Church came back after Communism went down. Now Putin uses it to reinforce his rule.

mr_toad 4 days ago||

> so he could divorce his wife

That’s the schoolyard version of the story. In reality dissatisfaction with the church hierarchy had been simmering for some time, both in England and in Europe. Henry wouldn’t have gotten away with the split if it hadn’t enjoyed widespread support from the general public, the political class and the aristocracy.

monooso 5 days ago|||

The real question is, do you know what it means?

readthenotes1 5 days ago||||

Based on only missing that one, it figured out. I knew 83,000 words. That seems unsupportable

iugtmkbdfil834 5 days ago|||

:D I did better than expected, but I did miss that one. I learned some fun ones.

thenthenthen 5 days ago|||

Lol. Yeah. Non native here but gave up at about 50 words. Too many words, too easy. And my English SUCKS

haswell 5 days ago||

If you gave up at 50, that means you skipped the difficult words.

thenthenthen 5 days ago|||

True. I tried a few more times. There is just so much wrong with the design. First 90 words super easy and then super hard? Why not random? Why is the longest description the correct one? Why so many? Why 3 clicks?

2dvisio 5 days ago|||

Agree. Complexity for me skyrocketed towards the end

tengwar2 5 days ago||

Yes - a very marked step rather than a gradual increase, I thought.

latexr 5 days ago|||

> Also - too many clicks per word.

They’re also too far away. I’m on a laptop and I have to keep moving the cursor up and down just to confirm. Give each option a letter or number and let me press it to choose the answer¹.

¹ There is (was?) some service for forms which does that and it works quite well. I think it was Typeform, but I just opened the website to check and—of course—it’s now just plastered with mentions of AI so I lost interest in verifying.

analog8374 5 days ago||

it's intentional. therefore testing vocab isn't the point.

I'm guessing it's testing our susceptibility to machine-generated compliments

latexr 5 days ago||

> it's intentional.

What is?

> I'm guessing it's testing our susceptibility to machine-generated compliments

I fail to see the point. For one, the compliments aren’t particularly good or interesting; for another, I didn’t even read them (I just went back to check after your comment), I simply clicked when seeing green.

analog8374 5 days ago||

too many clicks per word. and the distance between click points. that's intentional.

well the point would be to see how susceptible you are to that. They're figuring out where your cost vs reward tipping point is.

latexr 5 days ago|||

I think you’re reading too much into it. I think it’s just a common design pattern that was copied and is clearly optimised for mobile, where the distance doesn’t matter that much.

Anyway, if they were running metrics on that they just became useless because I automated responding to it a bunch of times.

https://news.ycombinator.com/item?id=48598586#48600403

tancop 2 days ago||||

its just a lazy clone of duolingo. its a ux pattern that works on mobile for a lesson with 10 complex questions where you have to retry the ones you failed until you get it right. when its 100 abcd questions on desktop its just annoying for no reason

scubbo 5 days ago||||

Can you elaborate? Who are the imagined "they", and in what way are they conducting experiments with or monetizing this investigation?

analog8374 4 days ago||

some people will crawl a mile for a promise. some people get skinny on a diet of promises

philipwhiuk 5 days ago|||

There's a small handful, mostly QI-inspired.

sowbug 5 days ago|||

Plus a scroll on mobile because the submit button is below the fold, though it seems to stay in the right place after the first scroll.

dolebirchwood 5 days ago||

Vibe coders don't know 'bout my dvh.

sandworm101 5 days ago|||

100 is too many? Thats two or three minutes at most.

I would suggest a bias in this test towards reading. More than a couple are words i know but rarely see in print. But maybe im too much a fan of british TV so i hear many of thier words without seeing them written down.

sd9 5 days ago||

Did you actually do 100 words? It wasn't two or three minutes. With good UX, sure. But I wasn't getting through 1 word per second.

sandworm101 5 days ago||

I did. Missed two. If you know a word there is no thinking time. Im on tablet so i was probably fast on the clicking, but not like korean gamer fast.

ralferoo 4 days ago|||

I don't know, I read each option for the first 20 or 30 in case there were any trick questions. There were a couple later where 1 option was very close to the meaning and 1 was much better. I actually got one wrong (can't remember what now, shame there's no summary at the end) where I chose one and it said another answer was correct, but I knew the meaning I chose was also valid.

sd9 5 days ago|||

I guess you just have a higher tolerance for inconvenience than me

cyanydeez 5 days ago|||

yeah, it should just be click->next;

I got tired after 8 words, looked at how many I'm suppose to know and gave up.

It'd be improved with statistical analysis; just progressively get harder and try to guess. If you wanted to gameify, you could update the stats after each answer.

jwpapi 5 days ago||

Also the explanations are too broad.

F.e. Frugal - Economical with money or goods

I don’t think frugal means economical it means rather over the top …

Yeah I don’t know how to define it properly but I don’t need to learn new words if they don’t even teach the right meaning

Ai slop

FLHerne 5 days ago|||

That seems a pretty good definition of 'frugal' to me. To be excessively frugal would be miserly, tight-fisted or whatever.

There were a couple of definitions I did think were a bit off, e.g. 'zenith' and 'nihilism'. And one word where two answers seemed valid but I forget which.

Sometimes it gives one of several possible meanings but that's a valid choice.

In general I think it's a fun quiz - agreed with others though that the word selection brackets aren't ideal. It spends a lot of time on everyday vocabulary, then jumps straight into long words that someone made up one day as a joke.

The words I find most interesting are those that convey some subtle nuance, or describe some very specific thing - tools for old crafts, uncommon but genuinely used adjectives and the like. Very few of those appear.

Per_Bothner 5 days ago||||

"Frugal" most definitely does not mean "rather over the top" unless that is some new slang meaning I've never heard of.

tom_ 5 days ago||||

You can look it up in a dictionary? See, e.g., https://www.oed.com/search/dictionary/?scope=Entries&q=fruga...

LeonB 5 days ago||||

That definition hinges on their definition for “economical” - adding a qualifier like “excessively economical” would’ve been good I think.

jwpapi 5 days ago|||

Seems like I’m the idiot here.

I had frugal stored as more than just economical.

Thanks for your comments :/

astura 4 days ago|||

Frugal doesn't mean anything "excessive," it's not the same as a cheapskate or tightwad.

Being frugal just means allocating scarce resources in a way that provides most utility and value.

IshKebab 5 days ago|||

Seems like you don't know what frugal means at all!

stbullard 5 days ago||

In addition to everything everyone else has said: their math is off by half (or 100%, depending on how you count), due to a structural error.

(context: native English speaker, big reader, huge nerd, perfect SAT score)

I got all 100 correct on the first try without looking anything up! Confusingly, that only resulted in a "SCIENTIFIC ESTIMATE" that I know 85,000/~170,000 words?

Their "How is this calculated" page that appears at the end explains their error:

> According to the Oxford English Dictionary (Second Edition), there are approximately 171,476 words in current use.

> We use Stratified Sampling. Instead of testing random words, we divide the language into 5 distinct difficulty bands based on frequency of use:

> 1. Core Basics ~3,000 words > 2. Intermediate ~7,000 words > 3. Advanced ~10,000 words > 4. Expert ~25,000 words > 5. The Obscure ~40,000+ words

> If you answer 2 out of 3 'Intermediate' questions correctly, we estimate you know roughly 66% of the 7,000 words in that band.

> Total Score = Σ (Accuracy in Band × Band Size)

Their strata add up to 85000, not ~170k, making a perfect score still give a 50%.

They're also using a pretty limited and perhaps non-difficulty-representative subset of the language.

Cute, but wrong on many counts.

iLoveOncall 5 days ago||

A lot are also just guessable because 3 out of the 4 definitions are obvious nonsense. I'd rather have a "I don't know this word" button than just pick the one that's obviously correct out of the 4, if the goal is to get a real estimate.

utdoctor 5 days ago|||

Funny enough, usually the correct answer is the option with the most number of letters/words. I found myself just picking the longest answer and by a wide margin it was the correct answer.

Enginerrrd 5 days ago||||

Yeah I scored well enough and only missed 3, but that’s just because it was very easy to “guess well”.

There were many words I didn’t know though.

maplethorpe 5 days ago|||

I wanted to know my real score so I intentionally picked the answer most likely to be wrong in those instances.

zvr 5 days ago|||

Exactly the same feedback: I got all 100 correct, and the results were the same as yours.

As it usually happens in this kind of "check your vocabulary" tests in English, being Greek gives you an advantage in higher levels ;-)

a022311 5 days ago|||

I'm Greek too and I got 81 (well technically I misclicked one in a hurry, would've been 82). It did help a bit though. Surprisingly enough I've learnt many of the more advanced words from technical blogs!

WalterBright 5 days ago|||

I rely on being Geek for advantage.

guidedlight 5 days ago|||

It was clearly built with AI.

tomrod 5 days ago||

Is this a problem? I thought it was fun, personally, even if the author used AI to help build it.

geuis 5 days ago|||

It's not fun because it isn't challenging. I have an expert-1 level of English as a native speaker and heavy reader. But absolutely nothing in this is challenging at all. The one question I got wrong was because the 4 proscribed answer options weren't specific enough. So overall there's no value in this.

bogwog 4 days ago|||

It's obviously a problem because it doesn't work as intended/at all. What's point of building something like this if it doesn't work? At best its a waste of everyone's time, at worst it's misleading.

irishcoffee 5 days ago|||

As an aside, I am also an avid reader, always have been, 790 on the !math part of the SAT back in the very early 2000s.

I attribute most of my success in life to reading early and often. Bartending in college rounded out the social skills (for me) but those two skills have carried me further than I anticipated, coming from a poor background.

Have you found the same to be true?

copperx 5 days ago||

How did bartending improve your social skills? On the surface, it looks like a regular customer service job.

geuis 5 days ago|||

Guessing you've never worked a service job. It's a good way to learn how to interact with the public early on. The success model is not being fired for bad social customer interactions.

Even if you're an introvert, working for a couple months at Olive Garden when you're 19 helps you to smile and be polite when 80% of the customers are mouth breathing idiots. Turns out they aren't all mouth breathers and those para social skills come into play later during your career.

I highly support kids of all origins working in service for a bit. Ain't a class thing, but is very helpful in getting used to the breadth and depth of people.

margalabargala 5 days ago|||

The length and breadth of conversations you tend to get into as a bartender far exceed nearly any other customer service job. Not to mention it's frequently with the same people.

There are few professions where it's not unusual to have an hour+ conversation about literally any topic, and then potentially do it again the next day with the same person about a different topic. More similar to a therapist than customer service.

KaiserPro 5 days ago|||

I agree with the others

But the choice of "advanced words" seems a bit odd. Obscure, isnt that obscure.

Sure there are some speciality words, but most of these words are just the stuff you're gonna hear on radio4 in normal conversation

estebarb 5 days ago||

I had that discussion with my high school English teacher. We used USA oriented books and they often introduce "advanced vocabulary" which should have been trivial for Spanish speakers, or any latin language speaker for the matter.

I suppose they evaluate difficulty based on origin of the word. If you already know German or Spanish you may have a head start when learning English, but on a different subset of it.

stevage 4 days ago|||

Confusingly I got three wrong but got the same vocab estimate.

jzer0cool 5 days ago|||

What background you all have that contributed you think to scoring 100

petesergeant 5 days ago|||

I got 96. I think I knew about 87 of them just from knowing them, and the rest I got with a bit of Latin and Greek background (eg a word starting with “ab” is likely to mean “away from”), plus there’s a pattern to how they’d generated the wrong answers.

dogmatism 5 days ago||||

I read a lot, and have since I was a child

edit: also, native English (well, American) speaker

schoen 5 days ago||

Same here. Also, I studied Latin and Greek in school and have kept studying them in various ways since then. I think this test is significantly biased toward vocabulary with these origins; dozens of tested words are directly recognizable as the "ordinary" Latin or Greek words for some concepts, or direct combinations of common Latin or Greek roots.

A lot of prestigious and scholarly vocabulary in English has come in through Latin and Greek (at various points in the history of English!), so you can learn that vocabulary or make it more memorable or more transparent either by studying Latin and Greek as languages, or just by studying some of their common morphemes (e.g. there are lists of Latin and Greek roots that may be given to medical or life sciences students to help them learn to recognize the meaning of terminology coined from these languages, even without speaking the languages).

But I think it's actually unrepresentative of the English language as a whole if we're literally thinking about vocabulary size rather than historical prestige of some part of the vocabulary. For example, foreign foods like "nori", "pandan", "dolma", "vichyssoise"[1], or "berbere" are often used as English words and would probably appear in large English dictionaries nowadays. None of that was tested in this quiz. I saw one foreign political term which I guessed at, and one or two German loanwords which I knew (I've also studied German), and almost everything else was Latin or Greek origins!

[1] apparently coined by a French-speaking American based on French roots?

devmor 5 days ago|||

I missed two, but I’m willing to be tthey’re similar to me - I read a lot and whenever I encounter a new word I don’t know, I usually look up its etymology.

whateverboat 5 days ago||

I think I got 80 correct and got 57k.

Laurel1234 5 days ago||

Pretty fun.

I suggest skipping the submit button and just showing it's correct when pressing and moving on after a sec or so. Having to click on submit twice really breaks the flow.

Also in all the words I tried I noticed out of the 4 options one is the correct one, another is the opposite of the correct one, and the other 2 are random stuff. You can basically skip any option whose antonym isn't present as well.

RicoElectrico 5 days ago||

It estimated 74k words for me, but I feel this might be inflated; much of the time when I didn't know the answer - I could vibe guess it just as you did it. The distractor answers weren't convincing enough. For starters, when an answer was based on deconstructing the word into common English words, that ruled it out. After all, if it was, then it wouldn't have been obscure.

A tangent: writing distractors for multiple choice questions is hard. From the exams I know (excluding those whose nature precludes it, such as based on calculation or rote memorization) the only that does this brutally well is LEK (Polish medical graduate exam). It's nigh impossible to vibe guess it at more than random chance for someone outside the field.

superjan 5 days ago|||

What I also noticed: when there are two contradictory definitions to choose from, it is usually one of those two.

For all its shortcomings, this was part of the fun, deducing the likely correct answer when you see a word for the first time.

datsci_est_2015 5 days ago||||

Yeah I also got exactly 74k. Stuff like “xylologist” I guessed had to do with vegetation because of “xylem”, whereas xylophone player was too on the nose. Then again, maybe knowing xylem in the first place makes 74k reasonable.

mpeg 5 days ago|||

Yeah I guessed that one right because xylophone player sounded like a trap.

I don't understand how they rank words though, some extremely common words like xenophobia were ranked as high as much more obscure ones.

fittingopposite 5 days ago||||

Haha. Yeah I figured Xylo- (wood) + sth. related to mono-poly so wood-seller made sense. Never have heard of this word before

pclmulqdq 5 days ago||

I think the test was vibe coded, because a xylologist is someone who studies wood, not someone who sells wood. I am not sure if "xylolgist" was the exact word, though.

xylo- = wood; -logy = study

Indeed from M-W: "a branch of dendrology dealing with the gross and the minute structure of wood"

fittingopposite 5 days ago||

Seems to be a hapax legomenon https://www.oed.com/dictionary/xylopolist_n "OED's only evidence for xylopolist is from 1656, in the writing of Thomas Blount, antiquary and lexicographer."

pclmulqdq 5 days ago||

That test had several hapax legomena on it, so it would make sense.

rationalist 5 days ago|||

66k for me, but I didn't get that word, instead I got ones like Hippopotomonstrosesquippedaliophobia, Flibbertigibbet, and Brobdingnagian... which the latter two interestingly do show up in my keyboard's word completion suggestions.

BenjiWiebe 5 days ago||

I've encountered flibbertigibbet and Brobdingnagian. Never encountered hippopotomonstrosesquippedaliophobia before, at least I don't remember encountering it.

Flibbertigibbet appears in some of the Little House on the Prairie (Laura and Mary) books, if I remember right.

And I've also read Gulliver's Travels which is where Brobdingnagian comes from. Brobdingnag was a land of giants. Pretty sure I've seen the word used elsewhere though.

abnry 5 days ago||

I knew Flibbertigibbet from the sound of music:

MARGARETTA: How do you find a word that means Maria?

BERTHE: A flibberti gibbet!

SOPHIA: A willo' the wisp!

MARGARETTA: A clown!

_diyar 5 days ago||||

in casual use you might also be able to guess it from context, so i think it’s a wash

vova_hn2 5 days ago||||

> A tangent: writing distractors for multiple choice questions is hard.

In case of online quiz you can have a "competition" between distractors:

1. start by having much more distractors than needed and pick randomly

2. for each measure the probability of it getting clicked (clicks/times it's shown)

3. show the most frequently clicked distractors more often

RicoElectrico 5 days ago||

Yeah, as I researched the topic of multiple choice exam design, seems the rule of thumb is to reject outright any distractors that are chosen by less than 5% of test takers.

onionisafruit 5 days ago||||

It would have been nice to have an “i don’t know” button. Instead I decided to select the first option for words I didn’t know instead of trying to figure them out. Although when I got to the final group I couldn’t resist trying to figure them out. It estimated 61k for me.

scubbo 5 days ago|||

Indeed. "Lethargic" meaning "affected by lethargy" would hardly be difficult to guess!

mpeg 5 days ago|||

It'd also be a lot less awkward to go through 100 words if it had keyboard shortcuts (1-4 for the words, enter to submit) and if they fixed the layout shift jank

goodmythical 5 days ago||

wouldn't even let me tab to sumbit, you had to click, tab through each following option, then to submit, but then you had to tab again to confirm the submission!

mdnahas 2 days ago|||

Another suggestion: update an estimate of words known after every word. E.g. start out with a range of 0 to 170,000. Then after the first word, show 100 to 170,000. Etc.

Users get drops of information and can stop whenever they feel. I stopped well before 100.

vova_hn2 5 days ago||

> I suggest skipping the submit button and just showing it's correct when pressing and moving on after a sec or so.

Having an answer counted as incorrect, just because I've accidentally touched the screen of the phone? I would absolutely hate that.

EtaoinWu 5 days ago||

It is quite easy to cheese the problems: many of them don't look like word definitions ("a sharp pain in the back"), many problem have this "correct answer + opposite meaning + 2 unrelated things" answer structure, and for the second half of the answers, very often the longest answer is the correct one. The wrong options are not well designed here.

The sample of words is also heavily biased towards concepts relating to words, speech, speakers, and/or persuation. They are likely generated by an LLM which is primed on the task of choosing words, and end up choosing words related to "words".

For context, I'm an L2 speaker, linguistic nerd, and I use English mostly in academic/professional settings. I got 75,400 by a combination of the tactics above; in reality it might be closer to 10-15k.

The design is also painfully similar to Duolingo if anyone can spot that.

emil-lp 5 days ago||

Also, every alternative containing a semicolon was the correct one.

microtherion 5 days ago|||

> many of them don't look like word definitions ("a sharp pain in the back”)

I had to look up the English word (lumbago), but German has the colorful “Hexenschuss” (witch shot). I suspect most people above a certain age can relate to there being a word for this in most languages.

da_grift_shift 5 days ago||

>The design is also painfully similar to Duolingo if anyone can spot that.

Yeah. Clocked it from the landing page.

rout39574 5 days ago||

It should be possible to respond "I don't know". When you really-really don't know, it's unfair to get a 1/4 chance at right anyway, or even better if you use routine multiple-choice tactics.

I got credit for a few that I would have happily just missed.

dktp 5 days ago||

Agreed

I did the full 100. It's not even 1/4, with the harder ones when one description is significantly longer than others, it's the correct one. Even outside that 2 choices are usually some object - which I think is never the correct answer

I'd also say the toughness should be mixed up a little. The last 30 or so became a slog

Cool idea though!

ngruhn 5 days ago||

Also a lot of questions had "right answer" / "opposite of right answer" pairs. Just by identifying those you get to 50% probability.

supermdguy 5 days ago|||

Agreed, there were also a few where I deduced the correct definition by comparing the options.

throwaway82931 5 days ago||

Yeah, I way overperformed on this test because it was multiple choice. There were 11 words I didn't know at all, and another 8 where I was uncertain to varying degrees. My score of 99/100 does not reflect my actual ability. Even the one I got wrong was a misclick.

gerdesj 5 days ago||

Miss-click!

I managed a paltry 90/100. Some of those words require a classical education and probably a British one at that. I studied Latin at two posh schools and have O level English Language and Literature (that's two qualis at age 16).

I'm pretty well read and know exactly who Sandi and Stephen are. Ironically Sandi is Danish but notably erudite (that turned up for me) and navigates her way around English with remarkable aplomb.

e1ghtSpace 5 days ago|||

Yeah it would just be easier and faster to have a Yes/No selection for each word and you just say whether you know the definition or not. That way you can blaze through all 100. Having keyboard shortcuts for each selection would help.

tengwar2 5 days ago||

It's probably more meaningful to force a guess, since you may guess on the basis of word elements that you do know. At worst, it's possible to compensate for a 25% chance of getting the right word by chance.

nickcw 5 days ago||

I have a copy of the shorter Oxford English Dictionary from 1970 which I inherited. It is two massive volumes and is only shorter in comparison to the full dictionary which is 12 volumes (more in more modern editions).

My shorter OED contains 163,000 words (compared to the 600,000 words of the longer).

According to this site I know 71,000 words... Let's test that against the OED. I should have about 43% chance if knowing a word picked at random.

In my totally scientific test (ha) I chose 50 words at random from the OED and discovered I knew 29 of them for a score of 58% which is more than two sigma from 43%, this disproving the hypothesis.

I forgot what that was now, but it was a fun experiment.

pclmulqdq 5 days ago||

I also got something around 70-80k with 95/100 correct words (I don't know or use most of these words, but the later sections have a lot of words with Greek or Latin origin, which made them easy to guess). One of my wrong words was a misclick in the first section, which I think dragged down the estimate quite a lot. You may have done something similar. I assume they use a simple formula where early misses cost you a lot and late misses cost you very little.

curuinor 5 days ago|||

can't assume gaussian underlying distribution of the word-knowing, it's known zipfian. so you can't be doing anovas or anything of that nature because if you look up zipfian distribution's variance, you get Nature and Reality giving you the middle finger

soVeryTired 5 days ago|||

No way is vocab size zipfian. Word counts from a corpus follow zipf's law, but not vocab sizes themselves.

Otherwise the most common vocab size would be equal to one.

dgacmu 5 days ago||||

I think you mean it's lognormal, at least if we're discussing native English speakers or comparing those with similar amounts of exposure to the language.

(The median English speaker almost certainly knows several thousand words, or word stems to avoid duplication. But the number who know all words in the tail is exceptionally small.)

montag 5 days ago|||

Not to mention, N=1

srean 5 days ago|||

Neat way to validate.

Your method of sampling could be improved further, unfortunately at the expense of ease of use. If the dictionary was sorted according to difficulty, then you could use stratified sampling.

I comment on the related aspects here.

https://news.ycombinator.com/item?id=48599769

notsylver 5 days ago||

It seems like the right answer is usually the longest of the choices, I managed to get a few just by picking the longest. It would also be nice if there was a "I don't know" instead of guessing and skewing the results by getting it right, though maybe thats accounted for

latexr 5 days ago||

> It seems like the right answer is usually the longest of the choices

You are correct. I tested that hypothesis about a dozen times and it seems that if you always pick the longest you’ll get it right somewhere in the high 70s to mid 80s. For anyone interested in testing for themselves, open the website to the first question then run this in the console (not going to spend time optimising it, it works well enough for the purpose):

  let loopCount = 0

  const loop = setInterval(() => {
    Array.from(document.querySelectorAll("button")).slice(0, 4).reduce((long, curr) => curr.textContent.length > long.textContent.length ? curr : long).click()
    setTimeout(() => Array.from(document.querySelectorAll("button")).at(-1).click(), 100)
    setTimeout(() => Array.from(document.querySelectorAll("button")).at(-1).click(), 200)

    loopCount++
    if (loopCount === 100) clearInterval(loop)
  }, 500)

earthpyy 5 days ago|||

Wow! great reproduce with less effort!

libertyit 5 days ago|||

cool

orrito 5 days ago|||

These were likely all AI generated, or at least the alternatives were. I made an app a while ago as well, and afterwards realized AI often wanted to make a very covering answer for the correct one, making it often longer than the others, thus defeating the idea of the quiz in the process.

EstanislaoStan 5 days ago||

Yeah this is AI slop I don't like..

jwrallie 5 days ago|||

Usually there were two answers that sounded like the word If read by someone unfamiliar, those were short, then either one or two long versions.

If one long versions you choose that, if two, then you choose the one that would be more useful to have a word assigned to it.

thenthenthen 5 days ago|||

Also surprisingly mostly the forst or last option (might be bias)

thenthenthen 5 days ago||

Hahahhaha i got 62k points by just choosing the longest definitions. Great observation!

vova_hn2 5 days ago||

Got 59,800, Performance Breakdown:

Core Basics 19/20

Intermediate 17/20

Advanced 19/20

Expert 14/20

Grandmaster 12/20

I guess, it's not too bad for a non-native speaker.

Minor feedback:

1. The correct answer for "Lethargic" is "Affected by lethargy". I think, definitions should not use words that share common root with the defined word, because:

a. it makes guessing too easy

b. it basically becomes a circular definition which is meaningless

2. Options almost always include 1 correct answer, 1 direct opposite and 2 completely random. Once you learn to recognise it, you can easily rule out 2 random options and have a 50/50 guess.

siegecraft 5 days ago||

I also felt the definition of lethargic was kind of silly, especially since I had already gotten lethargy as a word in tier 1.

firebot 4 days ago||

I scored slightly better than you. I missed 3 expert and 8 grandmaster...

It only pushed my score up to 65k.

SXX 5 days ago|

Not that I want to cheat in such a game, but for many words everything but correct definition is shorter or follow some "dumb rpg text" template.

Like if author used LLM to generate wrong definitions per word instead of actually mixing definitions of words.

Like for me most of more complex words been adjectives with few nouns. And in many cases you can just see 2/4 or 3/4 definitions are not for adjective.

SXX 5 days ago||

I feel like it make sense to just mix up definitions of different adjectives if it's adjective you looking at. With just little filtering to make sure you don't see repeatative definition options in different test words.

margalabargala 5 days ago|||

> Like if author used LLM to generate wrong definitions per word instead of actually mixing definitions of words.

Yes, exactly like this.

andrewflnr 5 days ago||

I was actually kind of impressed with how many of them didn't fall into that trap, but where all the options were roughly the same length and format. (For sure, a couple of the others were BS.)

More comments...