How many of the 170k English words do you know?

Posted by abnry 6 days ago

How many of the 170k English words do you know?(vocabowl-870366514258.us-west1.run.app)

501 points | 554 commentspage 3

salamo 5 days ago|

An alternative algorithm which would probably converge faster than 100 questions would be something like Elo or Glicko 2.

A word's "difficulty" would be some function of how rare it is. Once you have a reasonable estimate of the user's "skill" you can infer that a user won't know more difficult words. The benefit of this is you're not spending time asking the user about words they probably know.

Of course it's possible at an individual level, difficulty does not monotonically increase as a function of how rare the word is. A person might be very familiar with a domain-specific subset of English. But the "stratified sampling" approach will also have this problem.

There is a similar problem in chess, where players have ratings which really only change on one dimension. So there can theoretically be a mismatch when puzzles are also scored on a single axis, since a "harder" puzzle that contains a motif a player is familiar with will actually be easier for the player.

cl3misch 5 days ago||

I like it a lot, but unfortunately you can cheat a bit: there are always two opposite answers and two unrelated ones. The correct answer is (almost?) always one of the opposites.

brookman64k 5 days ago||

At first I noticed that for many questions two or three of the answers are obviously wrong. So in many cases the correct answer can be guesses easily. But then I noticed that in 90% of the cases the correct answer is the longest of the four. This makes guessing even easier. The whole thing has a vibe-slopped feel to it.

gumboshoes 5 days ago||

The 171,476 figure from OED is used inaccurately in a way that shows a gross misunderstanding of dictionaries and language. The number 171,476 refers to the number of full entries for words in “current use” as defined in the 20-volume Second Edition of the Oxford English Dictionary (OED). It does not represent words. It also does not include all the OED's variant spellings, inflected forms, phrases or run-ons (sub-entries derived from the main entries). Additionally, the OED is by no means a complete inventory of English. In fact, it's probably millions of words short, especially as it has an incredibly slow update cycle. Source: I am a dictionary editor and lexicographer, use OED daily, and know the people who make it.

stymaar 5 days ago||

Interesting choice of words I'd say: as a French person this test is pretty much a test about “how close is the English word to the original French meaning” as the test was almost devoid of obscure words of Germanic origin.

At least I learned a bunch of «faux-amis» in the process.

RugnirViking 5 days ago|

In general the observation in English is that most words that are close to what medival lower classes did everyday (tree, cow, house, stool): (træ, ko, hus, stol) are of danish/norse origin, and those from French are related to what the upper class did (arbory, beef, mansion, chair): (arbor, bœuf, maison, chaise)

So not surprising perhaps that many of the more obscure words end up being french.

stymaar 5 days ago|||

> So not surprising perhaps that many of the more obscure words end up being french.

Of course, for a native speaker at least, but for people with English as a second language there are many lower-class words that we never encountered before, because they simply don't occur in books or in online discussions. I got 88 correct out of 100 in this list but I'm almost certain I'd have faired much worse had the list been about niche house or agricultural items.

What counts as "obscure" is highly context dependent.

gpvos 5 days ago|||

Some of the most obscure English words are essentially Dutch though. There was a test online at some point (see my other comment) that was quite hard for me but got easier at the very highest levels.

goldenarm 6 days ago||

It's hilarious that most of these words are French

wongarsu 6 days ago||

English has this weird dichotomy where most of the words in a typical sentence are Germanic, while most of the words in the dictionary are French.

Fun fact: according to a quick count by AI using web search, the previous sentence contains 21 words of Germanic origin, 2 of Latin origin, 2 of Greek origin and 1 of French origin. Also the etymology of the word Germanic is Latin, while that of the word French is Germanic

smitty1e 5 days ago||

Yes, English is a post-Hastings collision between Norman French and Anglo Saxon.

rhdunn 6 days ago|||

Norman French due to the Norman invasion of 1066 resulting in Old English evolving into Middle English. You can see that in the words for animals vs meats (cow and boef/beef, sheep and mutton, etc.) where the Germanic people raised the sheep and the Norman aristocracy ate them.

A lot of the more common and simpler words are Germanic, as is the grammar (e.g. compound words like cupboard).

the_lonely_phon 6 days ago|||

Depends is bratwurst a German word or an English one? You will hard pressed to find an American that doesn’t know thr word and what it means. You can buy them at just about any grocery store and they are a staple of many restaurants.

At some point the word becomes both. Sourced from its mother language and maybe even still meaning the same thing in both, but no less an English word than any other at this point.

nairboon 5 days ago|||

Bratwurst is still a German word. It doesn't become English just because it's used by native English speakers. If you start to tweak it a bit, it could become an English word. Like "fish" vs. "Fisch" in German. Or "good" vs. "gut" in German.

mordechai9000 6 days ago||||

It also had "weltschmerz" in the list, but I think I have only ever heard "ennui" used in English. They are both foreign words, but I would not have thought of weltschmerz as a loan word. Then again, maybe I am not reading the right texts.

jongjong 5 days ago|||

True. I'm French native but my English is better (educated in Australia) so this created a weird situation for me where I got 14/20 for advanced words and 19/20 for expert words.

To be fair, I think I messed up a few advanced words by accident but I think the general pattern would hold because many of the expert level words seemed to have French root. So it felt like it got easier towards the end for me. Grandmaster words were a bit weirder on the whole.

I'm an engineer and read mostly non-fiction so this probably explains the gap too.

graemep 6 days ago|||

They are not. Quite a few have Latin roots and the like that corresponding French words share.

pessimizer 6 days ago||

Approximately 0.0% of those came into English through Latin, while around 100% came through Norman French.

grey-area 5 days ago|||

Latin was commonly spoken amongst the educated at one time (served as a lingua franca across Europe) and used for religion and scientific discourse for even longer.

gpvos 5 days ago|||

That depends on when and how they entered English. A lot of scientific vocab was taken directly from Latin and Greek.

I_am_tiberius 6 days ago|||

French english speakers usually have a quite good vocabulary. Getting to the point of speaking english is a milestone that's quite difficult for french speakers though.

triceratops 5 days ago|||

English is the PHP of human languages.

GeoAtreides 5 days ago|||

I'm not sure PHP deserved that...

goodpoint 5 days ago|||

That's harsh, English is a mess but PHP is worse.

gpvos 5 days ago||

Modern PHP, while not great, is mostly okay really. Old PHP I'd rate as worse than English.

classified 6 days ago||

English also has a ridiculously high fraction of Latin too.

pessimizer 6 days ago||

Not from Latin but through French - the direct use of Latin in English is generally restricted to technical jargon and legal terms (that English often also share with the French.)

Latin isn't really any sort of parent to Old English afaik, even though the Romans ran Britain for a while.

jillesvangurp 5 days ago|||

And French in turn was influenced a lot from Latin. Which means a lot of the French loan words have their origin in Latin. And of course Latin is actively used to this day in the Catholic church and the Church of England. Latin was widely used for written communication for quite some time. Most people couldn't read or write. But that impacted a lot of religious, scientific, legal, etc. communication and words. English also has a lot of loan words from other languages. Lots of nautical terms have an obvious Dutch origin, for example.

gpvos 5 days ago||

It's not just influence: French descended from Vulgar Latin, with a lot of influence first from the Gaulish (Celtic) substrate and then from the Frankish (Germanic) conquerors.

zulux 6 days ago|||

In order to stunt on the pors, English borrowed a fair amount of Latin and Greek directly - especially in law, philosophy, and the sciences.

alun 5 days ago||

Nice! Some feedback: The score it shows doesn't really mean anything to me. I think it would be more interesting for the user to know how they rank (perhaps in percentile terms) relative to the overall english-speaking population and/or relative to other users on the site

montag 5 days ago|

You’ll have to ask quiz takers for their SAT/ACT scores to estimate the (probably extreme) sample bias

rcfox 5 days ago||

Interesting that this showed up here now. I did it a week ago after hearing about it on The Rest Is Science. https://www.youtube.com/watch?v=9t-5lQ2mzuw

abnry 5 days ago|

This is where I got the link from.

tgv 5 days ago||

A common pattern is the word's true definition and its opposite, plus two mostly unrelated meanings. So, when in doubt, you can improve your changes by picking one of the opposing pair. That's a bit of short-coming.

gpvos 5 days ago|

78.000 (-2 advanced, -3 grandmaster), pretty good for a second language; the test's maximum appears to be 85.000.

The alternatives to choose between appear to be LLM-generated, you can see several patterns ("now" and "forever" appear a lot).

Years ago, I used to play a similar game that you could keep playing and where you levelled up when you had enough words correct in a row, or down for a single mistake. A fun thing about it was that at very high levels, it got easier for me because they mixed in some old English words which were essentially the same as in Dutch, my native language. There was a charity aspect to it as well, I think it was https://freerice.com/ , but they seem to have simplified the game now.

The university of Ghent (Belgium) also used to have an interesting test which rated your proficiency according to average scores at certain education levels. There I got 41.000 (IIRC), which was rated as average for a university-level native English speaker. An update at the bottom of https://languagehat.com/ghent-vocabulary-test/ discusses where that test went and has a few alternatives. Edit: https://www.myvocab.info/en is pretty similar to this test (found in another comment).

More comments...