Top
Best
New

Posted by wyattsell 2 days ago

Graphing how the 10k* most common English words define each other(wyattsell.com)
78 points | 20 comments
MrDrDr 32 minutes ago|
I remember thinking about this when the semantic web was first being discussed. If you think of it from the perceptive of a child, your first 'foundational' words are learned though direct experience. Then while you continue to learn words this way, we can also use those words we 'know' to define secondary or tertiary terms that we have no direct experience of. I'd like to see a graph like this with someones take on the minimum number of necessary foundational words and how that graph would look.
anigbrowl 5 hours ago||
It's a common problem to get excited about networks, build a large one, and then by stuck with an unapproachable hairball. If you want to explore network structure, consider using tools like quadrilateral simmelian backones which can provide an opinionated look at what matters in the network.
Someone 3 hours ago||
One could also try to use a different set of definitions better suited to such a visualization.

The Oxford Advanced Learner’s dictionary has an appendix called “Defining Vocabulary”. It says:

“In order to make the dictionary definitions easy to understand, we have written them using only the words in the following list.

[…]

Occasionally it has been necessary to use in a definition a word not in the list. When such a word occurs it is shown in SMALL CAPITAL LETTERS.”

I estimate that list has about 3,500 words.

⇒ If you base your network on that dictionary or one carefully constructed like that, the graph could have a central core of about 3,500 nodes with the other words circling around it.

Making a good visualization still would be a challenge, of course.

tomstuart 4 hours ago||
I had to look this up: https://doi.org/10.7155/jgaa.00370
WillAdams 37 minutes ago||
Nice! Reminds me a bit of "WordWeb" which is still around:

https://wordweb.info/free/

which also uses WordNet:

https://en.wikipedia.org/wiki/WordNet

(which this is also using)

which was developed by Princeton w/ DARPA money as an early investigation into AI and so forth.

avidiax 7 hours ago||
If you like this, you would probably enjoy Princeton Wordnet. They have unfortunately stopped developing it.

You can still browse it a bit online with some 3rd party sites: https://en-word.net/

jaen 1 hour ago|
The page literally credits "Open English Wordnet" (based on it) in the sidebar :)

(the link is broken though, it should be https://github.com/globalwordnet/english-wordnet)

reubenmorais 4 hours ago||
This reminds me of the classic "Growing a Language" talk by Guy Steele: https://www.youtube.com/watch?v=_ahvzDzKdB0
sspehr 1 hour ago||
There are some surprises like the word 'r'
breakingcups 2 hours ago||
It seems broken. The word "knows" only connects to the word "operator"
codeflo 2 hours ago|
It's likely that "knows" has no separate definition, but is used in some definition of "operator". If so, then "operator" should probably connect to "know", and "knows" shouldn't appear in the graph at all. But calling that edge case "broken" is a bit harsh, I think.
castral 2 days ago||
It's an interesting visualization for sure, but I don't really know what I can take away from it. Is it useful for something?
h4ch1 2 days ago|
You can look at this as how small sets of a primitive lexicon give rise to a larger, more complex language. At least that's how I interpret it.
rhelz 2 days ago||
Beautiful! Thank you!
theodpHN 2 days ago|
Very neat. What software is being used to construct/display the graph?
wyattsell 2 days ago|
Glad you like it. NetworkX for creating the graph and the layout; then SigmaJS for displaying it.
More comments...