Posted by grep_it 4/16/2025
don't site comment we here post that users against you're
Quite a stance, man :)
And me clearly inarticulate and less confident than some:
it may but that because or not and even these
I noticed that randomly remembered usernames tend to produce either lots of utility words like the above, or very few of them. Interestingly, it doesn't really correlate with my overall impression about them.
- aaronsw and jedberg share danielweber
- aronsw and jedberg share wccrawford
- aaronsw and pg share Natsu
- aaronsw and pg share mcphageWell, and worked a lot with americans over text based communication...
https://scikit-learn.org/stable/modules/generated/sklearn.ma...
I think other methods are more fashionable today
https://scikit-learn.org/stable/modules/manifold.html
particularly multi-dimension scaling, but personally I think tSNE plots are less pathological (they don't have as many of these crazy cusps that make me think it's projecting down from a higher-dimensional surface which is near-parallel to the page)
After processing documents with BERT I really like the clusters generated by the simple and old k-Means algorithm
https://scikit-learn.org/stable/modules/generated/sklearn.cl...
It has the problem that it always finds 20 clusters if you set k=20 and a cluster which really oughta be one big cluster might get treated as three little clusters but the clusters I get from it reflect the way I see things.
You have three points nearby, and a fourth a bit more distant. 4 best match is 1, but 1 best match is 2 and 3.
redis-cli -3 VSIM hn_fingerprint ELE pg WITHSCORES | grep montrose
montrose 0.8640020787715912
redis-cli -3 VSIM hn_fingerprint ELE montrose WITHSCORES | grep pg
pg 0.8639097809791565
So why cosine similarity is commutative, the quantization steps lead to a small different result. But the difference is .000092 that is in practical terms not important. Redis can use non quantized vectors using the NOQUANT option in VADD, but this will make the vectors elements using 4 bytes per component: given that the recall difference is minimal, it is almost always not worth it.