Posted by robin_reala 6 hours ago
Every time I open the app I feel like I'm back in the era of Mac OS X Snow Leopard and Steve Jobs is about to reveal one more thing.
Ohhh, in NNW it goes via the FreshRSS. I had no idea, cheers. I've been using just iCloud sync fairly successfully.
We need more software that is free, open source and comes with no subscriptions.
I haven't seen a newsreader solve that problem. Has anyone tried an LLM?
The best solution I know is grouping redundant stories together, possibly hierarchically: e.g., Sports > Olympics > Figure skating > Jones performance. (Fewer feeds require fewer levels, possibly just one.)
That ~ deduplicates the stories and, by displaying them together, you can compare and choose the coverage you like and delete the rest. Otherwise, IME most user time is spent sorting through redundant stories one at a time.
But as I said, I haven't seen a newsreader do that well. It seems like a good fit for LLMs. Or maybe there's another solution besides grouping?
For duplicate detection I am using DBSCAN
https://scikit-learn.org/stable/modules/generated/sklearn.cl...
and found some parameters where I get almost no false positives but a lot of duplicates get missed when I lowered the threshold to make clusters I started getting false positives fast. I don't find duplicates are a big problem in my system with the 110 feeds I have and the subjects I am interested in, but insofar as they are a problem there tend to be structured relationships between articles: that is, site A syndicates articles from site B but for some reason articles from site A usually get selected and site B articles don't. An article from Site A often links to one or more articles, often that I don't have a feed for, and it would be nice if the system looked at the whole constellation. Stuff like that.
Effective clustering is the really interesting technology Google News has had for a long time.
Edit: I just looked around for your YOShInOn RSS reader code and couldn't find it. I did find a number of references it looks like you've made to it on various forums, etc over the years.
You mean the k-means for diversity or DBSCAN for duplicates? Either way it is about 10 lines of scikit-learn code. Send me an email.
Nuzzle did something similar for Twitter but shut down (https://daringfireball.net/linked/2021/05/05/nuzzel).
That would be a good addition to feed readers, especially for news feeds.
You specify your interests as free form text, it ranks articles by how closely they match, and you can consume your Scour feed as an RSS feed to read it in NNW.
Disclaimer: I’m the developer