Posted by david927 11/9/2025
Ask HN: What Are You Working On? (Nov 2025)
V happy with how the CSS came out, except I spent a lot of time on an "ink bleed" newsprint effect that (oops) only looks good on HiDPI monitors... lessons learned I suppose
TL;DR: Expense reports were killing me (and trees). Built my first coding project – a PDF merger that fits multiple receipts per page. Planned to charge "one bike tire
worth" to recoup costs, but decided to make it free after learning so much from the community. [https://ahay.app/](https://ahay.app/)
So far i've got the scraping and embeddings / similarity clustering down (to build timelines of news stories), lots of data cleaning and UI refinement required. I find it hard to make choices, maybe I need a cofounder who can pair up with me. Looking to either monetize news data or build a news analysis / intelligence platform.
(I'm working on basic blog and video aggregators like Planet Python.)
So a paragraph might be good as a 384-dim vector but if you have 1,000 words then you might want a 768-dim embedding (if not higher). Embedding models have slightly better/worse accuracy based on the training data they're fed, but higher dimensionality definitely gives better results - to a great extent. If you have an extensively long piece of text, it's easier to chunk it into pieces and create separate embeddings. You do have to manually stitch them back together and do some cleanup when displaying results but it works.
Once you have embeddings for all your data the rest is just cosine similarity, play around with the min_similarity. You will need to build good indexes on postgres but it is basically all you need.