Posted by azath92 2 days ago
Our goal was to build a tool that allowed us to test a range of "personal contexts" on a very focused everyday use case for us, reading HN!
We are exploring use of personal context with LLMs, specifically what kind of data, how much, and with how much additional effort on the user’s part was needed to get decent results. The test tool was a bit of fun on its own so we re-skinned it and decided to post it here.
First time posting anything on HN but folks at work encouraged me to drop a link. Keen on feedback or other interesting projects thinking about bootstrapping personal context for LLM workflows!
The tension we have been finding is that we dont want to require people to "know how to prompt" to get value out of having a profile, hence our ongoing thinking around how to bootstrap good personal profiles from various data sources.
As Koomen notes, a good profile feels like it could be the best weapon against "AI slop" in a case I want something sharp and specific. But getting to that requires knowing how to prompt most of the time.
edit: ooh, I see what the swiping did:
## Analysis of user's tech interest: The user demonstrates a strong interest in advanced technical topics, particularly in the realm of artificial intelligence, machine learning, and low-level systems programming/security (e.g., kernel exploitation). They are drawn to articles that involve practical application, model creation, and deep dives into complex technical architectures. Their interest in "Show HN" articles suggests an appreciation for new, innovative projects, especially those with a technical or AI focus. They show less interest in general hardware announcements (like new microcontrollers), historical tech accounts, or very niche, non-AI/ML/security-related programming topics.
Yeah, that's pretty much spot on. Wonder if there's a way to match that against the topics I actually commented on, but at a glance it's pretty cool!
Other than quality of life stuff (multiple pages for example), I'd like to see it continually learn.
A few things got miscategorized and I'd love for it to naturally correct that with additional input from me.
The idea of having some kind of thumbs up/down on what you see after getting recs, that gets added to your preferences, or being able to do another round of preferences (rather than just re-doing them like we have now) is for sure on our next steps if we continue with. Were not quite sure what the feedback loops will be yet (we did look at adding whole webhistory for example but that felt like a bit much and pretty invasive).
For the miscats, on a meta level what we are generally interested in is whether they come from compression of the preferences into your user profile (essentially if more or better data is the path to better context for such a specific usecase, or whether there is more bang for buck optimizing the various prompts. Keen to hear if its obvious from looking at your profile what was the case.
If we get serious with this evals are a must next step. We are only 2 days in at the moment :)
In my case, none of the topics I most like to read about and discuss on HN (package management, software freedom, next-gen CLI tools, next-gen shells, philosophy, desktop Linux, functional programming, hacker history, literate programming, Emacs, bitching about common development practices, programming language design, configuration languages) managed to appear in the 30-post sample I used. The profile it wrote for me was pretty good considering that, but definitely not great.
The assessment was also mistaken about my degree of interest in "low level" technical details like binary file formats (in fact it's rather low, although it has gradually increased over time), and my degree of interest in theoretical computer science issues (in fact it's high, but all of the theoretical papers in the sample were about machine learning, which was not an area of academic focus for me).
I do really like the simplicity and customizability of this (exposing the profile as Markdown and making it editable is awesome), and the quality of the results is very good given the tiny input size. But if your primary interests are not super aligned with the mainstream on HN, you won't get a chance to demonstrate that you like them. If users could type a few terms to say what their biggest interests are before running through the samples, this could work even better for people like me.
It would also be interesting if this could work based on article contents and not just headlines. Sometimes I open something and close it immediately, or I open it undecided as to whether I will skim or read closely.
In fact I would pose that I have a couple of disparate interests or "profiles" that i would like to have greater control over/support in generating, that are non overlapping sets of topics and types of content. The ability to have greater agency in creating them and managing them is something we are keen to explore.
The article comments one is a toughie, as LLM use skyrockets when you scrape and consume content from the links. It would be awesome to include it, but would likely need to be paid, just from a cost perspective.
Really appreciate the detail here, this makes it easier to turn your examples into a test/eval/feature case.
This sounds like a great feature! My appetites for different clusters content certainly vary according to my mood! Perhaps "mood" would actually be a cute-but-clear name for such distinct/multiple profiles. :)
> The article comments one is a toughie, as LLM use skyrockets when you scrape and consume content from the links. It would be awesome to include it, but would likely need to be paid, just from a cost perspective.
Hm. That is a good (and in retrospect, obvious) point. If it makes the feed a lot better, I think it could certainly be worth it for some users. If it only makes a small difference, maybe not. It might be interesting for you to experiment and write about, since what kind of difference it will make isn't obvious (at least to me) up front.
We will have to do some combo of much more internal testing, construct evals, or just capture more info about peoples usage coupled with an ability to provide feedback in order to even get a handle of such a nuanced thing as "good" with a tool like this. Likely info capture and user feedback would be a first port of call for a substantive change, internal testing is always ongoing, but such a low sample size.
Also I know that depending on the days / weeks / mood I will want to read different content from HN, so I guess there should still be like 30% of "random articles" in each category just to create some noise
The generated page was really off for me — I had read most of the posts it ranked, at least a little, and most recommended as skips were some of my favorite recent submissions, and vice versa with dives.
On the other hand I'm not sure I'd want to use something like this much as something I like about HN are the pleasant surprises. Maybe as a side page or something if I were really in a rush?
We played around with the idea of a "fun" or "random" category, but ultimately didn't include it in this little first demo, as we found it super hard to have it not be just literally random (although that might not be a bad thing as you say)
On the topic of different moods and headspaces, thats one of the things more broadly we are really thinking about outside of this demo, and hadn't really considered for here but should. What different data can we use (in this case maybe just a different survey for different "profiles"), and how can a user manage those different profiles and front pages will be questions to answer.
Id be really interested to know if anyone has done topic grouped or themed frontpages for hackernews, as this would map well to that concept. ill have a look.
More generally a next feature we want for ourselves is a way to add just some generic text and "update" the profile with that, rather than generate it fresh exclusively off of the 30 examples. This circles back to us using this as a focus point to think about what data is enough to generate a good user profile, and what good is.
I had an expectation that it'd go through posts and give me stuff i'd be interested in. Like here's 25 posts that would be interesting?
Only frontpage? no second page? No sort by new, which is my preferred.
When weve been testing things, we often find that if there wasnt a great match between the options when picking preferences, and whats currently on the front page, that the context it generates will result in a lot of skips (understandably, but not great UX). Right now can try regenerating your context (and going through the process again), or manually editing it to get to different results.
Theres also some work for us to better select the options when picking preferences, or ensuring we always surface some deep dives.
Applying the same process to more pages, or bubling up content from multiple pages, or new is a great idea. cool to hear thats where you would look.
Would you be willing to share some more of the architecture/tech stack?
On the LLM side of things we are using Gemini 2.5 flash, mostly for speed, and found it to be reasonably good quality at a vibe level compared to something heavier like claude 4, probably because we've worked hard to keep the task very simple and explicit. But in saying that there are a bunch of comments on quality that really highlight that if we want to get serious about that we should put in some user feedback loops and evals.
Its all in JS/TS, using vercel ai for the LLM calls, storage is local, but in order to really dig into quality we might start saving things, but to do that well we'd have to add auth/users etc. and we wanted to keep it light for a demo. We have been recently exploring langfuse for tracing, and are really liking that, and will probably look at using them for first pass evals when we get to it for this project.
We also talked quite a bit about non-LLM recsys and aside from time to set up and do well, something I really like is the sense of transparency and agency. you can see your profile, and edit it if you like to see the change in your results. I almost think wed lean further into that rather than folding in some trad DS or recsys stuff even if that might make the results better. Just musings at this point though.
For richer data to build a profile its something we look at a bunch for other projects, which could get folded in here if we decide to make it more persistent.