Posted by simedw 22 hours ago
A natural next step could be doing things with multiple "tabs" at once, e.g: tab 1 contains news outlet A's coverage of a story, tab 2 has outlet B's coverage, tab 3 has Wikipedia; summarize and provide references. I guess the problem at that point is whether the underlying model can support this type of workflow, which doesn't really seem to be the case even with SOTA models.
But if we do it, we have to admit something hilarious: we will soon be using AI to convert text provided by the website creator into elaborate web experiences, which end users will strip away before consuming it in a form very close to what the creator wrote down in the first place (this is already happening with beautifully worded emails that start with "I hope this email finds you well").
I think this is basically what https://ground.news/ does.
(I'm not affiliated with them; just saw them in the sponsorship section of a Kurzgesagt video the other day and figured they're doing the thing you described +/- UI differences.)
I was thinking of showing multiple tabs/views at the same time, but only from the same source.
Maybe we could have one tab with the original content optimised for cli viewing, and another tab just doing fact checking (can ground it with google search or brave). Would be a fun experiment.
(The fact that browsers nowadays are usually expected to represent something "pixel-perfect" to everyone with similar devices is utterly against the original intention.)
Yet the original idea was (due to the state of technical possibilities) primarily about design and interactivity. The fact that we now have tools to extend this concept to core language and content processing is… huge.
It seems we're approaching the moment when our individual personal agent, when asked about a new page, will tell us:
Well, there's nothing new of interest for you, frankly:
All information presented there was present on pages visited recently.
-- or --
You've already learned everything mentioned there. (*)
Here's a brief summary: …
(Do you want to dig deeper, see the content verbatim, or anything else?)
Because its "browsing history" will also contain a notion of what we "know" from chats or what we had previously marked as "known".Or that I’m looking up a data point that I already actually know, just because I want to provide a citation.
But, it could be interesting.
> Or that I’m looking up a data point that I already actually know, just because I want to provide a citation.
Or what were know has changed.When I was a child we knew that the North Star consisted of five suns. Now we know that it is only three suns, and through them we can see another two background stars that are not gravitationally bound to the three suns of the Polaris system.
Maybe in my grandchildren lifetimes we'll know something else about the system.
Or (and this is actually doable absolutely without any "AI" at all):
What the bloody hell actually newly appeared on this particular URL since my last visit?
(There is one page nearby that would be quite unusable for me, had I not a crude userscript aid for this particular purpose. But I can imagine having a digest about "What's new here?" / "Noteworthy responses?" would be way better.)For the "I need to cite this source", naturally, you would want the "verbatim" view without any amendments anyway. Also probably before sharing / directing someone to the resource, looking at the "true form" would be still pretty necessary.
So, you gonna “put on those sunglasses, or start chewing on that trashcan?” It’s a distinction without a difference!
For this to work like a user would want, the model would have to be sentient.
But you could try to get there with current models, it'd just be very untrustworthy to the point of being pointless beyond a novelty
Naturally, »nothing new of interest for you« here is indeed just a proxy for »does not involve any significant concept that you haven't previously expressed knowledge about« (or how to put it), what seems pretty doable, provided that contract of "expressing knowledge about something" had been made beforehand.
Let's say that all pages you have ever bookmarked you have really grokked (yes, a stretch, no "read it later" here) - then your personal model would be able to (again, figuratively) "make qualified guess" about your knowledge. Or some kind of tag that you could add to any browsing history entry, or fragment, indicating "I understand this". Or set the agent up to quiz you when leaving a page (that would be brutal). Or … I think you got the gist now.
You should also have some way for the LLM to indicate there is no useful output because perhaps the page is supposed to be a SPA. This would force you to execute Javascript to render that particular page though
I think the primary reason I use multiple tabs but _especially_ multiple splits is to show content from various sources. Obviously this is different that a terminal context, as I usually have figma or api docs in one split and the dev server on the other.
Still, being able to have textual content from multiple sources visible or quickly accessible would probably be helpful for a number of users
almost unrelated, but you can also compare spegel to https://www.brow.sh/
LLMs are specifically good at a task like this because they can extract content from any webpage, regardless of it supports whatever standard that no one implements
Just hitting keywords for search? Many of them don't even have ads so I feel like that can't be it. Maybe referrals?
This is a requirement? I literally only browse the web with an ad blocker but I always assumed those sites had tons of ads.
I should have caught that, and there are probably other bugs too waiting to be found. That said, it's still a great recipe.
On a more pleasant topic the original recipe sounds delicious, I may give it a try when the weather cools off a little.
Edit: just saw the author's comment, I think I'm looking at the fixed page
My #1 usecase is fetching wikis on my hard drive and letting a local coding agent use it for creating plans.
Using a big cloud provider for this is madness.
Is it though, when the LLM might mutate the recipe unpredictably? I can't believe people trust probabilistic software for cases that cannot tolerate error.
Seems like most of the usual food blog plugins use it, because it allows search engines to report calories and star ratings without having to rely on a fuzzy parser. So while the experience sucks for users, search engines use the structured data to show carousels with overviews, calorie totals and stuff like that.
https://recipecard.io/blog/how-to-add-recipe-structured-data...
https://developers.google.com/search/docs/guides/intro-struc...
EDIT: Sure enough, if you look at the OPs recipe example, the schema is in the source. So for certain examples, you would probably be better off having the LLM identify that it's a recipe website (or other semantic content), extract the schema from the header and then parse/render it deterministically. This seems like one of those context-dependent things: getting an LLM to turn a bunch of JSON into markdown is fairly reliable. Getting it to extract that from an entire HTML page is potentially to clutter the context, but you could separate the two and have one agent summarise any of the steps in the blog that might be pertinent.
{"@context":"https://schema.org/","@type":"Recipe","name":"Slowly Braised Lamb Ragu ...
Also like, forget amounts, cook times are super important and not always intuitive. If you screw them up you have to throw out all your work and order take out.
And yes cook times are important but no, even for a human-written recipe you need the intuition to apply adjustments. A recipe might be written presuming a powerful gas burner but you have a cheap underpowered electric. Or the recipe asks for a convection oven but your oven doesn't have the feature. Or the recipe presumes a 1100W microwave but you have a 1600W one. You stand by the food while it cooks. You use a food thermometer if needed.
For one an AI generated recipe could be something that no human could possibly like, whereas the human recipe comes with at least one recommendation (assuming good faith on the source, which you're doing anyway LLM or not).
Also an LLM may generate things that are downright inedible or even toxic, though the latter is probably unlikely even if possible.
I personally would never want to spend roughly an hour or so making bad food from a hallucinated recipe wasting my ingredients in the process, when I could have spent at most 2 extra minutes scrolling down to find the recommended recipe to avoid those issues. But to each their own I guess.
I think the ad blocker of the future will be a local LLM, small and efficient. Want to sort your timeline chronologically? Or want a different UI? Want some things removed, and others promoted? Hide low quality comments in a thread? All are possible with LLM in the middle, in either agent or proxy mode.
I bet this will be unpleasant for advertisers.
But I feel it doesn't solve the main issue of terminal-based web browsing. Displaying HTML in the terminal is often kind of ugly and css-based fanciness does not work at all, but that can usually just be ignored. The main problem is javascript and dynamic content, which this approach just ignores.
So no real step forward for cli web browsing, imo.
In theory this could be used for ad blocking; though more expensive and less efficient, but the idea is there.
So, it is a very curious idea, but we still have to find an appropriate use case.
They are pretty great at converting data between formats, but I always worry there's a small chance it changes the actual data in the output in some small but misleading way.