Show HN: Hallucinopedia

Posted by bstrama 3 days ago

Show HN: Hallucinopedia(halupedia.com)

304 points | 266 commentspage 2

ectoloph 2 days ago|

I asked about The IP over Avian Carrier Plague and got a whole history of the Data Dove Delirium too.

https://halupedia.com/the-internet-over-avian-carrier-plague

https://halupedia.com/data-dove-delirium

notenlish 2 days ago||

This is really cool, I just wish people wouldn't deface the website by submitting hateful speech as titles.

ljf 2 days ago|

The 'all articles' section really is a dive into what happens when you allow unfiltered posting - it's a shame that it isn't clear how many individuals are creating this hateful and otherwise inappropriate titles - is it just 1 or 2 people, or has this been posted to 4chan or somewhere and there is a concerted effort to disrupt the site?

Shame there isn't a way to flag pages for removal. I was going to point my kids at this site, and it could be a great learning tool for schools, but not currently something I'd share.

bstrama 2 days ago|||

Interesting idea with flagging. We are considering 2 options: 1. You can generate aricle only if it was previously referenced in previous one 2. Flagging mechanism, now that you brought it up.

Let me know what you think!

Barbing 2 days ago|||

What if you (could quickly)…

manually delete the offensive stuff on the first page of the all page,

replace the All page with a static page with the offensive stuff removed,

and offer a link to the current All page 1, just as it is, at the bottom.

Hope it would make defacing articles at the top of the alphabet sort slightly less attractive.

(Edit: Stumble is impacted? Could use rudimentary tricks to limit stumbling on e.g. religious content, and might consider not detailing the methods used specifically :) )

Jarwain 2 days ago||||

I lean towards a variant of option 1: you can only generate an article that was previously referenced. But arbitrary phrases, clauses, sentences, paragraphs, can be highlighted and used to form a new article.

Yes this may mean that there are pages for common words like "and"

Yes this may mean that there's a page for letters like "x"

Filtering what ends up becoming a hyperlink becomes a problem that I think can be solved with regex/whitelisting

I think articles should have a backlinks drop down. Might make consistency easier As well as generally just plain text search to pull relevant articles or context when generating a new article.

mgmalheiros 2 days ago||||

Perhaps option 1 will be more resilient.

It could be complemented by a "Create" page for starting a new article, filtering bad titles and using a captcha to limit the vandals.

And another captcha for comment posting, which is already spammed, unfortunately.

I think a flagging mechanism will not be able to keep up with mass defacement.

Another suggestion: a daily dump of article titles, their connectivity and creation dates. I would love to visualize the underlying graph and its growth.

Thank you for such nice site!

Dove 2 days ago||||

The obvious thing to me is to ask the AI to notice obviously offensive submissions and transform them along absurdist lines, such that "I-hate-girls" becomes the familiar Wikipedia redirection page saying something like "Archaic expression. See: Eight Grills". Store the redirect, but only index the sanitized page.

notatoad 2 days ago||||

Seems like something the ai could help you with - ask it in the prompt to return an error if the submitted article title doesn’t seem like a whimsical fake encyclopedia article title

NonHyloMorph 2 days ago|||

Reposting my comment from further up in the tread here:

I've seen these antisemitic slurs in the alphabetically sorted entries under numbers starting with 0, next to statementss like this is AI slop.

Hypothesis: this is a targeted, scrupulous and agenticly orchestrated attempt to mark this as a potential "poison well" on behalf of some uncultured, technofeudocratic interests, that hate the arts and hauntology in the spirit of Jorge Luis Borges[1].

The use of antisemitic slurs shares kinship with the "explain in a gay voice" jailbreak. [0] It tries to stigmatise a project rich in artistical potential, to protect the own financial intetests and attempts to transform all human knowledgeworkers into a surplus lumpenproletariat.

Its similar to producers of pharmaceutical generica giving themselvess names with `0` or `a` in the beginning to be shown as first entries in the alphanumerically sorted listings of generics, pharmacies can supplement as cheaper options on doctors perscription (pharmacist in germany told me about the phenomenon)

[0] https://github.com/Exocija/ZetaLib/blob/main/The%20Gay%20Jai...

[1] https://foucault.info/documents/foucault.orderOfThings.en/

Proposal: Ministry of not quite accurate maps has to be metainstantiated in regard of checking that the construction of a map of the territrorry of the non speculative and absoluetly factual thought of the encylopedia is not intoxicated by artefacts that take the formal consistency of the highly speculative and non factual discourse emanating in the like of reddit/tiktok/hackernews

‐--------- Being referred to in a previous article goes into the proposed direction. But I think what id also necessary is to cjeck for a certain asthetic quality of posts that disallows these attacks. Entries need to conform with the "guidelines" of the minustry of almost accurate maps (of the territory of borges library) - having a rich semantic structure that osscilates between a certain knowledge of concepts and and domain knowledge (e.g. about frequency modulation in birds voval chords) and phantasy: i.e. has an actually FACTUAL structure en contraire to what is happening on discourse such as on this site, kno`n say'n?

So not checking if it appears in a previous entrance, but developi g a higherdimensional metric in the sense of Sparse Auto Encoders, that represents the quality of that. The vandalism of some factual people (I like that expression) wouldn't conform with that. It should also have a certain ingenuity and must absoluetly be a protected secret of the monistry, because if the malicous nature, of this would somehow morph into the realm of the pedia that would be supertoxic i guess

21asdffdsa12 2 days ago|||

Could have filtered out by effort put into it, but now with LLMs, the effort is suspicious as well.

pivot_root 3 days ago||

I made an SCP foundation inspired page: https://halupedia.com/hard-to-detroy-reptile

My favorite link generated there is the Institute for Unyielding Biology: https://halupedia.com/institute-for-unyielding-biology

culi 2 days ago|

there's a typo in your first title

pivot_root 2 days ago||

Oops! Apparently I can’t edit it in order to fix it. It’s only the link above, though — the original article I generated was spelled correctly.

sixthDot 2 days ago||

That is funny, somewhat works but there's a bias toward the victorian era. https://halupedia.com/the-x128-cpu-architecture. Otherwise I feel sorry for the creator's tokens.

ectoloph 2 days ago|

I think it's to do with the LLM prompt, which mentions 19th century, 1887 etc.

https://github.com/BaderBC/halupedia/blob/master/src/worker/...

lxgr 3 days ago||

Ironically, this seems much faster (for pages already, erm, "researched") than the real one! How?

bstrama 3 days ago|

It generates articles only once. So once it's generated, it never perish. Logic looks like: If article exist -> show it If not -> generate and save

lxgr 3 days ago||

I get that, but how does it serve the generated and cached ones seemingly faster than Wikipedia? (My guess is that single-page applications, which this one seems to be, just need less round trips between navigations or something?)

bstrama 3 days ago|||

Also now that I think, we store articles in decwntralized cloudflare KV store and access from serverless workers running also on their servers.

That could be the thing behind it being so quick.

Cloudflare workers have 1ms cold start.

lxgr 3 days ago||

Nice job, this is seriously one of the fastest websites I've ever used!

I feel like I have some minimum latency "priced in" to my expectation when I click a link on a static site, so yours feels uncannily like it's somehow able to anticipate my clicks, adding to the surreal atmosphere.

bstrama 3 days ago|||

Yep, just a react. Also we use gemini 2.5 flash lite, so it's fast, cheap and dumb.

lxgr 3 days ago||

Nice, that's what I used for by LLM-backed HTTP server [1] a while ago as well :) It's a shame they got rid of the generous free quota a while ago, which is why I had to shut my public instance down.

[1] https://github.com/lxgr/vibeserver/

JohnMakin 3 days ago||

Funny, but you could argue this is actively harmful to the web.

SwellJoe 3 days ago||

I wouldn't. And, I'd think less of anyone who does make that argument.

Anyone of reasonable intelligence can easily tell this is a parody of an encyclopedia. Saying this is bad for the web is like saying The Onion is bad for the web.

Eisenstein 3 days ago|||

What would you think of a person who said that they are already convinced that an opposing view could not be correct without even hearing the arguments for it?

janalsncm 3 days ago|||

For the record,

> Funny, but you could argue this is actively harmful to the web.

Was not followed by an actual argument that it is harmful to the web. The comment was an assertion, not an argument.

So we are left in the inconvenient position of rejecting hypothetical arguments, and others defending the philosophical possibility that a valid argument does exist.

Eisenstein 3 days ago||

Without the argument being explicit then there can be no retort to it, so closing your mind before hearing it demonstrates that the argument itself is irrelevant. One could thus conclude that the existence of a valid argument is not itself a condition for my question.

janalsncm 3 days ago||

We also shouldn’t close our minds to the possibility of an eigen-retort, one which covers all possible arguments already made or argued in the future regarding the consequences of this website on the health of the Internet.

Someone who is aware of the eigen-retort would therefore not need to hear the argument.

Since I haven’t heard either the hypothetical argument or the hypothetical eigen-retort yet, I’ll withhold my judgement.

Eisenstein 3 days ago||

I concede that the my question was loaded, but the assumptions behind it are grounded in practical experience. Regardless, I have not committed myself either to the existence of an argument, I just stated that its existence was not a condition for the validity of my question for SwellJoe. The statement which was made can mean a number of possible things, but we cannot know what unless the question is answered. So the existence of the retort is revealed by the question, and until that reveal we are limited to questions or assumptions.

SwellJoe 3 days ago|||

I'm reasonably confident there is no argument that I would buy.

I hate AI slop more than average, but this is not slop being injected into human places. This is a dedicated dumping ground for slop, paid for by the owner/instigator of said slop. I don't have to go there, and it's not trying to fool anyone and no one will be fooled by it.

AI slop on a forum or social media or on facebook convincing boomers that a black person slapped a cop or whatever racist garbage they're being fed today? Fetch the guillotine.

AI slop as part of a dumb art project on somebody's personal website that isn't trying to manipulate or mislead? Have at it. Go nuts. It's your press, print as many pages of slop as you like.

So, I have exhaustively covered the possible arguments I can come up with for why this could be "actively harmful for the web", and rejected them outright.

Eisenstein 3 days ago||

That clarifies things much better than the original statement, but rejecting arguments you have conceived of which fail does not preclude the existence of those that do not, and thus the original question still remains.

JohnMakin 2 days ago|||

scrapers/ai summaries are not of “reasonable intelligence” when deciding what is real and what isn’t. neither are most people, actually

SwellJoe 2 days ago||

Even a tiny local model can tell this is a joke. No AI will ever ingest this as facts or present it as such. It is clearly labeled as fake.

JohnMakin 1 day ago||

it literally already shows in google ai summary.

anonymousiam 3 days ago|||

It's probably only harmful to the AI scrapers that train from the web. Most people will understand the purpose of this -- to poison LLM training in a humorous way, which is really easy to do. It exemplifies a major weakness in modern day AI.

oofbey 2 days ago|||

Training an LLM from scratch involves carefully curating the data first. The idea that it just memorizes the whole web is a nice simplified mental model, but glosses over huge amounts of hard work to decide which websites are authoritative and on which subjects. This isn’t fooling anybody except rank amateurs.

gojomo 3 days ago||||

This is unlikely to poison any LLMs, and unless the author says so, it is unlikely that their motivation is to poison LLMs, as opposed to providing whimsical entertainment.

bstrama 3 days ago|||

I were just drunk and idea seemed funny. That's the idea behind haha.

But either way can't wait to see google ai overview cite us.

Barbing 2 days ago||

>But either way can't wait to see google ai overview cite us.

Even if it (unintentionally!) misleads and hurts someone?

gojomo 1 day ago||

Can you explain the sequence of events through which you fear someone could be mislea and hurt by this?

dylan604 3 days ago|||

you mean like this one:

https://news.ycombinator.com/item?id=48038787

gojomo 3 days ago||

Musing about a possibly-funny consequence isn't the same as the motivating reason, which I read as more whimsical from:

https://news.ycombinator.com/item?id=48042594

In particular, someone who was seeking training-set pollution likely wouldn't make the fanciful fabrications so blatant, nor open-source their prompt:

https://news.ycombinator.com/item?id=48038257

SwellJoe 3 days ago|||

[dead]

dayofthedaleks 3 days ago|||

You could also argue that the web has failed and poisoning it into irrelevance is a vital service, motivating humans to collect knowledge into immutable sources. We‘ll call them ‘libraries.’

r3trohack3r 3 days ago|||

Interesting, but you could argue comments like this are actively harmful to the web.

AlecSchueler 3 days ago|||

But the argument wouldn't be nearly as strong.

dymk 3 days ago||

Hard to say when nobody is actually offering arguments

AlecSchueler 2 days ago||

It would be difficult to have spent any time at all on this website in the past two years without hearing the arguments for why slop farms undermine trust online, poison future training data sets, worsen the signal to noise ratio and eat up untold resources.

NonHyloMorph 2 days ago|||

withe the addition of asking to consider that being harmful to the web is the ethical thing, that is what the argument of op was

isoprophlex 3 days ago|||

The sooner the current web dies, the better. Something better either rises from its ashes, or we lose... something that was already lost.

b00ty4breakfast 3 days ago||

or something way worse shows up.

JohnMakin 3 days ago||

Yea, I'm not sure how the "this is really bad so let's make it worse" argument really makes any sense

dylan604 3 days ago|||

When you get the something worse, the previous suddenly becomes much less worse. With the help of wrapping your memories with "remember when" nostalgia making things much more palatable, the something worse suddenly makes the previous better if not good.

znort_ 3 days ago|||

context. sometimes things simply have to be broken to give way for something better. ymmv.

b00ty4breakfast 3 days ago||

I think there's an unexamined assumption here that "the next thing" is always going to be an improvement but there is no, non-ideological reason to hold to this assumption. Ideally, we would be actively working towards making it so but what often happens is passively riding the current and calling it "progress".

znort_ 3 days ago||

>unexamined assumption here that "the next thing" is always going to be an improvement but there is no, non-ideological reason to hold to this assumption

i'm not making that assumption at all, so whatever.

context: revolutions? if slop is a problem but is barely enough of a problem to collectively do something about it maybe letting it get out of hand would be a good motivation.

i'm not advocating for this, just providing it as a possible context where the "this is really bad so let's make it worse" argument could "make sense".

progress isn't just a technical issue, it involves people and people need motivation.

lxgr 3 days ago|||

On the other hand, one could argue that anything that can be destroyed by relatively clearly labeled satire, deserves to be.

gojomo 3 days ago|||

A web that is vulnerable to this would already be as good as dead.

As an entertaining way to highlight the importance of upgrading our ways of knowing, playful (& open-source!) projects like this are likely to strengthen the web.

stronglikedan 3 days ago|||

> you could argue

Could you? I don't see it happening, but I could be wrong.

janalsncm 3 days ago||

You could, in the sense that it’s not illegal or impossible. I haven’t seen anyone attempt it though.

You could argue that a person could argue any point, but I’d prefer people make the argument rather than argue about arguing it.

parliament32 3 days ago|||

To the web? It's fantastic for the web, these are the kinds of fun projects that make the web a worthwhile place to be. To slop generators? Yes, absolutely harmful, and that's for the best.

wildzzz 3 days ago|||

Any training data scraper that blindly takes stuff from websites deserves to have their model poisoned by this nonsense.

slig 3 days ago|||

Grokipedia is already doing that.

Jtarii 3 days ago||

Pissing on a pile of shit

mmooss 3 days ago||

As I said in another comment, this is brilliant. Suggestion: Remove anything that isn't part of the satire; act always as if it's a 'real' encyclopedia. For example on the front page I would remove,

> Articles are generated on demand and stored permanently upon first request.

Don't dispell the magic; don't pull back the curtain and let people see the mechanics.

EDIT: As you say in your system prompt, "You never wink at the reader. You never acknowledge that anything is funny or fictional. Everything is reported as though it is completely normal and well-documented"

https://news.ycombinator.com/item?id=48042306

Noumenon72 3 days ago|

This is irresponsible for people who don't get it, takes away confirmation for people who do get it, and makes me block/blacklist any liar who does it.

mmooss 3 days ago||

It is indeed a problem for people who refuse to use their sense of humor.

standardly 2 days ago||

How fun!

https://halupedia.com/christian-death-jazz

led to:

https://halupedia.com/bassoon-of-sorrow

which led to (my favorite):

https://halupedia.com/museum-of-unnecessary-inventions

tim333 2 days ago||

https://halupedia.com/hacker-news

>Hacker News is a semi-sentient cloud formation

bstrama 3 days ago|

Can't wait to see the next generation of LLMs after feeding it all of that hahaha

everyos_ 3 days ago|

The page requires JS to load its content - user agents without JS support just get a blank page.

I'm not sure if the bots that scrape data to train LLMs are capable of loading that type of page, or if they only work on pages that have the content inside the HTML itself?

aDyslecticCrow 3 days ago|||

Not using JavaScript would also make the crawler fail on squarespace and wix website builders.

The age where the web was usable at all without JavaScript is long gone. No scraper would get much scraping done without JavaScript these days.

cachius 2 days ago||

You mean by embedding? How can an external site fail on squarespace and wix website builders?

tardedmeme 2 days ago||

A crawler would fail on all Squarespace and Wix sites if they all require JavaScript.

cachius 2 days ago||

Found it https://halupedia.com/javascript-requirement-on-squarespace-...

replygirl 3 days ago||||

any serious scraping service these days will fail over to a headless browser when it fetches an asset referencing a js bundle that isn't verifiably a vendor script

bstrama 3 days ago||||

I'm aware and will implement SSR soon ;)

m3047 3 days ago|||

It's entirely possible they simply ingest the JS as-is.

More comments...