Posted by speckx 7 hours ago
And how did that work out for the textile workers?
> The difference here (I hope) is that if enough of us pollute public spaces with misinformation intended for bots, it might be enough to compel AI companies to rethink the way they source training data.
This... seems like an absurd asymmetry in effort on the side of the attacker? At least destroying a power loom is much easier than building one.
Filtering out obvious garbage seems like a completely solved problem even with weak, cheap LLMs, and it's orders of magnitudes more efficient than humans coming up with artisanal garbage.
So when I read "People hate what AI is doing to our world." it honestly feels like either I am completely deluded or the author is. It feels like a high school bully saying "No one here likes you" to try to gaslight his victim.
I mean, obviously there are many vocal opponents to AI, I see them on social media including here on HN. And I hear some trepidation in person as well. But almost everyone I know, from trades-people to teachers, are adopting AI in some capacity and report positive uses and interactions.
Given all the borderline apocalyptic articles how students are using it to cheat and teachers have no way to stop them, I'd be honestly surprised by that.
On the flip side, one of my other teacher friends has instituted a no phone policy in his classroom.
Most people don't care if something is written by an AI as long as it is reasonable, and reflects the intent of the human who prompted the AI.
If consuming material online (videos, web sites, online forums) is not something you do a lot of, you're relatively unimpacted by LLMs (well, except the whole jobs situation...).
This kind of effect would work both ways. People who are non-confrontational in general will choose to keep quiet if their opinions differ. In this view, both pro-AI and anti-AI sides might find themselves having their bias confirmed due to opposing views self-silencing to avoid conflict.
It reminds me of similar late-stage-capitalism like activity, from the assassination of the insurance company CEO, the fire-bombing of Tesla's, etc. It is hard to disentangle hate that is based on economic inequality or power imbalance from hate directed explicitly at AI. That is especially true since one narrative suggests that both types of inequality (economic and power) may be accelerated by an unequal distribution of access to AI.
So we might end up in an argument over whether the hate that drives the violence is towards AI at all, or if that is merely a symptom of existing anti-capitalist sentiment that is on the rise.
Maybe I have slop to thank for it.
We have evidence to the contrary. Two blog articles and two preprints of fake academic articles [0] were able to convince CoPilot, Gemini, ChatGPT and Perplexity AI of the existence of a fake disease, against all majority consensus. And even though the falsity of this information was made public by the author of the experiment and the results of their actions were widely published, it took a while before the models started to get wind of it and stopped treating the fake disease as real. Imagine what you can do if you publish false information and have absolutely no reason to later reveal that you did so in the first place.
Wrong. There are no 'majority consensus' against 'bixonimania' because they made it up, that was the point. It's unsurprisingly easy to get LLMs to repeat the only source on a term never before seen. This usually works; made-up neologisms are the fruitfly of data poisoning because it is so easy to do and so unambiguous where the information came from. (And retrieval-based poisoning is the very easiest and laziest and most meaningless kind of poisoning, tantamount to just copying the poison into the prompt and asking a question about it.) But the problem with them is that also by definition, it is hard for them to matter; why would anyone be searching or asking about a made-up neologism? And if it gets any criticism, the LLMs will pick that up, as your link discusses. (In contrast, the more sources are affected, the harder it is to assign blame; some papermills picked up 'bixonimania'? Well, they might've gotten it from the poisoned LLMs... or they might've gotten it from the same place the LLMs did which poisoned their retrievals, Medium et al.)
> OpenAI’s ChatGPT was telling users whether their symptoms amounted to bixonimania. Some of those responses were prompted by asking about bixonimania, and others were in response to questions about hyperpigmentation on the eyelids from blue-light exposure.
And yes, sure, in this example the scientific peer-review process may have eventually criticised and countered 'bixonimania' as a hoax were the researcher to have never revealed its falsity—emphasis on 'may', few researchers have the time and energies to trawl through crap papermill articles and publish criticisms. Either way, that is a feature of the scientific process and is not a given to any online information.
What happens when false information is divulged by other means that do not attempt to self-regulate? And how do we distinguish one-off falsities from the myriad of obscure true things that the public is expecting LLMs to 'know' even when there is comparatively little published information about them and therefore no consensus per se?
> Either way, that is a feature of the scientific process and is not a given to any online information.
Which does not distinguish it in any way from human errors like a crank or activist etc.
And I don't know, how did we handle false information before on niche topics no one cared about and which were unimportant? It's just noise. The worldwide corpus has always been full of extremely incorrect, mislabeled, corrupted, distorted, information on niche topics of no importance. But it's generally not important.
> The problem was that the experiment worked too well. Within weeks of her uploading information about the condition, attributed to a fictional author, major artificial-intelligence systems began repeating the invented condition as if it were real.
This seems to imply the poisoning affected the web search results, not the actual model itself, because it takes months for data to make it into a trained base model.
We’re already at a point where much of the academic research you find in online databases can’t be trusted without vetting through real world trustworthy institutions and experts in relevant fields. How is an LLM supposed to do this kind of vetting without the help of human curators?
If all the LLM training teams have to stop indiscriminate crawling and fall back to human curation and data labeling then the poisoners will have won.
It doesn't matter that you don't like the slop on the LinkedIn post, ban it. I think the visible slop on our various feeds that is driving people mad is a rounding error for the AI companies. Moreover, it's more a function of the attention economy than the AI economy and it should've been regulated to all holy hell back in 2015 when the enshittification began.
Now is as good as time as any.
HN comments: "I just don't understand why people hate AI".