New accounts on HN more likely to use em-dashes

Posted by todsacerdoti 9 hours ago

New accounts on HN more likely to use em-dashes(www.marginalia.nu)

542 points | 457 commentspage 2

vjerancrnjak 5 hours ago|

On reddit it's even worse, I feel like Reddit is internally having their own bots for engagement bait.

As someone who loves LaTeX, I can't imagine ever spending so much time on typography on online forums, italics, bold, emdashes, headers, sections. I quit reddit and will quit hn as well if situation worsens.

whamlastxmas 4 hours ago|

I have the sneaking suspicion that reddit has allowed and facilitated astroturfing for over a decade. As in, providing accounts, eliminating rate limits, artificially boosting posts and comments, and aggressively shadow banned contrary opinions. This is definitely a known phenomenon on a auto moderator level but I bet reddit ownership is complicit in it too

quesera 4 hours ago||

This behaviour is also openly acknowledged to have been used in early-Reddit growth hacking. So why not?

Velocifyer 1 hour ago||

I find that specifically GPT4 uses a lot on em dashes. I find that it uses em-dashes when it is not usefull.

hartator 6 hours ago||

Biggest tell that a comment is AI: it's deeply uninteresting.

No one wants to read your ChatGPT outputs.

Aachen 5 hours ago||

Not sure if serious but I don't think that's precisely it. To me, it's more that it rehashes a point until it's fully beaten to death, putting obvious aspects in a list, being subtly wrong, writing a conclusion paragraph to the previous three sentences... it's boring but not because of what it writes but, instead, how it writes it. Of course, it can also be inherently uninteresting but then you should have entered a prompt that causes the autocomplete function to ramble about something you're interested in :P

mghackerlady 4 hours ago|||

It also feels way too sanitised, like it went through some companies PR department (granted, that's because it went through openais pr department, but still)

chrisjj 6 hours ago||

> No one wants to read your ChatGPT outputs.

...except ChatGPT fans.

Aachen 5 hours ago||

Not even them. They use gpt to summarise the other's output

CharlesW 8 hours ago||

A couple thoughts:

(1) I don't recommend focusing disproportionately on one signal. They'll change, and are incredibly easy to optimize for. https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing

(2) I do recommend taking one minute to dash a note off to hn@ycombinator.com if you see suspicious patterns. Dang and our other intrepid mods are preturnatually responsive, and appear to appreciate the extra eyeballs on the problem.

5o1ecist 8 hours ago||

> minute to dash a note

I support this dashing recommendation.

marginalia_nu 7 hours ago|||

I have sent them an email a few days ago about the state of /noobcomments.

This wasn't really a intended as an "wow, dang is sure sleeping on the job", more than an interesting observation on the new bot ecosystem.

I also feel like there's a missing discussion about the comment quality on HN lately. It feels like it's dropped like crazy. Wanted to see if I could find some hard data to show I haven't gone full Terry Davis.

bakugo 7 hours ago||

Is there even an incentive to optimize for such signals, though? Em-dashes have been a known indicator of AI-generated text for a good while, and are still extremely prevalent. While someone who doesn't like AI slop and knows and what to look out for will notice and call out obvious AI comments, the unfortunate truth is that the majority of people simply cannot tell, and even among those who can, many don't care.

Obvious AI-generated posts and articles make it to the front page on a daily basis, and I get the impression that neither the average user nor the moderation team see that as a problem at all anymore.

yorwba 6 hours ago||

The mods do care, but you have to email them or they won't necessarily notice.

atleastoptimal 2 hours ago||

It would be trivial to make a HN comment agent that avoids all the usual hallmarks of AI writing. Mere estimations of bot activity based on character frequency would likely underestimate their presence.

SkyeCA 8 hours ago||

If I see an em-dash in a comment I stop reading and I've seriously considered setting up a filter across multiple sites to remove any comments containing one.

I know there are legitimate usecases for the em-dash, but a few paragraphs (at most) of text in an HN/Reddit comment? Into the trash it goes.

bubblewand 5 hours ago||

Not so long ago, they were just a ~75%-odds tell that the user was typing on a Mac.

ge96 5 hours ago|||

trying to remember what is the grammatical purpose of it when writing

trying to remember last time I used it

arjie 5 hours ago||

I noticed a similar trend a couple of weeks ago so I auto-hide green comments now. I also autohide all top 1000 user accounts but it strikes me that perhaps I should also choose a “user signed up on $date” filter that precedes OpenClaw.

marginalia_nu 9 hours ago||

(author) I saw a 32:1 rate of EM-dashes last night when I just eyeballed the first 3 pages of /newcomments and /noobcomments. So I'm not sure how stable this is over over time.

gritzko 8 hours ago||

This is probably the time to add some invitation system like GMail had in the beginning. Or make a shade for accounts <1yr. Or something else, before things get too mixed.

shit_game 7 hours ago|||

The issue with creating some hidden maturity heuristic for accounts is that it will be gamed just the same as any other, except that using age alone is the simplest heuristic to game. You can simply do nothing for incrimental periods of time and then begin testing aged accounts to roughly determine what the minimum age an account must reach to become "trusted".

Bot prevention is a very difficult constant game of cat and mouse, and a lot of bot operators have become very skilled at determining the hidden metrics used by platforms to bless accounts; that's their job, after all. I've become a big fan of lobste.rs' invitation tree approach, where the reputation of new accounts rides on the reputation of older accounts, and risks consequence up the chain. It also creates a very useful graph of account origin, allowing for scorched earth approaches to moderation that would otherwise require a serious (and often one-off) machine learning approach to connect accounts.

duckmysick 6 hours ago|||

https://lobste.rs/ has a system like that.

Muhammad523 8 hours ago|||

I just took a look at /noobcomments and wow, there's ever a comment where a person argues with AI instead of, you know, using their own brain. It was abivous it was ai since it was formatted with markdown

lgats 8 hours ago|||

the link https://news.ycombinator.com/noobcomments

cookiengineer 8 hours ago||

I wanted to point out that em dashes are autocompleted by the iOS keyboard. So the false positives and true negatives might have some overlaps without more details. I think a better indicator would be to only detect em dashes with preceding and following whitespace characters, and general unicode usage of that user.

Additionally, lots of Chinese and Russian keyboard tools use the em dash as well, when they're switching to the alternative (en-US) layout overlay.

There's also the Chinese idiom symbol in UTF8 which gets used as a dot by those users a lot, so that could be a nice indicator for legit human users.

edit: lol @ downvotes. Must have hit a vulnerable spot, huh?

Aurornis 8 hours ago|||

> I wanted to point out that em dashes are autocompleted by the iOS keyboard.

That’s why the analysis was performed over time. All of those em dash sources you mentioned were present before LLM written content became popular.

marginalia_nu 8 hours ago|||

I think there is a baseline number of human users that for one reason or another uses em-dashes, but this doesn't explain why they 10x more prevalent in green accounts.

cookiengineer 7 hours ago||

> I think there is a baseline number of human users that for one reason or another uses em-dashes, but this doesn't explain why they 10x more prevalent in green accounts.

I'm not trying to negate the fact. I'm just pointing out that a correlation without another indicator is not evidence enough that someone is a bot user, especially in the golden age of rebranded DDoS botnets as residential proxy services that everyone seems to start using since ~Q4 2024.

eterm 6 hours ago||

It's the "incredibly banal" comments that upset me. The ones that just re-state the article in one or two uncontraversial sentences.

Often lean slightly pro-AI, but otherwise avoid saying much about anything.

onion2k 8 hours ago|

I’ve had this sense that HN has gotten absolutely innundated with bots last few months.

Is it possible to differentiate between a bot, and a human using AI to 'improve' the quality of their comment where some of the content might be AI written but not all? I don't think it is.

lm28469 8 hours ago||

> HN has gotten absolutely innundated with bots last few months.

hm, the whole internet really, youtube, reddit, twitter, facebook, blog posts, food recipes, news articles, it's getting more and more obvious

sunaookami 8 hours ago|||

I find the bigger problem with online comments are that people repeat the same comments and "jokes" over and over and over again. Sure we had those with YouTube 15 years ago when people always spammed "first!" and "who is listening in <year>?" but now it's gotten worse and every single comment is now just some meme (especially on Reddit) or some kind of "gotcha"...

lm28469 6 hours ago||

> I find the bigger problem with online comments are that people repeat the same comments and "jokes" over and over and over again.

And bots reposting a trending post from like 12 years ago to farm internet points... with other bots reposting the top comments of the initial post

skeptic_ai 8 hours ago|||

All will be fixed with real id attestation /s

Lucasoato 8 hours ago|||

Not exactly, bot farms can still be made with poor people IDs through black market. I don't know what the solution is going to be, but at some point we might forced to accept the reality that on the internet humans and AI won't be distinguishable anymore and adjust our services independently on the client being a person or a machine.

e2le 5 hours ago|||

That is a probable outcome however it would at least cap or limit the ability of bot farms to produce industrial sludge content.

8cvor6j844qw_d6 7 hours ago|||

ID verification with video capture for every post on an attested device.

lets bring back Chrome's WEI while we're at it

kdheiwns 6 hours ago|||

AI post "improvements" are the most annoying thing. I see more and more people doing it, especially when posting reviews/experiences with things, and they always get called out for it. They always justify it with "AI helped me organize what I wanted to say." Like man, you're having an AI write about an experience it didn't have and likely didn't even proofread it. Who knows what BS it added to the story. Even disorganized and misspelled stories are better than AI fantasy renditions that are 20 times longer than they need to be.

yoyohello13 8 hours ago|||

I just assume if any comment sounds like an ad it's a bot. All the comments like "I'm 10x faster with Claude Opus 4.6!" or "Have you tried Codex with ChatGPT 5.X? What a time to be alive!" can be lumped in the bot bin.

e2le 5 hours ago|||

> human using AI to 'improve' the quality of their comment

I want to hear people in their own voice, their own ideas, with their own words. I have no interest in reading AI generated comments with the same prose, vocabulary, and grammar.

I don't care if your writing is bad.

Additionally, I am sceptical that using AI to write comments on your behalf creates opportunities for self-improvement. I suspect this is all leading to a death of diversity in writing where comments increasingly have an aura of sameness.

munk-a 8 hours ago|||

I don't personally care about the distinction especially since AI usually 'improves' things by making it more verbose. Don't waste tokens to force me to read more useless words about your position - just state it plainly.

Brevity is the soul of wit.

homebrewer 8 hours ago|||

If you are suspicious, look at comment history. It's usually fairly obvious because all comments made by LLM spambots look the same, have very similar structure and length. Skim ten of them and it becomes pretty clear if the account is genuine.

I'm more worried about how many people reply to slop and start arguing with it (usually receiving no replies — the slop machine goes to the next thread instead) when they should be flagging and reporting it; this has changed in the last few months.

taeric 8 hours ago|||

This makes me think a tool that lets me know how much of the engagement I was seeing was from bots would be huge.

onion2k 7 hours ago|||

If you are suspicious, look at comment history.

I'm never suspicious though. One of the strange, and awesome, and incredibly rare things about HN is that I put basically zero stock in who wrote a comment. It's such a minimal part of the UI that it entirely passes me by most of the time. I love that about this site. I don't think I'm particularly unusual in that either; when someone shared a link about the top commenters recently there were quite a few comments about how people don't notice or how they don't recognize the people in the top ranks.

The consequence of this is that a bot could merrily post on here and I'd be absolutely fine not knowing or caring if it was a bot or not. I can judge the content of what the bot is posting and upvote/downvote accordingly. That, in my opinion, is exactly how the internet should work - judge the content of the post, not the character of the poster. If someone posts things I find insightful, interesting, or funny I'll upvote them. It has exactly zero value apart from maybe a little dopamine for a human, and actually zero for a robot, but it makes me feel nice about myself that I showed appreciation.

esafak 8 hours ago|||

I was thinking of how to create a UX around quantifying or qualifying AI use. If products revealed that users had used in-app AI to compose their responses, they might respond by doing it outside the app and pasting it in. If you then labeled pasted text as AI they might use tools to imitate typing. And after all that, you might face a user backlash from the users who rely on AI to write.

More comments...