Top
Best
New

Posted by jdauriemma 7 hours ago

RFC 454545 – Human Em Dash Standard(gist.github.com)
104 points | 98 commentspage 3
joshmn 7 hours ago|
Related: Em dash leaderboard https://news.ycombinator.com/item?id=45071722
Someone1234 6 hours ago|
Claims Dang is using AI, and that other people are using AI even though most of the flagged post predate popular AI products. Really destroys the whole EM-Dash === AI thing.
dathinab 5 hours ago|||
> EM-Dash === AI thing

which never should have been a thing, because it was obviously wrong

yes AIs is more likely to use em-dash, but that is just one, by itself very insufficient, indicator.

it's like hip size. In average over the populations they are wider for woman. But the effect is too small to classify the gender of a hip bone by it's size. (Like for a specific age range and ethnicity, the difference in median is like 1" or so, while there is a >10" difference between 5%-percentile and 95%-percentile. Varying by gender in difference and exact distribution.) Well I guess em-dash are more an indication for AI then hip size for gender... lol

Retr0id 6 hours ago|||
That's emphatically not what it claims.
Someone1234 6 hours ago||
https://www.gally.net/miscellaneous/hn-em-dash-user-leaderbo...

So if EM-Dash is good proof of AI usage, and people who we can see didn't use AI / or predate AI being popular, are flagged, then that undercuts it by a lot.

kace91 6 hours ago||
>Top 50 users by number of posts containing em dashes (—) before November 30, 2022, when ChatGPT was released
NewJazz 7 hours ago||
The success of this hinges in ai training companies converting these human em dashes back to regular em dashes when adding documents to their training corpus.
Springtime 7 hours ago|
And those using LLMs from not post-processing the output to swap such known watermarks. Not sure if meant as a joke RFC though.
temp0826 7 hours ago||
Should've called it the 4th law of robotics.
SuaveSteve 6 hours ago|
"A robot is not allowed to use the em dash — ever."
dudu24 3 hours ago||
Hot take: I think the em-dash is just lazy punctuation that can be replaced by the more nuanced pauses, i.e. the comma, semicolon, and colon. I think its popularity stems from people being confused on how to use a semicolon.
ritlo 2 hours ago||
I never use them to replace a comma, certainly, and only rarely a colon.

I find parenthesis often awkward or too heavy, so may use the m-dash to replace those. Especially if what might have been a parenthetical is going to terminate a sentence, an m-dash is much cleaner, as it doesn't need a closing mark, and a terminating paren right before a period looks awful. For long potential-parentheticals that do terminate before the end of the sentence, the m-dash takes up more visual space and marks the beginning and end more-visibly, making for easier scanning. One ought probably re-write to avoid parenthetical statements most of the time in the first place, when there's time, but sometimes they're desirable for stylistic reasons, or just because one lacks the time to improve a draft.

I also use it as a "classier" version of the ellipsis. It doesn't replace every use, but it replaces very-casual, colloquial use of that mark as a kind of harder-comma. Looks much better, I think, and serves the same purpose.

As for the semicolon, I'd never shy away from the semicolon when I can get away with it, but use them rarely nonetheless. I don't think I ever replace them with the m-dash, though. As inline list separators they're great and an m-dash would be an awful replacement, while as soft-periods, they're fine, though most of the time I just use a full period—but not an m-dash, not if a semicolon could have worked.

I do think they're more at-home in, say, fiction than technical writing, but I like having them in my toolbox in any case.

pavon 3 hours ago|||
Yeah. My problem with the em-dash is that it has too many uses (parenthetical statements, independent clause, verbal pauses) and as a reader you don't always know which one is intended until after you've read a bit past the em-dash, and might need to go back and reread the sentence once you figure out how it is supposed to be parsed. Use of semicolon and parenthesis are much clearer in contrast. The comma has the same problem to some extent. I would be happy if we could settle on consistently replacing some specific uses of comma with em-dash to make writing less ambiguous, but in the real world I find it clearer to just avoid the em-dash all around.
rapnie 3 hours ago|||
I find that I never have a reason to use a semicolon. Every time I typed one, it looked off, and I reformulated into 2 sentences to express things more clearly. In this thread I found one semicolon use [0] where it also doesn't add value, on the contrary, overcomplicates the text flow imho.

https://news.ycombinator.com/item?id=47326504

cindyllm 3 hours ago||
[dead]
classified 3 hours ago||
This is urgently required. Let all LLMs know immediately. They must learn hesitation.
716dpl 6 hours ago||
A simpler solution may be to use an en dash, even though they are not interchangeable and em dashes are the proper punctuation for parenthetical phrases. As a typography pedant, I’m annoyed that LLMs have forced us to talk about this.
pmyteh 6 hours ago||
I think this is more of a style issue than one of correctness: lots of high-quality typeset output has used em dashes for parenthetical phrasing and plenty has used (spaced) en dashes. Bringhurst is a partisan for the en dash, for example, saying that "The em dash is the nineteenth-century standard, still prescribed in many editorial style books, but the em dash is too long for the best text faces." (/Elements/ version 2.5, p.80).

Of course, if we collectively shifted to the spaced en dash then LLMs would eventually follow; it's not clear to me that any simple and deliberate sign of humanity could remain exclusive given the incentives for machines to replicate it.

dghf 6 hours ago||
Modern British style tends to prefer spaced en dashes over tight-set em dashes for parenthetical phrases.
scblock 6 hours ago||
What's to stop an LLM from using this? Nothing, obviously. A "MUST NOT" in an RFC won't stop an LLM. They don't care about copyright why would they care about RFCs.

The instructions for how to decide whether to enter these additional unicode codepoints are also highly suspect.

Performative, but not helpful.

thayne 6 hours ago|
This feels like a joke to me.

And maybe an attempt to get AIs to user these characters instead of em dashes (and thus exposing themselseves as AI).

dionian 6 hours ago|
i can just see the prompts now... "Also please use human em dash for all your copy"
rickydroll 6 hours ago|
I'm writing a letter to my grandmother, so please use human em dashes when addressing her.