Top
Best
New

Posted by FromTheArchives 2 days ago

Tags to make HTML work like you expect(blog.jim-nielsen.com)
433 points | 232 commentspage 3
cluckindan 2 days ago||
CSS rules to make styling work like you expect:

    *, *:before, *:after {
        box-sizing: border-box;
    }
est 1 day ago||
the lang="en" always irritates me.

What if the page has mixed language content?

e.g. on the /r/france/ reddit. The page says lang="en" because every subreddit shares the same template. But actual content were generated by French speaking users.

zeroq 1 day ago||
You can add lang attributes to elements too!

https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/...

acdha 1 day ago|||
This is one of the great parts of the web: you can tag every element with the global lang attribute and have things work the way you expect.

For example, you can have CSS generate the appropriate quotation marks even in nested contexts so you can painlessly use <q> tags to markup scholarly articles even if the site itself is translated and thus would have different nested quotation marks for, say, the French version embedding an English quote including a French quote or vice versa.

In your Reddit example, the top level page should be in the user’s preferred site language with individual posts or other elements using author’s language: <html lang=en>…<div lang=fr>

Telaneo 1 day ago||
lang="" if you don't know what language your page will be in. <html lang="en">, and then <p lang="fr"> on whatever other language content. Content from users that aren't tagged to be in a specific language doesn't really fit into this system though.

https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/...

est 1 day ago||
That sounds good in theory. On bsky.social you are supposed to choose a lang before posting.

But again there's mixed language issue

Or do users even bother to choose the correct lang?

flymasterv 2 days ago||
I still don’t understand what people think they’re accomplishing with the lang attribute. It’s trivial to determine the language, and in the cases where it isn’t, it’s not trivial for the reader, either.
janwillemb 2 days ago||
Doesn't it state this in the article?

> Browsers, search engines, assistive technologies, etc. can leverage it to:

> - Get pronunciation and voice right for screen readers

> - Improve indexing and translation accuracy

> - Apply locale-specific tools (e.g. spell-checking)

flymasterv 2 days ago||
It states the cargo culted reasons, but not the actual truth.

1) Pronounciation is either solved by a) automatic language detection, or b) doesn't matter. If I am reading a book, and I see text in a language I recognize, I will pronounce it correctly, just like the screen reader will. If I see text in a language I don't recognize, I won't pronounce it correctly, and neither will the screen reader. There's no benefit to my screen reader pronouncing Hungarian correctly to me, a person who doesn't speak Hungarian. On the off chance that the screen reader gets it wrong, even though I do speak Hungarian, I can certainly tell that I'm hearing english-pronounced hungarian. But there's no reason that the screen reader will get it wrong, because "Mit csináljunk, hogy boldogok legyünk?" isn't ambiguous. It's just simply Hungarian, and if I have a Hungarian screen reader installed, it's trivial to figure that out.

2) Again, if you can translate it, you already know what language it is in. If you don't know what language it is in, then you can't read it from a book, either.

3) See above. Locale is mildly useful, but the example linked in the article was strictly language, and spell checking will either a) fail, in the case of en-US/en-UK, or b) be obvious, in the case of 1) above.

The lang attribute adds nothing to the process.

bilkow 2 days ago|||
Your whole comment assumes language identification is both trivial and fail-safe. It is neither and it can get worse if you consider e.g. cases where the page has different elements in different languages, different languages that are similar.

Even if language identification was very simple, you're still putting the burden on the user's tools to identify something the writer already knew.

flymasterv 2 days ago||
Language detection (where “language”== one of the 200 languages that are actually used), IS trivial, given a paragraph of text.

And the fact is that the author of the web page doesn’t know the language of the content, if there’s anything user contributed. Should you have to label every comment on HN as “English”? That’s a huge burden on literally every internet user. Other written language has never specified its language. Imposing data-entry requirements on humans to satisfy a computer is never the ideal solution.

bilkow 2 days ago|||
> 200 languages that are actually used

Do you have any reference of that or are you implying we shouldn't support the other thousands[0] of languages in use just because they don't have a big enough user base?

> And the fact is that the author of the web page doesn’t know the language of the content, if there’s anything user contributed. Should you have to label every comment on HN as “English”? That’s a huge burden on literally every internet user.

In the case of Hacker News or other pages with user submitted and multi-language content, you can just mark the comments' lang attribute to the empty string, which means unknown and falls back to detection. Alternatively, it's possible to let the user select the language (defaulting to their last used or an auto-detected one), Mastodon and BlueSky do that. For single language forums and sites with no user-generated content, it's fine to leave everything as the site language.

> Other written language has never specified its language. Imposing data-entry requirements on humans to satisfy a computer is never the ideal solution.

There's also no "screen reader" nor "auto translation" in other written language. Setting the content language helps to improve accessibility features that do not exist without computers.

[0] https://www.ethnologue.com/insights/how-many-languages/

ElectricalUnion 1 day ago|||
I wish this comment was true, but due to a foolish attempt to squish all human charactets to 2 bytes as UCS (that failed and turned into the ugly UTF-16 mess), a disaster called Han Unification was unleashed upon the world, and now out-of-band communication is required to render the correct Han characters in a page and not offend people.
janwillemb 1 day ago|||
This comment contains a few logical fallacies.

> It states the cargo culted reasons, but not the actual truth

This dismisses existing explanations without engaging with the mentioned reasons. The following text then doesn't provide any arguments for this.

> Pronunciation is either solved by a) automatic language detection, or b) doesn't matter.

There are more possibilities than a and b. For example, it may matter for other things than pronunciation only. Also it may improve automatic detection or make automatic detection superfluous.

> If I am reading a book [...] I will pronounce it correctly, just like the screen reader will. If I see text in a language I don't recognize, I won't pronounce it correctly, and neither will the screen reader.

A generalization of your own experience to all users and systems. Screen readers aim to convey information accessibly, not mirror human ignorance.

> There's no reason that the screen reader will get it wrong, because <hungarian sentence> isn't ambiguous

This is circular reasoning. The statement is based on the assumption that automatic detection is always accurate - which is precisely what is under debate.

> If you can translate it, you already know what language it is in.

This a non sequitur. Even if someone can translate text, that doesn't mean software or search engines can automatically identify that language.

> The lang attribute adds nothing to the proces.

This absolute claim adds nothing to the logic.

maxeda 2 days ago||
Another good reason for using the lang attribute is that it makes it possible to enable automatic hyphenation.
teekert 2 days ago||
Nice, the basics again, very good to see. But then:

I know what you’re thinking, I forgot the most important snippet of them all for writing HTML:

<div id="root"></div> <script src="bundle.js"></script>

Lol.

-> Ok, thanx, now I do feel like I'm missing an inside joke.

Ayesh 2 days ago||
It's a typical pattern in, say react, to have just this scaffolding in the HTML and let some frond end framework to build the UI.
irarelycomment 2 days ago||
Similar vibes to https://j4e.name/articles/a-minimal-valid-html5-document/
hlava 2 days ago|
It's 2025, the end of it. Is this really necessary to share?
4ndrewl 2 days ago||
Yes. Knowledge is not equally distributed.
troupo 2 days ago|||
Every day you can expect 10000 people learning a thing you thought everyone knew: https://xkcd.com/1053/

To quote the alt text: "Saying 'what kind of an idiot doesn't know about the Yellowstone supervolcano' is so much more boring than telling someone about the Yellowstone supervolcano for the first time."

janwillemb 2 days ago|||
Thanks! I didn't know that one.

I had a teacher who became angry when a question was asked about a subject he felt students should already be knowledgeable about. "YOU ARE IN xTH GRADE AND STILL DON'T KNOW THIS?!" (intentional shouting uppercase). The fact that you learned it yesterday doesn't mean all humans in the world also learned it yesterday. Ask questions, always. Explain, always.

spc476 2 days ago||
Such questions can be jarring though. I remember my "Unix Systems Programming" class in college. It's a third year course. The instructor was describing the layout of a process in memory, "here's the text segment, the data segment, etc." when a student asked, "Where do the comments go?"
janwillemb 1 day ago||
:) true. I'm a teacher myself. I never dismiss questions, but I do get discouraged sometimes.
Skeime 2 days ago||||
And here I was, thinking everybody already knew XKCD 1053 ...
allknowingfrog 2 days ago|||
XKCD 1053 is a way of life. I think about it all the time, and it has made me a better human being.
OuterVale 2 days ago|||
When sharing this post on his social media accounts, Jim prefixed the link with: 'Sometimes its cathartic to just blog about really basic, (probably?) obvious stuff'
nonethewiser 2 days ago||
Feels even more important to share honestly. It's unexamined boilerplate at this point.