Top
Best
New

Posted by FromTheArchives 10/27/2025

Tags to make HTML work like you expect(blog.jim-nielsen.com)
441 points | 238 commentspage 2
brianzelip 10/27/2025|
> `<html lang="en">`

The author might consider instead:

`<html lang="en-US">`

childintime 10/27/2025||
It's time for an "en-INTL" (or similar) for international english, that is mostly "en-US", but implies a US-International keyboard and removes americanisms, like Logical Punctuation in quotes [1]. Then AI can start writing for a wider and much larger public (and can also default to regular ISO units instead of imperial baby food).

Additionally, it's kind of crazy we are not able to write any language with any keyboard, as nowadays we just don't know the idiom the person who sits behind the keyboard needs.

[1] https://slate.com/human-interest/2011/05/logical-punctuation...

chrismorgan 10/29/2025|||
If you want to divide English into only two categories, I reckon US English (color, analyze, center) and International English (colour, analyse, centre) is the best divide. It’s imperfect—Canadians are mostly International but want analyze, and there are other controversial words like program/programme (US, CA and AU prefer program; GB and IN prefer programme)—but I think it’s the best divide if you want only two.

Windows distributes ISOs labelled English (en-US) and English International (en-GB) along this divide.

It’s also a valuable divide for reasons beyond language, because the USA really does have a habit of doing its own thing, even when pretty much the rest of the world has agreed on something different. Your US English locale can default to Fahrenheit, miles, pounds, Letter, and their bizarre middle-endian date format, while International English can default to Celsius, kilometres, kilograms, A4, and DD/MM/YYYY. It doesn’t sort out everything, but it gives a much better starting point. Not every non-American prefers DD/MM/YYYY, but even if they’d prefer something like DD.MM.YY or YYYY-MM-DD, DD/MM/YYYY is a whole lot better than MM/DD/YYYY.

Telaneo 10/28/2025||||
en-DK is used for this in some cases, giving you English, but with metric units and an ISO keyboard among other things.

A dedicated one for International English, or heck, even just EU-English, would be great.

The EU websites just use en from what I can tell, but they also just use de, fr, sv, rather than specifying country (except pt-PT, which makes sense, since pt-BR is very common, but not relevant for the EU).

qingcharles 10/27/2025||||
Isn't that what "en" on its own should be, though?
fijiaarone 10/28/2025|||
We should also enforce a standard where every website has to change their content to match the user’s preferred idiomatic diss, whether it be “yo momma”, “deez nuts”, “six seven”, or a series of hottentot tongue clicks recorded in Ogg Vorbiz.
mobeigi 10/27/2025|||
Interesting.

From what I can tell this allows some screen readers to select specific accents. Also the browser can select the appropriate spell checker (US English vs British English).

Etheryte 10/27/2025||
Those two mean two very different things though, why would the author do that? Please see RFC 5646 [0], "en" means English without any further constraints, "en-US" means English as used in the United States.

[0] https://datatracker.ietf.org/doc/html/rfc5646

Aransentin 10/27/2025||
Note that <html> and <body> auto-close and don't need to be terminated.

Also, wrapping the <head> tags in an actual <head></head> is optional.

You also don't need the quotes as long the attribute doesn't have spaces or the like; <html lang=en> is OK.

(kind of pointless as the average website fetches a bazillion bytes of javascript for every page load nowadays, but sometimes slimming things down as much as possible can be fun and satisfying)

zelphirkalt 10/27/2025||
This kind of thing will always just feel shoddy to me. It is not much work to properly close a tag. The number of bytes saved is negligible, compared to basically any other aspect of a website. Avoiding not needed div spam already would save more. Or for example making sure CSS is not bloated. And of course avoiding downloading 3MB of JS.

What this achieves is making the syntax more irregular and harder to parse. I wish all these tolerances wouldn't exist in HTML5 and browsers simply showed an error, instead of being lenient. It would greatly simplify browser code and HTML spec.

bentley 10/27/2025|||
Implicit elements and end tags have been a part of HTML since the very beginning. They introduce zero ambiguity to the language, they’re very widely used, and any parser incapable of handling them violates the spec and would be incapable of handling piles of real‐world strict, standards‐compliant HTML.

> I wish all these tolerances wouldn't exist in HTML5 and browsers simply showed an error, instead of being lenient.

They (W3C) tried that with XHTML. It was soundly rejected by webpage authors and by browser vendors. Nobody wants the Yellow Screen of Death. https://en.wikipedia.org/wiki/File:Yellow_screen_of_death.pn...

haskellshill 10/27/2025|||
> They introduce zero ambiguity to the language

Well, to parsing it for machines yes, but for humans writing and reading it they are helpful. For example, if you have

    <p> foo
    <p> bar
and change it to

    <div> foo
    <div> bar
suddenly you've got a syntax error (or some quirks mode rendering with nested divs).

The "redundancy" of closing the tags acts basically like a checksum protecting against the "background radiation" of human editing. And if you're writing raw HTML without an editor that can autocomplete the closing tags then you're doing it wrong anyway. Yes that used to be common before and yes it's a useful backwards compatibility / newbie friendly feature for the language, but that doesn't mean you should use it if you know what you're doing.

recursive 10/27/2025||
It sounds like you're headed towards XHTML. The rise and fall of XHTML is well documented and you can binge the whole thing if you're so inclined.

But my summarization is that the reason it doesn't work is that strict document specs are too strict for humans. And at a time when there was legitimate browser competition, the one that made a "best effort" to render invalid content was the winner.

haskellshill 10/27/2025||
The merits and drawbacks of XHTML has already been discussed elsewhere in the thread and I am well aware of it.

> And at a time when there was legitimate browser competition, the one that made a "best effort" to render invalid content was the winner.

Yes, my point is that there is no reason to still write "invalid" code just because it's supported for backwards compatibility reasons. It sounds like you ignored 90% of my comment, or perhaps you replied to the wrong guy?

recursive 10/27/2025||
I'm a stickling pedant for HTML validity, but close tags on <p> and <li> are optional by spec. Close tags for <br>, <img>, and <hr> are prohibited. XML-like self-closing trailing slashes explicitly have no meaning in XML.

Close tags for <script> are required. But if people start treating it like XML, they write <script src="…" />. But that fails, because the script element requires closure, and that slash has no meaning in XML.

I think validity matters, but you have to measure validity according to the actual spec, not what you wish it was, or should have been. There's no substitute for actually knowing the real rules.

haskellshill 10/27/2025|||
Are you misunderstanding on purpose? I am aware they are optional. I am arguing that there is no reason to omit them from your HTML. Whitespace is (mostly) optional in C, does that mean it's a good idea to omit it from your programs? Of course a br tag needs no closing tag because there is no content inside it. How exactly is that an argument for omitting the closing p tag? The XML standard has no relevance to the current discussion because I'm not arguing for "starting to treat it like XML".
recursive 10/27/2025||
I'm beginning to think I'm misunderstanding, but it's not on purpose.

Including closing tags as a general rule might make readers think that they can rely on their presence. Also, in some cases they are prohibited. So you can't achieve a simple evenly applied rule anyway.

haskellshill 10/27/2025||
Well, just because something is allowed by the syntax does not mean it's a good idea, that's why pretty much every language has linters.

And I do think there's an evenly applied rule, namely: always explicitly close all non-void elements. There are only 14 void elements anyway, so it's not too much to expect readers to know them. In your own words "there's no substitute for actually knowing the real rules".

I mean, your approach requires memorizing for which 15 elements the closing tag can be omitted anyway (otherwise you'll mentally parse the document wrong (i.e. thinking a br tag needs to be closed is equally likely as thinking p tags can be nested)).

The risk that somebody might be expecting a closing tag for an hr element seems minuscule and is a small price to pay for conveniences such as (as I explained above) being able to find and replace a p tag or a li tag to a div tag.

recursive 10/27/2025||
I don't believe there are any contexts where <li> is valid that <div> would also be valid.

I'm not opposed to closing <li> tags as a general a general practice. But I don't think it provides as much benefit as you're implying. Valid HTML has a number of special rules like this. Like different content parsing rules for <textarea> and <script>. Like "foreign content".

If you try to write lint-passing HTML in the hopes that you could change <li> to <div> easily, you still have to contend with the fact that such a change cannot be valid, except possibly as a direct descendant of <template>.

haskellshill 10/28/2025||
Again, you're focusing on a pointless detail. Sure, I made a mistake in offhandedly using li as an example. Why do you choose to ignore the actually valid p example though? Seems like you're more interested in demonstrating your knowledge of HTML parsing (great job, proud of ya) than anything else. Either way, you've given zero examples of benefits of not doing things the sensible way that most people would expect.
recursive 10/28/2025||
To (hopefully) be clear, I don't think there are many benefits either way.
sgarland 10/28/2025|||
IMO, all of those make logical sense. If you’re inserting a line break or literal line, it can be thought of as a 1-dimensional object, which cannot enclose anything. If you want another one, insert another one.

In contrast, paragraphs and lists do enclose content, so IMO they should have clear delineations - if nothing else, to make visually understanding the code more clear.

I’m also sure that someone will now reference another HTML attribute I didn’t think about that breaks my analogy.

alwillis 10/27/2025||||
I didn't have a problem with XHTML back in the day; it tool a while to unlearn it; I would instinctively close those tags: <br/>, etc.

It actually the XHTML 2.0 specification [1] that discarded backwards compatibility with HTML 4 was the straw that broke the camel's back. No more forms as we knew them, for example; we were supposed to use XFORMS.

That's when WHATWG was formed and broke with the W3C and created HTML5.

Thank goodness.

[1]: https://en.wikipedia.org/wiki/XHTML#XHTML_2.0

WorldMaker 10/27/2025||
XHTML 2.0 had a bunch of good ideas and a lot of them got "backported" into HTML 5 over the years.

XHTML 2.0 didn't even really discard backwards-compatibility that much: it had its compatibility story baked in with XML Namespaces. You could embed XHTML 1.0 in an XHTML 2.0 document just as you can still embed SVG or MathML in HTML 5. XForms was expected to take a few more years and people were expecting to still embed XHTML 1.0 forms for a while into XHTML 2.0's life.

At least from my outside observer perspective, the formation of WHATWG was more a proxy war between the view of the web as a document platform versus the view of the web as an app platform. XHTML 2.0 wanted a stronger document-oriented web.

(Also, XForms had some good ideas, too. Some of what people want in "forms helpers" when they are asking for something like HTMX to standardized in browsers were a part of XForms such as JS-less fetch/XHR with in-place refresh for form submits. Some of what HTML 5 slowly added in terms of INPUT tag validation are also sort of "backports" from XForms, albeit with no dependency on XSD.)

tracker1 10/27/2025|||
XHTML in practice was too strict and tended to break a few other things (by design) for better or worse, so nobody used it...

That said, actually writing HTML that can be parsed via an XML parser is generally a good, neighborly thing to do, as it allows for easier scraping and parsing through browsers and non-browser applications alike. For that matter, I will also add additional data-* attributes to elements just to make testing (and scraping) easier.

ifwinterco 10/27/2025||||
You're not alone, this is called XHTML and it was tried but not enough people wanted to use it
zelphirkalt 10/27/2025|||
Yeah, I remember, when I was at school and first learning HTML and this kind of stuff. When I stumbled upon XHTML, I right away adapted my approach to verify my page as valid XHTML. Guess I was always on this side of things. Maybe machine empathy? Or also human empathy, because someone needs to write those parsers and the logic to process this stuff.
sevenseacat 10/27/2025|||
oh man, I wish XHTML had won the war. But so many people (and CMSes) were creating dodgy markup that simply rendered yellow screens of doom, that no-one wanted it :(
adzm 10/27/2025||
i'm glad it never caught on. the case sensitivity (especially for css), having to remember the xmlns namespace URI in the root element, CDATA sections for inline scripts, and insane ideas from companies about extending it further with more xml namespaced elements... it was madness.
imiric 10/27/2025|||
I'll copy what I wrote a few days ago:

The fact XHTML didn't gain traction is a mistake we've been paying off for decades.

Browser engines could've been simpler; web development tools could've been more robust and powerful much earlier; we would be able to rely on XSLT and invent other ways of processing and consuming web content; we would have proper XHTML modules, instead of the half-baked Web Components we have today. Etc.

Instead, we got standards built on poorly specified conventions, and we still have to rely on 3rd-party frameworks to build anything beyond a toy web site.

Stricter web documents wouldn't have fixed all our problems, but they would have certainly made a big impact for the better.

And add:

Yes, there were some initial usability quirks, but those could've been ironed out over time. Trading the potential of a strict markup standard for what we have today was a colossal mistake.

recursive 10/27/2025||
There's no way it could have gained traction. Consider two browsers. One follows the spec explicitly, and one goes into "best-effort" mode on encountering invalid markup. End users aren't going to care about the philosophical reasoning for why Browser A doesn't show them their school dance recital schedule.

Consider JSON and CSV. Both have formal specs. But in the wild, most parsers are more lenient than the spec.

WorldMaker 10/27/2025|||
Which is also largely what happened: HTML 5 is in some ways that "best-effort" mode, standardized by a different standards body to route around XHTML's philosophies.
ifwinterco 10/27/2025|||
Yeah this is it. We can debate what would be nicer theoretically until the cows come home but there's a kind of real world game theory that leads to browsers doing their best to parse all kinds of slop as well as they can, and then subsequently removing the incentive for developers and tooling to produce byte perfect output
haskellshill 10/27/2025|||
It had too much unnecessary metadata yes, but case insensitivity is always the wrong way to do stuff in programming (e.g. case insensitive file system paths). The only reason you'd want it is for real-world stuff like person names and addresses etc. There's no reason you'd mix the case of your CSS classes anyway, and if you want that, why not also automatically match camelCase with snake_case with kebab-case?
shiomiru 10/27/2025||||
> It would greatly simplify browser code and HTML spec.

I doubt it would make a dent - e.g. in the "skipping <head>" case, you'd be replacing the error recovery mechanism of "jump to the next insertion mode" with "display an error", but a) you'd still need the code path to handle it, b) now you're in the business of producing good error messages which is notoriously difficult.

Something that would actually make the parser a lot simpler is removing document.write, which has been obsolete ever since the introduction of the DOM and whose main remaining real world use-case seems to be ad delivery. (If it's not clear why this would help, consider that document.write can write scripts that call document.write, etc.)

bazoom42 10/27/2025||||
> I wish all these tolerances wouldn't exist in HTML5 and browsers simply showed an error, instead of being lenient.

Who would want to use a browser which would prevent many currently valid pages from being shown?

zelphirkalt 10/27/2025||
I mean, I am obviously talking about a fictive scenario, a somewhat better timeline/universe. In such a scenario, the shoddy practices of not properly closing tags and leaning on leniency in browser parsing and sophisticated fallbacks and all that would not have become a practice and those many currently valid websites would mostly not have been created, because as someone tried to create them, the browsers would have told them no. Then those people would revise their code, and end up with clean, easier to parse code/documents, and we wouldn't have all these edge and special cases in our standards.

Also obviously that's unfortunately not the case today in our real world. Doesn't mean I cannot wish things were different.

Aransentin 10/27/2025|||
I agree for sure, but that's a problem with the spec, not the website. If there are multiple ways of doing something you might as well do the minimal one. The parser will have always to be able to handle all the edge cases no matter what anyway.

You might want always consistently terminate all tags and such for aesthetic or human-centered (reduced cognitive load, easier scanning) reasons though, I'd accept that.

chrismorgan 10/27/2025|||
<html>, <head> and <body> start and end tags are all optional. In practice, you shouldn’t omit the <html> start tag because of the lang attribute, but the others never need any attributes. (If you’re putting attributes or classes on the body element, consider whether the html element is more appropriate.) It’s a long time since I wrote <head>, </head>, <body>, </body> or </html>.
qingcharles 10/27/2025|||
> Note that <html> and <body> auto-close and don't need to be terminated.

You monster.

tannhaeuser 10/27/2025|||
Not only do html and body auto-close, their tags including start-element tags can be omitted alltogether:

    <title>Shortest valid doc</title>
    <p>Body text following here
(cf explainer slides at [1] for the exact tag inferences SGML/HTML does to arrive at the fully tagged doc)

[1]: https://sgmljs.sgml.net/docs/html5-dtd-slides-wrapper.html (linked from https://sgmljs.sgml.net/blog/blog1701.html)

alt187 10/27/2025|||
I'm not sure I'd call keeping the <body> tag open satisfying but it is a fun fact.
nodesocket 10/27/2025|||
Didn't know you can omit <head> .. </head> but I prefer for clarify to keep them.
bentley 10/27/2025||
Do you also spell out the implicit <tbody> in all your tables for clarity?
ndegruchy 10/27/2025|||
I do.

`<thead>` and `<tfoot>`, too, if they're needed. I try to use all the free stuff that HTML gives you without needing to reach for JS. It's a surprising amount. Coupled with CSS and you can get pretty far without needing anything. Even just having `<template>` with minimal JS enables a ton of 'interactivity'.

christophilus 10/27/2025||||
Yes. Explicit is almost always better than implicit, in my experience.
tracker1 10/27/2025|||
Sometimes... especially if a single record displays across more than a single row.

I almost always use thead.

busymom0 10/27/2025||
If I don't close something I opened, I feel weird.
reconnecting 10/27/2025||
I wish I could use this one day again to make my HTML work as expected.

<bgsound src="test.mid" loop=3>

wpollock 10/27/2025||
I appreciate this post! I was hoping you would add an inline CSS style sheet to take care of the broken defaults. I only remember one off the top of my head, the rule for monospace font size. You need something like:

   code, pre, tt, kbd, samp {
     font-family: monospace, monospace;
   }
But I vaguely remember there are other broken CSS defaults for links, img tags, and other stuff. An HTML 5 boilerplate guide should include that too, but I don't know of any that do.
keane 10/27/2025|
Paired with H5BP you can use Normalize.css (as an alternative to a reset like http://meyerweb.com/eric/tools/css/reset/) found at https://github.com/necolas/normalize.css/blob/master/normali...

There's also this short reset: https://www.joshwcomeau.com/css/custom-css-reset/

orliesaurus 10/27/2025||
Quirks quirks aside there are other ways to tame old markup...

If a site won't update itself you can... use a user stylesheet or extension to fix things like font sizes and colors without waiting for the maintainer...

BUT for scripts that rely on CSS behaviors there is a simple check... test document.compatMode and bail when it's not what you expect... sometimes adding a wrapper element and extracting the contents with a Range keeps the page intact...

ALSO adding semantic elements and ARIA roles goes a long way for accessibility... it costs little and helps screen readers navigate...

Would love to see more community hacks that improve usability without rewriting the whole thing...

chrisofspades 10/27/2025||
If your IDE supports Emmet (supported by VS Code out of the box) then you can use "!"-tab to get the same tags.
Grom_PE 10/27/2025||
I hate how because of iPhone and subsequent mobile phones we have bad defaults for webpages so we're stuck with that viewport meta forever.

If only we had UTF-8 as a default encoding in HTML5 specs too.

jonhohle 10/27/2025|
I came here to say the same regarding UTF-8. What a huge miss and long overdue.

I’ve had my default encoding set to UTF-8 for probably 20 years at this point, so I often miss some encoding bugs, but then hit others.

jraph 10/27/2025||
> <!doctype html> is what you want for consistent rendering. Or <!DOCTYPE HTML> if you prefer writing markup like it’s 1998. Or even <!doCTypE HTml> if you eschew all societal norms. It’s case-insensitive so they’ll all work.

And <!DOCTYPE html> if you want polyglot (X)HTML.

nikeee 10/27/2025||
I tend to lower-case all my HTML because it has less entropy and therefore can be compressed more effectively.

But in case of modern compression algorithms, some of them come with a pre-defined dictionary for websites. These usually contain the common stuff like <!DOCTYPE html> in its most used form. So doing it like everybody else might even make the compression even more effective.

bombcar 10/27/2025||
We need HTML Sophisticated - <!Dr. Type, HtML, PhD>
isolay 10/27/2025||
The "without meta utf-8" part of course depends on your browser's default encoding.
kevin_thibedeau 10/27/2025|
What mainstream browsers aren't defaulting to utf-8 in 2025?
Eric_WVGG 10/27/2025|||
I spent about half an hour trying to figure out why some JSON in my browser was rendering è incorrectly, despite the output code and downloaded files being seemingly perfect. I came to the conclusion that the browsers (Safari and Chrome) don't use UTF-8 as the default renderer for everything and moved on.

This should be fixed, though.

layer8 10/27/2025||||
I wouldn’t be surprised if they don’t for pages loaded from local file URIs.
akho 10/27/2025||||
html5 does not even allow any other values in <meta charset=>. I think you need to use a different doctype to get what the screenshot shows.
layer8 10/27/2025||
While true, they also require user agents to support other encodings specified that way: https://html.spec.whatwg.org/multipage/parsing.html#characte...

Another funny thing here is that they say “but not limited to” (the listed encodings), but then say “must not support other encodings” (than the listed ones).

shiomiru 10/27/2025||
It says

> the encodings defined in Encoding, including, but not limited to

where "Encoding" refers to https://encoding.spec.whatwg.org (probably that should be a link.) So it just means "the other spec defines at least these, but maybe others too." (e.g. EUC-JP is included in Encoding but not listed in HTML.)

layer8 10/27/2025||
Ah, I understood it to refer to encoding from the preceding section.
naniwaduni 10/27/2025|||
All of them, pretty much.
More comments...