Posted by FromTheArchives 1 day ago
Also, wrapping the <head> tags in an actual <head></head> is optional.
You also don't need the quotes as long the attribute doesn't have spaces or the like; <html lang=en> is OK.
(kind of pointless as the average website fetches a bazillion bytes of javascript for every page load nowadays, but sometimes slimming things down as much as possible can be fun and satisfying)
What this achieves is making the syntax more irregular and harder to parse. I wish all these tolerances wouldn't exist in HTML5 and browsers simply showed an error, instead of being lenient. It would greatly simplify browser code and HTML spec.
> I wish all these tolerances wouldn't exist in HTML5 and browsers simply showed an error, instead of being lenient.
They (W3C) tried that with XHTML. It was soundly rejected by webpage authors and by browser vendors. Nobody wants the Yellow Screen of Death. https://en.wikipedia.org/wiki/File:Yellow_screen_of_death.pn...
It actually the XHTML 2.0 specification [1] that discarded backwards compatibility with HTML 4 was the straw that broke the camel's back. No more forms as we knew them, for example; we were supposed to use XFORMS.
That's when WHATWG was formed and broke with the W3C and created HTML5.
Thank goodness.
XHTML 2.0 didn't even really discard backwards-compatibility that much: it had its compatibility story baked in with XML Namespaces. You could embed XHTML 1.0 in an XHTML 2.0 document just as you can still embed SVG or MathML in HTML 5. XForms was expected to take a few more years and people were expecting to still embed XHTML 1.0 forms for a while into XHTML 2.0's life.
At least from my outside observer perspective, the formation of WHATWG was more a proxy war between the view of the web as a document platform versus the view of the web as an app platform. XHTML 2.0 wanted a stronger document-oriented web.
(Also, XForms had some good ideas, too. Some of what people want in "forms helpers" when they are asking for something like HTMX to standardized in browsers were a part of XForms such as JS-less fetch/XHR with in-place refresh for form submits. Some of what HTML 5 slowly added in terms of INPUT tag validation are also sort of "backports" from XForms, albeit with no dependency on XSD.)
That said, actually writing HTML that can be parsed via an XML parser is generally a good, neighborly thing to do, as it allows for easier scraping and parsing through browsers and non-browser applications alike. For that matter, I will also add additional data-* attributes to elements just to make testing (and scraping) easier.
Well, to parsing it for machines yes, but for humans writing and reading it they are helpful. For example, if you have
<p> foo
<p> bar
and change it to <div> foo
<div> bar
suddenly you've got a syntax error (or some quirks mode rendering with nested divs).The "redundancy" of closing the tags acts basically like a checksum protecting against the "background radiation" of human editing. And if you're writing raw HTML without an editor that can autocomplete the closing tags then you're doing it wrong anyway. Yes that used to be common before and yes it's a useful backwards compatibility / newbie friendly feature for the language, but that doesn't mean you should use it if you know what you're doing.
But my summarization is that the reason it doesn't work is that strict document specs are too strict for humans. And at a time when there was legitimate browser competition, the one that made a "best effort" to render invalid content was the winner.
> And at a time when there was legitimate browser competition, the one that made a "best effort" to render invalid content was the winner.
Yes, my point is that there is no reason to still write "invalid" code just because it's supported for backwards compatibility reasons. It sounds like you ignored 90% of my comment, or perhaps you replied to the wrong guy?
Close tags for <script> are required. But if people start treating it like XML, they write <script src="…" />. But that fails, because the script element requires closure, and that slash has no meaning in XML.
I think validity matters, but you have to measure validity according to the actual spec, not what you wish it was, or should have been. There's no substitute for actually knowing the real rules.
In contrast, paragraphs and lists do enclose content, so IMO they should have clear delineations - if nothing else, to make visually understanding the code more clear.
I’m also sure that someone will now reference another HTML attribute I didn’t think about that breaks my analogy.
Including closing tags as a general rule might make readers think that they can rely on their presence. Also, in some cases they are prohibited. So you can't achieve a simple evenly applied rule anyway.
And I do think there's an evenly applied rule, namely: always explicitly close all non-void elements. There are only 14 void elements anyway, so it's not too much to expect readers to know them. In your own words "there's no substitute for actually knowing the real rules".
I mean, your approach requires memorizing for which 15 elements the closing tag can be omitted anyway (otherwise you'll mentally parse the document wrong (i.e. thinking a br tag needs to be closed is equally likely as thinking p tags can be nested)).
The risk that somebody might be expecting a closing tag for an hr element seems minuscule and is a small price to pay for conveniences such as (as I explained above) being able to find and replace a p tag or a li tag to a div tag.
I'm not opposed to closing <li> tags as a general a general practice. But I don't think it provides as much benefit as you're implying. Valid HTML has a number of special rules like this. Like different content parsing rules for <textarea> and <script>. Like "foreign content".
If you try to write lint-passing HTML in the hopes that you could change <li> to <div> easily, you still have to contend with the fact that such a change cannot be valid, except possibly as a direct descendant of <template>.
The fact XHTML didn't gain traction is a mistake we've been paying off for decades.
Browser engines could've been simpler; web development tools could've been more robust and powerful much earlier; we would be able to rely on XSLT and invent other ways of processing and consuming web content; we would have proper XHTML modules, instead of the half-baked Web Components we have today. Etc.
Instead, we got standards built on poorly specified conventions, and we still have to rely on 3rd-party frameworks to build anything beyond a toy web site.
Stricter web documents wouldn't have fixed all our problems, but they would have certainly made a big impact for the better.
And add:
Yes, there were some initial usability quirks, but those could've been ironed out over time. Trading the potential of a strict markup standard for what we have today was a colossal mistake.
Consider JSON and CSV. Both have formal specs. But in the wild, most parsers are more lenient than the spec.
I doubt it would make a dent - e.g. in the "skipping <head>" case, you'd be replacing the error recovery mechanism of "jump to the next insertion mode" with "display an error", but a) you'd still need the code path to handle it, b) now you're in the business of producing good error messages which is notoriously difficult.
Something that would actually make the parser a lot simpler is removing document.write, which has been obsolete ever since the introduction of the DOM and whose main remaining real world use-case seems to be ad delivery. (If it's not clear why this would help, consider that document.write can write scripts that call document.write, etc.)
Who would want to use a browser which would prevent many currently valid pages from being shown?
Also obviously that's unfortunately not the case today in our real world. Doesn't mean I cannot wish things were different.
You might want always consistently terminate all tags and such for aesthetic or human-centered (reduced cognitive load, easier scanning) reasons though, I'd accept that.
You monster.
<title>Shortest valid doc</title>
<p>Body text following here
(cf explainer slides at [1] for the exact tag inferences SGML/HTML does to arrive at the fully tagged doc)[1]: https://sgmljs.sgml.net/docs/html5-dtd-slides-wrapper.html (linked from https://sgmljs.sgml.net/blog/blog1701.html)
`<thead>` and `<tfoot>`, too, if they're needed. I try to use all the free stuff that HTML gives you without needing to reach for JS. It's a surprising amount. Coupled with CSS and you can get pretty far without needing anything. Even just having `<template>` with minimal JS enables a ton of 'interactivity'.
I almost always use thead.
<bgsound src="test.mid" loop=3>
code, pre, tt, kbd, samp {
font-family: monospace, monospace;
}
But I vaguely remember there are other broken CSS defaults for links, img tags, and other stuff. An HTML 5 boilerplate guide should include that too, but I don't know of any that do.There's also this short reset: https://www.joshwcomeau.com/css/custom-css-reset/
If a site won't update itself you can... use a user stylesheet or extension to fix things like font sizes and colors without waiting for the maintainer...
BUT for scripts that rely on CSS behaviors there is a simple check... test document.compatMode and bail when it's not what you expect... sometimes adding a wrapper element and extracting the contents with a Range keeps the page intact...
ALSO adding semantic elements and ARIA roles goes a long way for accessibility... it costs little and helps screen readers navigate...
Would love to see more community hacks that improve usability without rewriting the whole thing...
If only we had UTF-8 as a default encoding in HTML5 specs too.
I’ve had my default encoding set to UTF-8 for probably 20 years at this point, so I often miss some encoding bugs, but then hit others.
And <!DOCTYPE html> if you want polyglot (X)HTML.
But in case of modern compression algorithms, some of them come with a pre-defined dictionary for websites. These usually contain the common stuff like <!DOCTYPE html> in its most used form. So doing it like everybody else might even make the compression even more effective.
<meta name="viewport" content="width=device-width,initial-scale=1.0">
width=device-width is actually redundant and cargo culting. All you need is initial-scale. I explain in a bit more detail here: https://news.ycombinator.com/item?id=36112889Where do the standards say it ought to work?
This should be fixed, though.
Another funny thing here is that they say “but not limited to” (the listed encodings), but then say “must not support other encodings” (than the listed ones).
> the encodings defined in Encoding, including, but not limited to
where "Encoding" refers to https://encoding.spec.whatwg.org (probably that should be a link.) So it just means "the other spec defines at least these, but maybe others too." (e.g. EUC-JP is included in Encoding but not listed in HTML.)