Top
Best
New

Posted by vishnuharidas 22 hours ago

UTF-8 is a brilliant design(iamvishnu.com)
677 points | 268 commentspage 5
smoyer 15 hours ago|
Uvarint also has the property of a file containing only ascii characters still being a valid ascii file.
sjapkee 7 hours ago||
Until you interact with it as a programmer
kevincox 17 hours ago||
> Every ASCII encoded file is a valid UTF-8 file.

More importantly, that file has the same meaning. Same with the converse.

sheerun 21 hours ago||
I'll mention IPv6 as bad design that could have been potentially UTF-8-like success story
tialaramex 21 hours ago|
No. UTF-8 is for encoding text, so we don't need to care about it being variable length because text was already variable length.

The network addresses aren't variable length, so if you decide "Oh IPv6 is variable length" then you're just making it worse with no meaningful benefit.

The IPv4 address is 32 bits, the IPv6 address is 128 bits. You could go 64 but it's much less clear how to efficiently partition this and not regret whatever choices you do make in the foreseeable future. The extra space meant IPv6 didn't ever have those regrets.

It suits a certain kind of person to always pay $10M to avoid the one-time $50M upgrade cost. They can do this over a dozen jobs in twenty years, spending $200M to avoid $50M cost and be proud of saving money.

jrochkind1 15 hours ago||
It really is, in so many ways.

It is amazing how successful it's been.

transfire 6 hours ago||
So brilliant that we’re all still using ASCII!†

† With an occasional UNICODE flourish.

quotemstr 21 hours ago||
Great example of a technology you get from a brilliant guy with a vision and that you'll never get out of a committee.
librasteve 21 hours ago||
some insightful unicode regex examples...

https://dev.to/bbkr/utf-8-internal-design-5c8b

hamburglar 19 hours ago|
Regex? Did you link to the wrong page? I see no regexes on that page.
carlos256 20 hours ago|
No, it's not. It's just a form of Elias-Gamma coding.
carlos256 20 hours ago|
* unary encoding coding.
More comments...