Posted by vishnuharidas 16 hours ago
https://commandcenter.blogspot.com/2020/01/utf-8-turned-20-y...
UTF-8 made processing Japanese text much easier! No more needing to manually change encoding options in my browser! No more mojibake!
A couple of days later, I got an email from someone explaining that it was gibberish — apparently our content partner who claimed to be sending GB2312 simplified Chinese was in fact sending us Big5 traditional Chinese so while many of the byte values mapped to valid characters it was nonsensical.
https://www.joelonsoftware.com/2003/10/08/the-absolute-minim...
So I went around fixing UnicodeErrors in Python at random, for years, despite knowing all that stuff. It wasn't until I read Batchelder's piece on the "Unicode Sandwich," about a decade later that I finally learned how to write a program to support it properly, rather than playing whack-a-mole.