I really love @dylanbeattie's talks.
I've seen the previous version of this that he references at the start, but watched this anyway, because it's a great talk.
Life as a sysadmin has taught me a lot of the lessons in here, but there's SO MUCH more background covered than I ever knew. So, still very useful.
#utf #plaintext #characterencoding #pikematchbox
You might've heard of ASCII or UTF-8. These character encodings built by very smart people.
I just built “UTF-21”, an impractical alternative that only a fool would use. Read about it (with a short Unicode crash course) here: https://evanhahn.com/utf-21/
#unicode #utf8 #ascii #characterencoding #programming
If you have been spared #characterencoding hell, then consider yourself fortunate. Every time I start to dig into it, I marvel at how all this mess could have been avoided with just a little foresight, basically as soon as ascii only stopped being the norm, just create a container format for any text files, which would work the same as any other media containers, basically have a file header, that says, for example, this is iso-8859-1, cp-1252, utf8, or whatever. Would've removed all ambiguity.
As usual with anything involving #CharacterEncoding handling, this turned out to be far more difficult than hoped. Currently it appears that #Perl core module Encoding::Guess catches all valid CP-1252 files, but misses some valid #UTF8, so I added in a legacy fallback test to detect the UTF8 that failed. This seems to catch most UTF8 now. Since #acxi doesn't do anything with UTF8, that's fine. #ASCII detection solid of course. Testing on large datasets, and seems to work reliably now.
#characterencoding #perl #utf8 #acxi #ascii
Gothic characters on wayside cross?
#characterencoding #waysideshrine #waysidecross #Budaörs
#Budaörs #characterencoding #waysideshrine #waysidecross