FedSearch - Federated network search engine

MattPounsett · @MattPounsett

76 followers · 337 posts · Server fosstodon.org

I really love @dylanbeattie's talks.

I've seen the previous version of this that he references at the start, but watched this anyway, because it's a great talk.

Life as a sysadmin has taught me a lot of the lessons in here, but there's SO MUCH more background covered than I ever knew. So, still very useful.

https://youtu.be/gd5uJ7Nlvvo

#UTF #PlainText #CharacterEncoding #PikeMatchbox

#utf #plaintext #characterencoding #pikematchbox

Last updated 2 years ago

Original post

Evan Hahn · @EvanHahn

859 followers · 226 posts · Server bigshoulders.city

You might've heard of ASCII or UTF-8. These character encodings built by very smart people.

I just built “UTF-21”, an impractical alternative that only a fool would use. Read about it (with a short Unicode crash course) here: https://evanhahn.com/utf-21/

#Unicode #UTF8 #ASCII #CharacterEncoding #programming

#unicode #utf8 #ascii #characterencoding #programming

Last updated 2 years ago

Original post

smxi · @smxi

16 followers · 202 posts · Server fosstodon.org

If you have been spared #characterencoding hell, then consider yourself fortunate. Every time I start to dig into it, I marvel at how all this mess could have been avoided with just a little foresight, basically as soon as ascii only stopped being the norm, just create a container format for any text files, which would work the same as any other media containers, basically have a file header, that says, for example, this is iso-8859-1, cp-1252, utf8, or whatever. Would've removed all ambiguity.

#characterencoding

Last updated 3 years ago

Original post

smxi · @smxi

14 followers · 130 posts · Server fosstodon.org

As usual with anything involving #CharacterEncoding handling, this turned out to be far more difficult than hoped. Currently it appears that #Perl core module Encoding::Guess catches all valid CP-1252 files, but misses some valid #UTF8, so I added in a legacy fallback test to detect the UTF8 that failed. This seems to catch most UTF8 now. Since #acxi doesn't do anything with UTF8, that's fine. #ASCII detection solid of course. Testing on large datasets, and seems to work reliably now.

#characterencoding #perl #utf8 #acxi #ascii

Last updated 3 years ago

Original post