Someone ran a modern Linux OS on a 30-year-old CPU, and it's surprisingly usable

Lee Duna@lemmy.nz · 1 day ago

Someone ran a modern Linux OS on a 30-year-old CPU, and it's surprisingly usable

higgsboson@piefed.social · 1 day ago

I assume because ASCII has a much smaller character set than unicode. ASCII fits in 7 bits, vs 21 bits for unicode.

https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/

balsoft@lemmy.ml · edit-2 24 hours ago

I understand Unicode and its various encodings (UTF-8, UTF-16, UTF-32) fairly well. UTF-8 is backwards-compatible with ASCII and only takes up the extra bytes if you are using characters outside of the 0x00-0x7F range. E.g. this comment I’m writing is simultaneously valid UTF-8 and valid ASCII.

I’d like to see some good evidence for the claim that Unicode support increases memory usage so drastically. Especially given that most data in RAM is typically things other than encoded text (e.g. videos, photos, internal state of software).

Frezik@lemmy.blahaj.zone · 24 hours ago

It’s not so much character length from any specific encodings. It’s all the details that go into supporting it. Can’t assume text is read left to right. Can’t assume case insensitivity works the same way as your language. Can’t assume the shape of the glyph won’t be affected by the glyph next to it. Can’t assume the shape of a glyph won’t be affected by a glyph five down.

Pile up millions of these little assumptions you can no longer make in order to support every written language ever. It gets complicated.

The_Decryptor@aussie.zone · 16 hours ago

Yeah, but that’s still not a lot of data, like LTR/RTL shouldn’t be varying within a given script so the values will be shared over an entire range of characters.