Why does the compiler understand ASCII by default?

printf("%s", name);@piefed.blahaj.zone · edit-2 28 days ago

Why does the compiler understand ASCII by default?

Aaron “Abolish ICE” Madlon-Kay@mastodon.social · 28 days ago

@akunohana The compiler doesn’t need to know that 0x18 is CAN; that knowledge is embedded in whatever decided that the data you’re inspecting is UTF-8 or ASCII.

The content of your original post has been replaced with a link that I can’t open so I can’t go back and confirm where you said the data was coming from. But if the data was some other exotic encoding then 0x18 would mean something else in the context of that data.

printf("%s", name);@piefed.blahaj.zone · 28 days ago

Oh, maybe I messed something up when editing… Here’s what I wrote:

I was experimenting with sanitizing user input In other words, I’m simply prompting for user input, read to a character array with fgets.

What it “that thing” that decided the encoding?

Aaron “Abolish ICE” Madlon-Kay@mastodon.social · 28 days ago

@akunohana OK that’s a pretty good question then. In that case the encoding is determined by your terminal (or if not terminal then execution environment). Try invoking env (or locale) and looking at LANG and LC_ALL; those should tell you what your terminal accepts as input and passes along to your program.

printf("%s", name);@piefed.blahaj.zone · 28 days ago

Sweet! I think this is the answer that I was looking for, although my post is poorly phrased. 😅

Does this mean that in theory, there could arise problems with portability? 98-ish percent of all systems use Unicode, but if I were to run my program on an obscure system whose underlying character encoding is not Unicode or some superset of ASCII, I assume it would return other values?

Aaron “Abolish ICE” Madlon-Kay@mastodon.social · 28 days ago

@akunohana Yes! Although I would say that ASCII is a pretty safe assumption and it’s really anything above the top of ASCII that you need to account for (document as a requirement for your program, or take steps to ensure the OS uses the right encoding if you are packaging something for distribution)