The same way as any Python 3 on any operating system.
“Extended ASCII” hasn’t been a particularly useful or meaningful term for quite some time. Unicode is the standard.
There are two separate concepts needed to understand Unicode: the encoding (rules that say how to re-interpret the bytes as a sequence of “code point” numbers) and the Unicode mapping (rules that assign an actual character to each code point - i.e., explain which character is which, what kind of script it comes from, how to position it graphically, and other such character properties). But with Unicode, the latter is always the same. There is a common Unicode database, which is versioned and maintained by the Unicode Consortium. So the interesting part is the encoding.
Dealing with all the things that can be done with text, really properly, in a way that works for every language, etc. etc. is incredibly complex. But fortunately for you, the operating system and other built-in libraries are responsible for most of that, and almost all of the rest is stuff that you won’t have to worry about if you just want to read and write text that someone else prepared for you. The hard part comes when you need to worry about language-specific rules for upper/lowercasing, sorting etc. Python has a little of this built in (see e.g. str.casefold
).
When you open the CSV file, simply specify the encoding
of the file, according to how the file is encoded. This is something you have to figure out, either by knowing (because you wrote the file), being told (someone else left you some metadata somewhere), or guessing (trying some options until it works, or looking at the raw bytes first and doing some analysis, or using a third-party library to do that analysis). After that, Python takes care of the rest.
Excel files are much more complex than CSV - they store a lot more information than just the text of each “cell”, and they have a more complex internal structure. You should use a third-party library to process them.
But as long as you know the encoding of the input file, “European” (there are separate sets of characters for several major alphabets, even when they superficially look the same) or “Japanese” (there is a large block of “unified CJK” characters that include kanji, and then separate blocks for katakana and hiragana) characters won’t cause a problem.