Problems with a code sample from a book

I read in from a file.
path = “C:\Users\tech_worker\segismundo.txt”

Strip EOL characters.
lines = [x.rstrip() for x in open(path, encoding=“utf-8”)]

Here are the first few lines of my data:

lines
[‘Sueña el rico en su riqueza,’,
‘que más cuidados le ofrece;’,‘’,
‘sueña el pobre que padece’,
‘’]

I try to read the first 10 characters, which should return “Sueña el r” -
But, I’m getting odd characters (below) instead.
f1.read(10)
'Sueña el ’

Any suggestions?

Something’s gone wrong in the encoding and decoding. When you tried to read ten characters, did you open the file with encoding=“utf-8” same as in the example here?

Yes, that is where I had it wrong.

I open the file like this
f1 = open(path)

And got this result:
f1.read(10)
'Sueña el ’

Then, I opened the file like this -
f1 = open(path, encoding=“utf-8”)

And got the correct results:
f1.read(10)
‘Sueña el r’

Thank you,

Yep! That would be it :slight_smile: Getting file encodings correct can be hard, but following a discipline of “always encode and decode UTF-8” will usually make things work fairly well.

What version of Python are you using? Some of these things have become easier with newer Python versions.

In the future, please make sure to show a proper minimal, reproducible example when asking questions about programming anywhere online. That will make it easier to understand the problem, and ensure that there hasn’t been some unrelated problem (e.g. forgetting to save changes to a file before running it). The code that you show in the first question can’t possibly be right, because a path like “C:\Users\tech_worker\segismundo.txt” will not work - the quotes are wrong, and the backslashes are not escaped. Although I think both of these happen simply because you have not properly formatted the code, as described in the pinned thread.

As you found, the problem was with specifying the encoding for the open call, but your original code shows that being done properly - so that can’t be what actually went wrong. It’s important to be able to reproduce problems in order to talk about them properly.

Python version - 3.10.9

As suggested, proper formatting:

path = "C:\\Users\\tech_worker\\segismundo.txt"

This:

f1 = open(path)

Should have been this:

f1 = open(path, encoding="utf-8")

Going forward, I’ll format things correctly and include working examples.

Thank you for your comments.

Note that on many (all?) *nix systems, (and some recent Windows as well), utf-8 is the default encoding, so a LOT of people will omit it and it will work fine, and then someone else gets bitten :frowning:

This is because Python uses the system default encoding, (locale.getpreferredencoding(False)), and Windows systems often had other encodings – IIUC, Windows is moving teh utf-8, but I"msure this will be an issue for a long time.

Python itself is moving to make utf-8 the default, but I have no idea when that might happen – it could break a fair bit of code in Windows machines :frowning: