Sorry, False Alarm. Should 'for line in m.readlines():' be the same as 'lines=m.readlines(); for line in lines:'?

Should these two code fragments have identical results? They do not. m is a file handle for a file in a zip archive. The first requires me to use decode to change line to a string. The second sometimes converts input to a string requiring me to check that it is not a string before using decode.

for line in m.readlines()

lines = m.readlines()
for line in lines:

for line in m.readlines()

lines = m.readlines()
for line in lines:

Should these two code fragments have identical results?

For a plain text regular file, I’d have thought so.

They do not. m is a file handle for a file in a zip archive.

But this is different. Can you should us the code which produces m?

The first requires me to use decode to change line to a string.

This suggests to me that m is a file opened in binary mode, not text
mode. As such, the .readlines() method returns instances of btes, not
instances of str. Converting to a str requires knowing the text
encoding used in the binary file. These days that is often UTF-8, but it
is by no means always so.

The second sometimes converts input to a string requiring me to check
that it is not a string before using decode.

That really surprises me. I would expect to always get bytes, or
always get str. Can you demonstrate this behaviour?

1 Like

Just for the record, you basically shouldn’t ever use either. Instead, just iterate over the file object directly:

for line in m:
    ...

This is equivalent to for line in m.readlines() but is shorter, simpler and much more efficient (memory-efficient for sure, and possibly also more time-efficient if you exit the iteration early), as it just reads the file line by line instead of loading the entire file into memory, splitting it into a list and then iterating over that whole list.

The one plausible reason I can think of why the lines=m.readlines(); for line in lines option could make sense is if you need the entire list to re-use later in some other non-loop operation, though there may be other better strategies for that too depending on the specific use case.

2 Likes

Also, on an administrative note—updating the title is fine, but please don’t wipe your OP to indicate the problem is solved (with no explanation as to why/how); that’s considered rather rude and inconsiderate, as it means others who have the same question cannot easily find or reference it, and the replies from people who spent their volunteer time helping you will now be out of context. Furthermore, web users may not get notified and email users like @cameron will never see it at all. As such, I’ve reverted it for you.

Instead, simply post your message as a reply, and in it you really should also actually state what the “bug that caused the symptom” was, to help others who might have that problem just like people here have helped you.