Practical applications of string-like bytes methods?

kknechtel · July 24, 2023, 8:04pm

Continuing the discussion from Str.dedent vs str.removeindent:

I have been wondering, given my own distaste for those methods (explained earlier in that thread) - have there been any surveys of the use of text-like bytes methods in 3.x code “in the wild”? I understand that it doesn’t accomplish much to just remove existing functionality (I can’t see how it would meaningfully allow for optimization), and I assume that changing the repr for the type is a non-starter - which is why I’m not posting in Ideas or anything.

But I’d be interested to see some motivating examples for why so much support was kept around for (mis-)treating bytes as a textual type. I thought that a big part of the reason for bumping the major version number and introducing all the breaking changes at once, was exactly so that people would be forced to confront those kinds of bad habits.

barry-scott · July 24, 2023, 9:05pm

Are you looking for justification for string like API fir bytes?

I work on HTTP requests with python and use bytes objects and their string like methods all the time. You can see that type of usage in the twisted library for example.

kknechtel · July 24, 2023, 9:56pm

Yes, precisely that sort of thing. I was hoping for more concrete references, because I don’t really know where exactly to look.

barry-scott · July 25, 2023, 6:40am

From my point of view its a mandatory feature to have strings like methods on bytes.
The core devs also felt the same when the methods where added.
As we all know a lot of care is taken to not increase the costs of maintaining python.

Are you saying that these decisions where wrong?

flyinghyrax · July 25, 2023, 10:20pm

…lol no?

@kknechtel was very clear about what they were asking:

As such I find this counter question extremely confusing.

————-

I don’t have in the wild examples, but I do like having the string-ish ‘strip” methods available, since you can specify what to remove. Upper/lower-casing always seemed a little weird to me, though.

brettcannon · July 26, 2023, 12:14am

The reason was to make some breaking changes that we thought made sense while still trying to make it somewhat reasonable to port preexisting code over. Since str in Python 2 was being used for binary data already, ripping out methods people were using for bytes-like data from bytes itself would have made the transition even harder. Hence why Python 3 bytes is effectively Python 2 str.

tjreedy · July 26, 2023, 6:57am

bytes and bytearray ‘text’ methods are most needed for low-level operations where one reads a block, examines and maybe alters some bytes, and retransmits – and speed is essential. For most Python programmers who enjoy the object model, working directly in encoded bytes – and sometimes bits – is not as fun. But we should appreciate the people who do such work.

barry-scott · July 26, 2023, 7:11am

HTTP headers are defined as case-blind ASCII.
It common to see a header written as camel case but compared as lower case.

header = b’ContentLength: 457’
If header.split(b’:’)[0].lower() == b’contentlength’:
     …

flyinghyrax · July 26, 2023, 3:38pm

That makes sense, today I learned! Great example.

barry-scott · July 26, 2023, 4:07pm

At work we execute bytes.lower around 650x10**9 times a day I estimate.

Rosuav · July 26, 2023, 4:20pm

I’m intrigued by the specificity there, is that actually a meaningful estimate or just a random large number?

barry-scott · July 26, 2023, 4:26pm

Its based on the transaction rate we process with an assumption about the number of headers on each HTTP request on average.
I used 10 headers on the request and 10 on the response.

Rosuav · July 26, 2023, 4:27pm

Makes sense!

petersuter · July 26, 2023, 7:57pm

I seem to remember Mercurial was a big proponent of string-like bytes methods, and holdout on Python 3 migration until bytes % args was added in Python 3.5.