Why not make str.join() coerce the items in its iterables

Rosuav · March 25, 2023, 1:02am

Not just backward compatibility. A huge number of protocols, file formats, etc, are a mixture of text (usually ASCII) and non-textual data. That’s why we have PEP 461 adding percent formatting to bytes objects, not just because Python 2 had less distinction.

However, in this particular case, “coercing to bytes” makes a lot less sense than coercing to strings does.

vovavili · March 26, 2023, 10:59pm

I feel like this proposal satisfies every complaint I have seen in this thread so far.

jsbueno · March 27, 2023, 4:22pm

So, as far as “coercing for bytes” is an issue, I’d say it is also resolved.

Just as calling str is the standard way in Python to call for a string representation of an object,
which is actually a protocol that will call __str__ - which if the class author bothers at all
to have the object as a string should have been customized, or default to __repr__.

when one comes to bytes, there is no more natural thing to think that bytes.join should call an
object’s __bytes__ method if there is one, or error otherwise. No ambiguity, no backwards incompatiility,
lots of functionality.

Again, both “str” boiling down to __str__ falling back go __repr__ and bytes() calling
__bytes__ are already standard interfaces fo all objects in the language, time tested for
more than 2 decades.

I really can’t figure out the objections, but coming from a mindset of artificial object constraining that
definitely is not the way things work on Python - otherwise it would’ ve been a static typed
language. It is in no way “guessing” if there is a standard way to do the conversion.

barry-scott · March 27, 2023, 4:37pm

No this does not work for unicode data. You have to know the encoding to use.

:>>> bytes('€uro')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: string argument without an encoding

No this does not work for int data. You get a bytes object the length of the int full of \x00.

:>>> bytes(3)
b'\x00\x00\x00'

kknechtel · March 27, 2023, 5:48pm

Except for the parts where

the str type clearly documents the use of the corresponding dunder, while the bytes type on the same documentation page doesn’t;
str(b'') will gladly produce "b''" (a textual representation of the data, rather than a decoding of the bytes contents), whereas bytes('') demands an encoding (because it will only understand a string as data to encode);
the bytes constructor has several special cases (integer, iterable of integers, buffer types like bytesarray and memoryview) that are not implemented by a __bytes__ dunder, whereas __str__ is used for everything except decoding bytes (i.e. used any time that str is called with one argument, which doesn’t hold for bytes);
the bytes type itself hasn’t had a __bytes__ (until 3.11), while str has had __str__ for effectively forever;
__bytes__ is not actually that old. bytes wasn’t even a type identifier until 2.6, and while the addition of __bytes__ was initially proposed, it was considered “out of scope” in that proposal, and not mentioned in the superceding proposal for 3.0. The old bug tracker implies it was indeed added for 3.0, but I can’t find a PEP officially asserting the introduction of this dunder; the earliest mentions I can find of it in the Python documentation are for 3.4.

Fundamentally, calling str on an object and calling bytes on an object are not that similar. The former is conceptually creating a representation of the object; the latter is, except for user-defined types, just like constructing any other object - the input parameter is informing the process, it is not being converted or coerced. The documentation only barely acknowledges that bytes exists, and it only seems to exist in order to cover some very specific use cases. (How often do you see tutorials telling people to implement it? When was the last time you implemented it?)

jsbueno · April 4, 2023, 4:06pm

Ok, so just leave bytes.join as is - as it is far more strict.

The initial issue, to which, a lot of people agreed isthat something as simple as ", ".join(range(1,11)) may work
instead of ", ".join(map(str, range(1,11)).

Even special casing it to work with “int” would be better than the current status quo, as I think that it covers
the majority of cases. (but please, don’t )

jsbueno · April 4, 2023, 4:15pm

Actually, I can’t even see a use-case for “bytes.join” besides concatenating some other byte-strings with the empty b’’ - maybe there should even be another call for that.

Trying to work with text using bytes by giving bytes.join() a non empty string is a strong indicator one is working with bytes when they should be doing it with text instead.