Allow list of integers in `str.join`

h-vetinari · February 9, 2023, 5:38am

I wondered today for the Nth time why the following is not permitted - the intent seems clear enough:

>>> import sys
>>> '.'.join(sys.version_info[:2])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: sequence item 0: expected str instance, int found

Of course, it’s possible to work around this

>>> '.'.join([str(x) for x in sys.version_info[:2]])
'3.11'

but it seems… unnecessary?

Once I’m calling from a string.join method, what possible other output than a string can be expected? So why not just call __str__() on any members rather than give a type error? Same goes for floats and other classes.

Zeturic · February 9, 2023, 6:58am

This has come up before several times.

I personally don’t have much of an opinion either way. I can see both how it would be handy, but there are also times where it would hide errors.

You might have something like ', '.join(things) where things is a list of Thing objects. You get an error, and you quickly fix it to ', '.join(thing.name for thing in things) which is what you intended.

If str.join automatically stringifies the elements, it’d essentially be doing ', '.join(str(thing) for thing in things), which would not give any kind of error, but it also wouldn’t give the intended result.

But as I said, I’m not completely opposed to it either, because it’s not clear how common situations like the above are, and I definitely run into the trivial “I just want to call str on all of them” a lot.

As an aside, there’s no reason to use the list comprehension there. Get rid of the square brackets and use a generator expression instead.

>>> '.'.join(str(x) for x in sys.version_info[:2])
'3.11'

To be fair, in this specific case it’s a two-element list so it hardly matters, but still. If you’re doing this for, say, a large list of integers, the generator version would use significantly less memory.

Rosuav · February 9, 2023, 7:10am

Uhh, why not take a much MUCH simpler option?

>>> "%d.%d" % sys.version_info[:2]
'3.12'

abessman · February 9, 2023, 7:26am

F-strings are so handy I tend to forget “old-style” string formatting is even a thing anymore. This is a nice reminder that it still has its uses.

h-vetinari · February 9, 2023, 7:57am

Because lists can be arbitrarily long (I just took a random example that has a fixed length).

ericvsmith · February 9, 2023, 8:45am

There’s a long discussion of this here.

h-vetinari · February 9, 2023, 8:57am

Thanks for the link! I searched discourse but not the GH issues…

steven.daprano · February 9, 2023, 9:13am

It is not clear to me. Do you want the str(), repr(), ascii() or some other string conversion of the objects? Why or why not?

As I wrote here some weeks back, we need to distinguish between functions which are part of a low-level API, and those expected to work as part of a higher level API.

print has a high-level API. It should be polymorphic, and work with any type. I should be able to print any object at all, and get something sensible, without caring too much about it. It’s okay for print to guess what converter we want.

Because printing is a high-level API, I’m unlikely to capture the output of print and use it in other computations, so “something sensible” doesn’t need to be too precise. print is not a building block to create complex tools, it is one of those complex tools.

str.join is part of a low-level string API, which is why it shouldn’t try to guess what the user wants to do with non-string values:

is it an error? if so, raise
or did the programmer intend there to be a non-string in the input?
if so, how does the programmer want to convert the value into a string?

A low-level API should not guess what is wanted. In this case, explicit is better than implicit:

sep.join(map(ascii, values))

If you want a high-level joiner that works on anything, like print, it is a one-liner:

def join(values, *, sep=''):
    return sep.join([str(obj) for obj in values])

But it hardly seems worth it, for such a simple operation.

Stefan2 · February 12, 2023, 12:36pm

As far as I know, when given an iterator, str.join turns it into a list anyway, making that take as much memory as list comprehension. And the list comprehension is faster. Is that not the case anymore? See Raymond Hettinger’s answer.

eryksun · February 12, 2023, 4:04pm

In CPython, PyUnicode_Join() creates a list for use by _PyUnicode_JoinArray(), which makes an in initial pass over the list to compute the required allocation size. It doesn’t necessarily have to be this way, but it’s the most efficient implementation since str.join() can be passed an iterator such as a generator object. Otherwise it would have to use realloc() to grow the buffer, which could incur the cost of making multiple copies of the intermediate result.

I’m sure that I’ve already forgotten aspects of % interpolation, since I haven’t used it in years. The extensible __format__() method used by str.format() was a significant improvement over the hard-coded conversions in str.__mod__(). For example:

>>> '%.12f' % decimal.Decimal('123456789.123456789')
'123456789.123456791043'
>>> '{:.12f}'.format(decimal.Decimal('123456789.123456789'))
'123456789.123456789000'

>>> '{:%Y-%m-%d}'.format(datetime.datetime.now())
'2023-02-12'

I don’t see the benefit of using % interpolation here other than saving a few keystrokes. I think using str.format() is at least as readable and easily understood.

>>> '{}.{}'.format(*sys.version_info[:2])
'3.12'
>>> '{0}.{1}'.format(*sys.version_info[:2])
'3.12'
>>> '{v[0]}.{v[1]}'.format(v=sys.version_info)
'3.12'
>>> '{v.major}.{v.minor}'.format(v=sys.version_info)
'3.12'

If str.__mod__() was deprecated and subsequently removed in Python Pi (3.14), I’d be happy to see it go, not that there’s a significant chance of this occurring. Retaining it is mostly harmless and low maintenance.

Rosuav · February 12, 2023, 4:06pm

The format method is more powerful, but when you don’t need all of that power, percent formatting is perfectly viable. Plus, it’s broadly the same as printf formatting in C and any other languages inspired by it, so it’s compatible across a variety of systems. It’s worth keeping, even if it’s not as powerful or flexible as full .__format__() callbacks.