Why not make str.join() coerce the items in its iterables

Agreed I really like the concat() function idea: +1 for concat.
That is something I would use if it was available.

Of course, but if you want the repr() version, then apply repr(). You need to do that in both the current and the suggested form of join. It’s not an argument for or against applying str() by default.

To me it seems totally reasonable that this:

str.join(sep, [x, y, z])

means the same as this:

str.join(str(sep), [str(x), str(y), str(z)])

and that it is silly having to write out the second, explicit form. If on the other hand I wanted to get the repr() of my objects, it is clear that I must be explicit, and this:

str.join(sep, [repr(x), repr(y), repr(z)])

does not look silly at all.

1 Like

| Nick Coghlan ncoghlan CPython core developer
March 12 |

  • | - |

Joao S. O. Bueno:

Better yet - what if join accepts a mapping callable as a named parameter, which would default to
str ?

I considered that (a variant of the approach was posted earlier in the thread), but I don’t like it for a few reasons:

  • an optional argument is only arguably more readable than the status quo with an explicit map call
  • it makes the type signature of join hard to describe, since it technically depends on what the coercion function accepts
  • it feels like overgeneralising, since using anything other than str or repr for join coercion is way down in the noise (and adequately covered by map and list comprehensions)

By contrast, ā€œstr.joinrepr(x)ā€ is clearly easier to write and more readable than both ā€œstr.join(map(repr, x))ā€ and ā€œstr.join(repr(y) for y in x)ā€, there’s no complexity in the type signatures (both methods accept ā€œIterable[Any]ā€), and the addition of ā€œjoinreprā€ would help to highlight that the default ā€œjoinā€ is now implicitly a ā€œjoinstrā€ operation.

The two clear wins I see for the named parameter is: (1)no need for yet-another-string-method - it is not like there are too few of those, and (2), with a special value like None the current behavior can be replicated, just in case someone needs it. (a xkcd 1172 preemptive counter-measure, actually. https://xkcd.com/1172/ )

| Steven D’Aprano steven.daprano
March 12 |

  • | - |

So one eager for forcing repr (since repr is already a fallback for objects without str, I hope repr defenders did not forget)

The problem isn’t objects that don’t have a __str__ dunder. Every object can be stringified, unless they have a bug in their custom __str__ or __repr__ method that causes an exception.

The problem is objects where the str() and the repr() are different. If all you are doing is joining ints or floats, you won’t really notice a difference but for many other objects there is a considerable difference and blindly joining the str() output

may not be appropriate.


>>> a = Fraction(2, 3)

>>> str(a), repr(a)

('2/3', 'Fraction(2, 3)')

I love it that in your supposedly counter example, the usefulness of having str as a converter is undisputable.

However, going back to the comment where you list tens of personal reasons for not wanting to modify str.join just
to end the message saying ā€œbut I am not -10, just -1ā€ - yes, maybe a new join built in, ot even a ā€œjoinā€ in the stdlib string module could be more useful - for the reasons you mention.

I’ve never once used a repr with str.join and I’ve been using Python since the 90s. :person_shrugging: YMMV

I am disappointed that so few people seem to be taking @malemburg 's suggestion to add a high-level join builtin seriously.

I explicitly supported this idea, it seems much more preferable.

1 Like

It is relevant to the questions of whether str is the only choice for this new functionality, and if not, what should be the default.

Suppose that the proposal was to make str.join always and automatically convert objects using ascii(). Wouldn’t it be valid to argue that hardly anyone does that, this would be so niche as to be pointless, and that str() or repr() would be a better choice?

The same thing applies to the question of str() vs repr(). Why choose str() as the default choice, or only choice?

In this topic, I think that only one person, me, has actually made a (admittedly quick and dirty and admittedly ideosyncratic) review of existing code, and found that using repr() is much more common than using str().

(If anyone else did a code review, sorry for missing your post.)

If we were to do this thing, wouldn’t it be much better to make the default the function which is more common and more useful? I trust people aren’t going to promote making this feature less useful :slight_smile:

If we were to move ahead on this proposal, the PEP would need to include a more systematic code review to determine which converter is more common in real code. But until we have that systematic review, the only review we have so far is, I think, mine, and that shows that repr() is more than twice as common as str().

I also found that in my own code base, I had a significant number of bugs in my code caused by thoughtlessly using str() as the converter when it should have been repr(). Do we want to encourage those sorts of errors?

You want to be able to do this? str.join(tuple(), [1, 2, 3]) == "1()2()3"

That’s a remarkably strong position to take! I don’t think anyone else has proposed making the separator autoconvert as well!

In any case, regardless of the separator, it seems to me totally unreasonable to expect the objects being accepted by a string method to autoconvert. str.join is a string method, not a high-level type-agnostic function like print.

We don’t expect other string methods to autoconvert:


# Change the street address from 123 to 456 some street.

if address.startswith(123):

    address = 456 + address[3:]

I think that everyone agrees that low-level string methods should require string arguments, and not implicitly guess how to convert arbitrary objects into strings. That sort of implicit conversion is great for high-level functions like print and f-strings, but for everything else we agree that string methods shouldn’t do it.


# Convert Western digits to Thai

for digit in range(10):

    string = string.replace(digit, chr(0x0E50 + digit))

(Some extremely low-level methods, like str.translate, accept integer code points instead of characters, but that’s not auto-converting arbitrary objects to strings.)

If we are to make an exception for str.join, it needs something more than just saving a few characters typing, or ā€œits obviousā€, to justify why join is special.

We can say exactly the same thing if we want to get the str() of our objects too.

Not sure why you are writing it as an unbound method instead of sep.join(...), but okay. Either way, I agree that it looks no sillier than str.join(sep, [str(x), str(y), str(z)]).

I’m curious, have you never written a class with a __repr__? That seems to be the most common use for repr+join.

Perhaps dataclasses will decrease the need for that, but dataclasses didn’t exist in the 90s :slight_smile:

Ahh – I was going to ask you about that – obviously you’re going to want repr() in a __repr__, yes. But I submit that while some people might write a lot of __repr__s, those are not the target audience for this, and not the target use case. I said earlier in the thread that ā€œfolks that want repr() know what they are doingā€ – that wan’t really the right point – how about "folks writing __repr__ methods should be thinking carefully about what they want.

And I have written my small share of few __repr__s, but I don’t think I’ve used str.join in them – certainly not much.

Maybe my way of thinking of this is that Python is still a ā€œscripting languageā€ – making it easier to use for that way without making ot harder to write libraries / large systems, etc is a good thing.

Defining ā€œscriptingā€ is hard – but I’ll say if you are writing __repr__s, you are not scripting.

-CHB

1 Like

Not at all, I was merely pointing out that it is reasonable to expect a method of str to convert inputs to str using str, and that it can even look tautological to apply str to its inputs.

1 Like

It won’t ever actually do that with its first argument though, due to the way bound methods work. It’d probably be clearer to avoid the str.join(" ", ...) notation due to the potential confusion.

2 Likes

I consider str.join not coercing data to be a feature and not something to be removed.

The issue I have is not with the more unlikely ["a", "2", 3, ... ] case being hidden, but the case where a container gets used instead of a field, or the incorrect list gets used as input. This is an issue with any kind of automatic coercion and is not limited to str or repr.

Here’s a toy example with a relatively obvious mistake that would currently raise a nice TypeError and be highlighted in an IDE so you know you’ve done something wrong. password in this case is just a proxy for any sort of information that isn’t intended to be revealed.

from dataclasses import dataclass

@dataclass
class User:
    name: str
    password: str

users = [
    User("David", "correct horse battery staple"),
    User("Alex", "password1"),
    User("Chris", "abc123"),
]

user_names = [u.name for u in users]

namelist = ", ".join(users)

print(namelist)

Currently you get this error.

TypeError: sequence item 0: expected str instance, User found

This is a useful error that tells me exactly what I’ve done wrong and where I’ve done it. If str.join converted input using str this would not fail and would instead list all of the object internals, including the password field.

While obviously you shouldn’t store private data in such a way, people will, and this change makes it easier to accidentally do the wrong thing. Even without the potential to leak data, the input would obviously not be what was intended and the call would no longer give a useful error pointing to the appropriate line.

My code is almost certainly not representative but I don’t have any instances where str coersion would have been useful and only one where repr could and I didn’t find ", ".join(f"{field!r}" for field in field_names)[1] to be particularly onerous.

I do have many examples that are similar to ", ".join(obj_names) and I’m grateful that in the (hopefully) unlikely case I use the list of objects instead of names, my IDE can highlight the issue and python will raise an exception pointing out where I went wrong.


  1. There are other joins around this and in those cases the f-string does more, this just matched the style of everything else going on more than repr(field). ā†©ļøŽ

5 Likes

Neither have I for that matter.
I wrote the O.P. for one reason: I find myself at least once a week writting ''.join(str(x) for x in y) - and that seems redundant.
As someone else recalled correctly in this comments, f-strings also convert using str by default with a total of zero injuries to date. The str() protocol defaults to using __repr__ for a reason: if an object has no need for a custom __str__ it may only implement __repr__ - and that has been just perfect for my usage over years of coding.

4 Likes

I don’t think the first is a clear win. If there’s insufficient appetite for the extra str method, then the existing ā€œstr.join(map(repr,x))ā€ is fine (unlike str, there’s no major efficiency loss with passing repr through map to a join operation, since even strings need surrounding quotes added and their contents escaped if necessary)

For the second, if the proposal were to be accepted, then the question would be if the extra complexity is worth it just to make it a little bit easier to get back to the old behaviour back, when the options of type hints and explicit runtime type checks already allow that to be done in a way that is compatible with existing releases rather than only running on the new version.

We should do a code search to check whether enclosing str.join() in a try/except TypeError occurs much in the wild. I don’t think I’ve ever seen it used. Going forward, if people think they need it, we could add a strict option to restore raising an exception for non-str inputs. That would give an easy fix for existing try/except code.

I expect it’s rare – but I think the folks that want these cases to raise aren’t expecting to handle it at runtime, but for it to get caught during development / testing / static type checking.

A quick grep of the CPython code base shows 114 uses of join() with str(), but only 27 with repr().

1 Like

A new builtin is a tempting idea but I wonder if the same could be achieved by an update to the f-string grammar. I too find ", ".join(str(i) for i in things) boring and repetitive, but one more irksome to me is placing such a joined list into a larger string such as f"\t{color}{', '.join(str(i) for i in things)}{no_color}\n".

Unlike the format string they replace, f-strings have the ability to convert the type explicitly before formatting. If a new convert type was created that joined an iterable of formattable objects, it could remedy both situations. I will use j for join. It could even be combined with existing converters to be explicit about how the objects being joined are first stringified. Whatever follows the j, up to the first } or :, if anything, would be the first argument to join.

things = ["spam", "ham"]
f"{things!j}" == "spamham"
f"{things!j, }" == "spam, ham"
f"{things!rj }" == "'spam' 'ham'"
f"{things!sj :>12}" == "    spam ham"
1 Like

I have a feeling, the discussion is going in circles.

Changing sep.join(iterator_or_sequence) to suddenly apply a str() mapping inside is a breaking change which will cause trouble.

Some examples in addition to what was already mentioned:

  • passing a list of floats will go unnoticed and cause those floats to be converted using (most likely) unexpected formats
  • having a None in the list will no longer raise an exception, causing data corruption down the line (e.g. think of ā€œNoneā€ getting written to a CSV file which is supposed to use ā€œā€ or ā€œN/Aā€ as missing value)
  • passing a list of byte objects (or any other object which don’t have a __str__ method or redirect this to __repr__) will result in joins of repr(obj), e.g. "b'abc', b'def'"
  • accidentally passing in objects such as other sequences or dictionaries will go unnoticed, since their repr() will be used as str()
  • type checkers are (usually) not run at runtime, so won’t help much if you’re dealing with data from arbitrary data sources, e.g. think of reading a wrongly formatted JSON file, where what should have been a list of strings ends up being a list of dicts

IMO, the right way to approach this is not by changing the default or adding parameters which change the default in unexpected ways, but instead by adding a new API, if the use case is really common (I have my doubts, see below, but perhaps I’m wrong).

Since what the OP and Raymond are proposing is very much in going in the same direction of what the builtin print() already does (converting its arguments via str()), a new API should follow this precedent, IMO.

I already suggested having a concat() function in that sense, others prefer strcat(). The name doesn’t really matter and such a function could even allow providing a conversion function other than str(), as long as the result is a string.

All of these would work, but I’m definitely -1 on having str.join() accept anything other than a sequence or an iterator of strings.

The reason for having .join() as a method on strings is because it is meant to work with sequences of strings. Otherwise, you could open the same case for e.g. "3" + 4 returning "34" (since this calls "3".__add__(4) internally, another method of strings).

PS: The sep.join([str(x) for x in sequence]) construct isn’t really used a lot in the Python stdlib. In most cases, the sequence already has all strings and anything not a string would be an error, or you apply explicit formatting, not just the standard str() conversion. You also won’t find many try-excepts around sep.join(), since those not-a-string errors are meant to be raised – after all, they point to
real errors :slight_smile:

6 Likes

That is a reasonable avenue. Besides removing boilplate, it would be more discoverable by newcomers than map(str, data). Also there is the added benefit in that avoids the current awkward phrasing of sep.join(list_of_strings) which I don’t think anyone really likes.

I see this one a little differently. There aren’t many try-excepts around sep.join() because it is almost never useful. People aren’t omitting the try because they actually want a latent undetected bug in their production code, a potential TypeError ready to bring down the whole stack :wink:

6 Likes

If you do want a latent undetected bug, coercing arbitrary objects to strings instead of raising an exception might have some potential :wink:

Yes, a try/except around sep.join is mostly* useless as there’s no way to handle the exception inside the loop of the function where it would be able to do anything. It forces the user to prepare the data before passing it to the join method, which means you have to handle the formatting of objects for which str is not the correct method which might otherwise be passed incorrectly to the output.


*There’s always the potential that someone has done something like this:

def concat(iterable, *, sep=""):
    items = list(iterable)
    try:
        return sep.join(items)
    except TypeError:
        return sep.join([str(item) for item in items])

to keep some of the performance in the case where everything is a string already :wink: