Why not make str.join() coerce the items in its iterables

steven.daprano · February 24, 2023, 8:36am

Flag arguments are never “harmless and convenient”. They should only be used with care, and only when the benefits significantly outweigh the costs. I argue that this is not one of those times.

As we so often say, not even one liner needs to be a builtin.

Ironically, last night in another topic, I actually did propose a flag argument, so they’re not always bad. But this case is a poor fit for a flag argument, because we don’t have a binary “coerce or not” choice, we have a choice between many different ways to coerce items.

If we were to go down this track (which is a strong -1 from me) the right interface is not a binary Yes No flag but a coercion function (defaulting to None):

', '.join(stuff, coerce=None)

That allows us to pass repr (especially useful for generating object reprs or ascii or any custom string convertor we like.

steven.daprano · February 24, 2023, 9:44am

I’m a little surprised that there hasn’t been much push-back on the idea that we convert the objects using str. What’s the use-case for that? In my experience, using repr is more common.

One major use-case for this proposal is to join stringified items in containers, to use as the container’s repr(). But for that use-case, we need to call repr() on the items, not str().

If we restrict this to only converting items using str, we will regret it: people will complain that this is no good for creating object reprs, or more likely, they will just blindly use it anyway, and we’ll get a lot of bad object reprs:

>>> obj = ["Hello", Fraction(1, 2)]
>>> "[" + ", ".join(str(x) for x in obj) + "]"  # Join using str instead of repr.
'[Hello, 1/2]'

I don’t hate @malemburg 's suggestion that we add a new builtin, modelled on print’s API. That kills two birds with one stone:

allows people to auto-coerce arbitrary items and join them;
reduces the complaints about join being a string method instead of a list method.

But then I suppose we’ll get complaints that its a function instead of a string method

I can’t remember who it was, but somebody suggested that they used this frequently on bytes. (Maybe in one of the other topics on this, er, topic.) Here is a prototype which can be used for both strings and bytes:

def join(*items, sep=', ', start=None, end=None, coerce=None):
    if isinstance(sep, str):
        if coerce is None:
            coerce = repr
        if start is None:
            start = ''
        if end is None:
            end = ''
    elif isinstance(sep, bytes):
        if coerce is None:
            coerce = lambda x: repr(x).encode('utf-8')
        if start is None:
            start = b''
        if end is None:
            end = b''
    return start + sep.join(map(coerce, items)) + end

Best of all, this no longer violates the guideline “Not every one line function needs to be a builtin.”

PythonCHB · February 24, 2023, 6:41pm

Wow! I sure have the opposite experience – shows how little we can go by our individual expectations.

Really? Another surprise.My first instinct on that is that folks writing __repr__s should know what they are doing, and it’s fine for it to be a bit more awkward.

I was thinking the primary use case for this was more the casual “scripting” type user.

This is one of the challenges of Open Source development is that folks developing the system may not be representative of much of the user base. And I think the “scripting” user has been a bit neglected in recent years …

All the being said, a way to easily join iterables of objects while being stringified in a custom way would be nice.

steven.daprano · February 25, 2023, 1:17am

I’m curious what sort of use-cases you have, and why using str is better than repr.

(Note that for the common case of stringifying ints or floats, it makes no difference which you use, but for the case of strings and other objects, it makes a big difference.)

I grepped my code, which I completely acknowledge is not representative of all Python code (everyone’s individual code is idiosyncratic). I found

three classes with a repr that calls ", ".join(repr(obj) for obj in self) or equivalent;
two that use str in place of repr;
two examples of sep.join(str(x) for x in something) outside of a __repr__;
and one example of sep.join(stringify(x) for x in something), for some custom stringify function.

Even at face value, that suggests that for my code, out of eight “stringify and join” operations, only half use str and the others use something else.

But we shouldn’t take this at face value.

With regard to the second item, I now realise that both of those __repr__ methods are wrong and need to be fixed by changing the call to str to use repr. I had failed to test or even look at the output of the method when the objects contained values other than ints and floats. E.g. Decimals or Fractions.

So in my code base, using a quick and dirty grep, I would say that 5 out of 8 examples of the pattern “stringify and join” use repr to do the stringification, 1 uses a custom function, and 2 use str (and I didn’t look too closely at those so that could easily change in the future too).

Well sure everybody writing code should know what they are doing, but I don’t see why you single out repr dunders here, or why you think repr dunders could not, or should not, take advantage of a built-in “stringify and join” function and method.

That’s not a use-case that’s a target audience, and for casual users, it is all the more important that the default choice of stringifier gets it right.

Also I wonder what the OP @jsbueno thinks about being lumped into “casual users”

Rosuav · February 25, 2023, 1:33am

My reading of this is: Your repr functions call repr, which is completely unsurprising; of the others, two use str and one uses something else. So str is definitely the more general choice, with repr being primarily used for nested reprs.

Of course, this is still nonrepresentative, as mentioned. But since there’s a pretty good split between repr, str, and other, it’s prettty clear that there’s no single obvious way to stringify as part of joining.

PythonCHB · February 26, 2023, 8:22pm

Because, by definition, if you are writing a __repr__ then you are thinking about how repr() is different to str().

There are absolutely folks using str.join() that do not fully understand (and may not be aware of) repr and str.

Vaguely speaking repr is for computers to understand, and str is for humans (and any non-python system) to understand.

So generating text for output or writing to text files, etc will generally want to use str.

Well, this proposal is either to save a few characters or make it more “casual” to use – The most experienced developer can still casually write a quickie script – I do it all the time. So “casual use” rather than “casual user”.

ncoghlan · March 11, 2023, 12:49pm

Summarising the core reason this hasn’t been done: “In the face of ambiguity, refuse the temptation to guess”

As noted in the comments above, the primary ambiguity is between using str and repr as the coercion function:

>>> data = [1, "2", "three", 5.0, 2e-99]
>>> ",".join(data)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: sequence item 0: expected str instance, int found
>>> ",".join(map(repr, data))
"1,'2','three',5.0,2e-99"
>>> ",".join(map(str, data))
'1,2,three,5.0,2e-99'

Wanting to use repr as the coercion function comes up in many more situations than just __repr__ implementations: it comes up any time it’s relevant to know more about the data being displayed than just the human-readable string (such as in log messages)

In the status quo, if you forget to specify how coercion should work and coercion is needed, you get an exception at runtime. (Static type analysers are also able to complain about the missing coercion step)

Sometimes the bug isn’t even that the coercion is missing: it’s that an unintended value made it into the sequence being displayed.

With the proposed implicit coercion with str() you get ambiguous data instead (as shown in the last line), and have to hope that someone will notice that the data is wrong before it gets too far down the line. Of course, as a data corruption bug, good luck figuring out where it got corrupted.

You also lose the type checking safety net: since anything is a valid input for str, type checkers can no longer tell the difference between “forgot to specify how input coercion should work” and “intentionally using str for input coercion”.

There are potentially viable ways to improve the ergonomics of requesting coercion of the inputs to string joining operations, but the significant error masking potential is what means that implicit input coercion isn’t one of them.

rhettinger · March 11, 2023, 6:59pm

I hope that people reading this post don’t instantly anchor to one of those reasons without really challenging how much weight they have and without giving due consideration to users who want this behavior.

The OP, @jsbueno, described the current idiom as “definitely, one of the most boring and repetitive patterns I find myself typing”. This is not an empty concern. Many of the language’s greatest hits merely “save a little typing”: the @-notation for decorators, list comprehensions, dataclasses, etc.

My experience as a coach and instructor is that the current idiom has to be taught. No first or second day learner figures this out on their own.

A twitter poll of over Python 3,000 users indicated that about 80% want the proposed behavior. That is as close as the Python community ever gets to having a consensus.

So, this isn’t a fluff proposal. Let’s give a full open-minded consideration to the request.

Suppose that when f-strings were being proposed, a group of developers insisted that the expressions inside curly braces be explicitly converted to strings: f'Received {str(req_count)} requests'. And suppose they gave all of the reasons listed in the previous post (type checkers, forgetting which input coercion is used, etc)? Would you have bought into those arguments? This isn’t a bogus comparison. Internally f-strings coerce the inputs to strings and then performs a str.join().

I do think the example given in the previous post is bogus: data = [1, "2", "three", 5.0, 2e-99]. Really, do naturally occuring datasets look like this? And if so, do we really benefit by raising an exception just to force the coder to explicitly choose between str and repr? The example makes it seem like all the user wants to do is convert the mixed dataset to a string. Presumably in real world code you would actually want to loop over the data and do computations with it. This is why real world lists tend to be homogenous and why type annotations for lists were designed to reflect this reality: data: list[sometype].

Before basing a decision on a questionable example, please think for yourself whether it is realistic or contrived. Perhaps look at your own real world code and decide for yourself what it typical.

I recognize that it is not easy to know in advance whether userland will be confused by a feature or whether they will love it. I recently approved a PR making slices hashable. I hesitated because it would make somedict['hello': 'world'] syntactically valid; however, Guido was able to cut through the fog of doubt and said the danger was overstated. Likewise, the reviewer who gave a -10 to the current proposal was also adamantly against dictionaries becoming ordered and thought that we should intentionally scramble dictionaries to make sure no one ever relied on order. In hindsight, that concern was also overstated.

bryevdv · March 11, 2023, 7:17pm

I honestly don’t understand why there would be any ambiguity. The method is str.join, so the default coercion should be str. At least, I can’t, myself, imagine ever expecting anything else than for those two things to line up.

davidism · March 11, 2023, 8:04pm

", ".join(["First", "Second", None, "None"]) seems like a better example of something that could produce an unintended result. Right now, if I’m building a list to join and somehow let None through, I get an error, and understand that I should filter out missing values first. If everything is converted to string automatically, I might not catch that only one of the “None” items in the output was intentional.

Is that actually a problem? Is either the current or proposed behavior better in this case? I see tradeoffs either way.

I don’t find the current behavior to be a burden. ", ".join(map(str, items)) also seems like a reasonable way to indicate “I intentionally want the default str representation of any type.”

ncoghlan · March 12, 2023, 2:04am

I think there are valid counterarguments to the status quo, and @rhettinger presents them well.

In particular, the “None or other mixed types sneaking into a list” situation is well served by static type hints on the list itself, so type checkers should still be able to identify such issues independently of the signature of “str.join”.

So while the part of my post explaining the status quo remains valid, the closing paragraph is more definite than is justified - the rise of type hints actually makes this change easier to justify rather than harder (since a good type hint will show where bad data is getting into the data structure, rather than only complaining when trying to display it).

It does make me wonder if there should be two parts to the proposal, though:

make str.join coerce non-strings with str
add a str.joinrepr that’s a shorthand for “str.join(map(repr, x))”

Wanting the latter is vastly less common than the former, but it’s sufficiently common to have served to justify the status quo for decades.

bryevdv · March 12, 2023, 2:07am

Me either, really, but I would go so far as to say that map has been officially actively discouraged, or at the very least is perceived as such, to the point that people only try ", ".join(str(x) for x in items) and that that is what folks have a distaste for.

I personally like the idea that popped up in one of these threads for just making a new strcat function.

jsbueno · March 12, 2023, 2:32am

[reply by e-mail test]

| Nick Coghlan ncoghlan CPython core developer
March 12 |

| - |

I think there are valid counterarguments to the status quo, and @rhettinger presents them well.

In particular, the “None or other mixed types sneaking into a list” situation is well served by static type hints on the list itself, so type checkers should still be able to identify such issues independently of the signature of “str.join”.

So while the part of my post explaining the status quo remains valid, the closing paragraph is more definite than is justified - the rise of type hints actually makes this change easier to justify rather than harder (since a good type hint will show where bad data is getting into the data structure, rather than only complaining when trying to display it).

It does make me wonder if there should be two parts to the proposal, though:

make str.join coerce non-strings with str

add a str.joinrepr that’s a shorthand for “str.join(map(repr, x)”

Better yet - what if join accepts a mapping callable as a named parameter, which would default to
str ?
So one eager for forcing repr (since repr is already a fallback for objects without str, I hope
repr defenders did not forget), could just do: "".join(mylist, map=repr) - and people who want to
lazily update some eventual code which would misbehave if updated to the new join could just change the call to
"".join(mylist, map=none) which would behave the same as existing `.join’.

The explicit mapping function could be good for date, datetimes, and dataclasses as well.

ncoghlan · March 12, 2023, 2:58am

I considered that (a variant of the approach was posted earlier in the thread), but I don’t like it for a few reasons:

an optional argument is only arguably more readable than the status quo with an explicit map call
it makes the type signature of join hard to describe, since it technically depends on what the coercion function accepts
it feels like overgeneralising, since using anything other than str or repr for join coercion is way down in the noise (and adequately covered by map and list comprehensions)

By contrast, “str.joinrepr(x)” is clearly easier to write and more readable than both “str.join(map(repr, x))” and “str.join(repr(y) for y in x)”, there’s no complexity in the type signatures (both methods accept “Iterable[Any]”), and the addition of “joinrepr” would help to highlight that the default “join” is now implicitly a “joinstr” operation.

PythonCHB · March 12, 2023, 5:36am

I suppose I’m one of those folks – I tend prefer comprehensions over map – but I find both forms equally distasteful in this case

I do like this proposal – it makes things easier, and making a final string isn’t where people should be expecting their type errors to be caught – Python made that decision a long time ago when it was decided that every object could be stringified. Yes, Python is dynamically, but not weakly typed – but there are a lot of overloading and implicit type conversions (Truthiness, anyone?) – if your code has a bug that puts the wrong type in an iterable, you really shouldn’t expect it to be caught when creating strings …

This feels totally unnecessary to me – folks that want repr() know what they are doing, and can do the map or comprehension easily enough. (Harking back to Raymond’s comment about beginners)

steven.daprano · March 12, 2023, 6:24am

That is a gross misrepresentation of the benefits of those features. Saving typing is the least important benefit of those features, and in the case of at least list comprehensions, they can require more typing than the alternative:

[len(s) for s in iterable]
list(map(len, iterable))

@-notation enables a powerful design pattern, Decorator, and it does so without violating DRY. It puts the decoration right up front, where it is obvious, not at the end, where it can be easily missed.

Comprehensions enhance for-loops into expressions, making a common software idiom (the accumulator) much easier and convenient. If “saving a little typing” was our only goal, we could have aliased list.append as list.a and save five characters. Or used map.

Dataclasses don’t save “a little typing”, they save a lot of typing and re-inventing the wheel. They are a powerful code generator.

In comparison to these, the proposed change to str.join really is nothing more than saving a few characters in typing. Any performance benefit is so far purely hypothetical, and could be made redundant by future interpreter optimizations.

The costs of this convenience include:

lack of generality;
for many purposes, maybe the majority, the wrong choice in string conversion;
errors pass silently;
confusing low-level string methods and high-level type-agnostic functions.

A red herring argument. No first or second day learner will figure out the new join semantics either. It will still need to be taught.

I’m surprised it’s not more. Most coders care very much about the short-term benefit of saving a few characters in typing. It’s an easy win, and the costs are hard to see, being spread out in the future.

f-strings are a bad analogy to this proposal. Like print, f-strings are a high-level type agnostic function. It is completely appropriate for f-strings to automatically convert arbitrary objects to a string. Just as it would be completely inappropriate for string methods to auto-convert objects to strings:

text.replace(2, "two")

I have, and I do not want this feature added to str.join.

In my code, this is:

Not a very common issue.
The use of str as the converter would be the wrong choice.

I am disappointed that so few people seem to be taking @malemburg 's suggestion to add a high-level join builtin seriously.

That would seem to have all the benefits of this proposal, and none of the disadvantages, plus a number of its own benefits:

more discoverable than string methods;
could work with both str and bytes;
more general than forcing the use of str;
satisfies people who dislike join being a string method;
could also deal with prefixes and suffixes if desired.

In my cynical opinion, the worse the feature, the more userland will love it (only half kidding).

More seriously, popularity is not a great proxy for quality. Coders have a severe bias for code which is quick and easy to write, even when it makes it harder for others to read and harder to maintain.

Maybe I have missed something, but I can’t see that anyone in this thread has given -10 to this proposal. I gave a -1 but I don’t see anyone else give an explicit vote.

Reading through the posts, I see only two people claiming -10, you and this comment, which was a grossly unfair misrepresentation of people’s arguments. Have I missed something?

steven.daprano · March 12, 2023, 6:55am

Even if that is less useful than using repr() for the default coercion?

I don’t know about other people’s code, but in mine I have roughly:

63% of coerce-and-join operations use repr()
25% use str()
12% use a customer converter.

steven.daprano · March 12, 2023, 7:37am

Why not? We expect all other uses of iterables to follow the same type rules that objects normally obey.

We don’t expect sum([1, 2, 3, "4", "5", 6]) to autoconvert everything to strings, or ints.

We don’t expect to be able to write arbitrary objects to a file without explicit string conversions:

with open('file.txt', 'w') as fp:

    fp.writelines(["hello", 10, [2.0, 3.0, None], b"world"])

There is a big gulf between “everything can be stringified” and “everything will be implicitly stringified”.

I think that we should acknowledge the distinction between low-level string methods, which expect strings and only strings, and high-level functions which can be agnostic about the types they accept. This is not just an artifact of static type checkers!

print and f-strings are high-level
string methods are low-level

This suggests we should have a function, call it join or concat or whatever, that implements the coerce-and-join operation, without changing the semantics of str.join (and bytes.join), which will continue to refuse to guess when given a non-string.

If comprehensions are easy enough for people who want repr, they’re just as easy for people who want str. So it seems to me that your argument is that if this feature is unnecessary for people wanting repr, its also unnecessary for people wanting str.

Remember that this request wasn’t made by a beginner who struggles to know how to stringify and join a bunch of objects, it was made by an experienced coder who knows what he is doing and wants to save typing. repr is 25% longer to type than str, so this feature should be even more valuable for people needing to repr-ise their strings.

I’m not kidding or joking around here. If we take this proposal seriously for str, then it is even more serious for repr, and contrariwise, if we reject it for repr, then we have even less reason to accept it for str.

steven.daprano · March 12, 2023, 8:09am

The problem isn’t objects that don’t have a __str__ dunder. Every object can be stringified, unless they have a bug in their custom __str__ or __repr__ method that causes an exception.

The problem is objects where the str() and the repr() are different. If all you are doing is joining ints or floats, you won’t really notice a difference but for many other objects there is a considerable difference and blindly joining the str() output may not be appropriate.


>>> a = Fraction(2, 3)

>>> str(a), repr(a)

('2/3', 'Fraction(2, 3)')

effigies · March 12, 2023, 2:09pm

FWIW I would be +1 on that. It seems worth making it its own proposal.