Of course, but if you want the repr() version, then apply repr(). You need to do that in both the current and the suggested form of join. Itās not an argument for or against applying str() by default.
To me it seems totally reasonable that this:
str.join(sep, [x, y, z])
means the same as this:
str.join(str(sep), [str(x), str(y), str(z)])
and that it is silly having to write out the second, explicit form. If on the other hand I wanted to get the repr() of my objects, it is clear that I must be explicit, and this:
| Nick Coghlan ncoghlan CPython core developer March 12 |
| - |
Joao S. O. Bueno:
Better yet - what if join accepts a mapping callable as a named parameter, which would default to str ?
I considered that (a variant of the approach was posted earlier in the thread), but I donāt like it for a few reasons:
an optional argument is only arguably more readable than the status quo with an explicit map call
it makes the type signature of join hard to describe, since it technically depends on what the coercion function accepts
it feels like overgeneralising, since using anything other than str or repr for join coercion is way down in the noise (and adequately covered by map and list comprehensions)
By contrast, āstr.joinrepr(x)ā is clearly easier to write and more readable than both āstr.join(map(repr, x))ā and āstr.join(repr(y) for y in x)ā, thereās no complexity in the type signatures (both methods accept āIterable[Any]ā), and the addition of ājoinreprā would help to highlight that the default ājoinā is now implicitly a ājoinstrā operation.
The two clear wins I see for the named parameter is: (1)no need for yet-another-string-method - it is not like there are too few of those, and (2), with a special value like None the current behavior can be replicated, just in case someone needs it. (a xkcd 1172 preemptive counter-measure, actually. https://xkcd.com/1172/ )
So one eager for forcing repr (since repr is already a fallback for objects without str, I hope repr defenders did not forget)
The problem isnāt objects that donāt have a __str__ dunder. Every object can be stringified, unless they have a bug in their custom __str__ or __repr__ method that causes an exception.
The problem is objects where the str() and the repr() are different. If all you are doing is joining ints or floats, you wonāt really notice a difference but for many other objects there is a considerable difference and blindly joining the str() output
I love it that in your supposedly counter example, the usefulness of having str as a converter is undisputable.
However, going back to the comment where you list tens of personal reasons for not wanting to modify str.join just
to end the message saying ābut I am not -10, just -1ā - yes, maybe a new join built in, ot even a ājoinā in the stdlib string module could be more useful - for the reasons you mention.
It is relevant to the questions of whether str is the only choice for this new functionality, and if not, what should be the default.
Suppose that the proposal was to make str.join always and automatically convert objects using ascii(). Wouldnāt it be valid to argue that hardly anyone does that, this would be so niche as to be pointless, and that str() or repr() would be a better choice?
The same thing applies to the question of str() vs repr(). Why choose str() as the default choice, or only choice?
In this topic, I think that only one person, me, has actually made a (admittedly quick and dirty and admittedly ideosyncratic) review of existing code, and found that using repr() is much more common than using str().
(If anyone else did a code review, sorry for missing your post.)
If we were to do this thing, wouldnāt it be much better to make the default the function which is more common and more useful? I trust people arenāt going to promote making this feature less useful
If we were to move ahead on this proposal, the PEP would need to include a more systematic code review to determine which converter is more common in real code. But until we have that systematic review, the only review we have so far is, I think, mine, and that shows that repr() is more than twice as common as str().
I also found that in my own code base, I had a significant number of bugs in my code caused by thoughtlessly using str() as the converter when it should have been repr(). Do we want to encourage those sorts of errors?
You want to be able to do this? str.join(tuple(), [1, 2, 3]) == "1()2()3"
Thatās a remarkably strong position to take! I donāt think anyone else has proposed making the separator autoconvert as well!
In any case, regardless of the separator, it seems to me totally unreasonable to expect the objects being accepted by a string method to autoconvert. str.join is a string method, not a high-level type-agnostic function like print.
We donāt expect other string methods to autoconvert:
# Change the street address from 123 to 456 some street.
if address.startswith(123):
address = 456 + address[3:]
I think that everyone agrees that low-level string methods should require string arguments, and not implicitly guess how to convert arbitrary objects into strings. That sort of implicit conversion is great for high-level functions like print and f-strings, but for everything else we agree that string methods shouldnāt do it.
# Convert Western digits to Thai
for digit in range(10):
string = string.replace(digit, chr(0x0E50 + digit))
(Some extremely low-level methods, like str.translate, accept integer code points instead of characters, but thatās not auto-converting arbitrary objects to strings.)
If we are to make an exception for str.join, it needs something more than just saving a few characters typing, or āits obviousā, to justify why join is special.
We can say exactly the same thing if we want to get the str() of our objects too.
Not sure why you are writing it as an unbound method instead of sep.join(...), but okay. Either way, I agree that it looks no sillier than str.join(sep, [str(x), str(y), str(z)]).
Ahh ā I was going to ask you about that ā obviously youāre going to want repr() in a __repr__, yes. But I submit that while some people might write a lot of __repr__s, those are not the target audience for this, and not the target use case. I said earlier in the thread that āfolks that want repr() know what they are doingā ā that wanāt really the right point ā how about "folks writing __repr__ methods should be thinking carefully about what they want.
And I have written my small share of few __repr__s, but I donāt think Iāve used str.join in them ā certainly not much.
Maybe my way of thinking of this is that Python is still a āscripting languageā ā making it easier to use for that way without making ot harder to write libraries / large systems, etc is a good thing.
Defining āscriptingā is hard ā but Iāll say if you are writing __repr__s, you are not scripting.
Not at all, I was merely pointing out that it is reasonable to expect a method of str to convert inputs to str using str, and that it can even look tautological to apply str to its inputs.
It wonāt ever actually do that with its first argument though, due to the way bound methods work. Itād probably be clearer to avoid the str.join(" ", ...) notation due to the potential confusion.
I consider str.join not coercing data to be a feature and not something to be removed.
The issue I have is not with the more unlikely ["a", "2", 3, ... ] case being hidden, but the case where a container gets used instead of a field, or the incorrect list gets used as input. This is an issue with any kind of automatic coercion and is not limited to str or repr.
Hereās a toy example with a relatively obvious mistake that would currently raise a nice TypeError and be highlighted in an IDE so you know youāve done something wrong. password in this case is just a proxy for any sort of information that isnāt intended to be revealed.
from dataclasses import dataclass
@dataclass
class User:
name: str
password: str
users = [
User("David", "correct horse battery staple"),
User("Alex", "password1"),
User("Chris", "abc123"),
]
user_names = [u.name for u in users]
namelist = ", ".join(users)
print(namelist)
Currently you get this error.
TypeError: sequence item 0: expected str instance, User found
This is a useful error that tells me exactly what Iāve done wrong and where Iāve done it. If str.join converted input using str this would not fail and would instead list all of the object internals, including the password field.
While obviously you shouldnāt store private data in such a way, people will, and this change makes it easier to accidentally do the wrong thing. Even without the potential to leak data, the input would obviously not be what was intended and the call would no longer give a useful error pointing to the appropriate line.
My code is almost certainly not representative but I donāt have any instances where str coersion would have been useful and only one where repr could and I didnāt find ", ".join(f"{field!r}" for field in field_names)[1] to be particularly onerous.
I do have many examples that are similar to ", ".join(obj_names) and Iām grateful that in the (hopefully) unlikely case I use the list of objects instead of names, my IDE can highlight the issue and python will raise an exception pointing out where I went wrong.
Neither have I for that matter.
I wrote the O.P. for one reason: I find myself at least once a week writting ''.join(str(x) for x in y) - and that seems redundant.
As someone else recalled correctly in this comments, f-strings also convert using str by default with a total of zero injuries to date. The str() protocol defaults to using __repr__ for a reason: if an object has no need for a custom __str__ it may only implement __repr__ - and that has been just perfect for my usage over years of coding.
I donāt think the first is a clear win. If thereās insufficient appetite for the extra str method, then the existing āstr.join(map(repr,x))ā is fine (unlike str, thereās no major efficiency loss with passing repr through map to a join operation, since even strings need surrounding quotes added and their contents escaped if necessary)
For the second, if the proposal were to be accepted, then the question would be if the extra complexity is worth it just to make it a little bit easier to get back to the old behaviour back, when the options of type hints and explicit runtime type checks already allow that to be done in a way that is compatible with existing releases rather than only running on the new version.
We should do a code search to check whether enclosing str.join() in a try/except TypeError occurs much in the wild. I donāt think Iāve ever seen it used. Going forward, if people think they need it, we could add a strict option to restore raising an exception for non-str inputs. That would give an easy fix for existing try/except code.
I expect itās rare ā but I think the folks that want these cases to raise arenāt expecting to handle it at runtime, but for it to get caught during development / testing / static type checking.
A new builtin is a tempting idea but I wonder if the same could be achieved by an update to the f-string grammar. I too find ", ".join(str(i) for i in things) boring and repetitive, but one more irksome to me is placing such a joined list into a larger string such as f"\t{color}{', '.join(str(i) for i in things)}{no_color}\n".
Unlike the format string they replace, f-strings have the ability to convert the type explicitly before formatting. If a new convert type was created that joined an iterable of formattable objects, it could remedy both situations. I will use j for join. It could even be combined with existing converters to be explicit about how the objects being joined are first stringified. Whatever follows the j, up to the first } or :, if anything, would be the first argument to join.
I have a feeling, the discussion is going in circles.
Changing sep.join(iterator_or_sequence) to suddenly apply a str() mapping inside is a breaking change which will cause trouble.
Some examples in addition to what was already mentioned:
passing a list of floats will go unnoticed and cause those floats to be converted using (most likely) unexpected formats
having a None in the list will no longer raise an exception, causing data corruption down the line (e.g. think of āNoneā getting written to a CSV file which is supposed to use āā or āN/Aā as missing value)
passing a list of byte objects (or any other object which donāt have a __str__ method or redirect this to __repr__) will result in joins of repr(obj), e.g. "b'abc', b'def'"
accidentally passing in objects such as other sequences or dictionaries will go unnoticed, since their repr() will be used as str()
type checkers are (usually) not run at runtime, so wonāt help much if youāre dealing with data from arbitrary data sources, e.g. think of reading a wrongly formatted JSON file, where what should have been a list of strings ends up being a list of dicts
IMO, the right way to approach this is not by changing the default or adding parameters which change the default in unexpected ways, but instead by adding a new API, if the use case is really common (I have my doubts, see below, but perhaps Iām wrong).
Since what the OP and Raymond are proposing is very much in going in the same direction of what the builtin print() already does (converting its arguments via str()), a new API should follow this precedent, IMO.
I already suggested having a concat() function in that sense, others prefer strcat(). The name doesnāt really matter and such a function could even allow providing a conversion function other than str(), as long as the result is a string.
All of these would work, but Iām definitely -1 on having str.join() accept anything other than a sequence or an iterator of strings.
The reason for having .join() as a method on strings is because it is meant to work with sequences of strings. Otherwise, you could open the same case for e.g. "3" + 4 returning "34" (since this calls "3".__add__(4) internally, another method of strings).
PS: The sep.join([str(x) for x in sequence]) construct isnāt really used a lot in the Python stdlib. In most cases, the sequence already has all strings and anything not a string would be an error, or you apply explicit formatting, not just the standard str() conversion. You also wonāt find many try-excepts around sep.join(), since those not-a-string errors are meant to be raised ā after all, they point to
real errors
That is a reasonable avenue. Besides removing boilplate, it would be more discoverable by newcomers than map(str, data). Also there is the added benefit in that avoids the current awkward phrasing of sep.join(list_of_strings) which I donāt think anyone really likes.
I see this one a little differently. There arenāt many try-excepts around sep.join() because it is almost never useful. People arenāt omitting the try because they actually want a latent undetected bug in their production code, a potential TypeError ready to bring down the whole stack
If you do want a latent undetected bug, coercing arbitrary objects to strings instead of raising an exception might have some potential
Yes, a try/except around sep.join is mostly* useless as thereās no way to handle the exception inside the loop of the function where it would be able to do anything. It forces the user to prepare the data before passing it to the join method, which means you have to handle the formatting of objects for which str is not the correct method which might otherwise be passed incorrectly to the output.
*Thereās always the potential that someone has done something like this:
def concat(iterable, *, sep=""):
items = list(iterable)
try:
return sep.join(items)
except TypeError:
return sep.join([str(item) for item in items])
to keep some of the performance in the case where everything is a string already