Yes there was a lot of discussion. Fundamentally, the issue is that you can join any iterable, and there is no common class for all iterables, that such a method could be attached to. You would need to demand that all iterables derive from a common base type, and that goes against python’s basic principle of duck typing.
How odd. To me, joining an arbitrary iterable of strings using a string separator is obviously a fundamental method of strings.
You’re joining strings with strings, using a string separator. Why are you surprised its a string method?
It certainly shouldn’t be just a list method, or even a sequence method, since we need to join arbitary iterables including sets, iterators and any other iterable object. Since all these iterable objects don’t share a common inheritance hierarchy (aside from object
itself), we would need to re-implement that join
method over and over again.
It is an accident of history that str.join takes an iterable of strings as argument, rather than an arbitrary number of string arguments like os.path.join
. There may or may not be reasons to prefer one API over the other, but both APIs are workable, and we could have even split the difference and done something similar to max()
and min()
which accept both a single iterable or an arbitrary number of positional arguments.
Had we used the os.path.join
API for str.join
, would you still think it was a list method? I doubt it.
Without a lot of clever work, it would be difficult to prevent the reduce implementation from displaying quadratic O(N^2) behaviour, which would be bad.
max and min don’t have the oddity that a single string would behave weirdly if treated as an iterable.
def either_or_join(sep, *strings):
if len(strings) != 1: return sep.join(strings)
return sep.join(strings[0])
print(either_or_join("/", "spam", "ham"))
print(either_or_join("/", ["spam", "ham"]))
print(either_or_join("/", ["spam"]))
print(either_or_join("/", "spam"))
So it kinda has to demand an iterable.
Joining an array of arbitrary values using a string separator is, equally obviously, a fundamental method of arrays. Many languages work this way. Or they allow you to multiply arrays by strings to join them. Or they have a global function to join collections of strings.
Note that you don’t split on the separator, you split on the collection. This puts things a bit backwards in Python:
"spam ham foo bar fum".split(" ")
" ".join(['spam', 'ham', 'foo', 'bar', 'fum'])
It’s not nearly as fundamental as you might think. It can definitely go either way.
Only if you ignore the rest of the reasoning I attempted to lay out in my post. The question is not whether we’re dealing with strings, it’s which strings are the “semantic subject” of the action and which string is the “semantic direct object” (or even just a “semantic parameter”) of the action.
When considering what methods to add to a class, the relevant question is “what things can objects of this class naturally do?”, or “what natural transformations of this object will commonly be required”. This informs how the methods will be used in the wild. People will of course differ on what “natural” means, but it is hard to argue that ";".join("arg1", "arg2")
is a natural transformation of a semicolon, while it is easy to argue that ["arg1", "arg2"].join("\n")
is a natural transformation of a string list.
Exhibit A - how many times have we all seen code like this, needlessly breaking flow for the join?
x = arr.reverse() # or whatever method, or method chain
return "\n".join(x) # or "\n".join(arr.reverse())
In even the most compact form, the flow starts in the middle at arr
, proceeds to the right, then jumps all the way back to the left to find the "\n"
object, then goes right again to call join()
with the result of reverse()
that’s been squirreled away.
It seems obvious to me to want something like the following instead, where the flow of operation starts at the left and continues to the right, transforming the subject of operation as it goes:
return arr.reverse().join("\n")
This is obviously not just my own opinion, it’s deemed a “highly active question” on Stack Overflow: python - Why is it string.join(list) instead of list.join(string)? - Stack Overflow. This isn’t just because people are new to programming and need their expectations changed, it’s usually because people aren’t new to programming and are having their expectations violated.
Aside: in the python-dev
thread linked to in one of those answers, there are several desirable outcomes that are inconsistent with the choice of making join
a method of the separator:
- The same should happen as for
L"foo" + " " + L"bar"
[Guido] (this would imply being a method ofL"foo"
, not of" "
) - I think it should do … the moral equivalent of a
reduce(operator.add, ...)
[David Ascher]
etc. I think the length of the discussion alone makes it clear that it’s not “obvious” what the functionality should be. Even the “decisive” message calls it “Funny, but it does seem right” [Guido].
That is an indication of a problem with the collection hierarchy (or rather, its lack of one), which has many manifestations all over Python land. The same argument can be made for any desired method that should operate on arbitrary iterable collections, no matter what it does. Perhaps this is the reason we are stuck with "\n".join()
, but its awkwardness is an indication of a problem the language is currently forced into.
That wouldn’t change anything about the reasoning. [x, y, z].strjoin("\n")
is no more difficult than arr.strjoin("\n")
if you don’t already have a list. It exhibits the same improvement over "\n".join(x, y, z)
or "\n".join([x, y, z])
IMO.
That would have been a very poor design decision, so it’s irrelevant. The general case of joining paths is not simply a string join with separators. (Side note - I am the author of the 19-year-old module Path::Class, so I am very familiar with the semantics of constructing path strings.) We can be glad the standard library does not conflate these operations.
Not really true. For the specific case of strings, the work has already been done (and does not need to be particularly clever, it’s a simple buffered string concatenation), and could of course be preserved. For the general case, it is not the general method’s job to deal with allocation complexity, just as it’s not reduce()
's job.
In any case, I can see that a lot of discussion bytes have been offered already over many years about this design decision, so I don’t necessarily need to continue here, but I do feel like it’s worth pointing out the persistent warts in a language every once in a while, even if some people don’t consider them warts at all or have become inured to them over time. I have really admired Python’s PEP process for considering and adopting language changes in a careful way, I think that’s a huge part of why it has become and remained successful.
Just to be clear, my problem with '\n'.join()
style is purely visual. I personally find it difficult to scan such code. It’s not an editor thing - I have the same difficulty with other editors, web-based code displays, etc.
I think I see what you mean - even though it’s syntactically and semantically “obvious” that literals should allow method calls, it’s not common to see them in practice because usually you could inline the result instead of calling the method at all.
Example: you’ll rarely see things like "foo".capitalize()
, because you could just as easily have written "Foo"
. So we condition ourselves over time to see things like that as code smells, and suddenly "\n".join()
starts to look bad.
Is that kind of the idea?
Yes, I think that contributes to my perception, but it’s also literally visual. Like how my eyes scan code on a screen. It’s too visually noisy for me. I’ve had this reaction since str.join
was introduced, although I do like the fact that the join method comes from the separator. The Tim Peters Solution resonated with me immediately when he suggested it, and I’ve adopted it ever since.
Hmm.
" ".split(“spam ham foo bar fum”)
looks pretty nice to me. And split and join are both str functions.
Wonder why it was done differently? Of course you might make the
argument that the first string is the primary operand, and the parameter
is the secondary operand.
Yeah, and that’s cool. str.join
isn’t ever going anywhere and if you like calling it on a literal, Bob’s your uncle!
For those that don’t like calling a method on a literal such as "\n".join(my_list)
, you can also call the unbound method directly such as str.join("\n", my_list)
if that reads better to you.
That’s fine, but it breaks down if you want to split on arbitrary whitespace, which would need to be spelled None.split("spam ham foo bar fum")
.
Non-commutative operators always offer a choice of which order you write the operands. In native language, we can swap the order of even non-commutative operators like subtraction:
- “two dozen less three”
- “three fewer than two dozen”
But we only have one operator for it, x - y
, x subtract y. We don’t have an operator for “y subtracted from x” where the number being subtracted (the subtrahend) is on the left and the number being subtracted from (the minuend) is on the right. It is a historical accident that we write minuend - subtrahend
instead of subtrahend - minuend
.
The same applies for binary operations written as functions or method calls. There’s no overwhelming reason to prefer haystack.find(needle)
over needle.findin(haystack)
but we have one and not the other.
In the case of the in
operator, we even flip the order of operands:
spam in eggs --> eggs.__contains__(spam)
Sometimes we just have to accept that the choice of order is arbitrary. Why did mathematicians put the superscripted exponent to the right instead of the left? x² instead of ²x? You would have to ask Rene Descartes.
I didn’t ignore your reasoning, I found it unconvincing and irrelevant. Such as this argument here:
The question is not whether we’re dealing with strings, it’s which strings are the “semantic subject” of the action and which string is the “semantic direct object” (or even just a “semantic parameter”) of the action.
Okay, that’s a reasonable question to ask if we were debating whether to write sep.split(text)
or text.split(sep)
, and then we could get into a long and tedious argument as to whether the separator is the subject or the object of the action.
But first we really should debate whether the categories of object and subject actually apply to binary operations like split
, and if so, which should be the left-hand operand and which the right-hand operand.
In any case, we’re not debating split
. We’re debating whether join
should be an method on (pick one):
- strings and bytes;
- or lists, tuples, dicts, dict views, sets, frozen sets, iterators, arrays, other sequences, mappings and iterables of all kinds. Including strings and bytes.
Making it just a method on list
alone is not acceptable, because we have the need to join substrings in arbitrary iterables, and requiring the user to convert to a list first is wasteful of time, memory and effort.
Later in your response, you raised this objection:
That is an indication of a problem with the collection hierarchy (or rather, its lack of one), which has many manifestations all over Python land.
shrug
That may be the case, but consider that Python is not Java, and long inheritance chains hurts performance and goes against duck-typing.
If we’re going to start debating the basic execution model of Python, we’ll be here for months. I take it as a given that:
- Iterable is a protocol, not a subclass relationship.
- Long chains of superclass lookups may hurt performance. There is no compile-time resolution of methods.
Consequently, in the Python we have (not the Python we might want), making all iterables joinable as methods on the iterables would require a lot of independent methods.
If Python were different, my answer to the question might also be different. (Or not.)
Exhibit A - how many times have we all seen code like this, needlessly breaking flow for the join?
x = arr.reverse() # or whatever method, or method chain return "\n".join(x) # or "\n".join(arr.reverse())
This is an excellent example of an unconvincing argument. You are aware, of course, that list.reverse
returns None, so that code cannot work. Making join a list method will still fail: arr.reverse().join('\n')
raises, as I am sure you know.
So how is this example an argument for list.join
? Answer: it isn’t, it is irrelevant. The problem you have identified (Exhibit A) isn’t caused by join being a method on strings, and would not be fixed by making it a method on lists instead.
If the list.reverse
method returned the list, then we could write "\n".join(arr.reverse())
and it is still not an argument for list.join
. In fact, we can do that right now if you don’t mind some extra memory consumption, just use reversed()
or slicing arr[::-1]
.
Whichever way you look at it, your exhibit A is irrelevant to the debate about making join a list method.
It is, however, relevant to a bigger debate over postfix vs prefix vs infix notation, and the advantages and disadvantages of pipeling syntax versus function call syntax. A fascinating debate that would be (I personally love pipelining syntax), but so much bigger than the question of whether join should be a method on lists.
In even the most compact form, the flow starts in the middle at
arr
, proceeds to the right, then jumps all the way back to the left …
Yes, infix and function call notation do that. That’s a good argument for postfix or prefix notation. Its not a very good argument for list.join as soon as you do something with the resulting string, like print it:
print(len(arr.reverse().join(sep)) + 1)
So long as we use function call notation and infix operators, execution is going to jump around all over the line. What’s one method more or less?
it’s also literally visual. Like how my eyes scan code on a screen. It’s too visually noisy for me.
Visual idiosyncrasies are idiosyncratic
How do you go with expressions like this?
func("\n").method()
If that’s fine, perhaps it’s not the noisiness, and maybe you could separate the dot from the quote by using the same trick: put brackets (parens) around the literal.
("\n").join(lines)
Visual idiosyncrasies are idiosyncratic
For sure! TOOWTDIEWTI
Your func()
example looks fine to me, except that I would use single quotes instead of double quotes [1].
Ultimately this is a personal preference. I like the Tim Peters Style for str.join
; it’s what I use in my code, and it’s what I advocate for in code I review. But hey, you do you!
which is why I prefer
blue
overblack
, but let’s not get into that debate! ↩︎
This is an excellent example of an unconvincing argument.
I don’t need to engage in this discussion anymore if this is the way you’re going to go about it.