Allow str.join to take *args in addition to iterable (like min/max)

nedbat · January 26, 2020, 4:55pm

There is reference-counting sorcery in unicodeobject.c also:

python/cpython/blob/main/Objects/unicodeobject.c#L11440-L11455


      
                  /* Concat the two Unicode strings */
                  res = PyUnicode_New(new_len, maxchar);
                  if (res == NULL)
                      goto error;
                  _PyUnicode_FastCopyCharacters(res, 0, left, 0, left_len);
                  _PyUnicode_FastCopyCharacters(res, left_len, right, 0, right_len);
                  Py_DECREF(left);
                  *p_left = res;
              }
              assert(_PyUnicode_CheckConsistency(*p_left, 1));
              return;
          
          error:
              Py_CLEAR(*p_left);
          }

brandtbucher · January 26, 2020, 5:15pm

There is reference-counting sorcery in unicodeobject.c also:

Well, sure, PyUnicode_Append lives there… But this is basically just the implementation of unicode_concatenate, right?

I don’t see that it is actually used anywhere in unicodeobject.c for string operations or APIs (but it does look like it’s used as a shortcut in the compiler and a handful of stdlib modules).

ruud · January 26, 2020, 5:48pm

I am still very puzzled on why sum excludes a string from being used as a start. That is the ultimate of not doing one thing that’s understood by all users. It wouldn’t have broken at all if behind the curtains if a string sum would have been implemented differently. The only problem I can see that when an object is inherited from str had defined a __add__ method. But that could have been detected by checking if the __add__ of start equals the __add__ method of str. Like (pseudo code)

def sum(iterable, start=0):
    if isinstance(start, str) and start.__add__ == str.__add__:
        return start + "".join(iterable)
    return orgsum(iterable, start)  # orgsum is like current sum without a TypeError for str's.

I think it’s time to rethink this, as the developers (@guido and @rhettinger ?) seem to have overseen this possibility.

brandtbucher · January 26, 2020, 6:10pm

I am still very puzzled on why sum excludes a string from being used as a start. That is the ultimate of not doing one thing that’s understood by all users. It wouldn’t have broken at all if behind the curtains if a string sum would have been implemented differently.

The point I was getting at is that there’s already a correct way to do this that is efficient without lying to the user. If there wasn’t another simple way, then this would be a very different discussion.

For what it’s worth, I too think the TypeError is a bit heavy-handed, especially considering how nice the interpreter is regarding string addition in general. But I do understand that the motivation was probably educational rather than technical, and in general I don’t think it’s a bad thing.

This is an unusual case where the Right Way and the Obvious Way are not necessarily the same thing. So new users make the mistake once, learn the correct way, maybe Google why it’s better, and move on to greater things.

ruud · January 26, 2020, 7:01pm

The Zen of Python says:
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you’re Dutch.
Well that obvious way is to use sum to concatenate a list of strings. That might not be obvious at first, but (like @guido) I am Dutch.
I would like to propose a PEP to change the behaviour of sum for strings. What would be the best way to proceed? Anyone who is familiar with this process, please advise.

brandtbucher · January 26, 2020, 7:09pm

There should be one-- and preferably only one --obvious way to do it.

I feel that this is easily the most misunderstood line of the Zen. Nobody is arguing that sum isn’t obvious, or that str.join is. It’s been pointed out several times, though, that the right way (which the Zen says nothing about), in this case, is different. That’s probably why the author of sum felt justified in raising an error here.

I would like to propose a PEP to change the behaviour of sum for strings. What would be the best way to proceed? Anyone who is familiar with this process, please advise.

I am familiar with the process. The first steps are gathering feedback and finding at least one core developer who agrees with you.

aeros · January 26, 2020, 10:05pm

Adding to this, you can also consider starting a new thread on python-ideas to reach a wider audience. Just make sure that you try to address all of the main points that have been previously brought up against it (including those in this topic).