String joining design

One can use a \N{} named escape if that’s easier to read, parse or remember. They’re based on the Unicode character names and aliases. For example:

>>> '\N{COMMA}\N{SPACE}'
', '
>>> '\n' == '\N{NEW LINE}' == '\N{LINE FEED}' == '\N{END OF LINE}'
True
>>> '\n' == '\N{NL}' == '\N{LF}' == '\N{EOL}'
True
3 Likes

Hey, this is great! ‘\N{NL}’.join is even longer than str.NL.join for
those that like verbosity, but it also has the definite advantage of
using standardized names that would be hard to fake or override.

I think it’s notable that I can write ' '.join(mylist) but not 10.bit_length() – but I can write (10).bit_length()! :face_with_spiral_eyes:

Jumping to discussion of constants sort of skips an important point that join being a named method of str objects wasn’t the only way that string-joining could have been implemented. Just to toss out some ideas

>>> join = str.join
>>> join(", ", "abc")
'a, b, c'
>>> class MyStr(str):
...     def __pow__(self, other):
...         return self.join(other)
>>> s = MyStr(",")
>>> s ** "abc"
'a,b,c'

Perhaps the history of %-formatting makes us wary of using an infix operator, but isn’t str.join part of the class of common and “fundamental” operations which might deserve to be operator-ized?


On the subject of defining constants and putting them somewhere, I would strongly prefer that they be in string, not attributes of str.

string already contains several useful constants, so it seems like a reasonable place to add more. Furthermore, to a previous point, this mirrors math having constants and parallel structures are good for comprehension.

These constants are almost uniformly more verbose than the literals. So I don’t think verbosity makes sense as a reason to reject import string. If str.NEWLINE.join(...) is better than "\n".join(...), then doesn’t the same argument apply to import string; string.NEWLINE.join(...)?

I’ll also say that I am somewhat against shorthand names like NL. IMO, string.NEWLINE is self-evident. string.NL is simple once you know what it means.

1 Like

Yes! Although, be careful: not every Python supports the same \N escapes. So by doing this, you’ll be stopping your code from running at all on MicroPython, and different versions of Python will have different lists of supported names. Names like “NEW LINE” will probably work on pretty much any CPython (even CPython 2.7), but errored out on PyPy 2.7.18/7.3.3 (it recognized the \N syntax but didn’t know the name “NEW LINE”), and I couldn’t even get Jython to recognize that the line was complete. I haven’t run into problems with the “\N{NL}” alias on any of the Python 3 implementations I have installed (including PyPy3 3.7.10/7.3.5), but you’d have to check other character aliases to see whether they got added.

In any case: names and aliases should never be removed, only added, but this should be considered another factor of compatibility to be considered.

Support for aliases was added in Python 3.3, so none of the newline alias examples that I gave works in a 2.7 u"" string literal. I only gave those examples to demonstrate alias support. The main reason I posted was to suggest a flexible, compile-time alternative to avoid “collections of little marks”, such as using "\N{COMMA}\N{SPACE}" instead of ", ".

Ah, that explains it - I don’t have any Python 3.2 installed here any more.

Yes, definitely. If people are bothered by “\n”, then “\N{NEW LINE}” is there to help. Very few downsides compared to str.NEWLINE, in my opinion.

The part that “bothers” (maybe irks is the better term) me about "\n".join(...) is that you are accessing a member/method of a literal. It looks weird even thought I’m used to it and not actually bothered by it.

Calling a method on a literal/display doesn’t bother me. It’s an object like any other. The part that bothers me a bit is thinking of the join() constructor as a method of the separator. I would have preferred a class method, str.join(iterable, sep=" "). But the die is cast.

4 Likes

Am I the only one who thinks that April 1st is too early this year?

24 Likes

I don’t think it will map well to Python necessarily but in Rust iterators have a join method i.e.:

some_vector.iter().join("\n")

“Die is cast” - indeed. We even had join in the string module that took this syntax, but that went away with Python 3.

On the consistency front, it’s also fun that str.join takes an iterable, but os.path.join uses unpacking so it doesn’t. Makes me stop and think now and then. :slight_smile:

1 Like

Not a lot, perhaps, but every bit counts.

Space is not such a simple concept. For example, Emacs regular expressions have a [:space:] character class. It’s meaning is logical, but not entirely obvious. If I’d never heard of str.SPACE before, I could easily imagine it being a one of a kind object with its own join implementation that did something special with whitespace.

I see ' '.join as the one obvious way to do it. Adding another way to do the same will split the community stylistically.

1 Like

More importantly, it would make a lot of sense if str.split(str.SPACE) split on whitespace, and this will potentially cause confusion. But it doesn’t say WHITESPACE, it says SPACE, so hopefully not TOO much confusion. Personally, I’m not seeing a lot of point in having this, though.

str.NEWLINE has a similar problem. Does it mean \n or does it mean something more akin to “universal newlines”? Again, the problems aren’t so much when using this with join, but they become much more visible elsewhere - split, and things like str.NEWLINE in s.

I agree that I don’t see enough benefit. For people who want to use named values for join for stylistic reasons, just declare a global constant and leave it at that.

7 Likes

This was discussed when methods from the string module were being added to the str object but was ultimately rejected.

I’m pretty sure [1] that string methods predate “iterators”.


  1. but not 100% and it’s not important enough to look up ↩︎

Oops I misread that as iterable instead of iterator. Because I think I remember reading people wanting to put it on the list object but I might be wrong.

When I read that, I thought this conversation would go in a completely different direction. My problem with "\n".join(arr) isn’t that \n is some mysterious thing that needs an alias - anyone who’s been programming for more than a week in almost any language probably knows what \n is. And calling it NL probably only confuses things more - “where did this variable come from?” - “what actual value does it have?” etc.

Rather, the dislike I’ve always had for "\n".join(arr) is that it uses a string method rather than a list method. Joining a sequence of together with a separator is a fundamental property of lists/arrays, much more than it is of a single string. So I’d expect, if I didn’t know better, to see arr.join("\n") instead. That also allows for a more natural default when no separator is desired: arr.join() (instead of the even more unsightly "".join(arr), which puts the empty string as the protagonist of the story, even though no empty string is even necessary to carry out the operation).

Furthermore, it also suggests further development in a couple directions - one that coerces the elements to strings using str(), (call it arr.strjoin() or something), and one that does whatever + does on the elements of the sequence (call it arr.add() or arr.sum() or something). If the elements are strings, they act the same; but lots of other things with + semantics (chief among them: numbers) could benefit from this too. For example, widths.sum(padding) is easier on the eyes than sum(widths) + padding*(len(widths) - 1), especially if widths` is empty.

Implementation would be pretty easy, it’s just a left-associative list reduce() using operator.add as the function, with some extra optional interstitial elements.

Anyway, my point with that exposition is that I think having string-joining-with-separator as a method of the separator, rather than of the list, has missed what could have been a useful API direction - but certainly it’s never too late to pursue it again if desired.

Was there any talk at that time of making it a method of lists or other iterables?

I’ve always assumed that this was one of the “may not be obvious unless you’re Dutch” situations.

Yes there was, and a lot of talk! I recall Guido decided on the current str.join with logical justification. But i do not recall the detail. Is there an archive of python dev from that time?