Why not make str.join() coerce the items in its iterables

I like concat() name, but can see that the strcat() means its easy to have a bytescat()
Not sure how to spell the bytes version of concat() maybe bytesconcat() ?

I would use bytescat() a lot in network programming.

Is it possible to add a built-in without causing all kinds of collisions?

If you didn’t want two functions, you could make it work on bytes and strings:

@overload
def concat(components: Iterable[Any], sep: str) -> str: ...

@overload
def concat(components: Iterable[Any], sep: bytes) -> bytes: ...

I also like the @vovavili’s flag idea below since switching on the separator type is weird. So, maybe add a coercion_type: type[T] = str parameter, and it would work with any string, bytes, or other subclass?

1 Like

I believe the desire behind adding such a function was to have something that took Iterable[Any] and coerced inputs using str automatically to avoid the verbosity of sep.join([str(item) for item in items]). So overloading like this wouldn’t make sense for this use case. (Personally, I have no interest in adding such a function - but I do want to keep the non-coercing behaviour of the current .join method which the original proposal wanted to remove).

If such a function were added for strings, I think it would possibly fit better inside the string module in stdlib (alongside capwords) rather than being a builtin though.

Is having two functions that basically do the same thing really more elegant than having one function with an optional flag?

2 Likes

Right, I’ll edit my post.

Depends on the API details. It could be elegant or very ugly or error prone to use. How would concat know to use str vs bytes?

What would the one for bytes do? Maybe:

In [17]: def byteconcat(joiner, iterable):
    ...:     return joiner.join(bytes(ascii(obj), 'ascii') for obj in iterable)
    ...: 

In [18]: byteconcat(b',', [1, 3.0, 'this'])
Out[18]: b"1,3.0,'this'"

You would only call ascii() if the object is not bytes already.
Also encode to latin-1 might be required in an HTTP environment.

Also you have changed the order of args.

With sep=b’ ‘ and end=b’’.

Please don’t. HTTP supports UTF-8, and the W3C recommends it.

And neither is necessary on the string returned by ascii(), as it’s guaranteed to be ASCII-only.

If there will be a function for concatenating bytes, it should be called catbytes. Please.

Why catbytes? The text version isn’t catstr. Is there some joke/pun here that I’m not seeing?

Well, yes catbytes, when it is strcat. :laughing:

1 Like

Yeah, I don’t get it.

Don’t worry. It is a dumb joke and the explanation will make it dumber:

catbytes: “cat bites”
strcat: “street cat”

1 Like

Ah okay, I didn’t recognize the second one, which is why I missed the joke. Thanks for explaining - I’m the sort of person who’d rather KNOW how awful a pun is than be confused by it :slight_smile:

2 Likes

Which is why I’m confused – this started with the idea of “automatically coercing” to str – but with bytes, there’s not easy or obvious way to coerce to bytes. and if everything is bytes already, then all this does is add an optional end – maybe worthwhile, but pretty minimal functionality.

To be fair, I hardly ever work with bytes, so maybe I’m missing common use-cases.

1 Like

I want this for HTTP headers not the content. Content is opaque data in a lot of code i work with.

In headers iso-8859-1 is required in for talking to servers in the wild.
See RFC 9110: HTTP Semantics

Just some extra info, since yak shaving is ongoing full strenght:
My main use case is indeed with integers and floats.

repr would work - and that is probably why some people found more mapping to repr than to str - but I think str is unambiguously the desired output for this:

As for bytes: I really do not care - this is a text related thing - and the str for bytes which is their repr is really not useful. Bytes accepting strictly bytes and bytearrays would make sense.

1 Like

My thoughts after reading through the thread:

  • The functionality definitely deserves to exist and be provided by default. It is very much in line with the usual approach to incrementally improving the builtins and standard library.
  • It would be bad to change the behaviour of my_string.join(things) at this point in the game, even if it’s to make it more permissive. People do rely on bad data causing an error in these situations (especially with None).
  • “There would be a complicated type signature” is a terrible reason to hold off on features. This is bending to the whims of people who fundamentally want Python not to be Python. It is still a dynamically-typed language, and manifest typing is especially obnoxious. Trying to add “generics”/“template types” support has made things even worse - Java already shows what happens when you try to bolt that onto a system that wasn’t designed for it from the beginning.
  • The string standard library is a hack that exists because the class implementation was underpowered. All that stuff belongs either in str or under a deprecation notice - that’s off topic for the current discussion, but let’s at least please not add more things to string this late in the game.
  • strcat is a terrible name - the standard library should be moving away from using C-based names for “recognizability”, not towards that. concat is not great; it’s already a shortened form of “concatenate”, which itself is a somewhat esoteric word. A lot of new programmers in 2023 just don’t have that kind of vocabulary. They should, however, at least understand the concept of an iterable, since that’s all over the Python documentation.
  • Fundamentally, converting elements of a sequence to string and concatenating them is a creational design pattern. We already have idioms for that.
  • This is fundamentally a textual operation, so I’m opposed to implementing anything analogous for bytes. bytes already contains a lot of misguided stuff aimed at incorrectly treating the type as textual (yes, yes, backwards compatibility, I know. I’m pretty dogmatic about this.) It could work in principle, maybe, but the default coercion rule isn’t at all obvious.

My proposal is to add a class method to str, along the lines of:

@classmethod
def from_iterable(cls, items, sep='', end='', coerce=cls):
    if not isinstance(sep, cls):
        raise TypeError(f"'sep' must be a '{cls.__name__}' instance")
    if not isinstance(end, cls):
        raise TypeError(f"'end' must be a '{cls.__name__}' instance")
    to_join = items if coerce is None else map(coerce, items)
    return sep.join(to_join) + end

Now there is default behaviour that doesn’t require the weird ''.join syntax, and does the desired coercion. The name is clear and explicit, and has precedent (in itertools.chain). Those who don’t want that coercion can pass coerce=None (and still gain some nice sugar for the case of adding a end suffix - analogous to what print supports). Or they can use repr, or custom coercion where appropriate (say, ':0x'.format). Everyone wins (I think?).

4 Likes