I like concat()
name, but can see that the strcat()
means its easy to have a bytescat()
Not sure how to spell the bytes version of concat()
maybe bytesconcat()
?
I would use bytescat()
a lot in network programming.
I like concat()
name, but can see that the strcat()
means its easy to have a bytescat()
Not sure how to spell the bytes version of concat()
maybe bytesconcat()
?
I would use bytescat()
a lot in network programming.
Is it possible to add a built-in without causing all kinds of collisions?
If you didnât want two functions, you could make it work on bytes and strings:
@overload
def concat(components: Iterable[Any], sep: str) -> str: ...
@overload
def concat(components: Iterable[Any], sep: bytes) -> bytes: ...
I also like the @vovaviliâs flag idea below since switching on the separator type is weird. So, maybe add a coercion_type: type[T] = str
parameter, and it would work with any string, bytes, or other subclass?
I believe the desire behind adding such a function was to have something that took Iterable[Any]
and coerced inputs using str
automatically to avoid the verbosity of sep.join([str(item) for item in items])
. So overloading like this wouldnât make sense for this use case. (Personally, I have no interest in adding such a function - but I do want to keep the non-coercing behaviour of the current .join
method which the original proposal wanted to remove).
If such a function were added for strings, I think it would possibly fit better inside the string
module in stdlib (alongside capwords
) rather than being a builtin though.
Is having two functions that basically do the same thing really more elegant than having one function with an optional flag?
Right, Iâll edit my post.
Depends on the API details. It could be elegant or very ugly or error prone to use. How would concat know to use str vs bytes?
What would the one for bytes do? Maybe:
In [17]: def byteconcat(joiner, iterable):
...: return joiner.join(bytes(ascii(obj), 'ascii') for obj in iterable)
...:
In [18]: byteconcat(b',', [1, 3.0, 'this'])
Out[18]: b"1,3.0,'this'"
You would only call ascii() if the object is not bytes already.
Also encode to latin-1 might be required in an HTTP environment.
Also you have changed the order of args.
With sep=bâ â and end=bââ.
Please donât. HTTP supports UTF-8, and the W3C recommends it.
And neither is necessary on the string returned by ascii()
, as itâs guaranteed to be ASCII-only.
If there will be a function for concatenating bytes, it should be called catbytes
. Please.
Why catbytes
? The text version isnât catstr
. Is there some joke/pun here that Iâm not seeing?
Well, yes catbytes
, when it is strcat
.
Yeah, I donât get it.
Donât worry. It is a dumb joke and the explanation will make it dumber:
catbytes
: âcat bitesâ
strcat
: âstreet catâ
Ah okay, I didnât recognize the second one, which is why I missed the joke. Thanks for explaining - Iâm the sort of person whoâd rather KNOW how awful a pun is than be confused by it
Which is why Iâm confused â this started with the idea of âautomatically coercingâ to str â but with bytes, thereâs not easy or obvious way to coerce to bytes. and if everything is bytes already, then all this does is add an optional end â maybe worthwhile, but pretty minimal functionality.
To be fair, I hardly ever work with bytes, so maybe Iâm missing common use-cases.
I want this for HTTP headers not the content. Content is opaque data in a lot of code i work with.
In headers iso-8859-1 is required in for talking to servers in the wild.
See RFC 9110: HTTP Semantics
Just some extra info, since yak shaving is ongoing full strenght:
My main use case is indeed with integers and floats.
repr
would work - and that is probably why some people found more mapping to repr
than to str
- but I think str
is unambiguously the desired output for this:
As for bytes: I really do not care - this is a text related thing - and the str
for bytes
which is their repr
is really not useful. Bytes accepting strictly bytes and bytearrays would make sense.
My thoughts after reading through the thread:
my_string.join(things)
at this point in the game, even if itâs to make it more permissive. People do rely on bad data causing an error in these situations (especially with None
).string
standard library is a hack that exists because the class implementation was underpowered. All that stuff belongs either in str
or under a deprecation notice - thatâs off topic for the current discussion, but letâs at least please not add more things to string
this late in the game.strcat
is a terrible name - the standard library should be moving away from using C-based names for ârecognizabilityâ, not towards that. concat
is not great; itâs already a shortened form of âconcatenateâ, which itself is a somewhat esoteric word. A lot of new programmers in 2023 just donât have that kind of vocabulary. They should, however, at least understand the concept of an iterable, since thatâs all over the Python documentation.bytes
. bytes
already contains a lot of misguided stuff aimed at incorrectly treating the type as textual (yes, yes, backwards compatibility, I know. Iâm pretty dogmatic about this.) It could work in principle, maybe, but the default coercion rule isnât at all obvious.My proposal is to add a class method to str
, along the lines of:
@classmethod
def from_iterable(cls, items, sep='', end='', coerce=cls):
if not isinstance(sep, cls):
raise TypeError(f"'sep' must be a '{cls.__name__}' instance")
if not isinstance(end, cls):
raise TypeError(f"'end' must be a '{cls.__name__}' instance")
to_join = items if coerce is None else map(coerce, items)
return sep.join(to_join) + end
Now there is default behaviour that doesnât require the weird ''.join
syntax, and does the desired coercion. The name is clear and explicit, and has precedent (in itertools.chain
). Those who donât want that coercion can pass coerce=None
(and still gain some nice sugar for the case of adding a end
suffix - analogous to what print
supports). Or they can use repr
, or custom coercion where appropriate (say, ':0x'.format
). Everyone wins (I think?).