Operators with strings

in Python,
“A” + “V” = “AV”
but “AV” - “V” produces an error.

seems like in a world of NLP, python should have more
string operators (-. +, etc)

example

xyz = “the guy is a turkey”

current python = just another immutable string

strings like xyz shoud have
methods like
'xyz.author"
xyz.date
xyz.true.opinion
etc

just my two cents…
thank you,
James

This is best left to a custom string type in an NLP library. LLMs do not belong in the standard library.

4 Likes

(I have moved this from #peps to #ideas.)

2 Likes

String concatenation (which in Python happens to use ‘+’ instead of ‘|’, for instance, is a very standard operation. There is no one string ‘subtraction’ operation. Among others, Python has

>>> 'AV'.removeprefix('V')
'AV'
>>> 'AV'.removesuffix('V')
'A'
>>> 'VAVAV'.strip('V')
'AVA'
>>> 'VAVAV'.replace('V', '')
'AA'
4 Likes

clearly all your examples work fine, but
‘AV’ - ‘V’ is simpler, faster. 8 char vs ~22

you

There is no one string ‘subtraction’ operation

me
“ There didnt used to be airplanes either”

Well, it’s definitely faster right now because it simply raises a
TypeError exception. There is no defined behavior for it, at least
not yet.

I think what’s missing from your proposal is an explanation of which
of the many behaviors that could be described as “string
subtraction” you suggest that operator should perform. A list of
some possibilities was suggested. Which one do you think is most
like “subtraction” and why? Or is there a different behavior you
think would fit better?

Also be aware, many proposals have been made for new operator
behaviors purely on the grounds that they don’t exist, without a
clear reason for their necessity. Those proposals never go anywhere,
because the sheer absence of something isn’t reason enough to create
it.

The existence of a + operator for concatenation doesn’t demand that
there be an opposite - operator to balance it out. And what exactly
would the absence of concatenation be? How would you suggest to
define it precisely in a function, for a clear prototype?

1 Like

by paragraphs

  1. states the obvious. IF its not obvious, at a higher viewpoint I’m
    suggesting burying as much of the string operations boring code by
    condensing it into single letters that follow a + or a - sign.

encapsulation… by condensing string method calls to a single letter.

R for recursive, S for split, T for trim etc….

  1. a list of possibilities was suggested, all less clear, slower
    than my example. using the 3rd argument with letters in range A-Za-z
    could reduce string operations to a single letter for ~50 diff string ops.

thats prob a few billion less lines of boring string manip code or more per year :slight_smile:

  1. i’ve shown the clear reason for a minus sign in string work already
    (far less typing, faster, intuitively clear)

  2. absense of concantenation?
    same as what + sign does in absense of concantenation (math work)

one function possibility:

def minus_overload(left_operand, right_operand, W=None ):

if left_operand or right_operand != string:
return error
else:
length1 = len(right_operand)
length2 = len(left_operand)

match(length1)
case 1:
#using PCRE
trim last letter from left_operand matching right_operand
case length1 in range(2, length2):
#using PCRE
remove last substring found in left_operand == right operand or
remove last matching string in left_operand == right_operand
where matched string bounded by \b on both sides
case other:
R= None
break

if W == ‘R’ || W == ‘r’:

minus_overload(left_operand, right_operand, R=“R")

elif ……

.
.
.

else:
return

#if user wants to remove more than one instance of right_operand then call with R or r (recursive)

eg. “AV….” -R “V” or “AV” … -r “V” etc…

personally i dont care what people do. but as is normal in my life for
many decades, in five years they will wish they listened to a good idea.

James D
CEO of DSL

Sorry about that. Given the date you posted your proposal and the
nature of what it suggests, I mistook it for an April Fools’ Day
joke, so was playing along in my reply. Since you’re continuing, I
guess you were actually serious about this idea. My apologies if I
led you on with the impression that I was also taking it seriously
in any way.

Concatenating, splitting, truncating string objects: there are many ways to do that, as is, and as such I don’t see the need for any additional operators.

As for:

… it begs questions: where, when and at what point are these attributes to be stored?

If I code error_message = "Invalid input.", then where is error_message.author and error_message.date going to come from? Am I the ‘author’? And error_message.date: is that the date that the code was run, or the date is was written? I don’t understand what error_message.true.opinion would even mean.

Is this thought akin to how some document processors work, in so much as you can see such details, in the metadata?

I too thought this thread to be a April 1st prank, so @fungi, you’re not alone.

2 Likes

Maybe, but the truth is, we have many different types of airplanes. And we could only have one subtraction operator. I would say the two most plausible meanings would be “replace-with-nothing” and “removesuffix”, with the former being the preferred one, but there will be many people who have other opinions.

The trouble is, we have two equally logical expectations. One:

>>> s = "Hello "
>>> t = s + "world"
>>> assert t - "world" == s

And two:

>>> s = "Words, words, words"
>>> assert "," in s
>>> assert "," not in (s - ",")

If you subtract a string out of a string, obviously that string isn’t still in it, right? And equally obviously, adding a string and then subtracting it again gives back the original string, right?

In practical terms, I think “replace with empty string” is more useful, but “add then subtract returns you to where you were” is a more useful invariant. So, go figure.

You should understand this sentence you quoted not with an emphasis on the “There is”, but with an emphasis in the “one”. What is relevant here is that many operations can play the role of what could be analogous to a subtraction.

Strings form a monoid under the operation of concatenation, but they are not a group.

The “subtraction” that you want to denote by - is rather only a “cancelation”. This is the operation of deducing from an equation a\cdot c=b\cdot c, the equation a=b. The monoid formed by strings and concatenation has the cancellation property. It is good for it to have a specific name, like removesuffix (or removeprefix) to distinguish those already two different operations, and all others. There is a way to define a natural operation that would deserve to be called -, but the objects on which this operates are no longer the original strings (see Grothendieck group).

Thank you for clarifying that I meant “There is not only 1 possible ‘subtraction’ operation but many.” I gave 4 possible interpretations with the string being subtracted being a single letter. For each of those, there are multiple possible interpretations. If the single letter does not appear an appropriate place in the original, raise ValueError versus do nothing and return the original string. If multiple letters appear in appropriate places in the original, remove just one occurrence versus remove all.

If the string subtracted has multiple characters, we can interpret it as multiple single characters or one multiple character substring. Beginners often misinterpret the deletion argument of the strip functions as a substring rather than chars, and are hence surprised by:

>>> 'cbcaxxxxxabyc'.strip('abc')
'xxxxxaby'

The .removeprefix and .removesuffix methods were recently added to cover the remove exact substring case. We will not add a synonym for the latter.

And then there is deletion by index position regardless of value.

i’ve outlined a good idea, consistent with the substantial operator
overloading going on in Python already (**, *=, :=. //. “[LIST], etc)

good ideas persist. when someone who has a solid ROI for my
good idea discovers it, implementation will follow shortly thereafter.

i do like the Explicit over implicit rule of thumb, but for me
(Abstraction/Encapsulation) > Explicit > implicity

good luck all…

James