I just wanted to notice dedent() is not just removing common prefix from each line.
Maybe, I misread your comment. You just demonstrated that non-regex approach outperforms regex-based approach. So I didnāt need to mention about difference between dedent and removing common prefix.
Another reason to not using regex is simplicity.
One of goal of str.dedent() method is compile time processing (constant folding).
Users may write long multi-line SQL with triple-quote. ...""".dedent() makes it easy to compile time processing. User can save both of runtime cost and RAM usage.
And using sre in builtin str method is complex than implementing non-regex based approach in C.
Jon Crall implemented it already in the pull request. So I want to reuse it for str.dedent().
Iām not sure about adding builtin method always needs PEP or not.
FYI, adding this method had been decided once in 2013. See this comment.
To end proposals for new syntax to do what they do for triple-quoted strings. Nick Coghlan gave reasons as follows: run time cost small, can be optimized away, would be used more than some other string methods. [Python-ideas] Idea for new multi-line triple quote literal
A few things changed in 10 years. For example, Guido is no longer in charge, but the Steering Council. But IMO this approval from the time machine is still relevant
I think that the most difficult point was to decide to accept only a string or also accept a tuple of strings, similar to startswith(). Well, read the PEP for details.
A PEP is a nice place to drop links to past discussions, summarize the rationale, etc.
I would like to register strong opposition to adding new text-like functionality to bytes. The following is a brief rant that superficially digresses from the topic (the issues I describe here oughtnāt matter to the proposed functionality), but motivates my opinion on the matter.
Iām mildly against adding new textlike stuff to bytes for all the
reasons Karl enumerates, but wanted to add a counter example on the
topic of bytes looking a bit like text.
Karl writes:
To say nothing of the fact that privileging ASCII is just not actually
that useful for dealing with legacy data - considering that ālegacy
dataā exists worldwide and lots of systems used to blithely assume a
ānativeā encoding not recorded in metadata anywhere.
Iām writing a PDF parser at present, and PDF is the very image of a
binary data format masquerading as ASCII text. With raw binary byte
streams embedded in it, which can only be parsed correctly if you
parse/evaluate the dictionary which preceeds it
And it doesnāt entirely pretend to be pure ASCII outside those streams,
the PDF specification of a name is nearly āa sequence of bytes which
doesnāt include the ASCII whitespace code valuesā. Gah!
Anyway, being binary, Iāve benefited enormously from the existence of
binary regexps.
If weāre including āformatsā in āprotocolsā, PDF (Iām using a binary
regexps in my own code). It reads like ASCII text with allowed higher
byte values without specifying that theyāre in a character set, with
some raw binary shoehorned in.
And isnāt HTTP technically binary-with-ASCII-headers; I seem to recall
you canāt treat it as ASCII (sorry, no citation)?
To be fair, the code page system was a pretty decent kludge for its purpose, and it has the consequence that lots of things are at least ASCII-transparent. Itās just that you still need to figure out what to do when you get a byte with the high bit set. And no, a ānameā that contains bytes in an unknown encoding and unknown semantics besides ānot ASCII whitespaceāā¦ still doesnāt really need to be thought of as ātextā.
I mean, Youāre just going to be comparing it for equality, right? Not, say, trying to uppercase it, when you donāt even know if a given byte represents Ć, let alone whether a Turkish locale should be assumed?
For clarity, though, I donāt think thereās anything wrong with the concept of byte regexes per se. I just wish they didnāt preserve text-like concepts like āword characterā so strongly, and wish they did make it easier to, say, match a byte with a given value for the low nybble (or even specific bit patterns).
Based on recent precedent (e.g. str.removeprefix in PEP 616, fully qualified names in PEP 737, I think adding new builtin methods should go through a PEP, but the Steering Council hasnāt said this explicitly.