Summary
I think it should be discussed whether shlex.quote should be changed to raise TypeError for falsey non-str data, by moving the type check to the top of the function, before the empty-string special-case check. It caught my eye when implementing #148846 and it sticks out as a (potentially bad) cause of silent failures, most considerably whether None should continue to be coerced to "''".
Current:
def quote(s):
if not s:
return "''"
if not isinstance(s, str):
raise TypeError(f"expected string object, got {type(s).__name__!r}")
...
Proposal to discuss (just swap the order):
def quote(s):
if not isinstance(s, str):
raise TypeError(f"expected string object, got {type(s).__name__!r}")
if not s:
return "''"
...
History:
PR #132036 addressed Issue #118761 to speed up the import time of shlex, but by doing so changed what exception was thrown when the wrong type of argument for s (anything but str) was passed to shlex.quote. Before commit 06a26fda of #132036 a TypeError was thrown by re.Pattern.search. In 06a26fda or newer shlex.quote tried to access the (str.)isascii attribute of s, which threw an AttributeError for s of non-str type.
This was observed in Issue #138804 and resolved by PR #138809 (commit 220c0a81), by adding an isinstance(s, str) check towards the top of the function which raises a TypeError early lest an AttributeError be raised by trying to access s.isascii, or perhaps even worse s.isascii() and ... being called on s of a type that implements isascii differently than str.
The isinstance check was originally going to go at the top of shlex.quote, but it was discussed in a review of #138809 that having shlex.quote raise TypeError for falsey non-str data would technically be a breaking change, because the existing behavior both before and after #132036 is to return "''" for when s is falsey, eg for None or False or 0, regardless of whether the type of s made sense to be passed to quote.
def quote(s):
if not s:
return "''"
...
Thus the implementation left the special case of returning "''" for falsey data at the top of the function, before the type check, even though really it is intended specifically to test whether the empty string of length 0 has been passed, which needs to be treated as a special case in string quoting; the boolean check of if not s is just a Pythonic shortcut around if len(s) == 0, the latter of which is certainly not intended (in my eyes) to silently fail for None or False or int data which do implement __bool__, but do not implement __len__.
The documentation for shlex.quote only mentions strings, and does not indicate that the coincidental behavior for non-str data is intended.
Compatibility
Since we have established that if not s was intended to check for the str of length 0 via if len(s) == 0, not just to check for any falsey argument s, it remains to ask âis there anyone intentionally relying on shlex.quote returning the string "''" (the string of length 2 consisting of two single quotes) for arbitrary falsey data?â.
I would guess no with one exception (None) that we will discuss later. For now we eliminate all other cases:
For s whose type is any class that doesnât implement __bool__, s is true-y, that is the if not s check always returns False, and TypeError is then raised as expected. We therefore inspect falsey builtins:
If one were to imagine that someone out there incorrectly thinks that shlex.quote is for another type proximal to str, eg bytes, surely they would have passed a non-zero length string of bytes to shlex.quote by now, which causes TypeError to be raised, which would prompt them to change to using str? If not then their code is purely a b"" to "''" machine which doesnât seem to have any real uses, observing the difference in types. Therefore I wouldnât feel bad about making a âbreaking changeâ of raising TypeError even for that empty bytes string.
I can imagine False being passed to quote only as a user error, eg if one wrote shlex.quote(s.islower()) instead of shlex.quote(s.lower()). However since the True case of str.islower would cause a TypeError, and shlex.quote("ABC".islower()) returns an "''" which one notices is not "abc", I donât think this is likely to be in anyoneâs working code.
Nobody is passing ints to shlex.quote right? Why should 0 â "''"?
Therefore the main point I can see that needs debating is shlex.quote(None)'s behavior of coercing None to "''".
Should None continue to be coerced to "''"?
Continued postâŚ