Coerce Path objects in shlex.join

At the moment, shlex.join raises

TypeError: expected string or bytes-like object

when given pathlib.Path objects. Since subprocess.run and related subprocess commands allow Path since Python 3.6 and 3.8 for Windows, it seems intuitive to me that shlex.join be able to handle paths as well. This has been briefly discussed before on the bug tracker, but with the caveat that it was the wrong venue, and should be brought up on Python Ideas instead. I can’t find any record that it ever made it over here (or the mailing list), and I am curious to get this community’s perspective.

First, I’d like to note that while this shares similarities with the hashed1 (and rehashed2, and rehashed3, ad infinitum4) idea to coerce inputs of str.join to strings, I think this is a different issue. While I’d appreciate the convenience, I tend to agree that str.join and shlex.join should probably not implicitly coerce everything to strings. I’m only suggesting the coercion of os.PathLike objects to strings for shlex.join. (I recognize it may still get the same response, but I believe the subtlety might be an important distinction.)

My understanding of arguments against are that shlex.join should not implicitly convert its arguments to strings, and more importantly that Path objects are not special enough for an exception. However, subprocess takes the exact opposite approach, and has an explicit isinstance check to allow and convert os.PathLike objects. I’m assuming that some of the reasoning behind allowing path-to-string conversions there has to do with the prevalence of using paths as command line arguments (unlike integers, classes, or anything else you might want coerced for convenience). It seems to me like this could be rationale enough for why Path objects warrant an exception to the “don’t-coerce-inputs-to-strings” unwritten rule for shlex.join.

Without getting into the specific details of implementation, I’d imagine that changing this to just convert os.PathLike to strings in shlex.join would not break much existing code. I find it rather unlikely—though I’ll admit I have no supporting evidence—that anyone is relying on shlex.join to catch path objects that should be strings; I’d bet instead anyone using paths already is just converting them to strings “manually” before passing them to shlex.join. The thing keeping this from sliding down the slippery slope of weak typing is the frequency with which paths are used in commands.

[Sorry, “new” user, so you don’t get real links]
1 --discuss.python.org/t/str-join-str-i-for-i-in-value-for-f-strings/23167
2 --discuss.python.org/t/allow-list-of-integers-in-str-join/23676
3 --discuss.python.org/t/why-not-make-str-join-coerce-the-items-in-its-iterables/24097
4 --github.com/python/cpython/issues/87701

2 Likes

I think pathlib is great and would like to see it supported in more places.

As a precedence, os.path.join coerces paths to strings using os.fspath, which could also be used here. Although I think I like the idea of coercing everything here to str, I can appreciate that might be more controversial.

I ran some quick and very simple tests to get an idea of the impact on performance, even though I am not sure that this would be likely to be particularly performance sensitive, what I observed is os.fspath had a 15% impact, while str had a nearly 20% impact. As a side note, there is already a function being called to safely quote the string before the actual join, and changing how that is called from using a comprehension to using map had a 20% improvement, which seemed to more than offset the impact of either coercion method.