When working with pathlib.Path objects, I have been recommending the use of open(path) over path.open() and it’s been brought to my attention that this recommendation is at odds with some linter rules, such as ruff’s PTH123 which recommends replacing open(path) with path.open().
I would like to know what folks think about updating the pathlib documentation to recommend one of these over the other (for an official stance on “there should be one and preferably only one obvious way”).
My reasoning for recommending open(path) over path.open() can be summed up as:
pathlib was added to Python in Python 3.4 (it was previously a third-party library)
The built-in open function did not accept pathlib.Path objects in Python 3.4 or Python 3.5
After PEP 519 was accepted, the new __fspath__ was used by the built-in open function (and many other tools) to allow the built-in open function to accept pathlib.Path objects directly
Since Python 3.5’s EOL in 2020, pathlib.Path.open has been rendered redundant (now that the built-in open function accepts any Path-like object in all supported Python versions
Personally, I also prefer using the built-in open function because if I see Python code that includes the expression open(args.cf), I know that a file is being opened, but when I see args.cf.open() I may feel the need to find where args.con was defined to make sure I understand what it is.
I would like to see the pathlib documentation take a stance on whether the built-in open function should be preferred over the open method (or the other way around).
Works on str and anything that is os.PathLike providing good duck typing for any file that exists on disk.
Why use path.open:
Works on zipfile.Pathand similar classes that point to “virtual” files. These new classes may not fullyesupport the samekeywordarguments as open or even require different ones.
For functions it’s often more idiomatic in python to pass an open filestream than a filepath anyway, but when you pass a path it probably depends on whether the parameter points to a physical file or could be any object that has the notion of “openable”.
I think it should be left to the user’s judgement. And I think the ruff rule is misguided. Unless we plan on deprecating one of the options, or one is demonstrably inferior, either form is a valid choice.
Personally, I also prefer using the built-in open function because if I see Python code that includes the expression open(args.cf), I know that a file is being opened, but when I see args.cf.open() I may feel the need to find where args.con was defined to make sure I understand what it is.
For me, these days it’s usually the opposite; when seeing open(fname), I switch to assuming it’s older code and fname might even be an str (and thus I spend extra 5 seconds making sure it’s not).
And I think the ruff rule is misguided. Unless we plan on deprecating one of the options, or one is demonstrably inferior, either form is a valid choice.
Ruff’s lints are opinionated at times, this is not the first time I’ve seen them recommend one out of two equivalent choices despite official documentation not making any distinction: pandas-use-of-dot-is-null (PD003) | Ruff
FWIW, I think it makes sense, if only for consistency.
FWIW, the lint rule in question is adapted from an opinionated flake8 plugin (flake8-use-pathlib), who’s description is literally “A plugin for flake8 finding use of functions that can be replaced by pathlib module.” In that context, the recommendation for preferring path.open() makes total sense — it’s literally doing what the plugin says it will do.
Agreed, and to be clear I have no particular problem with people choosing to use a rule like that to do what it says - find functions that can be replaced by pathlib methods.
What I consider “misguided” is presenting it as a general rule[1], and even more so presenting it as evidence that a recommendation is wrong.
IMO, ruff has a bit of a tendency to obscure the difference between generally accepted recommendations, and more opinionated ones… ↩︎
IMO, ruff has a bit of a tendency to obscure the difference between generally accepted recommendations, and more opinionated ones…
This is something we are working to address in an upcoming release with categorization of rules and a better set of default rules, which almost certainly won’t include this rule, so users would still need to intentionally select it in their projects.
We try to avoid having conflicting rules, as that causes issues when users enable entire groups of lint rules (or enable all). Instead we prefer an existing rule be generalized with some level of option/preference to set, but we also prefer to avoid proliferation of options specific to single rules. The situation is complicated by the current expectations of retaining compatible behavior with the upstream plugins that these rules originated from.