Pathlib and os.path: feature parity and code de-duplication

Hi all,

A quick rundown of some notable feature differences and duplications between pathlib and os.path, also showing changes in the last year or so.


Path.expanduser() and os.path.expanduser() [complete!]

These implementations were almost identical, save for some subtleties in how Windows home directories are guessed.

Addressed in PR 18841, which deleted pathlib’s implementation and made it call os.path.expanduser().


Path.resolve() and os.path.realpath() [complete!]

Only resolve() was capable of throwing exceptions when missing files or symlink loops were encountered, whereas realpath() always appended the remaining path segment and returned without indicating an error.

This was addressed in PR 25264, which added a strict argument to realpath(), deleted pathlib’s own implementation and made it call realpath()


PurePath() and os.path.normpath() [pr available!]

PurePath automatically applies safe normalization to paths, e.g. redundant separators and . entries are removed. It does not collapse .. entries, as doing so cannot be done safely unless we also resolve symlinks along the way, which requires filesystem access.

Path objects provide a resolve() method that will safely resolve symlinks and .. entries simultaneously.

On the other hand, os.path.normpath() always naively collapses .. entries, which can change the meaning of paths involving symlinks. There’s no equivalent to PurePath's normalization.

I’ve opened PR 26694 to add a strict argument to normpath().


PurePath.is_reserved() and os.path.??? [todo!]

There’s no equivalent to pathlib’s PurePath.is_reserved() in os.path. For full parity this should be added.


PurePath.as_uri() and os.path.??? [todo!]

There’s no equivalent to pathlib’s PurePath.as_uri() in os.path. For full parity this should be added.


… and I think that’s everything!.

With these changes in place, pathlib’s _Flavour abstraction is entirely vestigial and can be safely removed. By moving the OS-specific bits into the low-level ntpath + posixpath modules, we free pathlib from the burden of re-implementing OS path quirks. That in turn allows for some careful refactoring as proposed by @kfollstad here:

Any feedback/questions/concerns very welcome! Thanks for reading.

Cheers

3 Likes

The functionality you’re proposing exists in neither pathlib nor os.path presently, and so doesn’t seem relevant to this thread.

There is urllib.request.pathname2url which needs to be taken into consideration if you intend to unify this.

But one fundamental question I have is why os.path needs feature parity with pathlib in the first place. It makes perfect sense the other way aroud, Path objects are preferred over string-form paths. It’s difficult to justify adding brand-new things to os.path; you couldn’t do it previousy, so this is obviously new code. Why don’t you just use pathlib instead?

4 Likes

There is urllib.request.pathname2url which needs to be taken into consideration if you intend to unify this.

This looks perfect, thanks for the tip. I’ll play around with using it in pathlib. That obviates the need for a os.path.uri()-like function.

But one fundamental question I have is why os.path needs feature parity with pathlib in the first place.

Ultimately, it’s to solve bpo-24132, i.e. support subclassing pathlib.Path.

The pathlib internals are a bit of a mess which greatly constraints work on that bug. A lot of things we want to say about the abstractions aren’t quite true, e.g.:

  • All OS access happens via _Accessor
  • All syntax manip happens in _Flavour

One of these is “all OS-specific functionality is implemented in posixpath or ntpath”. That statement is currently 95% true, e.g. all the OS-specific resolve() stuff is delegated to realpath(). The stragglers are the focus of this thread.

By slimming the PurePosixPath and PureWindowsPath classes down to almost nothing, and removing _Flavour, we make the sort of refactors @kfollstad has proposed feasible without breaking backwards compat.

As a secondary reason, I don’t think the existence of pathlib should mean we stop work on os.path. It’s not deprecated. The overlap of functionality between the modules is >90%, and to my mind the key difference is in approach: OOP or procedural. In that framework parity makes sense, because str vs Path is only a user choice.

2 Likes

Hello there,

In our code base we have the following function:

def _normalize(self, path):
    if not path:
        return ''
    return realpath(abspath(expanduser(expandvars(path.strip()))))

I would like to convert it using pathlib but it seems os.path.expandvars is missing from pathlib. Maybe I’m wrong and it exists under another name, a quick glance at the doc I couldn’t find anything “var”-related.

Another question that is a bit out of scope, I’m copy-pasting that function in various projects and use it as a complete “path sanitizer function”, I would love to see a pathlib equivalent so that I could just do “Path(user_provided_path).sanitize()”. The topic have been brought to the forum here Have a `.realpath` classmethod in pathlib.Path - #19 by sinoroc and I would like to know the decisions about it ?

The functionality in os.path.expandvars is unrelated to path handling (it can be used on arbitrary strings) so I don’t see any justification for adding it to pathlib. If it weren’t for the backward compatibility implications, I’d suggest renaming it as os.expandvars, but the disruption would be far too great to make that worth it.

1 Like

It makes sense, better leave it that way then