Pathlib improvement proposal (backwards compatibility, normalization support, etc)

I am the current maintainer of PyKotor

Working with file paths and resolving/normalizing them are incredibly important to my project. However the behavior with pathlib is not at all documented and even worse there’s many discrepancies between Python versions.

Solution

I have created classes that override pathlib’s implementations that will:

  • Correctly normalize a path (mixed slashes, multiple slashes, root, etc)
  • Provide backwards compatibility throughout python 3.7 through 3.12

Code can be found here:

Basic gist of what we do:

PathElem = Union[str, os.PathLike]

def override_to_pathlib(cls):
    if cls == PurePath:
        return pathlib.PurePath
    if cls == PureWindowsPath:
        return pathlib.PureWindowsPath
    if cls == PurePosixPath:
        return pathlib.PurePosixPath
    if cls == Path:
        return pathlib.Path
    if cls == WindowsPath:
        return pathlib.WindowsPath
    if cls == PosixPath:
        return pathlib.PosixPath
    return cls

class PurePathType(type):
    def __instancecheck__(cls, instance): # sourcery skip: instance-method-first-arg-name
        instance_type = type(instance)
        mro = instance_type.__mro__
        if cls in (pathlib.PurePath, PurePath):
            return BasePurePath in mro or override_to_pathlib(cls) in override_to_pathlib(instance_type).__mro__
        if cls in (pathlib.Path, Path):
            return BasePath in mro or override_to_pathlib(cls) in override_to_pathlib(instance_type).__mro__
        return cls in mro

    def __subclasscheck__(cls, subclass): # sourcery skip: instance-method-first-arg-name
        mro = subclass.__mro__
        if cls in (pathlib.PurePath, PurePath):
            return BasePurePath in mro or override_to_pathlib(cls) in override_to_pathlib(subclass).__mro__
        if cls in (pathlib.Path, Path):
            return BasePath in mro or override_to_pathlib(cls) in override_to_pathlib(subclass).__mro__
        return cls in mro

class BasePurePath(metaclass=PurePathType):
    """BasePath is a class created to fix some annoyances with pathlib, such as its refusal to resolve mixed/repeating/trailing slashes."""

    def __new__(cls, *args: PathElem, **kwargs):
        return args[0] if len(args) == 1 and isinstance(args[0], cls) else super().__new__(cls, *cls.parse_args(args), **kwargs)

    def __init__(self, *args, _called_from_pathlib=True):
        """Initializes a path object. This is used to unify python 3.7-3.11 with most of python 3.12's changes.

        Args:
        ----
            *args (os.PathLike | str): the path parts to join and create a path object out of.

        Returns:
        -------
            A constructed Path object

        Processing Logic:
        ----------------
            - Finds the next class in the MRO that defines __init__ and is not BasePurePath
            - Return immediately (do nothing here) if the next class with a __init__ is the object class
            - Gets the __init__ method from the found class
            - Parses args if called from pathlib and calls __init__ with parsed args
            - Else directly calls __init__ with passed args.
        """
        next_init_method_class = next(
            (cls for cls in self.__class__.mro() if "__init__" in cls.__dict__ and cls is not BasePurePath),
            self.__class__,
        )
        # Check if the class that defines the next __init__ is object
        if next_init_method_class is object:
            return

        # If not object, fetch the __init__ of that class
        init_method = next_init_method_class.__init__

        # Parse args if called from pathlib (Python 3.12+)
        if _called_from_pathlib:
            init_method(self, *self.parse_args(args))
        else:
            init_method(self, *args)

    @classmethod
    def parse_args(cls, args: tuple[PathElem, ...]) -> list[BasePurePath]:
        args_list = list(args)
        for i, arg in enumerate(args_list):
            if isinstance(arg, BasePurePath):
                continue  # do nothing if already our instance type
            formatted_path_str = cls._fix_path_formatting(cls._fspath_str(arg), cls._flavour.sep)  # type: ignore[attr-defined]

            # Create the pathlib class instance, ignore the type errors in super().__new__
            arg_pathlib_instance = super().__new__(cls, formatted_path_str)  # type: ignore[call-arg, reportGeneralTypeIssues]
            arg_pathlib_instance.__init__(formatted_path_str, _called_from_pathlib=False)  # type: ignore[misc]

            args_list[i] = arg_pathlib_instance

        return args_list  # type: ignore[return-value, reportGeneralTypeIssues]

    @classmethod
    def _create_instance(cls, *args, **kwargs):
        instance = cls.__new__(cls, *args, **kwargs)  # type: ignore  # noqa: PGH003
        instance.__init__(*args, **kwargs)
        return instance

    @staticmethod
    def _fspath_str(arg: object) -> str:
        """Convert object to a file system path string.

        Args:
        ----
            arg: Object to convert to a file system path string

        Returns:
        -------
            str: File system path string

        Processing Logic:
        ----------------
            - Check if arg is already a string
            - Check if arg has a __fspath__ method and call it
            - Raise TypeError if arg is neither string nor has __fspath__ method.
        """
        if isinstance(arg, str):
            return arg
        fspath_method = getattr(arg, "__fspath__", None)
        if fspath_method is not None:
            return fspath_method()
        msg = f"Object '{arg}' must be str or path-like object, but instead was '{type(arg)}'"
        raise TypeError(msg)

    # Call is_relative_to when using 'in' keyword
    def __contains__(self, other_path: os.PathLike | str) -> bool:
        return self.is_relative_to(other_path, case_sensitive=False)

    def __str__(self) -> str:
        """Call _fix_path_formatting before returning the pathlib class's __str__ result.
        In Python 3.12, pathlib's __str__ methods will return '' instead of '.', so we return '.' in this instance for backwards compatibility.
        """
        str_result = self._fix_path_formatting(super().__str__(), self._flavour.sep)  # type: ignore[_flavour exists in children]
        return "." if str_result == "" else str_result

    def __fspath__(self) -> str:
        """Ensures any use of __fspath__ will call our __str__ method."""
        return str(self)

My question is: could any of this be implemented into pathlib, maybe a global boolean/arg use_deprecated that’ll allow users to use older pathlib code? We ran through a real hassle trying to add support for python 3.12, can’t even imagine how much more difficult it will be for python 3.13.

The code in path.py works for us and we have many tests that ensure it will have the correct output between python versions:
common/test_path_isinstance.py

3 Likes
2 Likes

Hello, and welcome to this discussion group!

I’ve run into a similar problem once or twice, fixed by a tweak in my code, and I don’t even remember what the inconsistency was.

It might perhaps be more useful for the casual reader if you documented the inconsistencies with tiny examples: “This expression has this value in Python 3.8 but this value in Python 3.12 and this value on Windows.”

Thanks for bringing this up, this is potentially very valuable.

7 Likes

/cc @barneygale

1 Like

Hi @th3w1zard1, please could you provide a reproduction case for the problem(s) you’re hitting? I haven’t managed to reproduce the problem with __fspath__() behaviour that you mentioned.

Python 3.12.1 (v3.12.1:2305ca5144, Dec  7 2023, 17:23:38) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import os, pathlib
>>> os.fspath(pathlib.Path())
'.'
>>> os.fspath(pathlib.Path(''))
'.'
1 Like

Blockquote Hi @th3w1zard1, please could you provide a reproduction case for the problem(s) you’re hitting? I haven’t managed to reproduce the problem with __fspath__() behaviour that you mentioned.

Python 3.12.1 (v3.12.1:2305ca5144, Dec  7 2023, 17:23:38) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import os, pathlib
>>> os.fspath(pathlib.Path())
'.'
>>> os.fspath(pathlib.Path(''))
'.'

Hi @barneygale ,
Hmm yes I am getting the same in my environment. However I know that I was experiencing a problem, and I had to implement this for a reason. The last line in this codeblock is what I’m referring to:

    def __str__(self) -> str:
        """Call _fix_path_formatting before returning the pathlib class's __str__ result.
        In Python 3.12, pathlib's __str__ methods will return '' instead of '.', so we return '.' in this instance for backwards compatibility.
        """
        str_result = self._fix_path_formatting(super().__str__(), self._flavour.sep)  # type: ignore[_flavour exists in children]
        return "." if str_result == "" else str_result

I don’t see this edge case outlined in my test. Perhaps this was changed in a patch update? I really can’t recall at the moment. I’ll look through our commit history at some point when I find more time and see if I can figure out how to reproduce.

EDIT: I was not able to reproduce.

Python 3.12.0 (tags/v3.12.0:0fb18b0, Oct  2 2023, 13:03:39) [MSC v.1935 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from pathlib import PurePath
>>> PurePath("/")
PureWindowsPath('/')
>>> p = PurePath("/")
>>> str(p)
'\\'
>>> p = PurePath(" ")
>>> str(p)
' '
>>> p = PurePath("  ")
>>> str(p)
'  '
>>> p = PurePath(".")
>>> str(p)
'.'
>>> p = PurePath("...")
>>> str(p)
'...'
>>> p = PurePath("/.")
>>> str(p)
'\\'
>>> p = PurePath("./")
>>> str(p)
'.'
>>> p = PurePath(PurePath("."))
>>> str(p)
'.'
>>> p = PurePath(PurePath(""))
>>> str(p)
'.'

The tests in OP outline the remaining discrepancies.

Blockquote I’ve run into a similar problem once or twice, fixed by a tweak in my code, and I don’t even remember what the inconsistency was.

this was part of the reason I did not document exact changes between python versions. The code/tests were a real hassle to put together, so my main goal was to continually swap through python versions and operating systems until I could achieve some sort of unified result. But this is a good suggestion nonetheless, I’ll see what I can do

I have implemented the class CaseAwarePath that can resolve case-insensitive file/folder paths on unix operating systems. The class relies on pathlib. If it is at all useful to this discussion that code can be found here:

PyKotor/Libraries/PyKotor/src/pykotor/tools/path.py at master · NickHugi/PyKotor (github.com)