Protocol for virtual filesystem paths

ISTM that os.PathLike is doing double-duty:

  1. Majority usage: a local path with a string/bytes representation
    • e.g. when passed to io.open(), os.mkdir(), etc
  2. Minority usage: any kind of path with a string/bytes representation
    • e.g. when passed to os.path.join(), os.path.split(), etc

In some situations the distinction might matter. For example, it’s perfectly legitimate to take the posixpath.basename() of a zip file member path, but much less advisable to call posixpath.islink() in the same context. Yet the function signatures don’t reflect the difference. Another such case is pathlib.PurePath, which “provides path-handling operations which don’t actually access a filesystem” according to its docs, yet provides a __fspath__() method that enables those filesystem-accessing operations:

>>> Path('...').unlink()        # success
>>> PurePath('...').unlink()    # error: no such method.
>>> os.unlink(Path('...'))      # success
>>> os.unlink(PurePath('...'))  # success... wait PurePath supports I/O?!

IMHO the last case here should (eventually) be an error, but to do that we’d need to deprecate/remove PurePath.__fspath__() (and make it a Path-only thing), and if we did that, then purely lexical function like os.path.join() would begin rejecting PurePath objects for no good reason.

This suggests to me we need something like os.fspath() / os.PathLike / __fspath__() but for pure/virtual paths. I humbly suggest os.vfspath() / os.VirtualPathLike / __vfspath__(), where VirtualPathLike is a superclass of PathLike. The os.path functions listed in the first half of this GH issue could be made to call os.vfspath(), as they do only lexical work.

I bring this up because, in the pathlib ABCs I’m developing (discussion, pre-PEP), I declare JoinablePath.__str__() as an abstract method. I can’t use __fspath__() because this type shouldn’t be accepted by os.unlink() etc. But __str__() doesn’t feel quite right either - for one thing, users might want a string representation that isn’t just a path - perhaps they’d like to use a URL instead for example. I believe what I really need is a __vfspath__() dunder to serve as an abstract method in JoinablePath.

Thank you for reading. I’d love to hear your thoughts/concerns/ideas!

P.S. Deprecating PurePath.__fspath__() in favour of PurePath.__vfspath__() might not be practical even if we agree it’s desirable from a typing perspective (and I’m not taking that for granted either!). I bring it up as a class that will be familiar to many readers; my immediate concern is with JoinablePath though.

2 Likes

I agree with the conflation of lexical and filesystem paths—PurePath.__fspath__() allowing I/O is indeed misleading. A separate os.vfspath() could help, but how would be handled cases where a path can be both lexical and filesystem-backed, like a symlink resolution in virtual filesystems?

that is:

    @abstractmethod
    def __str__(self):
        """Return the string representation of the path, suitable for
        passing to system calls."""
        ...

I don’t think this can stay. “Passing to system calls” only applicable to local filesystem paths. A JoinablePath does not, in general, have a representation you can pass to system calls.
You can “downcast” to a local filesystem path, and that could be implicit – “we trust the programmer to only call __str__ if it makes sense”[1]. It would be good to be explicit about where the downcast happens, conceptually, between the JoinablePath and the syscall. Does __str__ do that (as the docstring above suggests), or do the os.unlink/open do it?

In my mental model, it happens when an os function interprets its argument. That is:

>>> os.unlink(PurePath('...'))  # success... wait PurePath supports I/O?!

No, PurePath does not support I/O. You’re calling os.unlink, which interprets its input as a path on the real local filesystem.
If you do open(secrets.token_hex(), "w"), the token doesn’t support I/O either.


it’s perfectly legitimate to take the posixpath.basename() of a zip file member path

Yes, but taking the os.path.basename of it would be, conceptually, wrong.[2]
The reason you can use posixpath.basename() is because you know an extra detail about zip files: they behave enough like Posix paths. You can’t use the function if you are working with truly generic JoinablePaths.

Sounds like zip paths can use __str__ as defined above. Then, zip paths would be special (relative to JoinablePath) in that their __str__ gives you a posix-path-shaped string.
Or maybe they should get a __posixpath__ method, to make this explicit for type-checkers (and brains trying to construct a mental model).
But I don’t think it would help to add a generic __vfspath__: that sounds like it would give you a string you can use with a specific “parser” module, but it would erase info about which module to use.

Or did you mean __vfspath__ to remain tied to the path’s parser module? The class itself would pass it to its parser, but the user shouldn’t call it (since they shouldn’t use the path’s parser directly, there’s nothing they can do with an internal string representation)?


  1. This often works well – if your path is just an alphanumeric ASCII filename, you can treat it as any flavor of path, and implicit conversions simplify the work! ↩︎

  2. For what’s conceptually correct, it helps to think about a hypothetical platform whose os.path uses, say, $ for path separators. (And assuming all the places where Python assumes / & \ were properly ported.) ↩︎

1 Like

Thanks Petr. Indeed the JoinablePath.__str__() is misleading. I’m sorry I didn’t spot this before posting! I guess it should be along these lines:

    @abstractmethod
    def __str__(self):
        """Return the string representation of the path, suitable for
        passing to lexical path functions like os.path.join()."""
        ...

open(secrets.Token(), "w") is presumably a TypeError though?

I don’t have good answers for these, but thank you for giving me things to ponder :slight_smile:

1 Like

AttributeError, since secrets.Token() doesn’t exist.
I wanted to point out that right now, any string can be interpreted as a local path.

I wonder if a segments abstract property might help? We could define _JoinablePath.__str__() as a stub method like:

(expand code)
class _JoinablePath(ABC):
    """Abstract base class for pure path objects."""
    __slots__ = ()

    @property
    @abstractmethod
    def parser(self):
        """Implementation of pathlib.types.PathParser used for low-level
        path parsing and manipulation.
        """
        raise NotImplementedError

    @property
    @abstractmethod
    def segments(self):
        """Sequence of raw path segments supplied to the path initializer.
        """
        raise NotImplementedError

    @abstractmethod
    def with_segments(self, *pathsegments):
        """Construct a new path object from any number of path segments.
        Subclasses may override this method to customize how new path
        objects are created from methods like `iterdir()`.
        """
        raise NotImplementedError

    def __str__(self):
        """Return the string representation of the path."""
        if not self.segments:
            return ''
        return self.parser.join(*self.segments)

Might just be moving the problem around.

Why is this? Mac / Linux zipfiles support symbolic links.

Seems like we need to move the problem around if we don’t want a load-bearing __str__, and there’s a choice to make between __vfspath__ and segments.

posixpath.islink() checks for a link on the local filesystem, so it would be an error to pass in a zip file member path (ZipInfo.filename) and expect it to query the zip file data.

Still not getting it. You’re discussing how you’re updating Python to work in the future, right?
help(posixpath.islink) says

Help on function islink in module genericpath:

islink(path)
    Test whether a path is a symbolic link

so a future (3.14) zipfile library could implement using the os.path interface instead something like

  # Check if the file is a symlink (using Unix permissions)
  is_symlink = (zipinfo.external_attr >> 16) & 0o120000 == 0o120000

I thought the discussion was about future implementations of the pathlib interface (because that’s what’s being designed as extensible). Adding extra implementations of os.path functions isn’t in the scope of the changes @barneygale is making (as I understand it).

1 Like