Expose file size, mode, etc from `pathlib.Path.info`

To date we’ve not been able to add seemingly-useful methods like pathlib.Path.size() and mode(), because they would wastefully throw away most of the os.stat_result. That was just the nature of the Path interface - none of its methods cache their results. Instead, I’ve encouraged users to call Path.stat() and pull out os.stat_result fields.

But in Python 3.14 we have a new Path.info attribute (discussion), which is an annex for Path objects that caches filesystem information. For example, calling p.info.is_dir() twice will incur at most one stat(). Under the bonnet, info is mostly a wrapper around a stat result, except for paths generated by scanning directories where we wrap an os.DirEntry instead.

Path.info has the following methods: exists(), is_file(), is_dir(), is_symlink(). These return booleans and suppress all OSError exceptions.

My suggestion is to add the following methods:

  • Path.info.size() - content size in bytes (int)
  • Path.info.mode() - 12-bit POSIX file mode (int)
  • Path.info.access_time() - time of last read (datetime)
  • Path.info.modify_time() - time of last write (datetime)
  • Path.info.create_time() - time of birth (datetime)

These would accept follow_symlinks as an optional keyword-only argument and cache their results.

A few things I’m not sure about:

  • Should we suppress exceptions like the existing methods? If not, do we cache and re-raise exceptions, e.g. what happens if a user calls exists() and then size() on a non-existing file?
  • Is it alright for mode() to return an int, or would we prefer enum.IntFlag / similar?
  • Is “mode” a good name, or would we prefer “posix_permission” / similar? (I picked it mainly for consistency with the chmod() and mkdir() arguments)

Any other questions/criticism/ideas welcome of course.

Thanks all!

25 Likes

This would be most welcome, Barney.

For your bulleted questions:

  • I think exceptions must be re-raised, in a similar vein as for next() on an exhausted iterator
  • IntFlag would definitely be my preference
  • mode() seems brief and clear to me

For the methods, any reason to exclude birth_time and (Windows) file_attributes?

4 Likes

Can we add Path.info.stat() as well? It could be useful to get a raw timestamp, or some OS-specific info.

Maybe re-raise without caching, i.e. the next attempt will call stat() again? That seem easy to explain and implement.

Yeah, IMO it’s fine for mode, and…

that might be a good name for a future addition that returns some rich object, if we need an object that isn’t compatiple with int.

2 Likes

Thanks both!

I’ll experiment with using enum.IntFlag for mode().

Good point on birth_time() - I’ll add that to the interface.

It’s difficult to know where to draw the line, but I reckon file_attributes() is perhaps too OS-specific and low-level for most pathlib users. I’ve excluded a wrapper of st_flags for the same reason.

I’m a little reluctant because we already have Path.stat(), and that method doesn’t throw away information like Path.exists() does.

Perhaps we could look at adding it later, if there’s demand?

I appreciate the suggestions nonetheless :slight_smile:

4 Likes

Connecting a similar parallel thread around adding os.statx (linux system call): gh-83714: Use statx on Linux 4.11 and later in os.stat by jbosboom · Pull Request #136334 · python/cpython · GitHub

Would be nice to expose the newer statx data and/or some wrapper which makes stat, statx, etc. all look/feel more normalized. Not sure how likely it is that more kernels will expose something statx-like though (can request subsets of information and some choices around how cached the data can be)

2 Likes

Maybe, but I could say the same about mode :slightly_smiling_face: Generally I find that the Unix-focused file mode is too simplistic to use on Windows, but it’s an “attractive nuisance” for people who are looking for a quick answer.

I’m not seriously suggesting we omit mode, of course, but I do think your argument against file_attributes is weak. (A stronger argument, that I can’t really dispute, is that the file_attributes fields in the stat object are rarely used.)

1 Like

Playing around with using enum.IntFlag for the mode() return value, and the result is pretty nice I think:

>>> from pathlib import Path
>>> path = Path('README.rst')
>>> path.info.mode()
<Mode.USER_READ|USER_WRITE|GROUP_READ|GROUP_WRITE|OTHER_READ: 436>
>>> from pathlib.types import Mode
>>> path.chmod(path.info.mode() | Mode.USER_EXECUTE)
>>> path = Path(path)  # discard info cache
>>> path.info.mode()
<Mode.USER_READ|USER_WRITE|USER_EXECUTE|GROUP_READ|GROUP_WRITE|OTHER_READ: 500>
6 Likes

Love the idea.

But, about the mode, isn’t this bit-array just a detail of the underlying representation? Why not just expose all of the individual flags as Boolean methods? I think that will make the client code that uses this information much more legible (which is very important) at the cost of making pathlib slightly longer (which is much less important).

4 Likes

Are the timestamps lossy seconds-floats or exact nanoseconds-ints? (pathlib doesn’t have any means to set times except for Path.touch(), so maybe roundtripping is out-of-scope. The recent multigrain timestamp changes in Linux 6.13 allow timestamps with granularity better than a jiffy; not sure if it’s good enough for float imprecision to fool make-like scripts.)

Do you plan to include other types of files (FIFOs, block and character devices, Unix sockets)? At least according to MSDN GetFileType documentation, Windows has “disk”, character device and pipe files.

Regarding generalizing statx: Windows kinda supports querying a subset of information, but AIUI it’s limited to choosing between different structs; there’s no obvious way (besides failing the call) to indicate that some information is not available. I also couldn’t find anything analogous to statx’s force-sync/don’t-sync flag, which I find surprising given Windows’s long-term focus on network filesystems in large corporate intranets. Unless someone else finds something, I don’t see anything to do here.

Looking at the currently defined Windows attribute bits, macOS/BSD flags, and statx attribute bits, there are some commonalities:

Windows Mac Linux meaning
FILE_ATTRIBUTE_HIDDEN UF_HIDDEN filename starts with ‘.’ only shown when user asks for them
FILE_ATTRIBUTE_COMPRESSED UF_COMPRESSED STATX_ATTR_COMPRESSED storage size < nominal size
FILE_ATTRIBUTE_ENCRYPTED ? STATX_ATTR_ENCRYPTED may require special permissions to read or modify
FILE_ATTRIBUTE_READONLY (sort of) UF_IMMUTABLE STATX_ATTR_IMMUTABLE requires special permissions to modify
FILE_ATTRIBUTE_ARCHIVE (inverted, sort of) UF_NODUMP STATX_ATTR_NODUMP exclude from backups that bother to check this flag
IO_REPARSE_TAG_MOUNT_POINT statfs f_mntonname != f_mntfromname STATX_ATTR_MOUNT_ROOT what it says on the tin

Hiddenness seems user-relevant enough to expose in a cross-platform way. Mount-root is useful on Linux because the traditional st_dev test does not detect bind mounts on the same filesystem and openat2 with RESOLVE_NO_XDEV may trigger automounts and auditing[1] – but I’m not sure that generalizes in a way useful to implement, e.g., rm --one-file-system on Windows or macOS too. (Also, mount points and mount roots are technically not the same thing.)

(For the rest: the storage size of a file can differ from the nominal size also if the file has holes[2]; getting encryption keys or clearing the immutable bit requires OS-level interaction anyway; approximately zero programs check the nodump flag.)


  1. and isn’t (yet?) available to Python ↩︎

  2. and when considering the total storage of a set of files, also due to extent sharing ↩︎

2 Likes

I’m torn.

Explaining how permission bits in unix, at work to new hires, is always cumbersome. The use of IntFlag is really slick and explains the concept nicely imo. However the explicitness of having 9 distinct properties for getting and setting each bit removes the burden to learn the bits completely.

It’s an interesting point. I suppose the main advantage of a bit array is compatibility with os.chmod(), os.mkdir(), and other such functions that accept a mode. Also the ability to manipulate the value before passing it back to chmod() as in my previous example.

If we dropped the mode int, presumably we’d have methods like path.info.executable() instead. I think that could be a little misleading, as there are other factors that can prevent the execution of a file beside its mode bits.

1 Like

Did you also consider something like this? (Can probably still be an int subclass)

>>> path.info.mode()
Modes(user=<Mode.READ|WRITE|EXECUTE: 7>, group=<Mode.READ|WRITE: 6>, other=<Mode.READ: 4>)
2 Likes

It seems to me that making the parts of the mode attribute into methods is a bad idea, in the sense that we should add methods based on use cases, not just wrapping some bit tests (which aren’t sufficient on their own to address the underlying use cases). If you wanted to add a path.info.is_executable() method, I’d be fine with that, as long as it did what it says, and checks whether a file is executable. As you noted, though, that’s not the same as just checking a mode bit (not even on Unix, and much less so on Windows).

The use case for having a mode attribute is that “people want access to the POSIX mode” - which is a bit circular, but not worth worrying about given the huge historical precedent. But if we move away from providing just the mode and letting people do their own thing, I think we have to do it properly, and not just wrap a bit of bit checking in a method.

And personally, while I’d like accurate is_readable(), is_writeable() and is_executable() methods, I don’t think it’s worth the significant amount of effort needed to provide correct cross-platform implementations of them.

4 Likes

Just for clarity, I suggest a couple dataclasses for the mode with some useful factories for the most common cases (saves users from having to look things up):

@dataclass
class BasicPermission:
    read: bool
    write: bool
    execute: bool


@dataclass
class FilePermissions:
    owner: BasicPermission
    group: BasicPermission
    others: BasicPermission

    def as_int(self) -> int:
        return ...

    @classmethod
    def from_int(cls, mode: int, /) -> FilePermissions:
        return FilePermissions(...)

    @classmethod
    def public_file(cls) -> FilePermissions:
        """
        644 → rw-r--r--
        Owner can read/write, group and others can read. Common for text 
        files, configs, documents.
        """
        return FilePermissions.from_int(0o644)

    @classmethod
    def private_file(cls) -> FilePermissions:
        """
        600 → rw-------
        Only owner can read/write. Used for private files (SSH keys,
        credentials).
        """
        return FilePermissions.from_int(0o600)

    @classmethod
    def shared_file(cls) -> FilePermissions:
        """
        664 → rw-rw-r--
        Owner and group can read/write, others can only read. Used in
        collaborative environments.
        """
        return FilePermissions.from_int(0o664)

    @classmethod
    def executable_script(cls) -> FilePermissions:
        """
        755 → rwxr-xr-x
        Owner can read/write/execute, others can read/execute. For executable
        scripts.
        """
        return FilePermissions.from_int(0o755)

    @classmethod
    def system_directory(cls) -> FilePermissions:
        """
        755 → rwxr-xr-x
        Owner can read/write/enter, others can read/enter but not write. Very
        common for system dirs like /usr, /bin.
        """
        return FilePermissions.from_int(0o755)

    @classmethod
    def private_directory(cls) -> FilePermissions:
        """
        700 → rwx------
        Only owner can access. Used for private directories like ~/.ssh.
        """
        return FilePermissions.from_int(0o700)

    @classmethod
    def shared_directory(cls) -> FilePermissions:
        """
        775 → rwxrwxr-x
        Owner and group have full access, others can read/enter. Used in
        shared group dirs.
        """
        return FilePermissions.from_int(0o775)

    @classmethod
    def temporary_directory(cls) -> FilePermissions:
        """
        777 → rwxrwxrwx
        Everyone has full access. Rare and generally unsafe, but sometimes
        used for temporary dirs like /tmp.
        """
        return FilePermissions.from_int(0o777)

Then, we could allow users to use the FilePermissions in place of integers:

Path.chmod(self, mode: int | FilePermissions, ...) -> None: ...
Path.mkdir(self, mode: int | FilePermissions, ...) -> None: ...

This allows users to sidestep the (what I feel are) anachronistic octal values and code in an arguably more Pythonic, straightforward interface. E.g.,

if path.info.permissions().user.executable: ...
path.chmod(FilePermissions.executable_script())
path.mkdir(FilePermissions.private_directory())
2 Likes

This already has a standard name: stat.S_IRUSR

IMO, it’d be best to expose all of stat(), and/or give the mode as int. Those have existing infrastructure around them (even if it is clunky and decades old), and easy to wrap if someone wants something better.

That method doesn’t cache, though. If 3 different libraries build 3 different easy-to-use wrappers around Path.info.stat(), they all get cached info. With only Path.stat(), caching becomes complicated even for a single library.
That’s why I think this should expose the “source” data, even if it the format is friendly for the (Unix) kernel more than users. Adding more complicated representations would be a bonus if we get them right, and useless if we don’t.

2 Likes

I would prefer a more structured object over the stat interface and permissions int, as that would be easier to adhere to in implementations of Path on non-posix “filesystems”. If you intend only to add info to PosixPath, then I guess this point is moot

2 Likes

If you’re thinking of virtual filesystems here (archives, cloud storage, etc) then I think an int is an OK choice. .tar, .zip[1] and .iso files store POSIX permissions as-is without generalising into another format. IIRC Git only supports setting the executable bit, but still uses octal permissions in e.g. git fast-import, perhaps elsewhere. For those formats at least, implementing a PathInfo.mode() method is simplest if it returns an int.


  1. shifted left by 16 bits ↩︎

For some prior art, Rust’s Path offers a similar concept via Path::metadata. Notably, Rust’s Path type has no direct stat() equivalent precisely because of the platform-specific nature of it. Instead, its Metadata API only exposes the cross-platform bits:

  1. is_dir() → bool
  2. is_file() → bool
  3. is_symlink() → bool
  4. len() → int [1] – file size
  5. modified() → datetime [1:1][2] – Unix mtime or Windows ftLastWriteTime
  6. accessed() → datetime [1:2][2:1] – Unix atime or Windows ftLastAccessTime
  7. created() → datetime [1:3][2:2]statx.btime (kernel ≥ 4.11)/birthtime or Windows ftCreationTime
  8. permissions() → Permissions – a struct with a single cross-platform bit of information, readonly() → bool. On Windows, this checks FILE_ATTRIBUTE_READONLY, and on Unix, it returns true if none of the owner/group/other write bits are set.

Because pathlib.Path already provides stat(), I agree with @encukou that exposing a cached stat() via Path.info makes sense. If Path lacked stat(), I’d be against adding it to PathInfo.[3]

However, I’m not convinced about a dedicated PathInfo.mode(). I’d rather see a higher-level, cross-platform object akin to Rust’s Permissions.

One possible design, which aligns with the fact that Path.info already returns different types at runtime (_WindowsPathInfo on Windows, _PosixPathInfo on POSIX), would be:

  • Path.info.permissions always returns a cross-platform Permissions object with only universally supported bits (like readonly() -> bool).
  • _PosixPathInfo.permissions returns a PosixPermissions object with full POSIX mode information.
  • _WindowsPathInfo.permissions returns a WindowsPermissions object with Windows-specific flags.

Users can then match/isinstance to get the platform-specific API. This makes it so that platform-specific API is an explicit opt-in and avoids polluting a cross-platform object with platform-specific methods.[4]

Alternatively, if you think the number of $Type, Windows$Type, and Posix$Type is getting ridiculous, we can just put methods directly on .info instead of .info.permissions.


  1. I’ve deliberately written these signatures using the closest Python types to make them more comprehensible. ↩︎ ↩︎ ↩︎ ↩︎

  2. In Rust, these time methods can fail and return an error; I’ve left that detail out to keep the comparison straightforward. ↩︎ ↩︎ ↩︎

  3. Although, personally, I would prefer to see Path.stat() deprecated from Path and available only on PosixPath, I am aware this might be unrealistic at this point, so we have to live with it. ↩︎

  4. Rust itself uses a similar pattern for platform-specific metadata: std::fs::Metadata provides the cross-platform bits, while std::os::unix::fs::MetadataExt, std::os::windows::fs::MetadataExt, and std::os::unix::fs::PermissionsExt add OS-specific bits through its trait system rather than separate classes. ↩︎

6 Likes

Very nice, thanks @Monarch. Quite tempting to lift the rust API almost as-is!

I like that readonly() gets its own method, as the one piece of cross-platform information. The Unix-specific PermissionsExt has mode(), set_mode() and from_mode() methods but nothing more specific, which tbh I also like.

Rust’s chmod()-equivalent set_permissions() accepts a Permissions object. We’re at something of a disadvantage in that our os.chmod() and os.mkdir() functions (which are used by pathlib) accept integer permissions, even on Windows. So perhaps our WindowsPathPermissions class needs a mode() function, but it’s faked entirely from the readonly() data, very roughly:

class WindowsPathPermissions:
    def __init__(self, readonly):
        self._readonly = readonly

    def readonly():
        return self._readonly

    def mode():
        return 0o444 if self._readonly else 0o666

class PosixPathPermissions:
    def __init__(self, mode):
        def._mode = mode

    def readonly():
        return (self._mode & 0o222) == 0

    def mode():
        return self._mode

class Path:
    def chmod(self, mode):
        if isinstance(mode, PathPermissions):
            mode = mode.mode()
        os.chmod(self, mode)

edit: alternatively we put it in a utility function that’s called from Path.chmod() and Path.mkdir():

def _get_posix_mode(obj):
    if isinstance(obj, PosixPathPermissions):
        return obj.mode()
    if isinstance(obj, PathPermissions):
        return 0o444 if obj.readonly() else 0o666
    if isinstance(obj, int):
        return obj
    raise TypeError(...)

On stat():

Path.info is currently documented as a pathlib.types.PathInfo object. That protocol will eventually be helpful for people building Path-like classes for virtual filesystems, and asking those folks to create an os.stat_result-ish object is annoying/borderline nonsensical.

So we’ll need to change the documentation to describe the actual return type[1], probably exposed as pathlib.PathInfo (though that name conflicts with the protocol’s). I guess I’ll move the fledgling pathlib.types.PathInfo protocol documentation to a new page to avoid similar information appearing twice. All this is fine I think. I’ll try to get a PR together.


  1. more likely, a non-protocol base type that can’t be instantiated by users ↩︎

1 Like