On linux you could check the inode of each file and verify that they are the same (i.e. the data is at the same location on the filesystem).
In python this is os.stat(file_path).st_ino
or the inode()
method of an os.DirEntry
object.
On linux you could check the inode of each file and verify that they are the same (i.e. the data is at the same location on the filesystem).
In python this is os.stat(file_path).st_ino
or the inode()
method of an os.DirEntry
object.
Okay here is a demonstration of two venvs created with uv venv
and then uv pip install numpy
:
$ stat venv1/lib/python3.13/site-packages/numpy/__init__.py | grep node
Device: 811h/2065d Inode: 17715891 Links: 4
$ stat venv2/lib/python3.13/site-packages/numpy/__init__.py | grep node
Device: 811h/2065d Inode: 17715891 Links: 4
They both have the same inode and it shows that there are 4 links so this file is shared in 4 places.
Here is another venv created with python -m venv
:
$ stat venv3/lib/python3.13/site-packages/numpy/__init__.py | grep node
Device: 811h/2065d Inode: 26872598 Links: 1
That’s a different inode and it only has 1 link so it is not shared anywhere else.
uv will use either hard links or reflinks by default, depending on platform. Both of these options save space.
It would be awesome if uv also hard linked or reflinked pyc files. For one, this could mean faster start up in a lot of scenarios. Currently uv does not compile pyc on installs at all (pip does — this is an important way uv’s benchmarks against pip are not apples-to-apples). See also Speed up pyc compilation · Issue #2637 · astral-sh/uv · GitHub
Awesome! Thanks for explaining, and for linking the relevant issue.
Looks like the situation is already pretty great, and only getting better
ls -l shows the link count in the second column. For an ordinary file,
if the link count is greater than 1, then there is another hard link to
the file somewhere.
You can also use ls -i to see the inode numbers. Two hard links to the
same file will have the same inode number. (But if two files have the
same inode number, they’re not necessarily the same file – they could
be on different file systems. You can use df to find out what file
system a file is on.)
You can also disable writing the bytecode to disk entirely if it’s adding up that much. There’s a relatively small hit to performance on first import that’s more noticeable in an environment where python is used for scripting or other short lived tasks than for programs that run for a while once started. (I’ve got PYTHONDONTWRITEBYTECODE=1
set in my dev environment (not in production though))