Possibly invalid symlinks with relative paths in shutil.copytree destinations

Symlink pointing outside of the source directory in shutil.copytree

The issue

Recently, I have been poking around with file systems and operations with directories more. In a discussion with my friend, we identified a potential issue: symlinks with relative paths that point outside the source directory.

Imagine a source folder, something like /folder1/folder2/source which contains a symlink /folder1/folder2/source/link pointing to a relative path ../../folder3/file.py, the expansion of which would result in /folder1/folder3/file.py.

That’s okay and a valid symlink, as long as we don’t try copying it over to another folder as-is. Let’s say we want to copy the contents of our folder /folder1/folder2/source into /folder1/dest. This would result in a symlink /folder1/dest/link with the same relative path ../../folder3/file.py, which would, however, try to expand into a path of /file.py, which does not have to be present.

The root of the FS is an extreme example; the same could happen in any other part of the tree, e.g., you want to copy into /folder4/folder5/folder6 which would result in a symlink /folder4/folder5/folder6/link expanding into /folder4/file.py which doesn’t have to be present, or can be a completely different file.

Proposed change

I propose checking the paths of these symlinks and replacing them with the absolute path if they point outside the source folder with a relative path.

In the above examples, this would result in symlinks pointing to /folder1/folder3/file.py as an absolute path, even in the destination directories.

A function I came up with (pasted from my project, edited to be usable outside of the class):

    def symlink_in_source(source_path_abs, symlink_path) -> str:
        """Check if a symlink is pointing inside the source folder.

        Returns absolute path in case symlink points outside of source folder.

        Args:
            source_path: path to source folder
            symlink_path: path to symlink

        Returns:
            str: path to which the symlink should point
        """
        symlink_path_abs = os.path.abspath(os.path.join(source_path, symlink_path))

        logging.debug(f"Symlink absolute path: {symlink_path_abs}")

        if os.path.abspath(source_path) == os.path.commonpath(
            [os.path.abspath(source_path), symlink_path_abs]
        ):  # alternatively `os.path.abspath(source_path) in symlink_path_abs[:len(os.path.abspath(source_path))]`
            return symlink_path
        else:
            logging.debug(
                f"Symlink path leads outside of source folder, using absolute path: {symlink_path_abs}"
            )
            return symlink_path_abs

Backwards compatibility

I believe this change is backward compatible.

How can this be? Wouldn’t the exact same code given the same file structure containing a sym link outside the root result in different files afterwards?

The current behavior has the benefit of being simple, stupid and obvious. What you are describing is just one of a few “clever” solutions I can think of (raise an exception, materialize the symlink, copy the file even if outside of root). I am not sure if it’s a good idea to settle on one. Maybe this could be made configurable?

2 Likes

Maybe this could be made configurable?

It is already configurable, via the symlinks and ignore_dangling_symlinks parameters.

This might be more suited to a GitHub issue than a discourse thread, but could the OP give a specific example of the behaviour they consider buggy? I.e. python code, directory structure, actual and expected results.

In particular, the value of the symlinks and ignore_dangling_symlinks parameters change the desired and expected behaviour in this scenario significantly, so it is difficult to discuss without concrete examples that specify those.

Note there is an existing issue #91205, which might potentially be impacting you here.