I came across something that was quite unexpected in my eyes.
# Code running on Windows 10, Python version 3.11.6
import os
import shutil
directory = r'W:\prod'
src = r"W:\test\fill_icon.png"
for file in os.scandir(directory):
dest = os.path.join(r"W:\test", file)
shutil.copy(src, dest)
I was thinking this small piece of code would copy the prod file over to the test location containing the fill_icon.png content. To my suprise I instead overwrote the production data with the fill_icon.png. I don’t understand how this is correct behaviour of the code. Can anyone take a look into this and see if should instead return an error when joining a dir entry this way?
Scandir returns absolute paths not the filenames. os.join.path will always return the last absolute path, ignoring earlier paths. os.path.join('/x/y', '/a/b/d') returns /a/b/d not /x/y/d.
In other words your code worked as expected.
I suspect you you wanted to do is os.path.join(r"w:\test", os.path.basename(file))
When I’m writing code like this I first run it printing out the src and dest to check my logic is as I expect.
I have noticed this before, but in a slightly different context. Would it be possible to explain why it is logical that os.path.join('/x/y', '/a/b/d') == /a/b/d?
The context where it ‘confused’ me is Path("/a/b/c/") / "/d/e/f/" == Path('/d/e/f').
This is my guess at why it works this way.
It works this way because there are a number of possible solutions when an absolute path is in the middle of the args.
Start the join again, as os.path.join does
Raise an exception
Something else, which is likely to be some form of guess as to what the user intended
(1) has proved to be workable in practice. As it allows a default path to be overridden by a user input absolute path. But if only a basename is used then the default path is joined on to the basename.
(2) requires a lot more logic in the caller
(3) will likely lead to support/maintenance issues as there is no obvious correct behaviour.
The thing that makes this one a bit more confusing, is that the representation of os.scandir object only prints «
<DirEntry ‘filename.ext’>» so its easy to think its just the filename on not an absolute path element.
I was also confused by seeing only the name.
I had to convert to a path by access the .path member of DirEntry.
You can also use the os.fspath() function to get a file system path from an object that contains one.