I’m attempting to create a script to back up all directories and files from my linux home folder to a USB drive, excepting all which are not hidden and those which are already save on the USB. My code successfully copies any new folders and sub-folders from the source (src) directory to the destination (dst) directory, but it is not copying any files. I believe something’s not right with the os.walk in the “for file in files” section, but can’t figure what. Likely something simple but this novice doesn’t see it! Thanks for any suggested fixes you may have!
Below is the current code:
‘’‘Check if all subdirectories in a directory exist and then copy any new subdirectories and files
to usb drive while excluding hidden files
‘’’
import os
import shutil
def check_and_copy(src, dst):
# Check if source and destination directories exist
if not os.path.exists(src):
raise FileNotFoundError(f"The source directory {src} does not exist.“)
if not os.path.exists(dst): #os.makedirs(dst)
raise FileNotFoundError(f"The Destination USB drive {dst} does not exist.”)
# Walk through the source directory
for root, dirs, files in os.walk(src):
# Skip hidden directories
dirs[:] = [d for d in dirs if not d.startswith('.')]
# Create corresponding directory in destination
for d in dirs:
src_dir = os.path.join(root, d)
dst_dir = os.path.join(dst, os.path.relpath(src_dir, src))
if not os.path.exists(dst_dir):
os.makedirs(dst_dir)
print(f"Creating directory {d}")
else:
print(f"Directory {d} already exists in destination, skipping.")
# Copy files from source to destination
for file in files:
if not file.startswith('.'):
# Check if the file does not already exist in the destination
if not os.path.exists(src):
src_file = os.path.join(root, file)
dst_file = os.path.join(dst, os.path.relpath(src_file, src))
shutil.copy2(src_file, dst_file)
print(f"Copied file named {file}")
else:
print(f"File {file} already exists in destination, skipping.")
print(f"********** END OF BACKUP **********")
You might like to check up on the rsync tool that already implements all your backup requirements. It is likely already installed on your system. A web search for “rsync backup” should provide lots of examples of use.
When debugging a script like this I add lots of print() calls to show the flow of the code and what is in the variables.
For example in the for loop start by adding a print at the top of the loop showing the files and directories returned from os.walk.
Next what is in dirs after your filter line.
There’s an important difference here - if you mutate the directory list, it will reduce the number of directories that os.walk() traverses. This is a documented feature, as long as the walk is being done top-down (which is the default and is being done here).
Possibly to belabor the point here but: dirs is a reference to the
list os.walk will be using to descend the subdirectories. The dirs[:] incantation is necessary to modify that list. A plain dirs=
just repoints the local variable, and makes no change to the list os.walk is using.
Thanks for the pointer on rsync. I had recently come across that option, but did not attempt to implement anything as I wanted to develop a script which I could recycle later for saving specific files. Thought I’d start with all items in my home folder and if that worked I could alter it as needed in the future.
On the [:] after dirs: when the colon is removed only the 1st directory in the root is created but no sub- directories or files, and I get this error:
File "/home/[anonymized]/BackUpToUSB42.py", line 44, in <module>
check_and_copy('/home/[anonymized]', '/run/media/[anonymized]/')
~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/maviko/BackUpToUSB42.py", line 25, in check_and_copy
os.makedirs(dst_dir)
if I take that check out and tab everything else back, ala this:
for file in files:
if not file.startswith('.'):
# Check if the file does not already exist in the destination
src_file = os.path.join(root, file)
dst_file = os.path.join(dst, os.path.relpath(src_file, src))
shutil.copy2(src_file, dst_file)
print(f"Copied file named {file}")
else:
print(f"File {file} already exists in destination, skipping.")
… it copies everything, even if a copy exists in the “dst” already.
Is this what you were suggesting?
Thanks to all who gave suggestions. I managed to find the problems. I restructured the file copying section similar to the structure in the directory section. Also, Chris, you were correct that it was also a problem of referencing the “src” and not the “dst”.
I also learned something about files in Python:
I had some files that I began with an underscore (e.g., “_MyFolder”) which I used often and wanted at the top of my Home directory. When I ran this script I noticed that it was not giving me a “printing” or “skipping” message about files in those folders, though it was copying them. Not sure why that is. Anyone have an answer?
Here’s the new code:
'''Check if all subdirectories in a directory exist and then copy any new subdirectories and files
to usb drive while excluding hidden files
'''
import os
import shutil
def check_and_copy(src, dst):
# Check if source and destination directories exist
if not os.path.exists(src):
raise FileNotFoundError(f"The source directory {src} does not exist.")
if not os.path.exists(dst):
#os.makedirs(dst)
raise FileNotFoundError(f"The Destination USB drive {dst} does not exist.")
# Walk through the source directory
for root, dirs, files in os.walk(src):
# Skip hidden directories
dirs[:] = [d for d in dirs if not d.startswith('.')]
# Create corresponding directory in destination
for d in dirs:
src_dir = os.path.join(root, d)
dst_dir = os.path.join(dst, os.path.relpath(src_dir, src))
if not os.path.exists(dst_dir):
os.makedirs(dst_dir)
print(f"Creating directory {d}")
else:
print(f"Directory {d} already exists in destination, skipping.")
# Copy files from source to destination
for file in files:
src_file = os.path.join(root, file)
dst_file = os.path.join(dst, os.path.relpath(src_file, src))
# Skip hidden files
if not file.startswith('.'):
# Check if the file does not already exist in the destination
if not os.path.exists(dst_file):
shutil.copy2(src_file, dst_file)
print(f"Copied file named {file}")
else:
print(f"File {file} already exists in destination, skipping.")
print(f"********** END OF BACKUP **********")
check_and_copy('/home/[anon]', '/run/media/[anaon]/')
That sounds very odd. I have no idea at this stage what the problem is, so I would use my standard debugging technique: If In Doubt, Print It Out! For example, right at the top of the main for root, dirs, files loop, add: print("Walking", root) so that you can see the directories it’s checking. This should pair nicely with your “Creating directory” // “Directory already exists” lines; you should see the initial root directory, followed by the creation of any needed children, and then you walk into those subdirectories.