I’m new to python. I’m trying to create a program to check two
directories and remove the all files in one directory that have the
same size and modification date.
This is my attempt.
Look basicly fine to me. Since you’re only interested in the
intersection of the 2 directories, walking one is sufficient.
I have a few remarks, inline in the code:
import os
from os import path
This is usually written “import os.path”. What you have imports os.path
as the local name “path”, meaning you can call eg path.isdir(…). It is
more normal to either:
import os.path
os.path,isdir(...)
or:
from os.path import isdir
isdir(...)
import sys
print “Source " + sys.argv[1] + " Destination” + sys.argv[2]
Looks like you’re using Python 2. I strongly recommend using Python 3,
Python 2 is end of life. That would mean writing print() with brackets
as it is now a function call, not a statement:
print("Source " + sys.argv[1] + " Destination" + sys.argv[2])
You can do this in Python 2 by making sure that this:
from __future__ import print_function
is the first line of your script. Then print() works as a function in
Python 2 and Python 3.
Also, print() accepts multiple arguments, so you can write this:
print("Source",sys.argv[1],"Destination", sys.argv[2])
path = sys.argv[1]
Usually we pull things off the command line and don’t refer to it
afterwards. Eg:
cmd, srcpath, dstpath = argv
print("Source", srcpath, "Destination", dstpath)
and likewise in the rest of the programme. More readable, easier to
debug.
#print ‘checking’, path, os.path.isdir(“path”)
for root, dirs, files in os.walk(sys.argv[1]):
for file in files:
file1=root + os.sep + file
This is better written:
file1 = os.path.join(root, file)
l=len(path)
file2 = sys.argv[2] + root[l:] + os.sep + file
Probably better written:
file2 = os.path.join(dstpath, os.path.relpath(file1, root))
The os.path module has lots of useful things for working with file
paths. Look it up.
if os.path.isfile(file2):
Also check if file2 is a file.
sst = os.stat(file1)
dst = os.stat(file1)
if sst.st_size == dst.st_size :
if sst.st_mtime == dst.st_mtime:
pass
You’re not removing file2 yet. That is good. At least put a print()
statement here to make clear what will happen.
ALSO, very very important, use os.path.samefile() to check you’re not
removing the original. WHat would happen if you went:
my-rmove-script.py dirpath dirpath
i.e. compare a directory with itself? A disaster!
Finally, a size/mtime check is a fast way to check files, but really it
only tells you that they are different if they differ. If they are the
same you still need to compare the file contents to be sure.
Cheers,
Cameron Simpson cs@cskk.id.au