I have this directory tree:
(top)–>(c , cpp, python)
(top/c)–>(other_courses)
(top/c/other_courses)–>(cpp,python)
(top/cpp)–>(other_courses)
(top/cpp/other_corses)–>(c,python)
(top)–>(python)
I want to find all directories by the name python. I do not want to use os.walk() and I do not want to use recursion as in the code commented out below. Also I want to print Not found in case folder does not exist ( my code prints not found many times!).
import os
def findFolder(rootDir,folder):
list_dirs = os.walk(rootDir)
for root, dirs, files in list_dirs:
for d in dirs:
p=os.path.join(root, d)
#print(p)
if folder in p:
print(p )
"""for lists in os.listdir(rootDir):
path = os.path.join(rootDir, lists)
if 'python' in path:
f=True
print (path)
if os.path.isdir(path):
findFolder(rootDir,folder)
if not f:
print('not')"""
findFolder('root','python')```
Anything that can be done with recursion, you can equivalently do by keeping a list of the things you still haven’t checked. But, as Stefan asks: why? Recursion is the simplest, clearest, and most logical way to spell this. Is there a special restriction that makes recursion impractical?
That’s a very reasonable concern, but do be aware that Python’s default recursion limit is one thousand. That’s a LOT of directory levels. If you really need more than that, it’s certainly possible to avoid recursion, but it’s far less clean and convenient.
Then I have some great news for you: today, you have stumbled upon the exercise that’s going to teach you how recursion works! Seriously though, this is a really superb lesson in recursion, because directory trees are inherently recursive (a directory can contain other directories, and walking a small part of a directory tree is the same as walking a large part of one, you just need to do more/less of the same work) and a recursive solution to the problem will make the most sense.
We’re programmers. When we’re faced with a problem, we learn new tools and strategies, and keep right on going!
def flattened(thing):
pending = [thing]
while pending:
obj = pending.pop()
... do whatever with obj ...
for child in obj.its_children():
if you_would_recurse_into(child):
pending.append(child)
So that you have a queue (pending) of things needing processing. Pull
the first thing off the queue, process it. If it has children you would
have processed by recursion, just appending them to the queue. Repeat
until the queue is empty.
For your walk() replacement, thing would be the top directory name.
The children can be obtained with os.listdir() and any children which
are directories can be appended to the queue.
Whether you pull from the start or end of the queue affects the order,
as does what order you append things to the queue. And in principle the
queue count be any container such as a set or heap as long as you can
pull a single item from it.
Thanks Christ but wait. It seems we are stuck to recursion because os.walk is recursive. So just for the fun of it can we come up with pure iterative solution without ever using os module?
thanks Cameron that is exactly what I did. I modified source code of os.walk and made it leaner and used as shown below. can please take a look and make better and also print not found when folder does not exist?
import os
import os.path as path
def mywalk(top,folder, topdown=True):#, onerror=None, followlinks=False):
islink, join, isdir = path.islink, path.join, path.isdir
names=os.listdir(top)
dirs, nondirs = [], []
for name in names:
if isdir(join(top, name)):
dirs.append(name)
else:
nondirs.append(name)
if topdown:
yield top, dirs, nondirs
for name in dirs:
new_path = join(top, name)
for x in mywalk(new_path,folder):#, topdown, onerror, followlinks):
yield x
#I NEED TO PRINT not found if folder not found at the end
#I don't know if folder not in dirs??
p = "./root"
rt=''
folder='python'
for root,d_names,f_names in mywalk(p,folder,topdown=False):
rt=root
if folder in rt:
print('....',rt)
rt=''
thanks Cameron that is exactly what I did. I modified source code of
os.walk and made it leaner and used as shown below. can please take a
look and make better and also print not found when folder does not
exist?
import os
import os.path as path
def mywalk(top,folder, topdown=True):#, onerror=None, followlinks=False):
islink, join, isdir = path.islink, path.join, path.isdir
You can do this in the import:
from os.path import islink, join, isdir
Personally I usually write this as:
from os.path import islink, joinpath, isdirpath
basicly because “join” is a very generic name.
names=os.listdir(top)
dirs, nondirs = [], []
for name in names:
if isdir(join(top, name)):
dirs.append(name)
else:
nondirs.append(name)
if topdown:
yield top, dirs, nondirs
for name in dirs:
new_path = join(top, name)
for x in mywalk(new_path,folder):#, topdown, onerror, followlinks):
This is a recursive call - you’re calling mywalk from within mywalk.
The iterative solution keeps a collection of outstanding directories to
walk and iterates over that.
The general approach is to set a flag, something like this:
found = False
for item in items:
if this_is_what_we_wanted(item):
found = True
# often you might only need to find the first matching item
# so we might break out of the loop at this point;
# if you need to do things to _all_ matching items,
# do not break
break
if not found:
print("no matching item found")