My problem here is that there doesn’t seem to be any way to find out what has gone wrong. I’m relatively new to programming by the way.
So basically, I have made a simple script that recursively processes all the files under a directory. What I’ve included below is a simplification of the code I was using that I can run to reproduce the same problem. I’m happy to include the full script if that would be better. When I run the code below on almost any directory, it works perfectly. My problem is that my script was designed to work on a specific directory I have, which is full of files nested inside a maze of subdirectories, and although it has been working flawlessly until now, and allowed me to process 1000s of files, it seems to have encountered a specific file that causes a crash.
The details are as follows:
When I run my script, or when I run the snippet below, and import_path is set to this specific directory, the program crashes with nothing but the word ‘Killed’.
The problem directory has been working for a long time until now. Leading me to believe that the problem is caused by a specific file.
I can’t think of a way to get the address of the file, in order to delete it.
I’m running these programs from the command line on my Debian 12 PC
import os
slots = 1
import_path = '/media/user/60/to_import'
def process2(address):
print(address)
def process(address):
if address.endswith(('.jpg', '.png', 'jpeg')):
process2(address)
global slots
slots = slots - 1
def list_files_scandir(path):
with os.scandir(path) as entries:
for entry in entries:
if slots > 0:
if entry.is_file():
process(entry.path)
elif entry.is_dir():
list_files_scandir(entry.path)
else:
break
list_files_scandir(import_path)
I would be grateful for any insight.
Edit: I have fixed a typo, and included the full script below:
from PIL import Image
import os
import sqlite3
import hashlib
import shutil
max_files = 100
import_path = '/home/user/sort-qimgv/import'
import_files = os.listdir(import_path)
imported = 0
skipped = 0
db_path = "/home/user/sort-qimgv/database.db"
def hash(filename):
with open(filename, 'rb', buffering=0) as f:
return hashlib.file_digest(f, 'sha256').hexdigest()
def thumbnail(file):
WIDTH = 1920
HEIGHT = 1280
img = Image.open(file)
img.thumbnail((WIDTH, HEIGHT))
img.save(file)
if len(import_files) == 0:
raise Exception("No files to import.")
else:
print("Files found.")
# How many slots are free?
conn = sqlite3.connect(db_path)
cur = conn.cursor()
cur.execute("SELECT * FROM `index` WHERE location=0")
rows = cur.fetchall()
slots = max_files - len(rows)
conn.close()
print(str(slots) + ' slots.')
def import_file(file):
hash_first = hash(file)
conn = sqlite3.connect(db_path)
cur = conn.cursor()
cur.execute("SELECT * FROM hash WHERE hash=?", (hash_first,))
rows = cur.fetchall()
conn.close()
if len(rows) > 0:
global skipped
skipped = skipped + 1
os.remove(file)
return
conn = sqlite3.connect(db_path)
cur = conn.cursor()
cur.execute("insert into hash (hash) values (?)", (hash_first,))
conn.commit()
if os.path.getsize(file) > 1000000:
thumbnail(file)
hash_second = hash(file)
conn = sqlite3.connect(db_path)
cur = conn.cursor()
cur.execute("insert into hash (hash) values (?)", (hash_second,))
conn.commit()
# find a suitable id
conn = sqlite3.connect(db_path)
cur = conn.cursor()
cur.execute("SELECT MIN(id) + 1 FROM `index` WHERE id + 1 NOT IN (SELECT id FROM `index`)")
rows = cur.fetchall()
conn.close()
value = rows[0][0]
if value is None:
newid = 1
else:
newid = value
conn = sqlite3.connect(db_path)
cur = conn.cursor()
cur.execute("insert into `index` values (?, 0)", (newid,))
cur.execute("insert into old_name values (?, ?)", (newid, str(file)))
conn.commit()
new_destination = "/home/user/sort-qimgv/sort/" + str(newid) + os.path.splitext(file)[1]
shutil.move(file, new_destination)
global imported, slots
imported = imported + 1
slots = slots - 1
def process(address):
if address.endswith(('.jpg', '.png', 'jpeg')):
print(address)
import_file(address)
def list_files_scandir(path):
with os.scandir(path) as entries:
for entry in entries:
if slots > 0:
if entry.is_file():
process(entry.path)
elif entry.is_dir():
list_files_scandir(entry.path)
else:
break
list_files_scandir(import_path)
print(str(imported) + ' imported, ' + str(skipped) + ' skipped.')
Edit 2:
I’ve realised that a lot of what I put in the snippet was extraneous. This is actually all I need to reproduce the bug:
user@C1:~/sort-qimgv$ cat test2.py
import os
import_path = '/media/user/60/to_import'
def list_files_scandir(path):
with os.scandir(path) as entries:
for entry in entries:
if entry.is_file():
print(entry.path)
elif entry.is_dir():
list_files_scandir(entry.path)
list_files_scandir(import_path)
user@C1:~/sort-qimgv$ python3 test2.py
Killed
user@C1:~/sort-qimgv$
Edit 3:
Well, this is kind of embarrassing… What was reliably going wrong yesterday had miraculously fixed itself today, and I have done absolutely nothing except restart my computer.
So I’m totally nonplussed about what was going on, and am now unable to investigate further.
I think it must have has something to do with the way the drive was mounted, due to the spontaneous after a restart. The reason I ruled that out before was that all the other directories on that same drive didn’t have that problem. I also mounted it this time in exactly the same way I always do using Thunar. Anyway, that’s the end. Sorry for the anticlimax. I’m happy to answer any questions, but obviously I can’t do any testing unless it happens again.