Combining HTML file and associated folder into one directory

POST AT DISCUSS.PYTHON.ORG
I have downloaded many webpages as HTML and associated folder. The folder has the same name as HTML file, except it has an additional “_files” at the end. For example,

sample.html
sample_files

I’d like to move both of these into a folder named sample

I’d like to permanently associate these two objects with each other by placing them in the same folder. A new folder will be created for each matching pair, taking the same name as the html file without the extension.
HTML file and _files directory should be in the same folder. If one is found in a folder but not the other, then no new directory should be created.

I have a bash script to put each html file in its own directory, but not the associated folder, as follows . . .

find . -type f -name '*.rtf' -exec sh -c '
  for f; do mkdir -p -- "${f%.*}" && mv -v -- "$f" "${f%.*}" ; done' _ {} +

I am learning python and find bash more difficult, I prefer a python solution (but either is acceptable).

I can os-walk through the top-level folder containing all the other directories and HTML files. But I get mixed up when it comes to placing these two objects in a parent folder. I seem to be operating on two levels at one time.

This is the code that didn’t work.

#!/usr/bin/env python
import os
import string
import shutil
from os.path import splitext
from pathlib import Path

##=## RUN FROM FOLDER WITH ITEMS TO BE COMPRESSED ##=##
for root, dirs, files in os.walk('/Volumes/HighSierra/Users/ericlindell/Documents/testTTS-combine/'):

    # CHECK ALL FILES IN FOLDER FOR MP3 EXTENSION
    for checkFileHTML in files:
      
        # IF IT IS AN HTML FILE
        if checkFileHTML.endswith('.HTML'):
            HTMLFile = checkFileHTML
            HTMLFileNoExt = checkFileHTML[:-5]

            # ITERATE OVER FOLDERS
            for checkDir in dirs:

                # FIND DIR ENDING _files
                if checkDir.endswith('_files'):
                    checkDirAbbrev = checkDir[:-6]
                    if checkDirAbbrev == HTMLFileNoExt:

                        os.makedirs(checkDirAbbrev)

                        # MOVE ZIP INTO NEWDIR
                        filePath = os.path.join(root, checkHTML)
                        shutil.move(filePath, HTMLFileNoExt)

                        # MOVE HTML FILE INTO NEWDIR
                        filePath = os.path.join(root, checkDirAbbrev)
                        shutil.move(filePath, checkDirAbbrev)

Any help much appreciated !!

This example code might help:

from os import scandir

files = set()
folders = set()

for entry in scandir('/Volumes/HighSierra/Users/ericlindell/Documents/testTTS-combine/'):
    if entry.is_dir() and entry.name.endswith('.HTML'):
        # Collect the name of the HTML file without the '.HTML'.
        files.add(entry.name.removesuffix('.HTML'))
    elif entry.is_file() and entry.name.endswith('_files'):
        # Collect the name of the folder without the '_files'.
        folders.add(entry.name.removesuffix('_files'))

# Which names occur in both the set of files and the set of folders?
common = files & folders

for name in common:
    file_name = name + '.HTML'
    folder_name = name + '_files'
    print(f'Found a file called {file_name} and a folder called {folder_name}')

The use of sets here is very helpful. It simplifies things.
I am trying to loop recursively over the top-level folder.
I think os.walk may be what I need to use.
scandir appears to return the directory of the top level.
Thanks for the suggestions !!

A variation on that, which avoids having to reconstruct the file and folder names, is to use dicts:

from os import scandir

files = {}
folders = {}

for entry in scandir('/Volumes/HighSierra/Users/ericlindell/Documents/testTTS-combine/'):
    if entry.is_dir() and entry.name.endswith('.HTML'):
        # Collect the file.
        files[entry.name.removesuffix('.HTML')] = entry.name
    elif entry.is_file() and entry.name.endswith('_files'):
        # Collect the folder.
        folders[entry.name.removesuffix('_files')] = entry.name

# Which names occur in both the dict of files and the dict of folders?
common = files.keys() & folders.keys()

for name in common:
    print(f'Found a file called {files[name]} and a folder called {folders[name]}')

This code is very helpful in identifying the html file and its associated folder. The original question asked how to create the new folder and then put them both inside of it.

My issue is one of levels. I don’t know how to operate on the current level and that of the parent folder at the same time.

What would be the final command in this script to accomplish this? Thanks

Continuing on from my last post, try this (untested):

from os import mkdir, rename
from os.path import basename, join

parent_folder = '/Volumes/HighSierra/Users/ericlindell/Documents/testTTS-combine/'

for name in common:
    # Make the subfolder for the HTML file and associated folder.
    mkdir(join(parent_folder, name))

    # Move the HTML file into the subfolder.
    rename(files[name], join(parent_folder, basename(files[name])))

    # Move the associated folder into the subfolder.
    rename(folders[name], join(parent_folder, basename(folders[name])))