Combining HTML file and associated folder into one directory

POST AT DISCUSS.PYTHON.ORG
I have downloaded many webpages as HTML and associated folder. The folder has the same name as HTML file, except it has an additional “_files” at the end. For example,

sample.html
sample_files

I’d like to move both of these into a folder named sample

I’d like to permanently associate these two objects with each other by placing them in the same folder. A new folder will be created for each matching pair, taking the same name as the html file without the extension.
HTML file and _files directory should be in the same folder. If one is found in a folder but not the other, then no new directory should be created.

I have a bash script to put each html file in its own directory, but not the associated folder, as follows . . .

find . -type f -name '*.rtf' -exec sh -c '
  for f; do mkdir -p -- "${f%.*}" && mv -v -- "$f" "${f%.*}" ; done' _ {} +

I am learning python and find bash more difficult, I prefer a python solution (but either is acceptable).

I can os-walk through the top-level folder containing all the other directories and HTML files. But I get mixed up when it comes to placing these two objects in a parent folder. I seem to be operating on two levels at one time.

This is the code that didn’t work.

#!/usr/bin/env python
import os
import string
import shutil
from os.path import splitext
from pathlib import Path

##=## RUN FROM FOLDER WITH ITEMS TO BE COMPRESSED ##=##
for root, dirs, files in os.walk('/Volumes/HighSierra/Users/ericlindell/Documents/testTTS-combine/'):

    # CHECK ALL FILES IN FOLDER FOR MP3 EXTENSION
    for checkFileHTML in files:
      
        # IF IT IS AN HTML FILE
        if checkFileHTML.endswith('.HTML'):
            HTMLFile = checkFileHTML
            HTMLFileNoExt = checkFileHTML[:-5]

            # ITERATE OVER FOLDERS
            for checkDir in dirs:

                # FIND DIR ENDING _files
                if checkDir.endswith('_files'):
                    checkDirAbbrev = checkDir[:-6]
                    if checkDirAbbrev == HTMLFileNoExt:

                        os.makedirs(checkDirAbbrev)

                        # MOVE ZIP INTO NEWDIR
                        filePath = os.path.join(root, checkHTML)
                        shutil.move(filePath, HTMLFileNoExt)

                        # MOVE HTML FILE INTO NEWDIR
                        filePath = os.path.join(root, checkDirAbbrev)
                        shutil.move(filePath, checkDirAbbrev)

Any help much appreciated !!

This example code might help:

from os import scandir

files = set()
folders = set()

for entry in scandir('/Volumes/HighSierra/Users/ericlindell/Documents/testTTS-combine/'):
    if entry.is_dir() and entry.name.endswith('.HTML'):
        # Collect the name of the HTML file without the '.HTML'.
        files.add(entry.name.removesuffix('.HTML'))
    elif entry.is_file() and entry.name.endswith('_files'):
        # Collect the name of the folder without the '_files'.
        folders.add(entry.name.removesuffix('_files'))

# Which names occur in both the set of files and the set of folders?
common = files & folders

for name in common:
    file_name = name + '.HTML'
    folder_name = name + '_files'
    print(f'Found a file called {file_name} and a folder called {folder_name}')

The use of sets here is very helpful. It simplifies things.
I am trying to loop recursively over the top-level folder.
I think os.walk may be what I need to use.
scandir appears to return the directory of the top level.
Thanks for the suggestions !!

A variation on that, which avoids having to reconstruct the file and folder names, is to use dicts:

from os import scandir

files = {}
folders = {}

for entry in scandir('/Volumes/HighSierra/Users/ericlindell/Documents/testTTS-combine/'):
    if entry.is_dir() and entry.name.endswith('.HTML'):
        # Collect the file.
        files[entry.name.removesuffix('.HTML')] = entry.name
    elif entry.is_file() and entry.name.endswith('_files'):
        # Collect the folder.
        folders[entry.name.removesuffix('_files')] = entry.name

# Which names occur in both the dict of files and the dict of folders?
common = files.keys() & folders.keys()

for name in common:
    print(f'Found a file called {files[name]} and a folder called {folders[name]}')