Converting data from 324 text files into excel

I have two questions

  1. For below code, only data from first ten text files is copied into excel but data in 11th text is not copied into excel. What is the reason?
    from pathlib import Path

    import pandas as pd
    import xlwings as xw

    current_dir = Path(_file).parent if “file” in locals() else Path.cwd()

    input_dir = current_dir / r “Location1”
    output_dir = current_dir / r”Location2”
    output_dir.mkdir(exist_ok = True, parents = True)

    files = list(input_dir.rglob(“*.txt”))
    #print(*files)

    excel_template = current_dir / r “Location3”

    keys = set(““.join(file.stem.split(””)[:3]) for file in files)

    #print(keys)

    with xw.App(visible=False) as app:

    for key in keys:
    
        wb = app.books.open(excel_template)
    
        for file in files:
    
            if file.stem.startswith(key):
    
                     if file.stem.endswith("___1"):
                    
                        df = pd.read_csv(file, sep="\t", header=4)
                        wb.sheets("Pt_1").range((1,1)).options(index=False).value = df
    
                     elif file.stem.endswith("___2"):
    
                        df = pd.read_csv(file, sep="\t", header=4, usecols=[1])
                        wb.sheets("Pt_1").range((1,3)).options(index=False).value = df
    
                     elif file.stem.endswith("___3"):
    
                        df = pd.read_csv(file, sep="\t", header=4, usecols=[1])
                        wb.sheets("Pt_1").range((1,4)).options(index=False).value = df
    
                     elif file.stem.endswith("___4"):
    
                        df = pd.read_csv(file, sep="\t", header=4, usecols=[1])
                        wb.sheets("Pt_1").range((1,5)).options(index=False).value = df
    
                     elif file.stem.endswith("___5"):
    
                        df = pd.read_csv(file, sep="\t", header=4, usecols=[1])
                        wb.sheets("Pt_1").range((1,6)).options(index=False).value = df
    
                     elif file.stem.endswith("___6"):
    
                        df = pd.read_csv(file, sep="\t", header=4, usecols=[1])
                        wb.sheets("Pt_1").range((1,7)).options(index=False).value = df
    
                     elif file.stem.endswith("___7"):
    
                        df = pd.read_csv(file, sep="\t", header=4, usecols=[1])
                        wb.sheets("Pt_1").range((1,8)).options(index=False).value = df
    
                     elif file.stem.endswith("___8"):
    
                        df = pd.read_csv(file, sep="\t", header=4, usecols=[1])
                        wb.sheets("Pt_1").range((1,9)).options(index=False).value = df
    
                     elif file.stem.endswith("___9"):
    
                        df = pd.read_csv(file, sep="\t", header=4, usecols=[1])
                        wb.sheets("Pt_1").range((1,10)).options(index=False).value = df
    
                     elif file.stem.endswith("___10"):
    
                        df = pd.read_csv(file, sep="\t", header=4, usecols=[1])
                        wb.sheets("Pt_1").range((1,11)).options(index=False).value = df
    
                     elif file.stem.endswith("___11"):
    
                        df = pd.read_csv(file, sep="\t", header=4, usecols=[1])
                        wb.sheets("Pt_1").range((1,12)).options(index=False).value = df
                        
               
        wb.save(output_dir / f"{key}_Results.xlsm")
    

2. How do I automate using for loop or any iterative loops for text files from 2 to 34 by updating above code?

To condense your script so that it is not so verbose with the many if / elif conditional statements, you can use a dictionary. Analyzing your script, it appears that the only thing changing is the second number in the tuple depending on the ending string value of the file.stem object. Thus, we can organize these values as key / value pairs via a dictionary. Note that the special case when the string ends with “___1”, you’re not setting this value: usecols=[1], hence why it has its own dedicated if conditional statement.

To condense the script we make use of a dictionary and replace all of the the conditional statements starting from the second one. Note the key / value pairs. We also make use of a for loop to iterate through the dictionary keys corresponding to the file.stem object potential string values.


# Key / value pairs
range_keys = {"___2": 3, "___3": 4, "___4": 5, "___5": 6, "___6": 7,
              "___7": 8, "___8": 9, "___9": 10, "___10": 11, "___11": 12,}

for key in keys:

    wb = app.books.open(excel_template)

    for file in files:

        if file.stem.startswith(key):

            if file.stem.endswith("___1"):

                df = pd.read_csv(file, sep="\t", header=4)
                wb.sheets("Pt_1").range((1, 1)).options(index=False).value = df

            else:

                for r_keys in range_keys.keys():

                    if file.stem.endswith(r_keys):

                        df = pd.read_csv(file, sep="\t", header=4, usecols=[1])
                        wb.sheets("Pt_1").range((1, range_keys[r_keys)).options(index=False).value = df
                        break # break once match found


    wb.save(output_dir / f"{key}_Results.xlsm")

Note that you can easily extend the existing dictionary without modifying the script unless you have to also make changes to the body of the conditional statements or the key / value pairs.

1 Like

You can note that the range_keys entries follow a pattern, so you could build it like this:

range_keys = {f"___{i}": i + 1 for i in range(2, 12)}
1 Like

Yes, this is definitely more concise and elegant. However, being that the OP appears to be green still, sometimes the explicit form might be more telling and helpful.

@MRAB

By the way, once he understand how the dictionary selection works, this method would be preferred once options are increased significantly; say for 34. :slightly_smiling_face: