Combine or group excels sheets into new workbook

mg24 · January 16, 2022, 11:27pm

Hi Team,

Need help in grouping excels sheets into new workbooks as per Criteria.
I have 10 sheets in single input file, I want to create 4 workbooks out of it and save it.

Criteria If words contains below names club those sheets into new workbook.
US,IND,ENG,AU

Criteria to group.
us1,us2,us3 bring these sheets into US workbook
IND1, IND2,IND3 bring these sheets into IND Workbook
ENG1, ENG2 bring these sheets into ENG Workbook
AU1, AU2 bring these sheets into AU Workbook

thanks for your help in advance !
mg

CAM-Gerlach · January 17, 2022, 8:58am

In the future, I strongly suggest doing a quick google for terms of interest, like excel python sheets workbook, which for beginner problems like this can get you the answer in seconds, with far less time and effort for both you and others.

I suggest using pandas for this task, a ubiqutous library when working with tabular data; make sure to install the latest version of pandas (if you haven’t already) along with openpyxl. In particular, pandas.ExceltFile is what you want; it makes it very easy. The documentation explains everything you can do with it, but briefly, you can import pandas with:

import pandas as pd

Then, open an excel file with:

with pd.ExcelFile("path_to_file.xlsx") as source_file:
    # Read data from the file

Inside the with block, you can then read data from the file, either a single sheet:

    us1 = pd.read_excel(source_file, sheet_name="us1")

Or all the sheets, as a dict of sheet_name: sheet_contents.

    all_sheets = pd.read_excel(source_file, sheet_name=None)

You can then open an output file for writing:

with pd.ExcelWriter("us.xlsx") as xlsx_us:
    # Write your data

and write your data to a sheet with

    us1.to_excel(xlsx_us, sheet_name="us1")

You should be able to write your program with well under a dozen lines of code (about seven, by my count), if you’re a bit clever. Read all the sheets in one go into a dict, loop over a list with each prefix (US, IND, ENG, AU), and write an inner loop that writes each dict entry that .startswith() the prefix to a separate output file. If you do it correctly, your code should work with any file with any number of different prefixes and sheets with no changes except to the list of prefixes.

Best of luck!

mg24 · January 17, 2022, 12:28pm

Hi C.A.M Gerlach,

Thanks for your help really useful. I am new to python.
unable to add specific sheets into dictionary.

In a single workbook Contains 514 worksheets.
if Sheet name start with “US” add it to dictionary. how to achieve this. I am struggling here.

df = pd.read_excel(‘users.xlsx’, sheet_name = [‘User_info’,‘compound’])

Thanks in advance for your help.

Thanks
mg

mg24 · January 17, 2022, 7:06pm

Hi Team,

i have added 3 specific sheets into pandas dataframe. how to print 3 sheets as seperate workbooks.

Below is my attempted code

import pandas as pd
sheet_names = [‘US1’, ‘US2’, ‘US3’]
df = pd.read_excel(r’C:\Users\malle\OneDrive\Desktop\Input_File\Raw_File_for_combine_sheets.xlsx’, sheet_name = sheet_names)

Thanks
mg

CAM-Gerlach · January 18, 2022, 12:55am

As a reminder, this isn’t really a Python problem and more of a Pandas issue, but I can do my best to help.

Also, when posting code, make sure to put it inside three backticks (`) so it displays properly, like this:

```python
YOUR CODE HERE
```

You have a good start so far, but actually, you don’t want to read them all into the same dataframe, you want to keep them in separate dfs since you’ll be saving them to separate sheets. In fact, is that what the code you provided already does; it returns a dictionary of dataframes, one per workbook, not a single dataframe.

This is a somewhat different question from what you asked above, and from what your assignment states. However, it is a much simpler problem, which is a good idea to start with.

Besides renaming df to sheets to avoid confusion, since its a dict of sheet dfs rather than one df with all the sheets, all you need to do now is loop over the keys and values of sheets, and write each one to a separate excel file using pd.to_excel(). So something like:

for name, sheet in sheets.items():
    pd.to_excel(name, sheet_name=name)

I don’t use with pd.ExcelFile() or with pd.ExcelWriter here, but you will if you’re loading multiple batches of sheets from one workbook. See below for a more detailed description about how to solve the larger problem.

The document you shared above only has the dozen or so you need, so I assumed that was your input file and based my advice accordingly. I take it you’re talking about a different document? In that case, as you’ve correctly identified, that’s a whole lot to load at once, and it would be better to just load the ones you need into the dictionary. In that case its particularly important to use the with pandas.ExcelFile block I mention, so your excel file is only loaded once, and you can access the sheet names.

As the docs I linked mention, you can get the names of the sheets as a list with the .sheet_names attribute of the ExcelFile. As I hinted above, you can then get the names that start with "US" using the starts_with method looping over the list, preferably in a list comprehension. For example,

    sheet_names_us = [
        sheet_name for sheet_name in source_file.sheet_names
        if sheet_name.startswith("US")]

You then have a list of sheet names you can pass to sheet_name in read_excel (like the example you included above), which will return a dict of just the sheets you need. Since you have four different sheet names, instead of copying and pasting these lines once for every sheet prefix, you can just loop over a list of prefixes, and put the dicts containing the sheets from each prefix inside another dict, with the key being the prefix/output workbook name, which then makes writing them easy.

You’ll want to get a dictionary that looks like:

{
    "US": {
        "US1": <us1_sheet_df>,
        "US2": <us2_sheet_df>,
        ... # Etc
    },
    "IND": {
        "IND1": <ind1_sheet_df>,
        "IND2": <ind2_sheet_df>,
        ... # Etc
    },
    ... # Etc
}

To do so, you can do something like:

PREFIXES = ["US", "IND", "AU", "ENG"]  # The prefixes you want
sheets_by_prefix = {}  # Dict to put the dicts pandas outputs, one per prefix
with ... as source_file: # Open the input ExcelFile as above
    sheet_names = source_file.sheet_names
    for prefix in PREFIXES:
        sheet_names_prefix = ... # Get the sheet names that start with the prefix, as above
        sheets_prefix = ... # Load the sheets with those names with read_excel()
        sheets_by_prefix[prefix] = sheets_prefix  # Store the sheets for that prefix

Finally, you can write them out as I suggested before, except looping over the sheets_by_prefix dictionary instead of a hardcoded list. So you’d loop over the keys/values in the outer dictionary, which represent each output workbook, using the with statement I mention above to open the excel file for writing, then inside that loop use another nested loop inside the with block to looping over each key/value pair in the inner dictionary, which are each sheet. That would look something like this:

for workbook_name, sheets in sheets_by_prefix.items():
    with ... as output_file: # Open the output file
        for sheet_name, sheet in sheets.items():
            ... # Write each sheet as mentioned above

Best of luck!