As a reminder, this isnât really a Python problem and more of a Pandas issue, but I can do my best to help.
Also, when posting code, make sure to put it inside three backticks (`
) so it displays properly, like this:
```python
YOUR CODE HERE
```
You have a good start so far, but actually, you donât want to read them all into the same dataframe, you want to keep them in separate dfs since youâll be saving them to separate sheets. In fact, is that what the code you provided already does; it returns a dictionary of dataframes, one per workbook, not a single dataframe.
This is a somewhat different question from what you asked above, and from what your assignment states. However, it is a much simpler problem, which is a good idea to start with.
Besides renaming df
to sheets
to avoid confusion, since its a dict of sheet dfs rather than one df with all the sheets, all you need to do now is loop over the keys and values of sheets
, and write each one to a separate excel file using pd.to_excel()
. So something like:
for name, sheet in sheets.items():
pd.to_excel(name, sheet_name=name)
I donât use with pd.ExcelFile()
or with pd.ExcelWriter
here, but you will if youâre loading multiple batches of sheets from one workbook. See below for a more detailed description about how to solve the larger problem.
The document you shared above only has the dozen or so you need, so I assumed that was your input file and based my advice accordingly. I take it youâre talking about a different document? In that case, as youâve correctly identified, thatâs a whole lot to load at once, and it would be better to just load the ones you need into the dictionary. In that case its particularly important to use the with pandas.ExcelFile
block I mention, so your excel file is only loaded once, and you can access the sheet names.
As the docs I linked mention, you can get the names of the sheets as a list with the .sheet_names
attribute of the ExcelFile
. As I hinted above, you can then get the names that start with "US"
using the starts_with
method looping over the list, preferably in a list comprehension. For example,
sheet_names_us = [
sheet_name for sheet_name in source_file.sheet_names
if sheet_name.startswith("US")]
You then have a list of sheet names you can pass to sheet_name
in read_excel
(like the example you included above), which will return a dict of just the sheets you need. Since you have four different sheet names, instead of copying and pasting these lines once for every sheet prefix, you can just loop over a list of prefixes, and put the dicts containing the sheets from each prefix inside another dict, with the key being the prefix/output workbook name, which then makes writing them easy.
Youâll want to get a dictionary that looks like:
{
"US": {
"US1": <us1_sheet_df>,
"US2": <us2_sheet_df>,
... # Etc
},
"IND": {
"IND1": <ind1_sheet_df>,
"IND2": <ind2_sheet_df>,
... # Etc
},
... # Etc
}
To do so, you can do something like:
PREFIXES = ["US", "IND", "AU", "ENG"] # The prefixes you want
sheets_by_prefix = {} # Dict to put the dicts pandas outputs, one per prefix
with ... as source_file: # Open the input ExcelFile as above
sheet_names = source_file.sheet_names
for prefix in PREFIXES:
sheet_names_prefix = ... # Get the sheet names that start with the prefix, as above
sheets_prefix = ... # Load the sheets with those names with read_excel()
sheets_by_prefix[prefix] = sheets_prefix # Store the sheets for that prefix
Finally, you can write them out as I suggested before, except looping over the sheets_by_prefix
dictionary instead of a hardcoded list. So youâd loop over the keys/values in the outer dictionary, which represent each output workbook, using the with
statement I mention above to open the excel file for writing, then inside that loop use another nested loop inside the with
block to looping over each key/value pair in the inner dictionary, which are each sheet. That would look something like this:
for workbook_name, sheets in sheets_by_prefix.items():
with ... as output_file: # Open the output file
for sheet_name, sheet in sheets.items():
... # Write each sheet as mentioned above
Best of luck!