Pandas, xlsxwriter, how to get rid of first row of numbers? (Headers?)

c-rob · July 1, 2024, 5:26pm

I’m using Pandas and Xlsxwriter to write a spreadsheet to a .XLSX file. Here’s my incomplete code.

    '''datadict = has all my data with keys.
    options = a class holding different data to pass between functions.
    options.headersxl = ['Hdr1', 'hdr2', 'hdr3', 'hdr4']
    '''
    import pandas
    import xlsxwriter
    procname = str(inspect.stack()[0][3]) + ":"

    fn = os.path.join(options.progdir, options.datafn + ".xlsx")
    outdata = []
    sheetname = 'Sheet1'
    outdata.append(options.headersxl) # Column headers as a list.
    for key, val in datadict.items(): 
        # tlist = [key.split(',') for key in datadict] # Split one row
        t = datadict[key]
        tlist = t.split(',') # Convert one row to a list. 
        outdata.append(tlist) # Save one row
        
    df=pandas.DataFrame(outdata) # Make a dataframe from the list of lists. 
    (numrows, numcols) = df.shape
    try: 
        writer = pandas.ExcelWriter(fn, engine='xlsxwriter', date_format='MM/DD/YYYY')
    except PermissionError as e:
        print(f"{procname} ERROR: Please close the spreadsheet before running this.")
        sys.exit()
    except Exception as e:
        sys.exit(f"{procname} ERROR on pandas.ExcelWriter: {e}")
    
    try: 
        df.to_excel(writer, sheet_name=sheetname, header=options.headersxl, index=False, freeze_panes=(1,0) ) # Save to file.
    except Exception as e:
        sys.exit(f"{procname} ERROR on to_excel: {e}")
    
    # strings_to_numbers = convert all strings to floats/numbers where possible.
    # wbook  = writer.book(fn, {'strings_to_numbers': True, 'default_date_format': 'dd/mm/yy'})
    workbook = writer.book
    worksheet = writer.sheets[sheetname]

    hdrformat = workbook.add_format({'bold':True, 'text_wrap':True, 'valign':'top', 'align':'left'})
    dlrformat = workbook.add_format({'num_format': '0.00', 'align':'right'})
    dateformat = workbook.add_format({'num_format': 'mm/dd/yy'})
    writer.sheets[sheetname].set_column('A:A', 25) 
    writer.sheets[sheetname].set_column('B:B', 15, dateformat) 
    writer.sheets[sheetname].set_column('D:E', 25) 
    writer.sheets[sheetname].set_column('F:G', 12, dlrformat) 
    writer.sheets[sheetname].set_column('J:K', 12, dlrformat)

In my Excel file the first two rows look like this:

0     1     2     3    4...
Hdr1  hdr2  hdr3  hdr4 ...

I want to get rid of the first row which says “0 1 2 3 4”.

If I add my options.headersxl as the first row of the dataframe I still get the first row as numbers even if I do this:

df.to_excel(writer, sheet_name=sheetname, header=False, index=False, freeze_panes=(1,0) ) # Save to file.

Any idea where I’m going wrong here?

I’ve read the docs on xlsxwriter and pandas .to_excel about headers and I still don’t see what I’m doing wrong. Examples on the internet often don’t work either. Some of the code above is from internet examples so it’s possible it’s not correct, in the wrong order, etc.

Thank you.

EDIT: My data is row-based, that is, it is constructed row-by-row, as if I were writing one row at a time to a .CSV file or text file.

dg-pb · July 1, 2024, 6:59pm

One way that should work:

outdata = []
# Remove this one:
outdata.append(options.headersxl) # Column headers as a list.
# And:
df=pandas.DataFrame(outdata, columns=options.headersxl)

onePythonUser · July 1, 2024, 7:18pm

Hi,

the issue has to do with how you’re setting up your data for the dataframe.

The following example demonstrates how you are currently setting up your dataframe. Note that the very first cell (index = 0), is the header title list (highlighted by the print statement results).

import pandas as pd

class Options:  # simulate your class definition

        headersxl = ['Hdr1', 'hdr2', 'hdr3', 'hdr4']

outdata = []

outdata.append(Options.headersxl)
print(outdata)  # Demonstrates that column titles are 0th index value

# Simulates your data since it is being accessed via keys
data = {'one': 'first', 'two': 'second', 'three': 'third', 'four': 'fourth'}

for key, val in data.items():

        t = data[key]
        tlist = t.split(',')   # Convert one row to a list.
        outdata.append(tlist)  # Save one row

print(outdata)  # First index is the list with column titles

df = pd.DataFrame(data = outdata)
print(df)

In the following two examples, each header title is assigned data:

# Example 1
""" First row does not include column numbers when printed """
import pandas as pd

# Keys are used as column titles
data = {'col1': [1, 2],
        'col2': [3, 4],
        'col3': [5, 6]}

df = pd.DataFrame(data = data)
print(df)

# Example 2
""" Same as in example one but with added data per column """
import pandas as pd

data = {'col1': [[1, 2], ['a', 'b'], 'one'],
        'col2': [[3, 4], ['c', 'd'], 'string'],
        'col3': [[5, 6], ['e', 'f'], 'example']}

df = pd.DataFrame(data)
print(df)

So, what you have to do is essentially assign data to each column header and NOT append it as you would as in a regular list. From the examples provided, a potential solution is to assign data as in the following example using a dictionary instead of a list (can do this via a loop for simplicity but shown explicitly for demo purposes):

data['hdr1'] = data1  # where data can be of any form, lists, nested lists, etc.
data['hdr2'] = data2  # etc.
data['hdr3'] = data3  # etc.
data['hdr4'] = data4  # etc.

fyi, … convention states that first letter of classes should be capitalized.
Use intermediate print statements when debugging.

c-rob · July 2, 2024, 9:51am

That worked! I knew it was something simple I was missing. Thank you!