When reading Spreadsheet, how to KEEP leading zeros

c-rob · August 20, 2024, 1:45pm

I have Python 3.12 on Windows 10.

Problem

I have to read an Excel spreadsheet, XLSX format, with 1000s of lines. One column has a shipping tracking number which I want to read and keep any leading zeros. But when I read the tracking number with Pandas, Pandas is stripping off the leading zeroes.

Ex: Data in Excel shows “001234583833” but what pandas reads in Python is “1234583833”. This tracking number will be part of selection criteria when I search for it in a database so it must be accurate.

How I read the spreadsheet.

    try: 
        df = pandas.read_excel(filename, index_col=None) # Read all columns to help with debugging.
    except Exception as e:
        writeerr(f"{procname} ERROR reading Excel file. {e}")

How I read each row.

    for index, rowdata in df.iterrows():
        rowlen = len(rowdata)
        tracknum = df.iloc[index,options.ssfedextracknumcol]
        fedexcharge = df.iloc[index,options.ssfedexchargecol]
        job = df.iloc[index,options.ssfedexorigref2col] # Orig ref 2 from SS.

What I tried

In Excel I tried selecting the whole column (except for the header) and formatting it as “Text”. That didn’t fix my problem.
The length of the tracking number ight vary, so reformatting the data with leading zeros after it’s read in Python may not work on every record.
Doing a Google search for web pages shows how to ADD leading zeros but not keep them when I read them.

Do you have any ideas how to do this?

Thank you!

franklinvp · August 20, 2024, 1:52pm

See the argument converters, to specify reading the specific column as a string.

JamesParrott · August 20, 2024, 1:52pm

Have you tried setting dtype={'X' : object} for column X ?

https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html#pandas-read-excel

c-rob · August 20, 2024, 3:27pm

I got it working. Thanks. Here’s the line I used:

df = pandas.read_excel(filename, index_col=None, dtype={options.ssfedextracknumcolname: object})

dtype is a dictionary of keys and values.