Pandas.read_sas with chunk size option: IndexError

axelle · January 28, 2020, 9:38am

Good morning,

I have read the rules about posting but i cannot attach a sample of my data or reproduce the entire error message as the data i am working on is located on a server without access to internet. I apologise for this inconvenience. I’ll try to reproduce most of what is requested however below.

I am working with very big sas files (data on each job, hence millions of lines) and got memory error when i was trying to simple read them (they open fine in R or stata strangely). Therefore i searched and find the pandas.read_sas option to work with chunks of the data. My code is now the following:

import pandas as pd
df_chunk = pd.read_sas(r'file.sas7bdat', chunksize=500)

for chunk in df_chunk:  
    chunk_list.append(chunk)

At this point i get the following error (I am reproducing it here manually as i cannot copy paste):

line 660, in _chunk_to_dataframe
if self.column_formats[j] in const.sas_date_formats:
IndexError: list index out of range

Looking deeper in the error message, the issue seems to be in the underlying function “_chunk_to_dataframe(self)” in the following line : if self.column_formats[j] in const.sas_date_formats

Many thanks for your help,
Axelle

mattip · January 28, 2020, 9:57am

You have reached a general python forum. You may have better luck reporting this pandas-specific issue on the pandas issue tracker https://github.com/pandas-dev/pandas/issues/new. Be sure to fill in all the fields there you can, especially regarding the version of pandas you are using.

axelle · January 28, 2020, 10:02am

thanks I’ll try there as well!

Topic		Replies	Views
Python to SAS Datasets from csv files and it had to check metadata for structure Python Help	0	835	April 16, 2020
Python Value Error Python Help	5	483	January 25, 2024
Strange dataframes behaviour- is there something wrong with the code? Python Help	0	237	February 23, 2023
Comparing 2 dataframe. get rid of word timestamp and limit output to 2 decimal points Python Help help	0	305	December 21, 2022
Define Date Range in an API's response Python Help help	8	1752	November 17, 2021

Pandas.read_sas with chunk size option: IndexError

Related Topics