How to convert a CSV file to PARQUET without RLE_DICTIONARY encoding error message?

rcmv · September 5, 2022, 9:01am

Hi!
I’ve already test three ways of converting a csv file to a parquet file. You can find them below. All the three created the parquet file. I’ve tried to view the contents of the parquet file using “APACHE PARQUET VIEWER” on Windows and I always got the following error message:

“encoding RLE_DICTIONARY is not supported”

Is there any way to avoid this? Maybe a way to use another type of encoding?.. Below the code:

1º Using pandas:

import pandas as pd
df = pd.read_csv("filename.csv")
df.to_parquet("filename.parquet")

2º Using pyarrow:

from pyarrow import csv, parquet
table = csv.read_csv("filename.csv")
parquet.write_table(table, "filename.parquet")

3º Using dask:

from dask.dataframe import read_csv
dask_df = read_csv("filename.csv", dtype={'column_xpto': 'float64'})
dask_df.to_parquet("filename.parquet")

vbrozik · September 5, 2022, 1:04pm

After short googling:

With pyarrow would not use_dictionary=False or compression='none' parameter help?

https://arrow.apache.org/docs/python/parquet.html#compression-encoding-and-file-compatibility