I am a beginner on python.I was trying to read pdf tables using read_pdf function on tabula.
I have read the data and written to csv file using read_pdf “dataframe.to_csv” function
My pdf has nested columns(multiple columns under a single column).For those the values are not placing properly.
My requirement is read the data against each column and save this into corresponding columns in my database table
My plan was to conert to csv and then read from csv to data array.
How can I get the nested column values properly on data array using tabula?
I here by attaching the pdf structure
.The columns in red marked area not reading properly
Here is the code I have used
import tabula
import pandas as pd
infile = "demo.pdf"
df_data = tabula.read_pdf(infile,
pages = "1",
multiple_tables = False,
lattice=True,
#pandas_options={'skiprows':1}
#pandas_options={'skiprows':1}
#pandas_options={'header': [0,1]}
)[0]
df_data.to_csv("filename_2.csv")