Table extraction using Tabula

While reading a pdf file using
“df = tabula.read_pdf(pdf_file, pages=‘all’)” —> displays all tables from all pages.

but when converting into a Pandas dataframe using

tables = pd.DataFrame(tabula.read_pdf(pdf_file, pages = ‘all’, lattice = ‘True’)[0])—> display only the table on the first page.

  1. [0] indicates the first element in a sequence. Your code deliberately seeks out only the first available dataframe.

  2. Tabula already returns a list of dataframes. You can join them with pd.concat, or with more advanced logic.