FPDF encoding problem

jam75 · December 13, 2023, 9:28pm

Hi,
in the following program

#! python3
# coding: utf-8

# Python program to convert
# text file to pdf file
 
 
from fpdf import FPDF

    
racine = '/media/jam/HDDW10/'
dir_init = racine + 'Vidéos/'
nom = 'f1'
nom = 'liste1-8'
  
# save FPDF() class into 
# a variable pdf
pdf = FPDF()   
  
# Add a page
pdf.add_page()
  
# set style and size of font 
# that you want in the pdf
pdf.set_font("Arial", size = 15)
 
# open the text file in read mode
f = open(dir_init + nom, "r")
 
# insert the texts in pdf
for x in f:
    pdf.cell(200, 10, txt = x, ln = 1, align = 'L')
  
# save the pdf with name .pdf
pdf.output(dir_init + nom + ".pdf")

I get the following error on pdf.output(dir_init + nom + “.pdf”)

Traceback (most recent call last):
  File "/media/jam/HDDW10/python3/prog/fichiers/test.py", line 35, in <module>
    pdf.output(dir_init + nom + ".pdf") 
  File "/home/jam/.local/lib/python3.10/site-packages/fpdf/fpdf.py", line 1065, in output
    self.close()
  File "/home/jam/.local/lib/python3.10/site-packages/fpdf/fpdf.py", line 246, in close
    self._enddoc()
  File "/home/jam/.local/lib/python3.10/site-packages/fpdf/fpdf.py", line 1636, in _enddoc
    self._putpages()
  File "/home/jam/.local/lib/python3.10/site-packages/fpdf/fpdf.py", line 1170, in _putpages
    p = self.pages[n].encode("latin1") if PY3K else self.pages[n] 
UnicodeEncodeError: 'latin-1' codec can't encode character '\u2019' in position 685: ordinal not in range(256)

Thanks for your help

kknechtel · December 13, 2023, 10:01pm

The PDF library that you are trying to use is very old. The code on GitHub has not been updated since 2017 and there has not been an official release since 2015. The documentation says that it only supports Python up until version 3.4.

The error message tells us that the library tried to use the Latin-1 encoding for some text. Files - whether plain text, PDF or anything else - cannot actually contain text; they only contain raw data - an encoding tells the rules for understanding the data as text.

Your file contained a “smart quote” ’, which is not a character that the “Latin-1” text encoding can represent. It looks like you need to pass uni=True when you do pdf.set_font, to tell the library that it is a “unicode font” (this is not a real distinction, but a lot of programmers are imprecise when they talk about text handling because they do not properly understand it). Then it will use the UTF-8 encoding instead, which is fully general (it can represent every Unicode character, by design).

But you really should consider looking for a more modern PDF library.

jam75 · December 14, 2023, 9:42am

Thanks for the answer.
I tried conversions (UTF-8, …) without success.
It was my first test with pdf under python.
After this problem, I considered other solutions, but without deciding, up to now at least.
Best regards

jam75 · December 14, 2023, 9:45am

… I tried “uni=True” in set_font, but “uni” is not recognized as being valid…
Best regards

kknechtel · December 14, 2023, 10:18am

In add_font, sorry. I misread; but please actually check the documentation link I gave there in order to understand it properly.

jam75 · December 14, 2023, 11:50am

Thanks a lot.
I must recognize I use to write programs without reading the related documentation.
I start from an example (and FPDF was the only one I found with an example of clear conversion from txt to pdf) .
The related program is such an example.
Of course I have other solutions to solve the conversion (launch a CLI utility for example), try another library, …) .
I will read your recommended document, but it seems you think there are more uptodate libraries.
Best regards