Pygame2.1 Python3.9 Windows 10 - Can't display non-ASCII characters correctly

Hello,
I’m developing a game with Pygame2.1 and Python3.9. I’m working on the Windows distribution and I have a problem with the displaying of non-ASCII characters in the window.
I saw a lot of questions similar to mine around on other forums but none of the responses I found has worked for me…

I have strings with characters like "ù à é è ä" etc.
I create a font object with pygame with

font = pygame.font.Font("myfontfile.ttf", size)
# or with sysfont, which doesn't change anything to this problem
font = pygame.font.SysFont("arial", size)

The I create the text renring surface with

surf = font.render("some string with àéèù...", antialiasing, color)
# With a SysFont, this would give a result like with weird characters in place of the àéè etc, or just holes if I use my custom ttf file.

I have tried things like give my text arguments as bytes, or rencode it and re decode it …:

text = "some string with àéèù..."

surf = font.render(text.encode(), *args) # doesn't work, even worst
# or
surf = font.render(text.encode(sys.stdout.encoding,'replace'), *args) # doesn't work either
# or
surf = font.render(text.encode(sys.stdout.encoding,'replace').decode(), *args) # doesn't work either
# or
surf = font.render(text.encode(sys.stdout.encoding,'replace').decode("utf-8"), *args) # doesn't work either
# or
surf = font.render(bytes(text, "utf-8"), *args) # doesn't work either

I tried pretty much al possible combinations of the above tricks, all with the same result (except that string given as bytes displays the special characters with more weird characters, like with more bytes than decoded as str)

On Linux everything works fine, no matter what font I use.
On Windows it never works.

What am I missing ?

Thank you very much for any help!

Odds are it’s trying to treat those strings as
CP-1252/ISO-8859-1/Latin1 rather than UTF-8. Python on Windows
assumes a different default text encoding than on POSIX platforms
like GNU/Linux or Darwin/MacOS. Starting with Python 3.7, a new -X utf8 command-line option is available, as well as a PYTHONUTF8
environment variable you can use to toggle default UTF-8 encoding
for all platforms including Windows[*].

Those are quick workarounds you should be able to try in order to
confirm the problem, but ideally you would make sure in your
code that all string representations are encoded/decoded
consistently when being read and written or transmitted, regardless
of platform. You seem to be trying to do that already, but I’m
neither a Windows user nor familiar with PyGame, so I’m afraid I
can’t offer much help with that part.

If you’re interested in the details, the sordid history of PEP
579[**] has a lot of them. There was a push to make UTF-8 mode the
default, but eventually the PEP was revised to simply add an
optional warning if encodings aren’t specified and a system default
is relied on. The hope is that this will be a stepping stone to one
day have Python assume UTF-8 everywhere, even on Windows, but
getting there without adverse impact to existing software will take
time.

[*] hxxps://docs.python[dot]org/3/using/windows.html#utf-8-mode
[**] hxxps://www.python[dot]org/dev/peps/pep-0597/

[Sorry for the defanged URLs above, for some reason Discourse says
I’m not allowed to post links to Python’s own documentation!]

1 Like

According to the source of the render() method, a text string gets encoded as UTF-8 internally. In turn, it should call an SDL TTF function that’s implemented for UTF-8, such as TTF_RenderUTF8_Solid(). The SDL TTF implementation apparently looks up font glyphs by Unicode ordinal.

The following minimal example with Latin and Greek letters works for me in Windows 11 with Python 3.9.8 and Pygame 2.1.2:

import sys
import pygame
 
pygame.init()
screen = pygame.display.set_mode((400, 400))
clock = pygame.time.Clock()
 
text_font = pygame.font.SysFont('arial', 15)
text = text_font.render("àéèù φχψω", True, (0,0,0))

while True:
    for event in pygame.event.get():
        if event.type == pygame.QUIT:
            pygame.quit()
            sys.exit()

    screen.fill((255, 255, 255))
    screen.blit(text, (40, 40))

    pygame.display.flip()
    clock.tick(60)
1 Like

Thank you for those answers !
I tried the os.environ[“PYTHONUTF8”]=“1” without success .

@eryksun , your example also works for me so I quite don’t understand what is different in my implementation, it is pretty much the same thing… I’ll try to debug and find what breaks my string into some non-utf8 characters, I’m surely doing something weird somewhere but I can’t see what…

Thanks for your help !

Turns out that the problem was that I loaded my strings from a translation file, but I did not open the file with a specified encoding argument. On Linux the default seems to be utf8 but not on Windows.
So I solved my problem simply by adding the encoding argument in :

translations = json.load(open("my_trads.json", encoding="utf-8"))

It’s always better to open with an explicit encoding when the file encoding is known. That said, this problem would have been resolved by enabling UTF-8 mode. The PYTHONUTF8 environment variable has to be defined before Python starts. You can also use the -X utf8 command-line option.

1 Like