Tokenize identifies $ as an operator. Will it be in the future?

Hi there,

I found that in Python 3.12 and 3.13, tokenize identifies that $ as an operator.
Here is the test code:

#! python3
import platform
import tokenize
import io

print("Python {}".format(platform.python_version()),)

text = r"A$B"
## text = r"A\B"

fs = io.StringIO(text)
tokens = tokenize.generate_tokens(fs.readline)

for type, string, start, end, line in tokens:
    print(f"{start}--{end} {type:d}\t{string!r}")

Output:

Python 3.12.7
(1, 0)--(1, 1) 1	'A'
(1, 1)--(1, 2) 55	'$' <-- tokenize.OP (55)
(1, 2)--(1, 3) 1	'B'
(1, 3)--(1, 4) 4	''
(2, 0)--(2, 0) 0	''

In Python 3.11, the output is what I expected:

Python 3.11.9
(1, 0)--(1, 1) 1	'A'
(1, 1)--(1, 2) 60	'$' <-- tokenize.ERRORTOKEN(60)
(1, 2)--(1, 3) 1	'B'
(1, 3)--(1, 4) 4	''
(2, 0)--(2, 0) 0	''

Is this just a bug, or does it suggest something for the future?

According to the docs:

The following printing ASCII characters are not used in Python. Their occurrence outside string literals and comments is an unconditional error:

$ ? `

I tried your code with the other symbols, and I have the same result you had. Anyway, if you try to run A $ B, you get a SyntaxError, as stated in the docs.

(Side note: If you want a more readable display of the token type, you can use tokenizer.tok_name to translate them to strings - for example, tokenizer.tok_name[55] == 'OP')

This is interesting. The first place I’d go for this - since it’s a change between 3.11 and 3.12 - is the What’s New for 3.12. And the tokenizer module IS mentioned:

although the relevant details are further down:

What’s New In Python 3.12 — Python 3.13.0 documentation (scroll to the part about the tokenizer module). What you’re seeing is listed as a “minor behavioral change”.

It’s worth noting that the module itself carries a warning:

It’s not meant to be used on something that’s invalid. This is a bit surprising, a bit confusing. But, it seems the module doesn’t guarantee anything about how it behaves on illegal code.

2 Likes

Thank you very much for your advice and the links, @Rosuav and @Lucas_Malor.
I forgot to check “What’s new in Python 3.12”.
I understand that this is not a bug but an intended minor change and why the warning exists:

tokenize — Tokenizer for Python source — Python 3.12.7 documentation
… The behavior of the functions in this module is undefined when providing invalid Python code and it can change at any point.

1 Like

I believe 3.12 is when the new PEG parser was introduced. It operates differently from the previous one and some error detection seems to have been moved to a different stage of processing.

I noticed this when testing 3.12 preview releases and reported it:

As you can read from the issue, this was an expected side effect caused by parser changes.

3 Likes