Consistent token type int values across python versions

AlfredDU · September 18, 2025, 12:14pm

Dear my friends,

Python’s built-in token module assigns an integer value to each token type. However, these numeric values are not guaranteed to be stable across Python versions.

The following tables show the differences among active python versions:

|------------------|-------------|-------------|-------------|-------------|-------------|

| ENDMARKER | 0 | 0 | 0 | 0 | 0 |

| NAME | 1 | 1 | 1 | 1 | 1 |

| … | … | … | … | … | … |

| EXCLAMATION | | | 54 | 54 | 54 |

| OP | 54 | 54 | 55 | 55 | 55 |

| AWAIT | 55 | 55 | 56 | | |

| ASYNC | 56 | 56 | 57 | | |

| TYPE_IGNORE | 57 | 57 | 58 | 56 | 56 |

| TYPE_COMMENT | 58 | 58 | 59 | 57 | 57 |

| SOFT_KEYWORD | 59 | 59 | 60 | 58 | 58 |

| FSTRING_START | | | 61 | 59 | 59 |

| FSTRING_MIDDLE | | | 62 | 60 | 60 |

| FSTRING_END | | | 63 | 61 | 61 |

| TSTRING_START | | | | | 62 |

| TSTRING_MIDDLE | | | | | 63 |

| TSTRING_END | | | | | 64 |

| COMMENT | | | 64 | 62 | 65 |

| NL | | | 65 | 63 | 66 |

| ERRORTOKEN | 60 | 60 | 66 | 64 | 67 |

| N_TOKENS | 64 | 64 | 68 | 66 | 69 |

| NT_OFFSET | 256 | 256 | 256 | 256 | 256 |

This may lead to necessary extra works about token type value alignment for projects which involves tokenizing, for example, linters, formatters or code analyzers.

I propose that the token type values keep consistent in future python versions. Existed ones from ENDMARKER to N_TOKENS can stay the same as python 3.14; any token types to be removed (like ASYNC / AWAIT did) still occupied the int values; new token types can be append after existed ones instead of inserted before any one except special type NT_OFFSET.

I know that token types is used as enum internally within cpython source. Fixed value may be trival for cpython development, but, it can be helpful for external projects like grammar refactor, linter & formatters.

I have started a project which extracts the related codes of builtin `tokenize` module from incoming python 3.14 source, to be an standalone tokenizer that compatible with all active python versions (3.10 ~ 3.14rc) . The token types that generated are fixed as what in python 3.14 and keep consistent among versions.

I am looking forward to hear your opinions!

storchaka · September 18, 2025, 12:25pm

This is why there is the token module, so you can use more human-readable and stable names. If you need to export tokens, do it by name, not by integer value.

AlfredDU · September 18, 2025, 12:37pm

I am very grateful to get the advice form core developer!

Now I am working on a grammar-related project that use builtin tokenize module, which generate tokens with type that specific to version, especially missing token types (like TSTRING_Xxx that only available in incomming 3.14). I have to do much version check and token module query.

Besides, compare an int value is much efficient than import current version token modules. I think it is helpful indeed for crossing version library development involving tokenizer.