Addendum for PEP 722 to use TOML

Any Python script may assign a variable named __pyproject__ to a multi-line double-quoted string containing a valid TOML document.

This regular expression may be used to parse the metadata:
(?ms)^__pyproject__ *= *"""$(.+?)^"""$

The following is an example of how to read the metadata on Python 3.11 or higher.

import re, tomllib

def read(script: str) -> dict | None:
   match = re.search(r'(?ms)^__pyproject__ *= *"""$(.+?)^"""$', script)
   return tomllib.loads(match.group(1)) if match else None

Not sure if this is relevant or not for the discussion, but the suggested regex does not seem to cover all strings that are allowed by the current state of the PEP. For example if I run the following file with python3.11 myfile.py:

# myfile.py

__pyproject__ = """
[project]
readme.text = \"""
Hello, this is my awesome single-file utility script.
Try running::

    pipx run myfile.py
\"""
dependencies = ['tomli; python_version < "3.11"']
"""

import re, sys
if sys.version_info >= (3, 11):
    import tomllib
else:
    import tomli as tomlib


def read(script: str) -> dict | None:
    match = re.search(r'(?ms)^__pyproject__ *= *"""$(.+?)^"""$', script)
    return tomllib.loads(match.group(1)) if match else None


if __name__ == '__main__':
    with open(__file__, 'r', encoding="utf-8") as fp:
        script = fp.read()
    match = re.search(r'(?ms)^__pyproject__ *= *"""$(.+?)^"""$', script)
    print("---\nmatch.group:\n", match.group(1), "\n---\n")
    print("---\npyproject:\n", tomllib.loads(__pyproject__), "\n---\n")
    print(read(script))

I can get a parsing error:

$ python3.11 myfile.py
---
match.group:

[project]
readme.text = \"""
Hello, this is my awesome single-file utility script.
Try running::

    pipx run myfile.py
\"""
dependencies = ['tomli; python_version < "3.11"']

---

---
pyproject:
 {'project': {'readme': {'text': 'Hello, this is my awesome single-file utility script.\nTry running::\n\n    pipx run myfile.py\n'}, 'dependencies': ['tomli; python_version < "3.11"']}}
---

Traceback (most recent call last):
  File "/tmp/myapp/myfile.py", line 32, in <module>
    print(read(script))
          ^^^^^^^^^^^^
  File "/tmp/myapp/myfile.py", line 23, in read
    return tomllib.loads(match.group(1)) if match else None
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/tomllib/_parser.py", line 102, in loads
    pos = key_value_rule(src, pos, out, header, parse_float)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/tomllib/_parser.py", line 326, in key_value_rule
    pos, key, value = parse_key_value_pair(src, pos, parse_float)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/tomllib/_parser.py", line 369, in parse_key_value_pair
    pos, value = parse_value(src, pos, parse_float)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/tomllib/_parser.py", line 649, in parse_value
    raise suffixed_err(src, pos, "Invalid value")
tomllib.TOMLDecodeError: Invalid value (at line 3, column 15)

As far as I understood, myfile.py seems to be following the specification in the current state of the PEP draft.
The situation might be related to the fact that for finding the string literal assigned to __pyproject__ variable tools need to have some level of understanding of Python syntax (at minimum, how multi-line double-quoted strings can be written/escaped).

It might be worth to modify the PEP to restrict the allowed syntax and therefore rule out such edge cases. Or at least make it more explicit if the current PEP text already covers that scenario (it might be the case the text already covers this edge case, and I am just doing a very bad job at interpreting it - but I guess other people might also have problems with that).

1 Like