PEP 829: Structured Startup Configuration Files

Coverage.py uses .pth files today to ensure subprocesses are fully measured, by importing and starting coverage before the subprocess starts Python. I think that will still be possible with the proposals flying by here, though it’s easy to miss details in the volume of replies.

My use of .pth files for “before first line execution” could also be completely replaced with an environment variable specifying a module to import before first line. Then I wouldn’t need to ship a suspicious .pth file at all.

7 Likes

I agree with @steve.dower that “define a new file format” vs “constraining the existing file format” is the design decision that needs the strongest justification here.

As long as there is a way to run code on startup, any proposal for complex startup logic doesn’t need to live in the core interpreter - it can live in a third party module that uses the core support to bootstrap itself, and then other mechanisms (such as entry points) to find further startup code. No such practice has ever become widespread, despite it long being possible, so demand for it is presumably low.

Even the TOML based processing proposed in this PEP could be implemented that way.

While being able to specify a callable reference rather than relying entirely on import time side effects is a genuine advantage of the PEP’s proposal over merely disallowing semi-colons in .pth file import lines, even that downside of the latter approach could be mitigated by finding a different way to spell import X; X.Y() (such as accepting from X import Y in .pth files with the implication that Y will be called)

2 Likes

When Python is built in debug mode, there are python -X presite=module_name and PYTHON_PRESITE=module_name env var which import module_name even before the site module is imported. It is used by python -m test --coverage to set up a “minimal hook for gathering line coverage of the standard library”: test.cov module which uses sys.monitoring. @ambv wrote that in Python 3.13.

Do you need something similar? If we add a new option, I would prefer using an entry point syntax (module:function), rather than only specifying a module name.

There is also PYTHONSTARTUP, but it’s less useful for your use case since it’s executed after importing the site module.

UPDATE: I created PR gh-148015 to add module_name:function_name format support to -X presite and PYTHON_PRESITE.

5 Likes

Yes, you’ll absolutely be covered. Coverage.py is about the only example we’re using of a “good” installable case (there are others, but they’re not better than Coverage.py).

Where the discussion is at is whether you need import <some package> or specifically import sys; arbitrary_code_here, and it sounds like you can work with the former (which is my contention).

Yeah, or we could keep parsing import X; X.Y() from .pth files? It could even be extended to X.Y("string literal"), which greatly increases the viability of things like import editables; editables.activate("my_package")[1] and thereby avoiding having to generate specific importable modules each time.

As you say, “we need a new file format” is the part that needs strong justification.


  1. Entirely made up names here! ↩︎

4 Likes

Definitely just an import of my own module will be fine: whatever could have been in “arbitrary code here” can instead be in “coverage/before_first_line.py”. If it’s a “module:function” syntax, then I move the code into a function, no problem.

3 Likes

A constrained form of the existing syntax rather than new syntax also has the virtue of having an easier migration path (since existing Python versions will already support it).

The way I would suggest phrasing it would be:

  • executable lines consist of an import statement, optionally followed by a semi-colon separated dotted attribute lookup and function call
  • argument values passed to the function call must be literals that can evaluated with ast.literal_eval

Structured enough that a regex could pick out the details for auditing purposes (what gets imported, what gets called, what arguments are passed), while the actual parsing and execution in Python implementations could be AST filtering and modification based.

Beyond that, adopt the same multi-pass processing for .pth files as the PEP proposes for its TOML files.

3 Likes

Thanks for the link. The pth file written by editables is in a safer security category in my mind, because it’s tightly controlled by known-good code. It isn’t arbitrary.

Because it will break things. I think PEP 829 gives us a workable migration story.

That’s a good point, and I will definitely add that to Security Implications. I also think that installers probably should refuse to install any site.toml that doesn’t also match a pth file in the same package.

Sure, but many of us have been trying to do this for years and it’s never happened. I just don’t think you can get there from here, because of the backward compatibility breaks. I’m not convinced it’s possible to change the pth format or semantics, which is why I think a new file format is required.

3 Likes

I don’t think the people who currently use .pth files will be convinced to migrate to PEP 829 until they are forced to. So the point where we finally remove .pth support for PEP 829 will break things just as much, and probably even more (as a significant portion of current .pth usage is safe, and doesn’t need to migrate).

Whereas deprecating unsafe usage of .pth files, by issuing a warning when encountering an import line that doesn’t match the simple import some.name form, is visible to the user, targets only the projects with a real need to change, and can be fixed in a way that is portable to both old and new Python versions.

My understanding is that people have been trying to remove all code execution from .pth files. I agree, that’s an incredibly hard sell, and I don’t think it’s likely to happen. But I’m not proposing to do that (and neither is PEP 829) - I’m just proposing to add exactly the same requirement as PEP 829, that code executed on startup is in a separate file.

3 Likes

:partying_face:

I’m certain that pth files predate editable installs. I can’t remember exactly why they were added, but it was probably some Zope thing ages ago! I believe @nedbat described how the executable feature is used in coverage. @vstinner described some uses he found on his Fedora system, and while it’s highly likely the -nspkg.pth files could be replaced by PEP 420 namespaces, the others are still existing use cases.

Despite its problems, people still complain about backward compatibility breaks whenever we’ve proposed just getting rid of them. I think at this point, it’s a thing we have to live with, but we can do better. Thus my suggestions for future work to split pre-execution control from -S and some kind of policy mechanism to further control what gets run (or doesn’t).

It seems based on other comments in this thread, we can’t without breaking things. I’m trying to stay out of the packaging tools way and just focus on what we can do for interpreter startup.

Can you explain your reasoning? The PEP explains why I made this choice.

I was trying to give an explicit form of the implicit relative directory case, and to provide an extension hook for the future. I’m not super committed to this aspect though, nor to the {placeholder} syntax itself. I considered $-strings too, but their popularity outside of i18n isn’t very strong and you’d still need to handle escaping (although maybe that’d be much less common).

Let me think about that one. The PEP wouldn’t lose much without it, and it could be a natural extension some time in the future if we find it useful.

For sure, and installers definitely could do that. I’m trying to stay out of their way though, so only defining the behavior of the interpreter. There would still be the case of a broken hand-written site.toml file so I think it’s still important to define how the interpreter will handle it.

Because I don’t want to have to fight that fight, right now? :winking_face_with_tongue:

I think it will be a long deprecation process either way. I’d love to hear what @brettcannon thinks about deprecating pths in this PEP rather than a future one.

Seems like we have a path toward making tomllib faster, but if that doesn’t pan out, this could be a reasonable way forward.

2 Likes

In my reference implementation, it’s str.replace() during the processing phase, and if that fails then sys.path is simply not extended.

Although, we’re talking about the site-packages directory here. Not that I’m opposed to some bikeshedding, or the suggested name, but I think it’s probably unlikely to collide accidentally.

Yes.

I think I’m just going to relax this whole constraint on the prefix to the .site.toml suffix. As has been pointed out in previous messages in this thread, tying that part to the package name is unnecessary and does break some use cases, so I think this part of the first draft is just going to get deleted.

(I plan to change all occurrences of <package>.site.toml to <name>.site.toml so there’s no direct connection to a package name.)

Maybe. I’m not sure it’s better than just doing the rpartition(':') and importlib.import_module() + getattr(), which is what the reference implementation does. But we can hash all that out in the PR!

The PEP already does, and the next draft will be more specific about version compatibility.

:+1:

Yes.

1 Like

I look forward to reading the competing PEP! 829 won’t go down this pth[1], so I’m going to leave any discussion along those lines alone.


  1. pun intended! ↩︎

2 Likes

I think so too. When I get some time, I plan on testing this exact scenario.

I think I’ve been clear in the PEP and in my comments that I don’t think it’s possible to do this in a backward compatible way. My proof is that over all the years we’ve been talking about “fixing” pth files, we’ve never done it. I’m happy to be proven wrong, and if you think the PEP language itself needs further justification about that, please let me know (after the next draft is posted).

I’m not so sure. To be clear (and I know you know this), pth file functionality is implemented by site.py and there aren’t currently hooks to say “run this pre-first-line code before any other pre-first-line code” which is what I think a third party module would have to do.

Even if we promote -Xpresite/PYTHON_PRESITE to non-debug mode builds and add the entry point syntax, it would still require a different invocation of the Python executable to invoke, which I think will generally not happen. I’m not seeing the way to hook all of this up with existing invocations of Python.

If you adopt a constrained form of pth processing, how do you not break existing pth files that use the more relaxed form? We can’t know all the ways pth files are used.

With a deprecation period, presumably?

Is this proposal intended to someday lead to a deprecation of pth files? If not, can it hope to fix the problems they present?

3 Likes

It’s not illegal to break existing .pth files, as long as there’s a reasonable deprecation period. Especially if (as in this case) there’s a simple workaround for affected users (move the code from the .pth file that’s being executed into a file that you then import from the .pth file).

Without a deprecation period, I can see no reason why people would choose to move from .pth files to PEP 829 files. There’s no benefit to them in doing so. And even with a deprecation period, I see no reason why they wouldn’t wait until the last minute, when .pth files finally get removed. Because the proposed way of working during the transition period involves creating both files (assuming you still want to support pre-PEP 829 versions of Python) which means duplicating the data.

6 Likes

My Ubuntu has 2 pth files in /usr/lib/python3/dist-packages, and does not have /usr/lib/python3.13/site-packages/. Both should be similar, yet you named the file <package>.site.toml, implying “dist” is excluded. Should we rename this to <package>.pth.toml for a more neutral name that supports both?

That’s a Debuntu-ism, so I don’t think so.

1 Like

I think this is far from being proven. Consider the following approach, which fully conforms to Python’s backward compatibility policy (note that I’m not trying to preserve support for every existing .pth file - I don’t think that’s a reasonable constraint, and it’s not what the compatibility policy requires).

  1. Deprecate the use of anything other than a single module name on an import line in a .pth file. We can produce a warning when such lines are encountered, and the migration advice is simple - take the content of the import line, put it in an importable module in your package, and replace the line with import <your new module>.
  2. Complete the deprecation by rejecting import lines that don’t follow the new rules.
  3. Redefine the semantics of .pth files to work as if the “normal” lines are what is contained in the paths.dirs list in the .site.toml spec, and the “import” lines are handled like the entrypoints section - but with the action being to import the given module rather than execute the given entry point.

The deprecation in step 1 can be extended as long as we like, but I see no reason why it should be any longer than a normal deprecation, given that the fix is so simple.

Unless there’s a flaw in PEP 829 (see below), I believe step 3 should result in the same behaviour as the current method of executing .pth files. But it allows the future policy model, fine-grained control, and error reporting improvements that PEP 829 offers, without needing a new file format.

I don’t see any significant benefit in using an entry point rather than an importable module - if anything, it has the disadvantage of allowing startup code to be shipped in the same file as normal runtime code, whereas using an importable module makes it clear that everything in that file is startup time code.

There’s no “built in” extensibility, unlike TOML, but frankly, do we even want to extend the functionality here? All of the discussions have been around reducing functionality, so an extensible format seems like an unnecessary luxury. And in any case, if we wanted to extend the .pth format, we could always add new “start of line” keywords. It’s a bit clumsy, but IMO it’s good enough for something that I believe we won’t need in any case :wink:

It’s possible that the switch from processing .pth files in turn, line by line, to a batch process of collecting all files, processing “path extension” lines first and then processing “import” lines could change behaviour. But I can’t come up with an example where this would happen that isn’t clearly artificial. If anyone can find such a case, please describe it, as it’s probably also a case where PEP 829 won’t act as a valid replacement for .pth files.

The one downside of this plan is that people can’t opt into the new semantics early. But given how little benefit they gain for themselves by doing so, I don’t think this is a major issue. I know, for example, that I will probably just ignore PEP 829 for all of my projects until forced to adopt it when CPython actually removes support for .pth files altogether.

Thoughts? I’d rather not have to write this up as a competing PEP - it’s a lot of work just to resolve a difference of opinion on the file format, and it forces the SC to judge between two almost-identical PEPs, something I’m sure they’d prefer not to do, if only because they would be bound to disappoint one group of people in that case. But I’m not sure I can support PEP 829 in its current form.

6 Likes

I don’t agree, but hey that’s okay.

To counter that, modules that execute code as a side-effect of importing can cause problems in themselves, most notably if they get imported accidentally in contexts that you weren’t expecting. I think it’s perfectly clear and preferrable to have something like mypkg.startup:init as an entry point.

Yes, no, maybe? My crystal ball isn’t working today :sweat_smile:

As I’ve asked before, how would you handle a pth-file that is supposed to be compatible with multiple versions of Python, some of which understand this new extension line and some that don’t? PEP 829 handles this forward-compatibility explicitly.

I’ve talked with @brettcannon and we agree that the next draft of PEP 829 will propose a 5-year deprecation period for pth files.

I think the semantics of this are complicated and nuanced enough that a competing PEP will be necessary. Don’t worry about the SC! It’s happened before, and to be explicit, I will of course abstain if these PEPs come before the SC while I’m on it.

2 Likes