I don’t think there’s any way you can convince me that toml (or anything else that isn’t effectively a minimal for-purpose format) makes sense to have exist as part of interpreter startup.
I don’t really think that making it “more human readable” should be a goal here. I would think the goal would be constraining to the simplest set of behaviors that have an acceptable level of breaking, and that we really want this to be written and parsed in a uniform manner.
Some of the suggestions about security auding via adhoc regex (or that being a capable tool, not neccesarily a suggestion) for the path forward that looks more like iteration on .pth files than replacing are also disappointing to me: We should be able to provide a canonical function that can parse into a “more easily read” form of this, as well as the reverse, used for writing whatever might replace .pth files if this is something actually to be done.
I’m pretty neutral on this in general. I don’t really mess with pth files and I’m vaguely in favor of standardizing config on a single file format, so a toml replacement appeals a little bit. But I don’t really understand the argument being made so far.
Are you just saying you don’t prefer this proposal overall, or that you don’t agree that Paul’s proposal is backwards compatible?
I don’t understand the difference between a side-effect caused by importing a pth-specified module and a side-effect caused by running a site.toml-specified entry point (which will also import a module). Both methods cause side effects if the module does something unexpected, but they both explicitly show the reader where to look.
Is it safe to assume that “first line” means the first line of the file or module under test? <frozen site> and <frozen importlib> run before any .pth files, and other .pth files might run before yours.
Coverage.py recommends .pth to run import coverage; coverage.process_startup(). Two ways to run it are documented, and each can be replaced by a wrapper script:
coverage run something.py → __import__("coverage").process_startup(); __import__("runpy").run_path("something.py", globals(), "__main__")
coverage run -m amodule → __import__("coverage").process_startup(); __import__("runpy").run_module("amodule", globals(), "__main__")
sys.argv[0] can be added if necessary
Instead of shipping a .pth file, you would ship e.g. site-packages/coverage/run_{path,module}.py. Then read from sys.argv or os.environ to unhardcode the examples.
I’m unconvinced that you can evolve the pth file format in a way that guarantees both backward and forward compatibility without breaking existing code. I also don’t think that there are some use cases for pth files we’re okay with breaking (e.g. constraining the pth format in the future).
Sure, importing the module with the entry point function could have import-time side-effects. That’s just the way Python works. But it’s a bad idea (IMHO) in general and a bad idea to have startup code rely on that particular functionality.
I don’t think that we can make things better without breaking some things. That way lies JavaScript. I agree with @pf_moore that adding this without deprecating the old way will just mean that most people won’t use the new way. In particular, people with complicated .pth files won’t bother to figure out how to move away from them. Some people with simple .pth files may switch, but the gain in things like readability from that will be small if their .pth files were simple anyway.
To go a bit further, as I’ve said in earlier discussions, I lean towards the idea that it’s better to aim for evolutionary steps with higher costs and correspondingly higher benefits rather than small changes that add a little of this and a little of that without clearing out the cruft on which they are trying to improve. The latter adds to the cognitive load of what people have to understand to be conversant with Python code in the wild, which isn’t worth it for only a small benefit in a subset of cases.
Basically I think if we want to fix things we should fix them, not just add alternatives. (And if we think that is too much, then better to do nothing.)
No, the .pth file is to start coverage measurement for subprocesses created by the code under test. I don’t have control over how those subprocesses are created. The .pth file gives me a way to start coverage before the target code runs.
I’m struggling more and more to tell what problem we’re even trying to solve is.
This PEP proposes to replace .pth with TOML, the former being in a format so trivial that you could read it with just:
for (code, comment, path) in re.findall("^(import.+)|(#.*)|(.+)", pth.read_text()):
[1] – that’s less code than trying to validate tomllib.loads() output without a schema validator[2][3]
TOML will give us the extensibility that nobody needed. But hypothetically, it could be used for security motivated disabling and path restrictions which ultimately won’t provide any real security. It could also be used for non security motivated filtering but, to my knowledge there have been no unintentionally disruptive .pths out there[4] so uhmm… YAGNI.[5]
Ultimately, the only material change here that isn’t just a reshuffling of what we already have is to move the code contents of .pth files into a regular module/function.
Paul’s alternative proposol essentially enforces just that move of code albeit with the added restriction of relying on imports with side effects.
So is the goal here just to encourage full scripts out of .pth files in favour of .pth files that simply load the same functionality from regular modules? Honestly, I don’t see the benefits of that either? If the answer is because “people may not know to audit .pth files for malware” then I don’t buy that one – all you have to do to fool that mindset is mv malware.py malware.png then have another module runpy.run_path() the PNG. All files deserve equal scrutiny – no matter their suffix.
That’s not much consolidation for anyone whose workflow involves pip install -e . or has coverage in their environment.
At least assuming we’re aiming a little higher than the lame TypeError: string indices must be integers, not 'str' diagnostics that come from trying to use a generic config file without schema validation or some aggressive ad-hoc validation. ↩︎
Generic config files actually raise the surface for errors. ↩︎
coverage’s for example very politely no-ops itself as cheaply as possible when it knows it’s not needed ↩︎
Even if a package did do something you didn’t like in a startup entrypoint, would you really just shrug and say “no problem, I’ll just use -X disable-entrypoint=xyz” rather than either getting it changed or ditching the dependency? ↩︎
I guess folks have mostly been unhappy with the current implementations being kind of an “hack” over the initial intended purpose.
Yeah, the new format can be used for other things, but without any concrete plans, I feel like it leans a bit into being a difficult bargain to justify. I think it’s a much easier sell coupled with the other changes that are limited by the current format. I don’t think there’s any rush to make this change until the proposal actually provides functional improvements over the current system. These are just my 2 cents, though.
Other thing I think may be worth to point out is that we are only planning to replace .pth files, and not ._pth. It introduces yet another file format, and with it complexity, without actually getting rid of the old undesirable format. I don’t think this has any particularly strong weight, just thought it may be worth also taking into consideration.
If the argument is that it’s feasible to eventually remove .pth files entirely, why wouldn’t we be OK with instead narrowing them to a subset of their current syntax?
Consider the coverage startup invocation:
import coverage; coverage.process_startup()
Continuing to spell it that way is necessarily going to offer an easier migration path than switching to a different format stored in a different file.
More generally, with the “constrained syntax” approach, the only .pth file usage that would need to change is that which didn’t abide by the imposed syntactic restrictions. If the file format itself changes, every usage needs to change, even the ones we consider reasonable.
Oh, I thought I responded with an update, but maybe not. I talked it over with @brettcannon and he’s good with a 5-year deprecation of .pth files, so that’s what the next draft of 829 will propose.
Maybe I’m missing something, but you can’t just narrow the current syntax without breaking code. I know some people have argued that it’s okay to break those other use cases, but I don’t believe that is acceptable with Python’s backward compatibility guarantees. It doesn’t matter whether any projects (you know of) aren’t using those wider features. We all know that if it’s possible, someone is using it, and that someone’s code will break.
So, if you want to propose narrowing pth file acceptable syntax, then I think you need a transition period, where the wider syntax is still accepted, you emit warnings, and then eventually remove that old syntax.
With 829 + advertised pth deprecation, existing usages can take the next 5 years to switch over to site.toml file uses, continue to ship pth files until the minimum supported version is 3.15 (let’s say) and then stop shipping them. Nothing breaks, and there’s a clean migration story.
OK, so you are willing to deprecate .pth files with complete removal in 5 years in favour of PEP 829, but you don’t consider it acceptable to deprecate some problematic constructs in .pth files for 5 years before removing just those constructs?
Fair enough, but I honestly don’t see how you reconcile those two positions.