Add folder for python customizations: __sitecustomize__

Following the conversations in https://bugs.python.org/issue33944, wanted to discuss the support for a sitecustomize folder that will hold python scripts to be executed at startup. Similar to sitecustomize.py but allowing different stakeholders of a Python installation to add themselves.

This is basically a “supported way” of the current abuse of pth files to add startup code.

How will this be useful?

  • Support administrators to add multiple “sitecustomize.py” files. As an example, today we are basically appending to sitecustomize when we need some additional behaviour.
  • Support for library owners that need some startup customization like betterexceptions.
  • Tools that include an interpreter like virtualenv or things like PyOxidizer by allowing them to customize the interpreter they expose to users.

It basically offers a better alternative to the currently abused feature of code execution in pth.

  • Support for library owners that need some startup customization like betterexceptions.

Having a structured __sitecustomize__ directory would allow all 3rd parties which benefit from pth hacks to have a way of offering client code a proper customization, so I’m +1 on this.

This would add considerable dir listing times to Python’s startup.

Better would be do have an explicit list of such files in an env
var and a smart caching mechanism, so that stat calls are not
(always) necessary.

Hi Mario,

Can you explain how this proposal will avoid the problems with pth
files? If all you are doing is just changing the name from “spam.pth” to
sitecustomize/spam.py” I don’t see how that’s going to fix the
problems with pth files.

Maybe I’m missing something obvious (wouldn’t be the first time) but you
should explain in detail what makes this a better alternative than pth
files, not just state that it is better.

You say that this will be useful, but you don’t say how this will
provide the features you say it will. For example, your second item:

“”"
Support for library owners that need some startup customization like
betterexceptions.
“”"

How does this work? I’m the owner of the library “spam.py”, and I need
to customize startup to import “eggs.py”. What do I do? How do I avoid
clashing with other libraries? What makes this different from, and
better than, sitecustomize?

What are the specs here? When do the files under sitecustomize run?
Do they run instead of, before, after site.py and sitecustomize.py? How
do we disable this if needed? Where can I place my sitecustomize?

By the way, you will probably get much more feedback on the
Python-Ideas mailing list rather than here on Discuss. (Both positive
and negative.)

Indeed! The thing with pth files is that AFAIK they are not really thought to include there any code. Their main intention it to add additional paths. If you want to inject any customization code, you basically need to hack things around by creating a pth as:

import time;print("Code goes here");print("If you want multiple lines, use ;")

You cannot just create a “proper” file to be executed. Additionally, this makes debugging things harder, as you need to go through all pth files (most of them being legit) and inspect if any pth is actualyl executing any code. The proposal of __sitecustomize__ allows you to “move those pth files there” but also to write them in a sensible manner. As you’d write any other Python script.

Maybe I’m missing something obvious

You are not, I have not explained this properly as I came with previous context from the issue. Apologies about that.

How does this work? I’m the owner of the library “spam.py”, and I need
to customize startup to import “eggs.py”. What do I do?

They will be able to just install a custom script into __sitecustomize__ rather than using this pth file.

How do I avoid clashing with other libraries?

Great question! We can probably advise to name them after the package name. If there is general interest in this general idea we can further discuss the detail, but I think this won’t be a blocker.

What makes this different from, and better than, sitecustomize?

There is only one sitecustomize per Python installation (+usercustomize). The folder approach allows for many of this. Think of it as the tipical init.d of your OS.

When do the files under sitecustomize run?

Open for discussion as well. I’d suggest end of site.py. Just after all other site customization, as that allows for all general path changes before trying to find the __sitecustomize__ folders.

How do we disable this if needed?

I’d suggest starting with the already existing -S but we could add some further switch if you think it’s worth it.

Where can I place my sitecustomize ?

That is indeed another thing I’d be happy to discuss. There seem to be two proposals: Anything in the python path or in a site directory. sitecutzome.py does the former, pth files do the later. I personally find slightly more discoverable the latter. As user can rely on just importing it to see where files could be placed (for site, you can also just import site thought :man_shrugging:).

This would add considerable dir listing times to Python’s startup.

Indeed, I have to acknowledge I have not thought about this. If we go with the namespace like approach, this might get down to something like just an import if there is no __sitecustomize__. For installations that actually have the customizations, I guess it’s fair to pay the price. There is also always -S to disable site entirely. We should indeed do some measurements if the idea goes forward to see if it is even worth it.

OK, given that there is no much push against this, let’s do something more formal.

I’ve put in some kind of pep-like format a more formal proposal: https://gist.github.com/mariocj89/36daffd1e157b93e5c697bbadcb05806. I’ll update it as the conversation evolves in this thread.

This is not really true. You can add to the root of the site packages a a.py and then a.pth file that does import a. This is what virtualenv does today. This is now two read file operation, instead of the alternative one list and one read . I guess the only counter argument to this proposal is that there already is today a mechanism to achieve this. AFAIK pth files are well defined and supported by everyone.

It’s not a two read operation; it’s read+list+read: import a needs to look for many filenames (e.g. a.py, a.pyc, a.so, a.cpython-*.so, etc.), and importlib will read and cache the entire directory listing rather than query for each of those names individually.
Not that it matters. As with most micro-optimizations, implementing this and measuring will be easier than discussing/analyzing it theoretically ;‍)

Yeah, I mean, I think it is still not a “feature of Python” customizing startup like that, but let me add that to be 100% transparent. I’ve updated the gist to mention that in the pth section.

Can you explain why? It’s a direct effect of how pth files and import systems work. Making a statement like this feels like (and I’m exaggerating) saying mixing for loop and if statements it’s not a feature of Python :thinking:

Also I would like in the proposal some mention on the guarantee of execution order of these site customize scripts :thinking:

Let me explain that sentence. What I was trying to mean is that it seems to not be a behaviour of Python that developers want to publicise as a feature, document and maintain it.

From https://bugs.python.org/issue24534, you can see the following comment from Eric Snow:

From what I understand .pth file weren’t meant to be used in the way they are. Some folks are just really good at exploiting unintended behaviors based on an implementation. :slight_smile: We’ve been stuck with the “feature” ever since.

In this case the implementation happened to be a bit lax in how it evaluated each line. It should have been strict about allowing only single import statements. Instead it just made sure the line started with “import” and then exec’ed it. A check for “;” would have been sufficient.

That said, I don’t fault Martin (or whoever actually wrote it) at all. The implementation doesn’t really bother me. Sure, it could have been more careful, but honestly how could anyone be expected to anticipate the consequence here.

Ultimately, folks that looked closely enough at the source to figure out the hack would have had enough context to know the intent. They should have opened a bug report rather that take advantage of the loophole. If their need was sufficient they could have easily proposed an explicit mechanism to get what they wanted.

(bolding is mine, just to point out that’s what I’m doing hehe)

Also I would like in the proposal some mention on the guarantee of execution order of these site customize scripts

That was present in this section: https://gist.github.com/mariocj89/36daffd1e157b93e5c697bbadcb05806#order-of-execution, if you find something not too clear, happy to reword things though.

1 Like

Just one thing, in the beginning, you’re saying:

misusing pth files

I think the word misuse is not the right one here because that implies they are using it in a way that is not allowed. This was never explicitly disallowed, so perhaps it’s more appropriate to word it as exploiting/abusing past it’s initial intent.

Funny enough I initially had abusing, but a native English speaker suggested to move towards misuse. Exploiting is probably a nice alternative though, I’ll update it to exploit. Thanks!

1 Like