Method to refresh os.environ

eryksun · June 13, 2024, 1:16pm

That’s a significant change because os.environ would no longer be isolated from changes to the environment made outside of the Python runtime. I prefer for Python code to only see changes made by other Python code, unless the application explicitly requests to sync with the underlying C environ.

pf_moore · June 13, 2024, 1:28pm

Agreed. But it’s not good to make changes before we have either, which is how we got in this situation.

To be clear, I don’t care about what we do to the os module here, the only thing I have an issue with is being more careful to follow the normal process (get consensus before making a change).

yoavdw · June 13, 2024, 2:23pm

I don’t think I’m protecting a hypothetical person. I think the question link I provided shows this mindset, and if I didn’t make it clear yet, this is something I used to think. I’m not sure why you think it’s unreasonable, but as a beginner, to me, after I edited environment variables in the Windows UI, the process could somehow get the new values without a restart.

You are completely right by saying thinking this is wrong, and that learning things don’t work like that is something users will have to do anyway at some point.

However, I think this public API will simply confuse a lot more users than it will help. Even if the documentation is good.

I think the problem presented is going to be much, much, less common than the problem beginners will have (which I don’t expect Python to solve). I don’t have any way to measure it, but judging from how this is the first time (I think) this was brought up, and that no one else mentioned they had this problem before, I don’t think it’s very common.

In my worldview, an API called os.environ.refresh() should solve the problem most associated with the question “How to refresh environment variables?”. If that is not actually a problem that can be solved, that API shouldn’t exist.

vstinner · June 13, 2024, 2:59pm

Even if these functions are modified to use os.environ, it doesn’t solve the initial use case. The environment can be modified in the same process outside Python by many means. Tcl was given as an example, but basically everything which changes the environment outside Python will not be reflected into os.environ because of its design.

os.environ.refresh() is a simple solution to this problem. How is it overkill?

The current documentation says:

The os.environ.refresh() method updates os.environ with changes to the environment made by os.putenv(), by os.unsetenv(), or made outside Python in the same process.

Do you think that it’s not clear enough?

I didn’t see an use case about the “parent process”. Apparently, the main use case is to refresh “system PATH” on Windows. It was discussed previously, and no solution was provided.

By the way, yesterday I installed LLVM with “choco install llvm” on Windows which modified the system PATH. After that, I could get the “clang” command in my SSH connection. I had to reboot Windows. After a reboot, “clang” was available in my SSH connection! The SSH server doesn’t reload the system PATH apparently.

will_f · June 13, 2024, 9:08pm

I like the idea of method os.environ.refresh, and am also interested in the possibility of implementing os.environ.__call__ as an avenue for refresh that supports the object’s return.

I didn’t get a chance to read the entire thread thoroughly, but I’m also interested in warnings for dynamic use of os.environ. For example if two threads use the same variable with different expectations.

My apology for the lateness of my input

gerardw · June 14, 2024, 2:47am

I’m with Yoav. This is a bad idea. Loading libraries and calling functions is pretty esoteric stuff, and any developer doing that need should own the consequence. os.environ.refresh() isn’t going to do what people expect, and the terse documentation won’t be understood unless a user understands process trees and how each process has its own environment.

What happens with threads? Will manipulating the environment in one thread require a process lock just in case another thread might be calling refresh()?

csm10495 · June 14, 2024, 4:40am

I personally would use this feature as intended at least a few times over the years.

If the name is confusing, maybe a synchronize() name would make more sense? Or it could be more specific: synchronize_with_process().

The latter is about as clear as possible at least to me.

vstinner · June 14, 2024, 11:36am

I created PR gh-120494 to add a new os.get_user_default_environ() function. It can be used to update os.environ to the latest user and system environment variables, such as the PATH variable. Example:

import os
os.environ.update(os.get_user_default_environ())

I’m not sure about the function name. I chose this name based on CreateEnvironmentBlock() documentation:

Retrieves the environment variables for the specified user. This block can then be passed to the CreateProcessAsUser function.

I would prefer to call the function get_system_default_environ(), but apparently, environment variables can be set per user. So two users can get a different default environment.

vstinner · June 14, 2024, 11:57am

Example:

Run Python REPL.
Open the environment GUI editor in Windows Parameters. Add an user variable “TEST” equal to “value”.
In Python, get os.environ["TEST"]: you should get an error.
In Python, run os.environ.update(os.get_user_default_environ())
In Python, get os.environ["TEST"]: you should "value" as expected.

Example:

vstinner@WIN C:\victor\python\main>python
>>> # Add TEST variable in the GUI environment editor
>>> import os
>>> os.environ['TEST']
KeyError: 'TEST'
>>> os.environ.update(os.get_user_default_environ())
>>> os.environ['TEST']
'value'

yoavdw · June 14, 2024, 3:32pm

Maybe, for consistency, we can replace refresh with a get_process_environ (available on both platforms) and get_user_environ for Windows only?

And going a bit wild, a get_current_environ that tries get_user_environ if it’s Windows, otherwise get_process_environ?

yoavdw · June 14, 2024, 3:40pm

Exposing these as separate functions instead of refresh can also give more control as to not override values you changed manually.

For example if my program changed the value but an external entity also changed the value, I would be able to easily compare them.

vstinner · June 14, 2024, 5:52pm

The use cases are very different. refresh() is safe and has no known issue, whereas os.environ.update(os.get_user_default_environ()) overrides user changes on many variables and doesn’t remove variables deleted in os.get_user_default_environ().

yoavdw · June 14, 2024, 6:18pm

Ah right.

I think this only creates a problem with the last sentence of my message, to add get_current_environ() as this would be inconsistent across different operating systems, so consider that withdrawn.

I don’t see how that’s a problem with switching refresh() to get_process_environ() though. I mean, yes, it doesn’t do the same thing as get_user_environ(), but that’s okay, they’re two different functions.

Yes, most of the usage of get_process_environ() you’re going to do os.environ.update(os.get_process_environ()), but that’s probably also true for get_user_environ(), and in addition to consistency, it solves the original problem with refresh() which was caused by it’s name.

The only problem I see with this is that you’d (sometimes, though most of the time it won’t matter) to call os.environ.clear() before like refresh() does for you, to also fetch environment deletions - and that is 2 lines instead of 1. But given this is a rare use case anyway, I don’t feel like it’s very important to ensure it’s short.

This could also be solved ^[1] by making refresh(...) accept a required argument (and refresh would just clear() then update(...)), which would force the user to think about the type of refresh they’re making.

taking inspiration from Eryk’s earlier proposal ↩︎

yurivict · June 14, 2024, 7:34pm

It can’t be made thread safe, this is one issue with it.

If other threads would be setting environment values - its behavior would be undefined or it would cause crashes.

Python’s interpreter creation has the same issue because of os.environ caching.

This limitation should be documented.

yoavdw · June 15, 2024, 5:22pm

Sorry if I’m just repeating myself, but I think the best course of action here is:^[1]

Add os.get_process_environ() on Windows and Unix
Add os.get_user_environ() on Windows only.
Remove os.environ.refresh().

This has a few advantages in my opinion:

It solves all use cases mentions in this thread.
It forces the user to think and explicitly state what they need. It doesn’t assume a “default” type of refresh which may cause confusion.
It lets the user choose what to do with the values: the refresh() implementation does clear() and then update(...) which can cause threading problem, as mentioned. Sometimes you just want to use a specific value, compare changes, etc. There’s no reason to force a full and potentially unsafe update.

To address possible concerns:

I don’t think we need to worry too much about verbosity of this (requires clear() and then update(...). This is a pretty niche use case and it’s okay if it’s not as short as possible for the sake of being explicit. We can show the recipe of clearing and updating in the docs and mention thread-safety, but I personally don’t think that’s necessary.

Happy to hear any feedback on this, but specifically @yurivict and @csm10495 mentioned they would actually use refresh(), so please share if this API (instead of refresh()) would cause you any trouble in using it.

Though “don’t add anything” is still a close second in my opinion ↩︎

csm10495 · June 15, 2024, 6:48pm

get_process_environ() makes sense to me. get_user_environ() doesn’t really make sense to me.

Does it just get the environ for the user (aka what the edit environment variables panel says after submit or is it just the user portion of that panel?)

That being said: I don’t have a use for the get_user_environ() but get_process_environ() is basically what refresh() would allow me to get, so it would be useful for me in either of those forms.

vstinner · June 17, 2024, 9:26am

get_user_environ() gets environment variables which can be managed by this GUI editor (screenshot below). There are “user” environment variables and “system” environment variables. get_user_environ() gets both, combined:

(The screenshot is truncated, there are a few more environment variables after Path But it should give you an idea of what we are talking about.)

vstinner · June 17, 2024, 9:32am

Replacing os.environ.refresh() with os.get_process_environ() does not sound convenient. To reimplement refresh(), you must not use os.environ.update(os.get_process_environ()) but:

os.environ.clear()
os.environ.update(os.get_process_environ())

So this API is more error-prone, I don’t see the advantage over refresh().

vstinner · June 17, 2024, 9:37am

Environment variables are process-wide. Would you mind to elaborate on which operation is “undefined”?

Do you mean that thread A calls refresh() which clears the variable TEST and the thread B sets os.environ['TEST'] = 'value'? By design, depending if refresh() completes first or not, you may or may not have a variable TEST. But I don’t see how it’s different from the existing code where thread A can call os.environ.pop('TEST', None).

If you want (*need) consistency, you must introduce a lock around accessing environment variables. Again, that’s nothing new to refresh().

os.environ is created while Python has a single thread. Would you mind to elaborate? Are you referring to an application embedding Python with multiple threads running and threads are setting different environment variables?

steve.dower · June 17, 2024, 2:29pm

I know a lot of discussion has happened, but I don’t particularly like the idea of “read environment settings from the registry and overwrite my current environment” being easy, and it probably doesn’t belong in the stdlib.

If refresh() is just invalidating the environ cache, why not rename it to invalidate_cache()? The problem is that there is a cache that isn’t being invalidated, so anyone who encounters this is going to very quickly figure out that there’s a cache involved.

Environment variables being inherited by processes and then relatively static is well known and consistent across all modern OS. Users who modify their Bash profile and expect changes to be immediately available will be equally surprised until they learn about process creation, and then they’ll never be surprised again. Let’s not try too hard to cover up the OS’s intentional design.