Method to refresh os.environ

To add to this, I asked some co-workers today what they expect an os.environ.refresh() would do. Everyone I asked was confused as to what that would mean (they were Linux users) until I explained the use case of running another program in the same process. This goes back to what I said earlier that this API doesn’t do what people think of when you say “refresh environment variables”.


What happens if you have a thread reading an environment variable between the clear() and update(...) in a refresh() running in a second thread? Emptying the data dict not atomically could return empty results for all variables.

1 Like

Environment variables are process-wide. Would you mind to elaborate on which operation is “undefined”?

If multiple threads would attempt to set environment variables at the same time when others read it - this would lead to crashes. This is generally undefined behavior.

os.environ is created while Python has a single thread.

This can never be guaranteed. The Python interpreter can be initialized in the middle of the process when other threads are already running.

This is a general limitation of the Python interpreter that exists today.

1 Like

Also, I think I didn’t make it clear, I am not, in any way asking for this to be in the standard library. I initially mentioned it as to just note what new users would think refresh() do, and I thought it was a suggested as a PR to solve my concern with refresh(). I don’t have any particular preference for os.get_user_default_environ() (or variants) to exist in the stdlib, and reading back on the thread, it doesn’t seem like anyone else does either - so I apologize for not clarifying that earlier.

1 Like

I know you didn’t, but Victor has proposed a PR that adds it. I probably should’ve linked to it, but generally nothing in “Ideas” has PRs because this is for pre-discussion.

3 Likes

I fail to see how os.environ.refresh() is a bad move. For me, it solves a real old issue for many users. Solving Windows-specific problems are more complicated (see my “get_default_user_environ()” PR), but I don’t see how adding refresh() is causing more confusion than helping users.

Anyway, since multiple persons dislike refresh(), I propose PR gh-120790 to remove it.

2 Likes

I also fail to see why is this useful function controversial.

3 Likes

It’s just a bad name. Call it os.environ.invalidate_cache() and nobody will complain.

2 Likes

To me, refresh() looks more correct than invalidate_cache(). It does not invalidate the cache. It recreates the cache with the current actual values, refreshes it. I do not know better name for this. It looks like update(), but different, so you know that it make os.environ more “fresh”, but you need to look in the documentation if you need details.

The only flaw of this method is that it is not thread-safe. But none of environment variables manipulating API is thread-safe, this can not be fixed.

Perhaps, but given the amount of things refresh could mean on environ, it’s very unlikely that it’ll look correct to everyone, or even enough people to make it worthwhile.

invalidate_cache() at least implies that it isn’t going to change anything “real”. If you don’t know there’s a cache involved, you can easily find the docs to explain it.

synchronize() is another possible name, though that doesn’t imply which direction the environ is synchronized (does it make the real one match the dict? or the other way?).

I think update_from_real_environment() was suggested (I might’ve added “real”). It’s wordy, but it’s also about as clear as we can get in the name. If you were likely to type this regularly at the interactive prompt, I’d be against a name this long, but since this is a fairly special function that you’re only likely to call as part of a significant application, I don’t have a problem with the length.

5 Likes

Just rename it into ‘invalidate_cache’ instead of removing the method altogether.

Ok, here is a PR to rename the method: (PR gh-120808](gh-120057: Rename os.environ.refresh() to invalidate_cache() by vstinner · Pull Request #120808 · python/cpython · GitHub) “Rename os.environ.refresh() to invalidate_cache()”.

I searched in the stdlib: importlib and zipimport have invalidate_caches() method, so the name is not fully new, and it’s good to reuse it. We just drop the final S here (since there is a single cache).

3 Likes

It is good to reuse a name if the meaning is similar. Otherwise using the same name can be more confusing than helpful.

I would expect that invalidate_cache clears the cache but does not refill the cache with new values. I would expect that refresh updates all values in the cache to the current values from the underlying store. The difference between these two behaviours is what happens if you modify before reading:

os.environ.refresh()
# do other stuff
# that possibly changes environment variables
print(os.environ['VAR'])

If the “cache was invalidated” then I would expect environ['VAR'] to trigger a fresh lookup for the OS value of the environment variable at the time of access. If environ was “refreshed” then I expect it to store the value from the time at which it was refreshed.

5 Likes

refresh_cache() then?

We just need to avoid environ.refresh because “refresh the environment” is already a heavily overloaded concept.

1 Like

Would os.environ.reload() be better?

1 Like

I don’t dislike reload, but it may suggest that it’s going to reload from the registry/profile rather than just from changes made by non-Python code in the current process.

I am no expert here, but to me it seems that if “reloading from the registry/profile” is sensible at all and there is even a slight possibility that it will be implemented in the future, reload should be left reserved for it, or from another POV - name should be more specific here. Otherwise, if there is no chance for it at all, I don’t see why reload can not be used.

Also, I am not sure about the cache. I mean it is a cache to the same extent as a dictionary is a cache. I mean it is a cache, but cache is reserved for applications that are “cache-oriented”. It does seem to overextend here a bit.

Maybe reload_local if not going with pure reload?

1 Like

How about redesigning os.environ as a typical write-through cache? If a variable doesn’t exist in the _data cache, try to fetch it from builtin getenv() (in the posix or nt module), which would call POSIX getenv(). Initially the cache would be pre-loaded based on C environ. Then we can keep the wording “invalidate” without having to go back to the problematic words “refresh” or “reload”, which apparently lead some people to think the cache gets reloaded from data that’s outside of the current process.

The downside is that an actual caching design would expose Python code to new environment variables added by libraries that are outside of the control of the application. In contrast, the upside of a refresh() method is that Python code is explicitly agreeing to see those updates.

1 Like

Reloading based on persistent values stored on disk will not be added because there is no way on POSIX to implement an equivalent to WinAPI CreateEnvironmentBlock().

This is how it already works (I’m not sure if it reads through for unknown variables, but the bigger problem is changed variables, since a missing one is most likely going to be a failure).

It needs to read through getenv() all the time, which I personally don’t think is unreasonable, but it’s a performance regression for sure.

There’s also no way on Windows to correctly handle variables set by a shell profile or by the user. What happens when PYTHONPATH is cleared by a reload? Do we have to remove entries from sys.path? Or do we just have inconsistent state?

I don’t think reloading is at all viable. At best, a function like Victor proposed to fill a new dict with the variables that would’ve been there if everything restarted and no customisations would’ve applied, but that’s so specialised that I can’t imagine it getting used correctly.