Method to refresh os.environ

dg-pb · June 21, 2024, 10:44am

Then I don’t see why having a slightly shorter and simple reload is a big issue.

To me this case is similar to importlib.reload. Unless you understand things a bit you don’t know what it is exactly doing anyway. And naming is not going to help much.

eryksun · June 21, 2024, 10:45am

Currently Python has no builtin getenv() function. The _data cache is preloaded from C environ and can only be reloaded from C environ, not invalidated.

steve.dower · June 21, 2024, 10:48am

Yes, sorry, I was using it there to imply the native-level function (i.e. GetEnvironmentVariable on Windows).

eryksun · June 21, 2024, 10:55am

For me the biggest concern with an actual write-through cache design is the inconsistent state when a library outside of the application’s control adds one or more new environment variables while also changing the value(s) of preexisting variable(s). The values of the new variables may be inconsistent with the cached value of one or more existing variables. The Python application will be stuck in this inconsistent state until it calls invalidate_cache(). Currently this cannot happen because the _data mapping is preloaded and can only be refreshed, all or nothing.

dg-pb · June 21, 2024, 10:56am

Also, I don’t think cache makes it much clearer to me.

The question remains:

Is it reloading cache from from “the registry/profile” or “current process modifications by non-python code”?

eryksun · June 21, 2024, 11:03am

What’s under discussion is how to update os.environ to include changes made by os.putenv(), os.unsetenv(), and other libraries in the current process, and how to name whatever method implements this capability. On Windows, there’s the additional problem that _w_environ (the wide-character C environment) is itself a cache of the actual process environment in the PEB, which gets returned by GetEnvironmentStringsW() and used by WinAPI GetEnvironmentVariableW() and SetEnvironmentVariableW().

steve.dower · June 21, 2024, 11:05am

Ultimately, that question is only ever going to be answered by documentation.

The reason invalidate_cache is better than refresh or reload is that someone is more likely to realise they don’t know what the first one does and will go look it up. The latter two seem more obvious, or at least more tempting if you know that you need to “refresh” environment variables (i.e. reload them from the user’s profile).

A slightly facetious suggestion/alternative: add both reload() (which reloads from the current process) and reload_from_profile() (which always raises a “not supported” error). The presence of the second one makes it obvious that the first one doesn’t load from the profile, and the error for the second makes clear that you don’t really want to do it.

Of course, a function that always throws an error is a terrible idea, but I think it illustrates why we put so much value on names when they’re going to appear in isolation.

dg-pb · June 21, 2024, 11:43am

I was only thinking about naming. I have not yet encountered a need for such functionality myself. Well, it did happen few times that I was frustrated why putenv is “not working”, so would be happy with this addition.

Although I see why some might not, I actually like this.

It leaves a nice short name. Provides a contrast, which gives much more clarity than cache suffix. And in case situation ever changes, so that reload_from_profile can be implemented, there is a good pre-meditated place for it.

yoavdw · June 21, 2024, 11:44am

In that spirit, why not reload_from_process()?

I also want to throw out: recreate_from_process(), or sync_with_process(), but I like those less.

yoavdw · June 21, 2024, 11:49am

That’s the reason I don’t like it. If we ever implement reloading from the profile, it would make loading it externally, (get_profile_environment()) which makes more sense, a less consistent option. Sort of binding us on how to implement an API that doesn’t exist.

dg-pb · June 21, 2024, 12:10pm

Ok, i see, not the best solution then I guess.

This isn’t bad. But what is lacking to me is that target being current process is not very clear. The question arises “which process?”.

Don’t like this because this is not doing the same as what is done on initial “creation”. Does it?

This is not bad. It is somewhat clear to me that environment is not going to be “synced” with environment of another process. So I think this is more accurate than reload_from_process.

If not reload, then I like sync_with_process most so far.

But I still like simple reload most. I know this is the environment of current process and I am reloading it. I think one big factor to me is naming of adjacent methods: it is not put_to_process_env, but simply putenv. While the same question can be asked: “does it put to process or profile?”

To me, anything (much) more than reload falls out of context a bit.

dg-pb · June 21, 2024, 12:17pm

What about simply sync? or synchronize?

dg-pb · June 21, 2024, 12:24pm

I take it back. sync on its own is not good at all.

csm10495 · June 21, 2024, 2:46pm

synchronize_with_current_process()

It syncs with the current processes environment according to the current records reported by the operating system.

It’s long but verbose and not really something that needs to be short.

vstinner · June 24, 2024, 1:41pm

For me, refresh() describes exactly what it does. You keep the same object, the contents get updated, you don’t lose data by calling it, but you get fresher data. Web browsers have a “refresh” button" which works the same way.

dg-pb · June 24, 2024, 3:34pm

I am +1 on refresh.

When it is initialised it inherits from parent.

When it is putenved, refreshed, etc it does so within the process environment.

As long as it is well documented, I don’t see why inventing new naming convention is needed.

I think the underlying issue is that it is not obvious (maybe not very well documented?) that once python process has started, environment interactions are contained within it.

I don’t think naming of refresh is the right place to address more general education issue.

yurivict · June 24, 2024, 3:36pm

+1 on refresh().

It’s simple and it describes the meaning well.

csm10495 · June 24, 2024, 9:24pm

tbh i don’t care about the name at this point. There seems to be a circle here with one side saying one name, the other side saying use a different name, and another side saying remove it. I’m +1 on the functionality existing, but could care less about the name atm.

… maybe its time for a core vote or something since its just circles at this point.

yoavdw · June 24, 2024, 9:41pm

My main concern, and the reason I keep perusing this discussion, is that this change was merged before we fully discussed the name, so leaving it as stale now would not force the regular behavior (change isn’t merged until there’s an agreed name), rather it’s the people who dislike the name that need to push for it. This is a weird situation to be in and I’m not sure how to handle it, so I’m sorry if I’m repeating myself in this thread too much. I just wish we had this discussion before merging.

steve.dower · June 24, 2024, 9:57pm

That’s the normal progression for things in the Ideas category, yeah. They should only very rarely go straight to PR, and this one is clearly complex enough to deserve discussion in Core Development before merging (though the Ideas discussion should inform that initial post so that not everything has to be discussed over again).