Method to refresh os.environ

yurivict · June 2, 2024, 1:43am

os.environ doesn’t reflect environment changes that are made outside of Python.

For example, this code adds the environment XXX but os.environ doesn’t see this change.

This program:

import os

import ctypes
import ctypes.util

clib = ctypes.util.find_library("c")
dll = ctypes.CDLL(clib)

setenv = dll.setenv
setenv.argtypes = [ctypes.POINTER(ctypes.c_char), ctypes.POINTER(ctypes.c_char), ctypes.c_int]

getenv = dll.getenv
getenv.argtypes = [ctypes.POINTER(ctypes.c_char)]
getenv.restype=ctypes.c_char_p

# set XXX outside of Python's os.environ
setenv(b"XXX", b"Hello", 2)

print(f'real XXX value: {str(getenv(b"XXX"))}')
print(f'XXX is in os.getenv: {os.getenv("XXX")}')
print(f'XXX is in os.environ: {"YES" if "XXX" in os.environ else "NO"}')

prints:

real XXX value: b'Hello'
XXX is in os.getenv: None
XXX is in os.environ: NO

There should be a method, for example os.environ_refresh, that would re-read environment into os.environ.

I’ve hit this on the real-world case when Python didn’t see environment set by TCL script executed in the same process.

csm10495 · June 2, 2024, 1:54am

How was the TCL script run in the same process? Normally you would need to use subprocess or something similar to shell out to run a different script. That would then result in the child process’ environment variables changing.

os.environ is cached on import of the os module. So currently changes made outside of python (including in C extensions) wouldn’t be noticed by os.environ, or interestingly by os.getenv() either.

yurivict · June 2, 2024, 2:09am

In the case that I had both Python and TCL were invoked from the top-level C++ application.

However, TCL can also be called using a Python module written in C that binds TCL runtime and executes TCL scripts in the same process.

csm10495 · June 2, 2024, 2:16am

Got it. This is actually interesting since there are probably other cases where this type of thing matters directly in pure python itself:

If someone uses 2 sub-interpreters inside the same process, they could change the environ in one… and it would technically change in the other (since they’re the same process) … but the other wouldn’t see the change.

Maybe a solution while keeping the normal caching could be to add a param to os.getenv to force it to fetch from the OS instead of the cache, with it defaulting to False to keep today’s behavior by default.

Of course another, bigger thought is we could get rid of the fact that os.environ is a cache, but I don’t know the legacy reasons for why it is that way to begin with.

Rosuav · June 2, 2024, 2:30am

Notably, Tkinter runs Tcl code. So this sort of thing could happen by setting up a GUI and having code execute in that context.

eryksun · June 2, 2024, 2:47am

Currently, this issue doesn’t require anything special as far as extension modules go. Just use os.putenv(). For example, on Linux:

>>> getenv = ctypes.CDLL(None).getenv
>>> getenv.restype = ctypes.c_char_p
>>> os.putenv('SPAM', 'EGGS')
>>> 'SPAM' in os.environ
False
>>> getenv(b'SPAM')
b'EGGS'

It’s possible to wrap os.putenv() to work around this case in particular.

csm10495 · June 2, 2024, 9:31pm

I’m not sure I follow. This only helps one direction. You update the var from Python and it updates the real value.

If you update from an extension, you can’t fetch it from Python without using ctypes, etc.

It seems like a bit of a gap.

eryksun · June 3, 2024, 8:01am

Charles, if you’re replying to me, the last sentence in my post was addressing a possible argument that the standard library should only worry about ensuring that its own operations keep os.environ updated, excluding ctypes and tkinter. In that case, wrapping builtin os.putenv() is sufficient.

For a related example in another programming language, consider that the C runtime library on Windows maintains its own _wenviron and _environ environments, respectively for Unicode and multibyte strings^[1]. Modifying either of these environments directly is strongly discouraged. An application that uses the C runtime should set and unset environment variables via _wputenv[_s]() or _putenv[_s](). These functions keep the two C runtime environments in sync with each other, and they also call WinAPI SetEnvironmentVariableW() to keep the process environment in sync. There is no function to reinitialize _[w]environ from the process environment.

Initially only one of the two environments is initialized, depending on whether the application uses the wmain() entry point with Unicode arguments or the main() entry point with multibyte-string arguments. The other environment is initialized on demand. ↩︎

vstinner · June 3, 2024, 1:42pm

IMO it’s worth it to attempt to implement an os.environ.refresh() method and test if it does fix your use case. I can help to implement it.

yoavdw · June 3, 2024, 8:50pm

I know this is obvious to people here, but to a lot of beginners, an os.environ.refresh() is not going to do what they except.

For example, from this StackOverflow question, which is first when googling “python refresh environment variables”, the idea of refreshing environment variables is not necessarily those updated out of Python in the same process, but those updated in another process:

This was also my first thought when reading this thread’s title. I quickly realized that’s not possible, but not everyone will.

eryksun · June 3, 2024, 10:13pm

Here’s a ctypes prototype that’s implemented for glibc on Linux and ucrt on Windows. The way to access the environ array via ctypes FFI depends on the C runtime, so this code is limited to the platforms that I can currently test. I don’t use macOS or BSD.

import os
import sys
import ctypes

if sys.platform == 'linux':
    def _get_env_array(*, lib=ctypes.CDLL(None)):
        return ctypes.POINTER(ctypes.c_char_p).in_dll(lib, 'environ')
elif sys.platform == 'win32':
    def _get_env_array(*, lib=ctypes.CDLL('ucrtbase')):
        p = ctypes.CFUNCTYPE(ctypes.POINTER(ctypes.POINTER(ctypes.c_wchar_p)))
        return p(('__p__wenviron', lib))()[0]
else:
    raise ImportError

def refresh_environ():
    uppercase_names = sys.platform == 'win32'
    _env_array = _get_env_array()
    if isinstance(_env_array[0], bytes):
        equals = b'='
    else:
        equals = '='
    c_environ = {}
    for entry in _env_array:
        if entry is None:
            break
        name, value = entry.split(equals, 1)
        if uppercase_names:
            c_environ[name.upper()] = value
        else:
            c_environ[name] = value
    os.environ._data.clear()
    os.environ._data.update(c_environ)

[Edited: I remembered how os.environ is implemented, which makes the refresh much simpler to implement.]

For example:

>>> os.putenv('SPAM', 'EGGS')
>>> 'SPAM' in os.environ
False
>>> refresh_environ()
>>> os.environ['SPAM']
'EGGS'

csm10495 · June 4, 2024, 1:07am

Is there a reason os.environ can’t just fetch on use / iteration and put on modification? I don’t think it has to be a real dict, just follow the MutableMapping ‘interface’.

If we removed the concept of it being a cache, it could simplify things for users.

If we needed a way to have a static copy of it, dict(os.environ). If we needed a one shot update, they could still call os.environ.update(..) to update it like a dict.

vstinner · June 4, 2024, 5:00pm

I created issue gh-120057 and a pull request to add a new os.environ.refresh() method.

eryksun · June 4, 2024, 11:14pm

Charles, I presume the main reason for the local environment is historical. The design predates a lot of the magic that’s possible in modern Python. That said, another reason could be that it isolates Python code from changes to the environment by other shared libraries and language runtimes in the process. Calling a refresh() method would be explicitly requesting to see those changes. In the case of Windows, I think it should refresh based on the C runtime’s _wenviron array, not based on the process environment from WinAPI GetEnvironmentStringsW(). Maybe there could be an option on Windows to also refresh the C runtime environment based on the latter.

yoavdw · June 5, 2024, 8:04am

Is there maybe a different name we can have for this? I still feel like this is going to be a very common “gotcha”.

Here’s another person that refers to refreshing environment variables as getting an out-of-process update:

eryksun · June 5, 2024, 10:48am

How about refresh_cache() or reload_cache()? It could also accept an optional mapping to use instead of reloading from C environ, e.g. os.environ.refresh_cache(some_environ).

In the linked question, it seems that the value of the environment variable was changed in the OS via some graphical user interface. On Windows, it’s pretty easy to get a new environment from the values persisted across various keys in the registry. Unfortunately the required values to build an environment block are scattered all over the user and machine hives in the registry. Fortunately, the Windows API provides CreateEnvironmentBlock() to hide the messy details. Just call it with a reference to the access token of the current process. Then parse the environment block as a dict. For example:

import ctypes
from ctypes import wintypes

kernel32 = ctypes.WinDLL('kernel32', use_last_error=True)
userenv = ctypes.WinDLL('userenv', use_last_error=True)

TOKEN_DUPLICATE = 0x0002
TOKEN_QUERY = 0x0008

kernel32.GetCurrentProcess.restype = wintypes.HANDLE

kernel32.CloseHandle.restype = wintypes.BOOL
kernel32.CloseHandle.argtypes = (
    wintypes.HANDLE, # hObject
)

kernel32.OpenProcessToken.restype = wintypes.BOOL
kernel32.OpenProcessToken.argtypes = (
    wintypes.HANDLE,  # ProcessHandle
    wintypes.DWORD,   # DesiredAccess
    wintypes.PHANDLE, # TokenHandle
)

userenv.CreateEnvironmentBlock.restype = wintypes.BOOL
userenv.CreateEnvironmentBlock.argtyes = (
    ctypes.POINTER(wintypes.PWCHAR), # lpEnvironment
    wintypes.HANDLE,                 # hToken
    wintypes.BOOL,                   # bInherit
)

userenv.DestroyEnvironmentBlock.restype = wintypes.BOOL
userenv.DestroyEnvironmentBlock.argtypes = (
    wintypes.PWCHAR, # lpEnvironment
)

def OpenCurrentProcessToken(access=TOKEN_DUPLICATE|TOKEN_QUERY):
    ht = wintypes.HANDLE()
    hp = kernel32.GetCurrentProcess()
    if not kernel32.OpenProcessToken(hp, access, ctypes.byref(ht)):
        raise ctypes.WinError(ctypes.get_last_error())
    return ht

def create_environ():
    environ = {}
    p = wintypes.PWCHAR()
    htoken = OpenCurrentProcessToken()
    try:
        if not userenv.CreateEnvironmentBlock(ctypes.byref(p),
                                              htoken, False):
            raise ctypes.WinError(ctypes.get_last_error())
        try:
            i = 0
            while True:
                if p[i] == '\0':
                    break
                j = i + 1
                while True:
                    if p[j] == '\0':
                        break
                    j += 1
                # Skip names that begin with '='.
                if p[i] != '=':
                    name, value = p[i:j].split('=', 1)
                    environ[name] = value
                i = j + 1
        finally:
            if not userenv.DestroyEnvironmentBlock(p):
                raise ctypes.WinError(ctypes.get_last_error())
    finally:
        kernel32.CloseHandle(htoken)
    return environ

Here’s an example that stores a user environment variable in the registry and then creates a new environment mapping that includes it:

>>> HKCU = winreg.HKEY_CURRENT_USER
>>> hkey = winreg.OpenKey(HKCU, 'Environment', access=winreg.KEY_SET_VALUE)
>>> winreg.SetValueEx(hkey, 'SPAM42', 0, winreg.REG_SZ, 'EGGS42')
>>> environ = create_environ()
>>> environ['SPAM42']
'EGGS42'

yoavdw · June 5, 2024, 11:09am

Yes, on Windows I also saw Chocolatey has this functionality using the refreshenv command.

I assume if it did that then it would actually be refreshing the environment variables, so the name would be fine.

How would that work on Unix though?

eryksun · June 5, 2024, 12:41pm

Sorry, I don’t know how to implement the equivalent of CreateEnvironmentBlock() on POSIX. Linux has “/etc/environment” and “/etc/security/pam_env.conf” for system-wide environment variables, but per-user support for “~/.pam_environment” is deprecated and going away due to a fundamental security issue. Some system-wide variables are from the shell scripts “/etc/profile”, “/etc/bash.bashrc”, and “/etc/profile.d/*.sh”. User environment variables come from the shell scripts “~/.profile” and “~/.bashrc”. The situation on Windows is also a mess, but it’s only because of the way values are scattered around the system and user registry hives. It’s a manageable mess.

yoavdw · June 5, 2024, 12:53pm

Maybe we should just have a Windows-only method that does create_environment, and then you can do os.environ |= create_environment()?

eryksun · June 5, 2024, 2:19pm

It’s unlikely that any capability that’s specific to Windows will get added unless it’s important for security or something that’s useful for a wide range of applications. But, as I suggested previously, maybe the refresh() method could take an optional mapping, and just default to using C environ. That would at least provide a supported way to extend it with Windows specific support. A script could use something like the above create_environ() function, based on ctypes, or use PyWin32 to create the environment. For example:

import win32api
import win32profile
import win32security

def create_environ():
    hprocess = win32api.GetCurrentProcess()
    access = win32security.TOKEN_DUPLICATE | win32security.TOKEN_QUERY
    htoken = win32security.OpenProcessToken(hprocess, access)
    return win32profile.CreateEnvironmentBlock(htoken, False)