Global variables shared across modules

luc · June 25, 2022, 2:04pm

Hello to all Pythonians here.

I encountered a strange behavior about the global keyword and modules, which I cannot understand.

Module test1: Variable a is created
Module test2: Module test1 is imported, and function f is created, which modifies variable a through the global keyword
Module test3: Modules test1 and test2 are imported, f is called, and a is printed. Its value is the original, unmodified one.

See the files below. I am using Python 3.9.12. Please, could you help?

Regards, Luc.

File `test1.py`

a = 100

File `test2.py`

from test1 import a

def f():
  global a
  a = 2 * a

File `test3.py`

from test1 import a
from test2 import f

f()
print(a) # prints 100, not 200

kpfleming · June 25, 2022, 3:49pm

The global keyword in Python doesn’t do what you think it does

It makes an object global across the module in which it is declared, but it still belongs to that module.

vbrozik · June 25, 2022, 6:03pm

The culprit is not in the global command but in the way you imported the variable. To get the result you originally expected change the import this way:

File `test2.py`

import test1

def f():
  test1.a = 2 * test1.a

File `test3.py`

import test1
from test2 import f

f()
print(test1.a)

Explanation

In test2 when you do from test1 import a you are creating a global variable a in the current module (test2.a) bound to the same object as the variable a in the module test1 (test1.a) but these two variables are different variables. Assignment to a in test2 changes just to what test2.a is bound to, test1.a is still bound to the original object 100.

When you do import test1. Then you can refer to the original variable as test1.a and you can re-bind it.

The concept of binding between variable names and objects is very important for good understanding of Python. It is explained for example here: 4. Execution model — Python 3.10.5 documentation or here: Assignment in Python – Real Python (the course video requires payment).

luc · June 25, 2022, 8:43pm

Dear Kevin, dear Václav,

Thanks for the explanation. It is now quite clear.

Actually, I did not spot Page 4. Execution model mentioned by Václav. It makes everything clear.

I did not know of the module.variable naming possibility. It is quite convenient.

One suggestion: Refer this page from Page 7.12. The global statement, which is the one I was looking at.

Best regards,

Luc.

mlgtechuser · June 25, 2022, 11:16pm

Yes, that was a super explanation from Václav.
I ran it just to have a look and put the example in my “training” library.

It’s hard to tell who has coding experience and who is venturing into coding with Python, but…

Luc, you may recognize this as an example of namespace.

It seems possible that you’re fairly far along if you’re importing and linking modules that you wrote. Thanks for inspiring the lesson! Questions like this are what make discuss.python.org a great place to spend time!!

EDIT: I just ran across a topic on global vs. local namespace HERE. The scope of that discussion is functions within a single module but that topic is closely related to this one.

mlgtechuser · June 26, 2022, 1:27am

Summarizing the Execution model from Python 3.10.5 documentation:

The import statement of the form from ... import * binds all names defined in the imported module, except those beginning with an underscore.
⋮
Each assignment or import statement occurs within […] a class or function definition or at the module level (the top-level code block).
⋮
If a name is bound in a block, it is a local variable of that block, unless declared as nonlocal or global. If a name is bound at the module level, it is a global variable.

In the interest of clarity, can someone confirm that…

If a name is bound at the module level, it is a global variable.

…means that the scope of “global” can never be higher than the immediate module, where “immediate” means where the execution thread is currently located?

Not to mix terminology glibly, but is it reasonable or at least useful to apply the analogy of “casting the namespace” in this line? (After all, the purpose of explicitly invoking the namespace with namespace.func() or namespace.var is to pierce/escape the default scope.) This would explain why the import alone didn’t bind ‘a’ at a level above the module without the out-of-scope reference to module test1.

This line…

from ... import * binds all names defined in the imported module

…seems to overstate the case with " all " because Luc’s code tried to bind ‘a’ with only the import.

In other words: a module cannot bind namespace for variables from a module that it imports. Is this correct? @steven.daprano @CAM-Gerlach @cameron?

mlgtechuser · June 26, 2022, 1:47am

#this works (gives '200'):
import ModuleTest1
from ModuleTest2 import f

f()
print(ModuleTest1.a)
---------------------------------
#this doesn't work (gives '100'):
from ModuleTest1 import a
from ModuleTest2 import f

f()
print(a)
---------------------------------
#this also does NOT work:
from ModuleTest1 import a as a   #just being thorough;
from ModuleTest2 import f        #...not a serious attempt.

f()
print(a)
---------------------------------
#nor does this:
from ModuleTest1 import *  #I did halfway expect this to work
namespace                  #since it's a way to combine module namespaces
from ModuleTest2 import f

f()
print(a)

cameron · June 26, 2022, 4:29am

By Leland Parker via Discussions on Python.org at 26Jun2022 01:37:

Summarizing the Execution
model
from Python 3.10.5 documentation:

The import statement of the form from ... import * binds all names defined in the imported module, except those beginning with an underscore.
⋮
Each assignment or import statement occurs within […] a class or function definition or at the module level (the top-level code block).
⋮
If a name is bound in a block, it is a local variable of that block, unless declared as nonlocal or global. If a name is bound at the module level, it is a global variable.

In the interest of clarity, can someone confirm that…

If a name is bound at the module level, it is a global variable.

…means that the scope of “global” can never be higher than the immediate module, where “immediate” means where the execution thread is currently located?

It means that the term “global”, in Python, means a name bound at the
module level. So that when you go:

x = 1

def f(y):
    z = x + y
    return z

x is a global, and y and z are function locals. So we mean what
you’d naively expect of a global variable without thinking about modules
at all eg for a flat script.

There isn’t really any “higher” namespace.

Not to mix terminology glibly, but is it reasonable or at least useful to apply the analogy of “casting the namespace” in this line?

No? I have no idea what that’s supposed to mean. To my mind, “cast” is a
term I learnt with C, and largely means a type conversion, particularly
with pointers. I’ve seen people talk about things like:

s = "1"
i = int(s)

as a “cast”, and I hate it. int() is like any other class
instantiation. It just “looks” like a type conversion. And it does
effectively a very similar thing. But a C style cast is a compiler level
thing.

^[1]

Well, no? To use the above, namespace must be a name in the default
scope.

This would explain why the import alone didn’t bind ‘a’ at a level
above the module without the out-of-scope reference to module test1.

I’ve lost track of what import you’re talking about. An import binds
names in the current scope. It is just a special purpose assignment
statement. A module level import assigns in the module namespace, one
inside a function binds in the function namespace:

import csv # binds "csv" as a module-level aka global name

def f():
    import json # bind "json" as a function local name

This line…

from ... import * binds all names defined in the imported module

…seems to overstate the case with “all” because Luc’s code tried to bind ‘a’ with only the import.

In other words: a module cannot bind namespace for variables from a module that it imports. Is this correct?

Well, not a namespace. It’s an assignment. Consider:

Module A:

x = {'a': 2}

In the interpreter:

>>> import A    # binds the module A to the local name "A"
>>> print(A.x['a'])
2
>>> from A import x  # local "x" now a reference to the dict in A
>>> print(x['a'])
2
>>> x['a'] = 3
>>> print(x['a'])
3
>>> print(A.x['a'])
3
>>> x = {'a': 9}  # "x" now bound to a _new_ dict
>>> print(x['a'])
9
>>> print(A.x['a'])  # A.x has not been rebound
3

Cheers,
Cameron Simpson cs@cskk.id.au

(After all, the purpose of explicitly invoking the namespace with
namespace.func() or namespace.var is to pierce the default scope.) ↩︎

steven.daprano · June 26, 2022, 6:19am

Cameron Simpson said:

“There isn’t really any “higher” namespace.”

Sure there is: builtins, which are global to the entire interpreter session.

Here’s a trick to make something global to your entire application all at once:

import builtins

builtins.thing = 42

And now you can refer to thing from any module, anywhere, and it will return 42 (unless there is a local or module global variable of the same name).

But there is no equivalent to the global declaration to force a name into builtins, so although you can write:

print(thing)  # looks up the builtin name and prints 42

there is no way to make this work:

# Will never work, it must always be written builtins.thing = 999

thing = 999

That will always create a new local (or module global) variable “thing” with the value 999, shadowing the builtin variable “thing”.

cameron · June 26, 2022, 8:41am

By Cameron Simpson via Discussions on Python.org at 26Jun2022 04:40:

x is a global, and y and z are function locals. So we mean what
you’d naively expect of a global variable without thinking about modules
at all eg for a flat script.

There isn’t really any “higher” namespace.

Actually, I suppose there’s builtins.

Cheers,
Cameron Simpson cs@cskk.id.au

cameron · June 26, 2022, 10:36am

By Steven D’Aprano via Discussions on Python.org at 26Jun2022 06:35:

Cameron Simpson said:
“There isn’t really any “higher” namespace.”

Sure there is: builtins, which are global to the entire interpreter session.

As I remembered some minutes later. Oh the embarrassment.

Here’s a trick to make something global to your entire application all
at once:

import builtins
builtins.thing = 42

Yah, I’ve done that for my X() debug function occasionally, whose
cs.x module has:

if os.environ.get('CS_X_BUILTIN', ''):
  try:
    import builtins
  except ImportError:
    pass
  else:
    builtins.X = X

Cheers,
Cameron Simpson cs@cskk.id.au

steven.daprano · June 26, 2022, 11:02am

“my X() debug function”

Tell us more!

cameron · June 26, 2022, 11:53am

By Steven D’Aprano via Discussions on Python.org at 26Jun2022 11:17:

“my X() debug function”

Tell us more!

Well, I’m a lazy typist. And I debug a lot with print(), or more often
my X() function instead. X() has 4 primary features:

easy to type (1 letter, and both X and ( hold down the shift key)
can output in an ANSI colour - I use yellow in my terminals for it, so
it stands out amongst the normal green
write to stderr by default, but can also write directly to the
terminal (or to a logger or be discarded, which I pretty much never
use)
be controlled with enviroment variables

Here’s a screengrab of a dev terminal with some X() debug output in
the output:

So, the module I’m debugging will usually have:

from cs.x import X

like this (uncommited code which emits the message in the image above):

CSS[~/hg/css-solar(hg:solar)]fleet2*> diff
+ exec hg -R /Users/cameron/hg/css-solar diff
diff --git a/lib/python/cs/timeseries.py b/lib/python/cs/timeseries.py
--- a/lib/python/cs/timeseries.py
+++ b/lib/python/cs/timeseries.py
@@ -102,6 +102,8 @@ from cs.resources import MultiOpenMixin
 from cs.result import CancellationError
 from cs.upd import Upd, UpdProxy, print  # pylint: disable=redefined-builtin

+from cs.x import X
+
 __version__ = '20220606-post'

 DISTINFO = {
@@ -2730,6 +2732,7 @@ class TimeSeriesMapping(dict, MultiOpenM
           `column_name_map.get(column_name,column_name)`
     '''
     pd = import_extra('pandas', DISTINFO)
+    X("READ_CSV %r\npd_read_csv_kw=%s", csvpath, pformat(pd_read_csv_kw))
     df = pfx_call(pd.read_csv, csvpath, **pd_read_csv_kw)
     # prepare column renames
     renamed = {}

So, easy to type print flavour debugging.

BUT…

One thing I particularly rely on is its “tty” mode, which writes
directly to the current terminal when active. This is particularly handy
when debugging a test suite. For example, pytest intercepts stderr
and … prints it at the end, or drops it on the floor or something. If
I’m running my (failing) tests that way, the X() output still comes up
immediately in bright yellow. Ideally, usefully just before the test
explodes.

So my dev environment (these days, usually configured with direnv)
sets these modes with environment variables. Using direnv, these
.envrc files:

CSS[~/hg/css-solar(hg:solar)]fleet2*> cat .envrc
source_env ..
export SPLINK_DATADIR=$PWD/spd
export SPLINK_FETCH_SOURCE=solar-lan:/cygdrive/c/Users/CSKK/SP-LINK/CSKK
CSS[~/hg/css-solar(hg:solar)]fleet2*> cat ../.envrc
export CS_X_VIA_TTY=1
export CS_X_COLOUR=yellow
export CS_X_BUILTIN=1

so locally there’s some SPLINK* envvars associated with the code I’m
working on, and the parent dir has CS_X_* envvars setting my usual
debug modes, common to all my checkouts. The 3 above:

make X() write to /dev/tty (bypassing any stderr interception,
such as test suites and command line redirections)
make X() write bright yellow messages, easy to see in the output
stuffs the name X into the builtins namespace! which means I can
just put X(...) calls in other modules without bothering with an
import

And that last is the tie in to this namespace discussion

The cs.x module is available on PyPI if you care. Or source here:
https://hg.sr.ht/~cameron-simpson/css/browse/lib/python/cs/x.py

github.com

cameron-simpson/css/blob/main/lib/python/cs/x.py

#!/usr/bin/python
#
# Just my X debugging function.
#   - Cameron Simpson <cs@cskk.id.au>
#

'''
X(), for low level debugging.

X() is my function for low level ad hoc debug messages.
It takes a message and optional format arguments for use with `%`.
It is presented here in its own module for reuse:

    from cs.x import X
    ...
    X("foo: x=%s, a=%r", x, a)

It normally writes directly to `sys.stderr` but accepts an optional
keyword argument `file` to specify a different filelike object.

This file has been truncated. show original

Cheers,
Cameron Simpson cs@cskk.id.au

cameron · June 26, 2022, 12:02pm

By Steven D’Aprano via Discussions on Python.org at 26Jun2022 11:17:

“my X() debug function”
Tell us more!

Well, I did. But discourse cut it off part way through. I’ve fixed the
version on the forum:

which you’ll need to visit. The emailed version will be cut off, sorry.

Cheers,
Cameron Simpson cs@cskk.id.au

steven.daprano · June 27, 2022, 1:04am

I don’t think Discuss truncated your post. I got everything starting with “Well, I’m a lazy typist” and ending with your signature “Cheers Cameron” and everything in between.

I haven’t run diff over the email I received and the updated version on the web UI, but I can’t see any obvious missing text.

cameron · June 27, 2022, 1:37am

By Steven D’Aprano via Discussions on Python.org at 27Jun2022 01:19:

I don’t think Discuss truncated your post. I got everything starting
with “Well, I’m a lazy typist” and ending with your signature “Cheers
Cameron” and everything in between.

I haven’t run diff over the email I received and the updated version on
the web UI, but I can’t see any obvious missing text.

Yeah, me too. This morning. But i received the very truncated version
last night. Today it is complete.

I don’t know about you, but I’ve got the mutt remove-duplicates hook
enabled. I’m wondering if discourse sent out a revised version with the
same message id after I edited it?

Cheers,
Cameron Simpson cs@cskk.id.au

CAM-Gerlach · June 27, 2022, 9:53am

Like the other two Pythonistas much wiser than I, I am similarly confused by what specifically you are asking here. However, something important to keep in mind is that in Python, there is actually not really such a thing as a “variable” in this sense, at least with the semantics as implemented in C and many other languages—that a variable designates a specific memory location, rather than just a name, and assigning to that variable directly modifies that same underlying memory.

Instead, (and I assume you’re familiar with the basics, but just to review) Python has names which can be bound to objects, which may be somewhat relevant to helping make this more understandable (depending on what specifically you are asking about). These objects, in turn, live at particular memory locations, related to their id(); two objects with the same id occupy the same memory, and in fact are the same object (object_1 is object_2), even though they may have different names; conversely, the same name may point to different objects (i.e. with different id()s) in different scopes, or different points in time within the same scope.

Moreover, performing “assignment” on that name actually binds a new object to it; it doesn’t touch the original object or its underlying memory, so any other names (or other references) that refer to the same object still point to the original. The only way to actually change the object itself and its underlying memory, so that other references to it reflect the change, is to mutate it in place without binding (“assignment”).

So, putting this all together:

In the OP’s original example, from test1 import a is roughly equivalent to a = __import__("test1").a, i.e. binding the object referenced by the module-level attribute test1.a directly to the name a in the test3 module.
However (under normal conditions) modules are only executed once on the first import and cached in sys.modules for subsequent imports from other files in the same interpreter thread.
Therefore, initially, all names bound using from test1 import a refer to the same object, that created in the test1 module and bound to the module-level global name a.
As such, if the object bound to a is mutated (e.g. appending to a list), then all names that refer to it will reflect the change.
However, if the module-level name is instead rebound to a new object (which is indeed what test1.a = 2 * test1.a), then it will no effect on the existing object referred to by the name a in the various module namespaces
Thus, the code in the OP’s post shows no change to the object bound to the name a in test3, because that object was not modified; rather, a new object was created and bound to the name a in test3.

For the example in @vbrozik 's post:

import test1 is roughly equivalent to test1 = __import__("test1"), i.e. binding the module object created by executing the test1 module to the name test1 at the global scope of the test3 module.
However, normally, modules are only ever imported once and cached in sys.modules after the first import from other files in the same interpreter thread.
Therefore, while their names may be different in each module, all imported instances of the module point to the same underlying object
In turn, if the underlying module object is mutated, which includes modifying or rebinding attributes (test1.a = 2 * test1.a), then all names referencing that object (i.e. all instances of that imported module) reflect the change.
Thus, the code in @vbrozik 's post works as you expect and the modified module-level attribute is visible to test3—and any other modules that import it, for any code that executes after f() that references the top-level module object.

However, this can be very difficult to understand and reason about and easy to introduce confusing and hard to debug errors—what if f() gets executed twice? What if other modules using it don’t expect the change? Therefore, this sort of global state should be minimized or avoided if at all practical, and the state modified and shared as narrowly as possible.

Right, but to quote Jurassic Park, “Your scientists were so preoccupied with whether or not they could, they didn’t stop to think if they should.”

Global variables shared across modules

File test1.py

File test2.py

File test3.py

File test2.py

File test3.py

Explanation

File `test1.py`

File `test2.py`

File `test3.py`

File `test2.py`

File `test3.py`