PEP 558: Defined semantics for locals()

ncoghlan · December 31, 2019, 4:10am

After a break of several months, I finally made the time to update the PEP 558 reference implementation to use “independent snapshot” semantics for locals() invocations at function scope. With the changes successfully implemented, I went ahead and merged the related update to the PEP itself.

PEP 558 recap

To resolve assorted bugs in CPython’s tracing and debugging machinery, PEP 558 replaces the existing PyFrame_LocalsToFast() batch write-back mechanism with a new transparent write-through proxy as the frame.f_locals attribute. This only affects optimised frames (i.e. functions, coroutines, generators, comprehensions), with class and module behaviour remaining unchanged.
That change to frame.f_locals was not controversial back in the May 2019 discussions, so the key questions were around how locals() itself should behave. Nathaniel made a very strong case that the two most credible options were to expose the write-through proxy directly or to start returning true independent snapshots. I ruled out exposing the write-through proxy directly on backwards compatibility grounds (covered in the PEP), but agreed that true independent snapshots would be an improvement over both the status quo and the May 2019 iteration of the PEP
The latest iteration of the PEP now covers the “each locals() call at function scope returns an independent snapshot” proposal, but implementing that behaviour revealed a subtle problem at the public C API level: because it returns a borrowed reference, PyEval_GetLocals() is not able to accommodate the new locals() semantics. So that still returns a reference to a shared dynamic snapshot (with all the associated “spooky action at a distance” that Nathaniel pointed out as being problematic back in May), with a new C API function, PyEval_GetPyLocals(), added to exactly reflect the new locals() builtin behaviour.

The rendered PEP is available at PEP 558 – Defined semantics for locals() | peps.python.org, and I’ve also included a pandoc-converted markdown version below (note: the apparently duplicated footnotes arise from the format conversion, they’re multiple references to the same footnote in the original ReST document).

PEP: 558
Title: Defined semantics for locals()
Author: Nick Coghlan
BDFL-Delegate: Nathaniel J. Smith
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 2017-09-08
Python-Version: 3.9
Post-History: 2017-09-08, 2019-05-22, 2019-05-30, 2019-12-30

Abstract

The semantics of the locals() builtin have historically been
underspecified and hence implementation dependent.

This PEP proposes formally standardising on the behaviour of the CPython
3.8 reference implementation for most execution scopes, with some
adjustments to the behaviour at function scope to make it more
predictable and independent of the presence or absence of tracing
functions.

Rationale

While the precise semantics of the locals() builtin are nominally
undefined, in practice, many Python programs depend on it behaving
exactly as it behaves in CPython (at least when no tracing functions are
installed).

Other implementations such as PyPy are currently replicating that
behaviour, up to and including replication of local variable mutation
bugs that can arise when a trace hook is installed^[1].

While this PEP considers CPython's current behaviour when no trace
hooks are installed to be largely acceptable, it considers the current
behaviour when trace hooks are installed to be problematic, as it causes
bugs like^[2] without even reliably enabling the desired functionality
of allowing debuggers like pdb to mutate local variables^[3].

Review of the initial PEP and the draft implementation then identified
an opportunity for simplification of both the documentation and
implementation of the function level locals() behaviour by updating it
to return an independent snapshot of the function locals and closure
variables on each call, rather than continuing to return the
semi-dynamic snapshot that it has historically returned in CPython.

Proposal

The expected semantics of the locals() builtin change based on the
current execution scope. For this purpose, the defined scopes of
execution are:

module scope: top-level module code, as well as any other code
executed using exec() or eval() with a single namespace
class scope: code in the body of a class statement, as well as any
other code executed using exec() or eval() with separate local
and global namespaces
function scope: code in the body of a def or async def
statement, or any other construct that creates an optimized code
block in CPython (e.g. comprehensions, lambda functions)

We also allow interpreters to define two "modes" of execution, with
only the first mode being considered part of the language specification
itself:

regular operation: the way the interpreter behaves by default
tracing mode: the way the interpreter behaves when a trace hook has
been registered in one or more threads via an implementation
dependent mechanism like sys.settrace (^[4]) in CPython's sys
module or PyEval_SetTrace (^[5]) in CPython's C API

For regular operation, this PEP proposes elevating most of the current
behaviour of the CPython reference implementation to become part of the
language specification, except that each call to locals() at
function scope will create a new dictionary object, rather than caching
a common dict instance in the frame object that each invocation will
update and return.

For tracing mode, this PEP proposes changes to CPython's behaviour at
function scope that make the locals() builtin semantics identical to
those used in regular operation, while also making the related frame API
semantics clearer and easier for interactive debuggers to rely on.

The proposed tracing mode changes also affect the semantics of frame
object references obtained through other means, such as via a traceback,
or via the sys._getframe() API.

New `locals()` documentation

The heart of this proposal is to revise the documentation for the
locals() builtin to read as follows:

Return a mapping object representing the current local symbol table,
with variable names as the keys, and their currently bound references
as the values.

At module scope, as well as when using exec() or eval() with a
single namespace, this function returns the same namespace as
globals().

At class scope, it returns the namespace that will be passed to the
metaclass constructor.

When using exec() or eval() with separate local and global
namespaces, it returns the local namespace passed in to the function
call.

In all of the above cases, each call to locals() in a given frame of
execution will return the same mapping object. Changes made through
the mapping object returned from locals() will be visible as bound,
rebound, or deleted local variables, and binding, rebinding, or
deleting local variables will immediately affect the contents of the
returned mapping object.

At function scope (including for generators and coroutines), each call
to locals() instead returns a fresh snapshot of the function's
local variables and any nonlocal cell references. In this case,
changes made via the snapshot are not written back to the
corresponding local variables or nonlocal cell references, and
binding, rebinding, or deleting local variables and nonlocal cell
references does not affect the contents of previously created
snapshots.

There would also be a versionchanged note for Python 3.9:

In prior versions, the semantics of mutating the mapping object
returned from locals() were formally undefined. In CPython
specifically, the mapping returned at function scope could be
implicitly refreshed by other operations, such as calling locals()
again, or the interpreter implicitly invoking a Python level trace
function. Obtaining the legacy CPython behaviour now requires explicit
calls to update the originally returned snapshot from a freshly
updated one.

For reference, the current documentation of this builtin reads as
follows:

Update and return a dictionary representing the current local symbol
table. Free variables are returned by locals() when it is called in
function blocks, but not in class blocks.

Note: The contents of this dictionary should not be modified; changes
may not affect the values of local and free variables used by the
interpreter.

(In other words: the status quo is that the semantics and behaviour of
locals() are formally implementation defined, whereas the proposed
state after this PEP is that the only implementation defined behaviour
will be that associated with whether or not the implementation emulates
the CPython frame API, with the behaviour in all other cases being
defined by the language and library references)

Module scope

At module scope, as well as when using exec() or eval() with a
single namespace, locals() must return the same object as globals(),
which must be the actual execution namespace (available as
inspect.currentframe().f_locals in implementations that provide access
to frame objects).

Variable assignments during subsequent code execution in the same scope
must dynamically change the contents of the returned mapping, and
changes to the returned mapping must change the values bound to local
variable names in the execution environment.

The semantics at module scope are required to be the same in both
tracing mode (if provided by the implementation) and in regular
operation.

To capture this expectation as part of the language specification, the
following paragraph will be added to the documentation for locals():

At module scope, as well as when using exec() or eval() with a
single namespace, this function returns the same namespace as
globals().

This part of the proposal does not require any changes to the reference
implementation - it is standardisation of the current behaviour.

Class scope

At class scope, as well as when using exec() or eval() with separate
global and local namespaces, locals() must return the specified local
namespace (which may be supplied by the metaclass __prepare__ method
in the case of classes). As for module scope, this must be a direct
reference to the actual execution namespace (available as
inspect.currentframe().f_locals in implementations that provide access
to frame objects).

Variable assignments during subsequent code execution in the same scope
must change the contents of the returned mapping, and changes to the
returned mapping must change the values bound to local variable names in
the execution environment.

The mapping returned by locals() will not be used as the actual
class namespace underlying the defined class (the class creation process
will copy the contents to a fresh dictionary that is only accessible by
going through the class machinery).

For nested classes defined inside a function, any nonlocal cells
referenced from the class scope are not included in the locals()
mapping.

The semantics at class scope are required to be the same in both tracing
mode (if provided by the implementation) and in regular operation.

To capture this expectation as part of the language specification, the
following two paragraphs will be added to the documentation for
locals():

When using exec() or eval() with separate local and global
namespaces, [this function] returns the given local namespace.

At class scope, it returns the namespace that will be passed to the
metaclass constructor.

This part of the proposal does not require any changes to the reference
implementation - it is standardisation of the current behaviour.

Function scope

At function scope, interpreter implementations are granted significant
freedom to optimise local variable access, and hence are NOT required to
permit arbitrary modification of local and nonlocal variable bindings
through the mapping returned from locals().

Historically, this leniency has been described in the language
specification with the words "The contents of this dictionary should
not be modified; changes may not affect the values of local and free
variables used by the interpreter."

This PEP proposes to change that text to instead say:

At function scope (including for generators and coroutines), each call
to locals() instead returns a fresh snapshot of the function's
local variables and any nonlocal cell references. In this case,
changes made via the snapshot are not written back to the
corresponding local variables or nonlocal cell references, and
binding, rebinding, or deleting local variables and nonlocal cell
references does not affect the contents of previously created
snapshots.

This part of the proposal does require changes to the CPython
reference implementation, as CPython currently returns a shared mapping
object that may be implicitly refreshed by additional calls to
locals(), and the "write back" strategy currently used to support
namespace changes from trace functions also doesn't comply with it (and
causes the quirky behavioural problems mentioned in the Rationale).

CPython Implementation Changes

Resolving the issues with tracing mode behaviour

The current cause of CPython's tracing mode quirks (both the side
effects from simply installing a tracing function and the fact that
writing values back to function locals only works for the specific
function being traced) is the way that locals mutation support for trace
hooks is currently implemented: the PyFrame_LocalsToFast function.

When a trace function is installed, CPython currently does the following
for function frames (those where the code object uses "fast locals"
semantics):

Calls PyFrame_FastToLocals to update the dynamic snapshot
Calls the trace hook (with tracing of the hook itself disabled)
Calls PyFrame_LocalsToFast to capture any changes made to the
dynamic snapshot

This approach is problematic for a few different reasons:

Even if the trace function doesn't mutate the snapshot, the final
step resets any cell references back to the state they were in
before the trace function was called (this is the root cause of the
bug report in ^[6])
If the trace function does mutate the snapshot, but then does
something that causes the snapshot to be refreshed, those changes
are lost (this is one aspect of the bug report in ^[7])
If the trace function attempts to mutate the local variables of a
frame other than the one being traced (e.g.
frame.f_back.f_locals), those changes will almost certainly be
lost (this is another aspect of the bug report in ^[8])
If a locals() reference is passed to another function, and that
function mutates the snapshot namespace, then those changes may be
written back to the execution frame if a trace hook is installed

The proposed resolution to this problem is to take advantage of the fact
that whereas functions typically access their own namespace using the
language defined locals() builtin, trace functions necessarily use the
implementation dependent frame.f_locals interface, as a frame
reference is what gets passed to hook implementations.

Instead of being a direct reference to the dynamic snapshot returned by
locals(), frame.f_locals will be updated to instead return a
dedicated proxy type (implemented as a private subclass of the existing
types.MappingProxyType) that has two internal attributes not exposed
as part of either the Python or public C API:

mapping: an implicitly updated snapshot of the function local
variables and closure references, as well as any arbitrary items
that have been set via the mapping API, even if they don't have
storage allocated for them on the underlying frame
frame: the underlying frame that the snapshot is for

__getitem__ operations on the proxy will read directly from the stored
snapshot.

The stored snapshot is implicitly updated when the f_locals attribute
is retrieved from the frame object, as well as individual keys being
updated by mutating operations on the proxy itself. This means that if a
reference to the proxy is obtained from within the function, the proxy
won't implicitly pick up name binding operations that take place as the
function executes - the f_locals attribute on the frame will need to
be accessed again in order to trigger a refresh.

__setitem__ and __delitem__ operations on the proxy will affect not
only the dynamic snapshot, but also the corresponding fast local or
cell reference on the underlying frame.

After a frame has finished executing, cell references can still be
updated via the proxy, but the link back to the underlying frame is
explicitly broken to avoid creating a persistent reference cycle that
unexpectedly keeps frames alive.

Other MutableMapping methods will behave as expected for a mapping with
these essential method semantics.

Making the behaviour at function scope less surprising

The locals() builtin will be made aware of the new fast locals proxy
type, and when it detects it on a frame, will return a fresh snapshot of
the local namespace (i.e. the equivalent of dict(frame.f_locals))
rather than returning the proxy directly.

Changes to the public CPython C API

The existing PyEval_GetLocals() API returns a borrowed reference,
which means it cannot be updated to return the new dynamic snapshots at
function scope. Instead, it will return a borrowed reference to the
internal mapping maintained by the fast locals proxy. This shared
mapping will behave similarly to the existing shared mapping in Python
3.8 and earlier, but the exact conditions under which it gets refreshed
will be different. Specifically:

accessing the Python level f_locals frame attribute
any call to PyFrame_GetPyLocals() or
PyFrame_GetLocalsAttribute() for the frame
any call to PyEval_GetLocals(), PyEval_GetPyLocals() or the
Python locals() builtin while the frame is running

A new PyFrame_GetPyLocals(frame) API will be provided such that
PyFrame_GetPyLocals(PyEval_GetFrame()) directly matches the semantics
of the Python locals() builtin, returning a shallow copy of the
internal mapping at function scope, rather than a direct reference to
it.

A new PyEval_GetPyLocals() API will be provided as a convenience
wrapper for the above operation that is suitable for inclusion in the
stable ABI.

A new PyFrame_GetLocalsAttribute(frame) API will be provided as the C
level equivalent of accessing pyframe.f_locals in Python. Like the
Python level descriptor, the new API will implicitly create the
write-through proxy object for function level frames if it doesn't
already exist, and update the stored mapping to ensure it reflects the
current state of the function local variables and closure references.

The PyFrame_LocalsToFast() function will be changed to always emit
RuntimeError, explaining that it is no longer a supported operation,
and affected code should be updated to use PyFrame_GetPyLocals(frame)
or PyFrame_GetLocalsAttribute(frame) instead.

Additions to the stable ABI

The new PyEval_GetPyLocals() API will be added to the stable ABI. The
other new C API functions will be part of the CPython specific API only.

Design Discussion

Changing `locals()` to return independent snapshots at function scope

The locals() builtin is a required part of the language, and in the
reference implementation it has historically returned a mutable mapping
with the following characteristics:

each call to locals() returns the same mapping object
for namespaces where locals() returns a reference to something
other than the actual local execution namespace, each call to
locals() updates the mapping object with the current state of the
local variables and any referenced nonlocal cells
changes to the returned mapping usually aren't written back to
the local variable bindings or the nonlocal cell references, but
write backs can be triggered by doing one of the following:
- installing a Python level trace hook (write backs then happen
  whenever the trace hook is called)
- running a function level wildcard import (requires bytecode
  injection in Py3)
- running an exec statement in the function's scope (Py2 only,
  since exec became an ordinary builtin in Python 3)

Originally this PEP proposed to retain the first two of these
properties, while changing the third in order to address the outright
behaviour bugs that it can cause.

In ^[9] Nathaniel Smith made a persuasive case that we could make the
behaviour of locals() at function scope substantially less confusing
by retaining only the second property and having each call to locals()
at function scope return an independent snapshot of the local
variables and closure references rather than updating an implicitly
shared snapshot.

As this revised design also made the implementation markedly easier to
follow, the PEP was updated to propose this change in behaviour, rather
than retaining the historical shared snapshot.

Keeping `locals()` as a snapshot at function scope

As discussed in ^[10], it would theoretically be possible to change the
semantics of the locals() builtin to return the write-through proxy at
function scope, rather than switching it to return independent
snapshots.

This PEP doesn't (and won't) propose this as it's a backwards
incompatible change in practice, even though code that relies on the
current behaviour is technically operating in an undefined area of the
language specification.

Consider the following code snippet:

def example():
    x = 1
    locals()["x"] = 2
    print(x)

Even with a trace hook installed, that function will consistently print
1 on the current reference interpreter implementation:

>>> example()
1
>>> import sys
>>> def basic_hook(*args):
...     return basic_hook
...
>>> sys.settrace(basic_hook)
>>> example()
1

Similarly, locals() can be passed to the exec() and eval()
builtins at function scope (either explicitly or implicitly) without
risking unexpected rebinding of local variables or closure references.

Provoking the reference interpreter into incorrectly mutating the local
variable state requires a more complex setup where a nested function
closes over a variable being rebound in the outer function, and due to
the use of either threads, generators, or coroutines, it's possible for
a trace function to start running for the nested function before the
rebinding operation in the outer function, but finish running after the
rebinding operation has taken place (in which case the rebinding will be
reverted, which is the bug reported in ^[11]).

In addition to preserving the de facto semantics which have been in
place since PEP 227 introduced nested scopes in Python 2.1, the other
benefit of restricting the write-through proxy support to the
implementation-defined frame object API is that it means that only
interpreter implementations which emulate the full frame API need to
offer the write-through capability at all, and that JIT-compiled
implementations only need to enable it when a frame introspection API is
invoked, or a trace hook is installed, not whenever locals() is
accessed at function scope.

Returning snapshots from locals() at function scope also means that
static analysis for function level code will be more reliable, as only
access to the frame machinery will allow mutation of local and nonlocal
variables in a way that's hidden from static analysis.

What happens with the default args for `eval()` and `exec()`?

These are formally defined as inheriting globals() and locals() from
the calling scope by default.

There isn't any need for the PEP to change these defaults, so it
doesn't.

However, usage of the C level PyEval_GetLocals() API in the CPython
reference implementation will need to be reviewed to determine which
cases need to be changed to use the new PyEval_GetPyLocals() API
instead.

These changes will also have potential performance implications,
especially for functions with large numbers of local variables (e.g. if
these functions are called in a loop, calling locals() once before the
loop and then passing the namespace into the function explicitly will
give the same semantics and performance characteristics as the status
quo, whereas relying on the implicit default would create a new snapshot
on each iteration).

(Note: the reference implementation draft PR has updated the locals()
and vars() builtins to use PyEval_GetPyLocals(), but has not yet
updated the default local namespace arguments for eval() and
exec()).

Changing the frame API semantics in regular operation

Earlier versions of this PEP proposed having the semantics of the frame
f_locals attribute depend on whether or not a tracing hook was
currently installed - only providing the write-through proxy behaviour
when a tracing hook was active, and otherwise behaving the same as the
historical locals() builtin.

That was adopted as the original design proposal for a couple of key
reasons, one pragmatic and one more philosophical:

Object allocations and method wrappers aren't free, and tracing
functions aren't the only operations that access frame locals from
outside the function. Restricting the changes to tracing mode meant
that the additional memory and execution time overhead of these
changes would be as close to zero in regular operation as we can
possibly make them.
"Don't change what isn't broken": the current tracing mode
problems are caused by a requirement that's specific to tracing
mode (support for external rebinding of function local variable
references), so it made sense to also restrict any related fixes to
tracing mode

However, actually attempting to implement and document that dynamic
approach highlighted the fact that it makes for a really subtle runtime
state dependent behaviour distinction in how frame.f_locals works, and
creates several new edge cases around how f_locals behaves as trace
functions are added and removed.

Accordingly, the design was switched to the current one, where
frame.f_locals is always a write-through proxy, and locals() is
always a snapshot, which is both simpler to implement and easier to
explain.

Regardless of how the CPython reference implementation chooses to handle
this, optimising compilers and interpreters also remain free to impose
additional restrictions on debuggers, such as making local variable
mutation through frame objects an opt-in behaviour that may disable some
optimisations (just as the emulation of CPython's frame API is already
an opt-in flag in some Python implementations).

Historical semantics at function scope

The current semantics of mutating locals() and frame.f_locals in
CPython are rather quirky due to historical implementation details:

actual execution uses the fast locals array for local variable
bindings and cell references for nonlocal variables
there's a PyFrame_FastToLocals operation that populates the
frame's f_locals attribute based on the current state of the fast
locals array and any referenced cells. This exists for three
reasons:
- allowing trace functions to read the state of local variables
- allowing traceback processors to read the state of local
  variables
- allowing locals() to read the state of local variables
a direct reference to frame.f_locals is returned from locals(),
so if you hand out multiple concurrent references, then all those
references will be to the exact same dictionary
the two common calls to the reverse operation,
PyFrame_LocalsToFast, were removed in the migration to Python 3:
exec is no longer a statement (and hence can no longer affect
function local namespaces), and the compiler now disallows the use
of from module import * operations at function scope
however, two obscure calling paths remain: PyFrame_LocalsToFast is
called as part of returning from a trace function (which allows
debuggers to make changes to the local variable state), and you can
also still inject the IMPORT_STAR opcode when creating a function
directly from a code object rather than via the compiler

This proposal deliberately doesn't formalise these semantics as is,
since they only make sense in terms of the historical evolution of the
language and the reference implementation, rather than being
deliberately designed.

Implementation

The reference implementation update is in development as a draft pull
request on GitHub (^[12]).

Acknowledgements

Thanks to Nathaniel J. Smith for proposing the write-through proxy idea
in ^[13] and pointing out some critical design flaws in earlier
iterations of the PEP that attempted to avoid introducing such a proxy.

References

Copyright

This document has been placed in the public domain.

Broken local variable assignment given threads + trace hook +
closure (https://bugs.python.org/issue30744) ↩︎
Broken local variable assignment given threads + trace hook +
closure (https://bugs.python.org/issue30744) ↩︎
Updating function local variables from pdb is unreliable
(https://bugs.python.org/issue9633) ↩︎
CPython's Python API for installing trace hooks
(https://docs.python.org/dev/library/sys.html#sys.settrace) ↩︎
CPython's C API for installing trace hooks
(https://docs.python.org/3/c-api/init.html#c.PyEval_SetTrace) ↩︎
Broken local variable assignment given threads + trace hook +
closure (https://bugs.python.org/issue30744) ↩︎
Updating function local variables from pdb is unreliable
(https://bugs.python.org/issue9633) ↩︎
Updating function local variables from pdb is unreliable
(https://bugs.python.org/issue9633) ↩︎
Nathaniel's review of possible function level semantics for
locals()
(https://mail.python.org/pipermail/python-dev/2019-May/157738.html) ↩︎
Nathaniel's review of possible function level semantics for
locals()
(https://mail.python.org/pipermail/python-dev/2019-May/157738.html) ↩︎
Broken local variable assignment given threads + trace hook +
closure (https://bugs.python.org/issue30744) ↩︎
PEP 558 reference implementation
(https://github.com/python/cpython/pull/3640/files) ↩︎
Broken local variable assignment given threads + trace hook +
closure (https://bugs.python.org/issue30744) ↩︎

ncoghlan · January 1, 2020, 3:52pm

When updating the PEP to have locals() return independent snapshots at function scope, I missed a section that mentioned the old behaviour: https://github.com/python/peps/pull/1266/files

steve.dower · January 1, 2020, 3:56pm

Bikeshedding, but can we have the new C API be GetLocalsSnapshot instead of GetPyLocals? Just to save the documentation lookup to figure out what’s so “Py” about it?

ncoghlan · January 1, 2020, 4:12pm

[Edit: after reading the initial version of this, I realised ReadLocalsInto would be a more explicit name than just ReadLocals]

The new C API doesn’t always return a snapshot, though - it’s only a snapshot at function scope, and a direct reference to the locals namespace otherwise. The “GetPyLocals” name is meant to be a mnemonic for “get Python builtin locals() result”.

If we did want to offer a GetLocalsSnapshot style C API (that always copied the state regardless of scope), I wouldn’t actually define it as a function returning a dictionary - instead, I’d suggest we add a PyEval_ReadLocalsInto(target) API that accepted an existing mapping to update, rather than allocating a new one.

It would be kind of nice to have an API like that available at the Python layer as well (maybe as sys.read_locals_into()), so you could easily recreate the Python 3.8 and earlier semantics explicitly:

def f():
    ns = locals() # Make initial snapshot
    for item in iterable:
        sys.read_locals_into(ns) # Update snapshot from current function state

Rather than having to do:

def f():
    ns = locals() # Make initial snapshot
    for item in iterable:
        ns.update(locals()) # Makes a pointless copy under the current PEP 558 proposal

However, it’s also something we’re free to add later if we decide we really want it - the underlying machinery to do it all exists either way, it’s just a question of whether or not to offer a public API to access it.

steve.dower · January 1, 2020, 5:52pm

Eh, this isn’t going to be such a high frequency operation that you’d want to do that. Just return a new dict (other advantage is we can optimise its creation based on knowing the final size, if we want).

As for sometimes returning the “real” locals, that seems unlikely to matter. You’ll need to detect the context when calling it to know what to do with it, and if you know the context you can get a settable version of it some other way. Better to not overload this API like that so that callers never have to think about it, even if it might be useful sometimes (I’m mostly thinking about debugging and tracing applications here, as I’m not sure what else I’d be using the API for). So I’d still copy the locals into a new dict and just keep it that simple.

steve.dower · January 1, 2020, 5:54pm

And your examples aren’t strictly correct in the case where locals are deleted. Just getting a new snapshot each time is significantly easier to get right, and if you want to do something more complicated (such as detecting changes from the last snapshot) then you’ll want the new copy anyway.

ncoghlan · January 2, 2020, 1:42pm

To be clear on your rationale for adding the extra public API, does it run something like this:

the author of Python code knows whether that code is running at function scope or not, so the behavior of locals() is predictable even when it is scope dependent
the author of C code does not know which kind of Python scope is active, so the proposed “GetPyLocals” API is going to be tricky to use correctly in many cases, and those authors would be better served by an API where “PyEval_GetLocals()” never makes an independent snapshot, while “PyEval_GetLocalsSnapshot()” always makes one (regardless of scope).

That seems like a pretty solid rationale for the extra API to me.

steve.dower · January 2, 2020, 3:51pm

Yeah, that’s essentially it. The C API can only really be called from within a C function, which means its frame isn’t the one you’re reading locals from ever.

My experience is adding bias here for sure, but I’d only ever expect to see it used in a trace callback (and if not “only”, certainly the most common use that could modify it).

But I think YAGNI also applies - your side by side comparison of the two functions makes that clearer.

Not so much “many” cases as in the edge cases. I think the extra work you’d have to do to determine whether you got a modifiable result or not would outweigh the benefit of getting it so easily.

ncoghlan · January 2, 2020, 10:33pm

After thinking on this a bit more, the snapshot API would be a shorthand for:

PyObject *locals = PyEval_GetLocals();
if (locals == NULL)
    return NULL;
return Py_Dict(locals);

However, there would also be documentation benefits in having the explicit snapshot API defined:

makes it clearer that the existing PyEval_GetLocals() API does not return independent snapshots
allows the PyEval_GetPyLocals() API to be documented as equivalent to PyEval_GetLocalsSnapshot() at function/coroutine/generator scope, and PyEval_GetLocals() everywhere else.

encukou · January 3, 2020, 10:21am

Another bikeshedding: the name PyFrame_GetLocalsAttribute seems based on the corresponding Python API. Consider naming the operation itself, e.g. PyFrame_GetMutableLocals/PyFrame_GetLocalsProxy.

ncoghlan · January 5, 2020, 12:44pm

OK, based on the discussion so far, it seems folks are generally OK with the proposed Python level behaviour for both locals() and the frame API, but there are several concerns around making sure that the C API is both clear and usable.

On that front, I’m wondering if the following might be clearer than what is currently in the PEP.

New stable C API/ABI functions

PyObject * PyLocals_Get(): New reference to target namespace mapping at module and class scope, and when using exec() or eval(). New namespace snapshot at function/coroutine/generator scope. Directly equivalent to the Python locals() builtin.
PyObject * PyLocals_GetView(): Returns a new read-only mapping proxy instance for the current locals namespace. Updated for all local variable changes at module and class scope, and when using exec() or eval(). Updated at implementation dependent times at function/coroutine/generator scope (accessing any of the PyLocals_Get* APIs or the existing PyEval_GetLocals() API will always force an update).
PyObject * PyLocals_GetSnapshot(): Returns a new dict instance populated from the current locals namespace. Roughly equivalent to dict(locals()) in Python code, but avoids the double-copy in the case where locals() already returns a snapshot.
bool PyLocals_IsSnapshot(): returns true if PyLocals_Get() in the currently running scope will return an independent snapshot on each call, false if it returns a direct reference to the target namespace

Documentation updates for existing stable C API/ABI functions

PyObject * PyEval_GetLocals(): keeps existing behaviour in CPython (mutable locals at class and module scope, shared dynamic snapshot otherwise). However, update the documentation to note that the conditions under which the shared dynamic snapshot get updated have changed.

Also update the documentation to recommend replacing usage of this API with whichever of the new APIs is most appropriate for the use case:

Use PyLocals_Get() to exactly match the semantics of the Python level locals() builtin.
Use PyLocals_GetView() for read-only access to the current locals namespace.
Use PyLocals_GetSnapshot() for a regular mutable dict that contains a copy of the current locals namespace, but has no ongoing connection to the active frame.
Query PyLocals_IsSnapshot() explicitly to implement custom handling (e.g. raising a meaningful exception) for scopes where PyLocals_Get() would return a snapshot rather than granting read/write access to the target namespace

New additions to the CPython frame API

The additions to the CPython frame API would then mostly be those that CPython required to actually implement the proposed new PyLocals_* API and to keep PyEval_GetLocals() working after the fast locals proxy object is introduced:

PyObject * PyFrame_GetLocals(frame): New reference to target namespace mapping at module and class scope, and when using exec() or eval(). New namespace snapshot at function/coroutine/generator scope. Underlying API supporting PyLocals_Get().
PyObject * PyFrame_GetLocalsView(frame): Returns a new read-only mapping proxy instance for the frame’s local namespace. Updated for all local variable changes at module and class scope, and when using exec() or eval(). Updated at implementation dependent times at function/coroutine/generator scope (if the frame is still running, accessing any of the PyFrame_GetLocals* APIs or the _PyFrame_BorrowLocals() API will always force a view update, as will an explicit call to PyFrame_RefreshLocalsView()). Underlying API supporting PyLocals_GetView().
PyObject * PyFrame_GetLocalsSnapshot(frame): Returns a new dict instance populated from the current locals namespace. Underlying API supporting PyLocals_GetSnapshot().
bool PyFrame_LocalsIsSnapshot(frame): returns true if PyFrame_GetLocals(frame) for the given frame will return an independent snapshot on each call, false if it returns a direct reference to the target namespace. Underlying API supporting PyLocals_IsSnapshot().
PyObject * _PyFrame_BorrowLocals(frame): Underlying API supporting PyEval_GetLocals(). Underscore prefix to discourage use and because code using it is unlikely to be portable across implementations, but documented and visible to the linker because the dynamic snapshot stored inside the write-through proxy is otherwise completely inaccessible from C code (in the draft reference implementation, the struct definition for the fast locals proxy itself is deliberately kept private to the frame implementation, so not even the rest of CPython can see it).

However, there would also be one new frame-only API addition to make it easy to force snapshot refreshes when accessing the f_locals attribute on the frame struct directly:

int PyFrame_RefreshLocalsView(frame): force an update of any selectively updated views previously returned by PyFrame_GetLocalsView(frame). Currently also needed in CPython when accessing the f_locals attribute directly and it is not a plain dict instance (otherwise it may report stale information).

(that last API would replace the PyFrame_GetLocalsAttribute proposal in the PEP, as the attribute retrieval API is mainly in the current PEP to offer a way to easily force a refresh of the view inside the fast locals proxy instance in a manner similar to the way that accessing frame.f_locals will refresh it in CPython)

ncoghlan · January 5, 2020, 1:03pm

Note: after reviewing the API proposal in the previous post, I noticed it was missing an obvious way to force view refreshes when accessing the C level f_locals attribute directly, so I added the PyFrame_RefreshLocalsView(frame) API.

steve.dower · January 6, 2020, 2:30pm

Those sound good to me. And yes, I’m happy with the Python semantics.

ncoghlan · February 2, 2020, 1:36pm

I finally made it back to implementing the redesigned API, and discovered that while we’ve been using <stdbool.h> for a while in several implementation modules, we’ve never used bool instances as part of the public API.

Rather than pushing to change that, I think I’ll just stick with the Py_IsInitialized convention and return an int from PyLocals_IsSnapshot() and PyFrame_LocalsIsSnapshot().

I’m also thinking I should change the name of the latter to PyFrame_GetLocalsReturnsSnapshot(). The issue I see with my original name is that while the PyLocals_Get()/PyLocals_IsSnapshot() pairing is pretty unambiguous, PyFrame_LocalsIsSnapshot() looks like it is referring to the f->f_locals field, not the result of calling PyFrame_GetLocals().

ncoghlan · February 2, 2020, 2:00pm

I updated the PEP reference implementation to use the revised API design: https://github.com/python/cpython/pull/3640

It’s pretty close to the version I posted above, with just the two changes I mentioned (returning ints from the snapshot checking functions, and using the less ambiguous name PyFrame_GetLocalsReturnsSnapshot().

Since they’re a possible return type from public API functions, the PR also exposes _PyFastLocalsProxy_Type and _PyFastLocalsProxy_CheckExact as part of the full CPython C API.

ncoghlan · February 2, 2020, 2:01pm

Next step will be to update the PEP itself to propose this version of the C API - I currently expect to get to that some time next weekend (Feb 8th/9th).

steve.dower · February 2, 2020, 8:24pm

This makes it sound like returns is/are a noun. “Get locals as snapshot” is better if you don’t want it to be “get locals snapshot”.

ncoghlan · February 7, 2020, 4:11pm

The function to force a snapshot is PyFrame_GetLocalsSnapshot(f).

The naming issue is with the query API that lets you find out whether PyFrame_GetLocals() (the one that matches the Python locals() builtin) will return a snapshot or not.

ncoghlan · February 8, 2020, 11:26am

In going to update the PEP, I became less and less happy with using the term “snapshot” as a shorthand for “shallow copy”, especially as the related dict API is PyDict_Copy(), not PyDict_Snapshot().

With that substitution made, and the query API tweaked again, we would end up with the following API:

// Stable API/ABI
PyObject *PyLocals_Get();
int PyLocals_GetReturnsCopy(); // 0 == False, non-zero == True
PyObject *PyLocals_GetCopy();
PyObject *PyLocals_GetView();

// Full frame API
PyObject *PyFrame_GetLocals(f);
int PyFrame_GetLocalsReturnsCopy(f); // 0 == False, non-zero == True
PyObject *PyFrame_GetLocalsCopy(f);
PyObject *PyFrame_GetLocalsView(f);
PyObject *_PyFrame_BorrowLocals(f);
int PyFrame_RefreshLocalsViews(f); // 0 == operation OK, non-zero == error

The above also pluralizes “refresh locals() views”, as I’m expecting each PyFrame_GetLocalsView(f) call to produce a new mapping proxy instance (just like class __dict__ descriptors), with only the underyling data store being shared between the views.

ncoghlan · February 8, 2020, 2:30pm

WIP draft of the PEP updates is at https://github.com/python/peps/pull/1302

This is the first edit covering the revised C API discussed above.

I still need to do a top-to-bottom proof-read to ensure everything is self-consistent.