I think it’s worth remembering that the OP was interested in static analysis, rather than optimization.
Although “purity” was used as a term, it was imprecise and didn’t match their intent.
I don’t want to shut down discussion of optimization, but I think some of the original questions aren’t addressed.
Preventing Mutation of Inputs
OP showed a case of wanting to forbid a function or any function it calls from appending to a list.
Typing already provides protocols which allow for a limited version of this analysis.
If you use Sequence[int] rather than list[int], you declare that you will not call mutating methods.
The effect of using a restrictive protocol is viral within a codebase. Nothing in a chain of calls can treat a Sequence as a list so long as you avoid using ignores or casts to violate the basic contracts your types declare. And this does not introduce function coloring problems.
The built-in suite of protocols covers many container types, and you can write your own protocols for bespoke cases..
Protocols are just naming conventions, but that’s typically all we really need.
Preventing Side Effects
Trying to prevent side effects requires a definition of “side effects we care about”. Examples of typically non-semantic side effects are too numerous to count, but include
- importing a module for the first time writes
sys.modules and may create pyc files
- calling a function repeatedly may change its bytecode as the interpreter adapts
- memory utilization of the current process changes if ephemeral objects are created and disposed, as does GC info
- a function with a cache gets faster (!)
- time passes
Consider whether or not you care about the side effects here:
def add(x, y):
if __debug__:
log.debug("adding")
return x + y
There’s no question that emitting logs is a side effect. But for many applications it is not useful to consider it one.
This goes to the point raised earlier about cache_info(). Yes, it exists and violates any reasonable definition of function purity. But it easily falls into the category of “side effects which I typically don’t care about”. That might not be a useful definition for tool driven analysis, but it’s a very useful one for reading code. I’m not sure what kinds of formalisms exist for describing such things?
For the sort of work I do, not being able to log would be problematic. So being able to support the notion that “logging is exempt from side effect analysis” is probably necessary for me.
Side Effects and Observability
Suppose I write to a module scoped global variable.
Simple example:
# mymod.py
_CACHE = {}
# uses _CACHE
def myfunc(...): ...
Whether or not there are meaningful side effects depends a bit on where you try to observe from. Within mymod.py, there’s a relatively obvious side effect. But from outside of it, if your code’s contract forbids direct use of _CACHE, there isn’t meaningfully any side effect.
On the flip side, code which itself doesn’t have any side effects can observe impact from side effects elsewhere in the program. e.g.,
_mylist = [1]
def get_list():
return _mylist
def the_sum():
return sum(_mylist)
Because the same list is always returned, and lists are mutable, the_sum can be changed.
The code above doesn’t have side effects, but its impurity is highly impactful.
Also, these examples use modules but the same reasoning holds naturally for other containers. A class or an object are simple to analogize.
In summary, purity may be only an element of the original post. OP’s needs/desires are legitimate, but are centered around contracts (Protocols) and mutability.