That is indeed the typical case for cached_property in the many usages I’ve seen.
There isn’t. The worst-case scenario for the lock-free version is that the cached_property getter function runs more than once for the same instance, and the latest one “wins” and gets to insert its result value in the instance dictionary.
You should be able to make it so the latest one loses, though. The work will still be done twice, which is a change from the existing behaviour, but there doesn’t have to be two different observable values for the property. (Only a gut feel, but I suspect the latter would be more disruptive.)
If setdefault isn’t sufficient, I wouldn’t be opposed to a _PyDict_CompareAndSetItem API (public and/or Python API left for future work).
This is a good point. setdefault() should be all that’s needed to do this. Outside of a race it will make no difference at all. In a race it avoids ever having two different values for the cached property visible, which is a nice property.
Although it avoids having two values visible, it does mean that in the likely common case where the values are different because some other attribute on the instance changed, this means that you permanently get the out-of-date cached value instead of the more up-to-date one. This isn’t wrong behavior, since it was only an accident of timing that prevented you from having the out-of-date value cached anyway, but it does make me slightly nervous to diverge from the behavior of the existing widely-used implementations like this.
Yes, I agree it would be nice to address these limitations. I think they are orthogonal to the locking question at hand in this thread, though, so I won’t go into them further here.
Thanks for highlighting this! If we go with option 1, I will reach out to Kombu (and any other users I can find via code search) to discuss the impending change with them directly.
I would consider this rather a variant of option 1, since it shares the key feature that functools.cached_property becomes lock-free by default without a deprecation process.
If we want to do option 1, but also keep a locking version available in the stdlib, I think it’s slightly nicer to keep it under a new name (locking_cached_property) rather than using a kwarg.
Given that evidence suggests most users don’t need locking, and it has overhead, even presuming someone implements per-instance locking I don’t see why we would ever go through the disruption of a deprecation process to change the default to one that is probably less desirable for most users.
Option 5: Shared lock, but only lock during cache management. Release the lock during the computation. You might say this is really option 3 in disguise, but it’s a bit leaner.
Here’s a shot at it. I think it deadlocks on recursion, but then I’m not sure what the expected behaviour, if any, is in that case.
Edit: On closer examination, this approach doesn’t work, disregard.
_NOT_FOUND = object()
_PENDING = object()
class cached_property:
def __init__(self, func):
self.func = func
self.attrname = None
self.__doc__ = func.__doc__
self.lock = Lock()
self.cond = Condition(self.lock)
def __set_name__(self, owner, name):
if self.attrname is None:
self.attrname = name
elif name != self.attrname:
raise TypeError(
"Cannot assign the same cached_property to two different names "
f"({self.attrname!r} and {name!r})."
)
def __get__(self, instance, owner=None):
if instance is None:
return self
if self.attrname is None:
raise TypeError(
"Cannot use cached_property instance without calling __set_name__ on it.")
try:
cache = instance.__dict__
except AttributeError: # not all objects have __dict__ (e.g. class defines slots)
msg = (
f"No '__dict__' attribute on {type(instance).__name__!r} "
f"instance to cache {self.attrname!r} property."
)
raise TypeError(msg) from None
val = cache.get(self.attrname, _NOT_FOUND)
if val is _NOT_FOUND or val is _PENDING:
while True:
with self.cond:
# check if another thread filled cache while we awaited lock
val = cache.get(self.attrname, _NOT_FOUND)
if val is _NOT_FOUND:
try:
cache[self.attrname] = _PENDING
except TypeError:
msg = (
f"The '__dict__' attribute on {type(instance).__name__!r} instance "
f"does not support item assignment for caching {self.attrname!r} property."
)
raise TypeError(msg) from None
break
elif val is _PENDING:
self.cond.wait()
else:
return val
val = self.func(instance)
with self.cond:
cache[self.attrname] = val
self.cond.notify_all()
return val
The current (and expected) behavior in case of recursion is RecursionError, unless there is some recursion-breaker built into the getter function, in which case it should work. I don’t think deadlock is acceptable.
Agreed that this might be a lower-cost variant of option 3 that would work fine for some users, but I’m not confident that it still wouldn’t introduce problematic levels of lock contention for some users with very high numbers of instances of a single type and high numbers of threads.
The difficulty of evaluating such tradeoffs in the abstract is IMO an argument for why locking belongs in the user domain, not in cached_property itself.
That potential lock contention is already there in the current implementation. I admit it may be slightly aggrevated by having two lockings instead of one. But it’s not that big an issue as long as it’s only on the first access: If you weren’t intending to access it repeatedly then you wouldn’t make it a cached_property to begin with. Avoiding repeated computation is the important thing.
You can’t lock in the user domain unless you are willing to lock on every access, which cached_property currently manages to avoid. It seems an efficient threadsafe version would need a completely separate implementation.
RecursionError is a more user friendly outcome than deadlock, and I agree that a shared lock solution should issue the exception. Calling it expected behaviour is a stretch, though. It’s not documented and it doesn’t look like there’s a test for it.
I’m hoping for a threadsafe outcome, because that would be very useful: With that you can make effectively immutable objects with parts that are lazily computed. Without thread safety, that abstraction is too leaky.
For sure, but that’s the problem we have to solve, and I’m not sure whether any solution that still shares a lock across all instances solves it fully.
I’m confused by this. Nothing is stopping a user getter function decorated by @cached_property from doing whatever form of locking it wants to do internally (it can also re-check the instance dict within the critical section.) This doesn’t “lock on every access” – once the dict is populated with a value, all future accesses are normal lock-free attribute access. It only locks on every access that hits the descriptor getter (i.e. until the first access finishes up and the dict gets populated), which is the same behavior any locking solution must have.
Too leaky for whom? Lots of people have been using non-threadsafe cached_property to make “effectively immutable objects with parts that are lazily computed” for 15+ years by now. If you aren’t using threads, there is no problem. Even if you are using threads, if you occasionally get a race where the computation happens twice for one instance, for an idempotent cached property of an object that is “effectively immutable” this is not a problem either, unless both the computation is very expensive and this race condition happens frequently. If you are in this situation, you can add locking in your getter.
For many years already and still today, very widely used frameworks like Django and Pyramid have provided non-thread-safe cached_property, and these are the most widely used cached_property variant. So the argument that non-thread-safe cached_property is inadequate has a lot of empirical counter-evidence to contend with.
Thanks, I’m doing the same way in my local in-memory cache library and finally decide to add a lock option to let user decide how to use it. After thinking carefully I admit it’s hard to make it perfectly thread-safe and efficient. There are always corner cases. But the simple setdefault and threading.Event should work in most cases.
Me. I only just discovered cached_property by reading this thread, which was a pleasant discovery because I often write code that could benefit from it. But if it’s not thread safe, then it’s off the table.
I feel like a kid that got a present only to have it yanked away the next second, if all locking is removed.
I know, that’s the kind of trade-off I also make from time to time. But I would like to avoid it: It’s kind of a trap to lay for a future maintainer, who may unwittingly change something to rely on it only getting executed once, and then there’s a heisenbug. Having an efficient caching mechanism that guarantees single execution would be a major plus.
Example:
class C:
def __init__(self):
self.lock = threading.Lock()
def use_p(self):
with self.lock:
print(self.p)
@cached_property
def p(self):
return expensive_computation()
How do you propose to rewrite this so that it only locks on the first use_p call? While still using cached_property?
class C:
def __init__(self):
self.lock = threading.Lock()
def use_p(self):
print(self.p)
@cached_property
def p(self):
with self.lock:
if "p" not in self.__dict__:
return expensive_computation()
return self.p
That said, I agree that if we can get efficient per-object locking, that’s better.
That’s sad! I don’t want to take presents away from anyone
But (assuming you don’t want locking across the entire class), the present you are looking for doesn’t exist in the standard library today. I think it would be great if it existed for your use as functools.locking_cached_property, if we agree on a good implementation of it. (And I look forward to seeing your PR!)
But I agree with most commenters in this thread that it is better for most users if functools.cached_property is the simple, lowest-possible-overhead, lock-free cached_property that large swathes of the Python community have been happily using for many years.