Sometimes it is quite useful to attach an own attribute to a “foreign” object, i.e. an object not created/controlled by the application itself. It’s not a very clean style, but practical and often the most readable solution by far.
The main problem is that the attribute name must not be used by anybody else. Adding attrs with general names like .name or .extra is a bad idea. Even using ‘unlikely’ names cannot rule out a name clash completely. A change in a future library version or another combination of libraries
might create a problem.
My proposal is to create a naming convention. Just a documented recommendation, no language change.
To start the discussion, I was thinking about reserving an unique prefix:
x__<packagename>_varname # two underscores after the x
X__<packagename>_CONSTNAME
where <packagename> is equal to the __package__ string. My working name for this is “xunder”.
The rules are simple and predictable:
A class definition should avoid using names starting with x__ (also in upper case).
If an application adds its attribute to a foreign object, it should follow the name format shown above.
Notes
Inspired by the X- prefix rule used in email header names for decades
One underscore is not enough. Names like .x_offset, .y_offset are common when working with coordinates.
Personally I’d only ever do this with particular foreign objects, such as functions or instances for class C from library xyz. In those cases you can check what names are already in use.
If you insist on doing this with a generic foreign object, I think you can use double-leading-underscore fairly safely. When the class uses double underscore, the name gets “mangled” (technical term). If you name your attribute __{packagename}_{varname} (or some variation thereof), you’re only going to get conflicts if another library decides to use the same trick you do, and (for reasons unknown) decides to use your library name as part of their attribute naming scheme.
I don’t see how you’re going to get more safety than that. Even if it’s documented they shouldn’t, another library could easily create a class that does use attribute name x__{your packagename}_{some varname}. But if you use double leading underscore, you can only get into conflict with other libraries that use the same obscure trick, and use your library name as part of their naming scheme (for reasons unknown).
attrs puts the __attrs_attrs__ attribute onto classes it decorates, and I’ve never heard of that causing problems.
that’s a rule you can turn off, no?
I’m imagining you’re using this trick in an isolated library, or in just a few files. In the first case you can add the exemption to the pyproject.toml/ruff.toml/equivalent, in that latter case you can put #noqa {linter}: W0212 or something in that direction at the top of the files.
Frankly you are doing something naughty, so it is appropriate that the linter warns you. But that doesn’t mean you definitely shouldn’t do it, just that you should think carefully whether this is something you really want to do, and whether there is no better alternative.
Something I discovered through pandas code base is their accessor system. Its a pattern of allowing 3rd parties to register new namespaces on top of Dataframes/Series/Indexes, in the form of a class. The instances of the accessor class gets the object through which the access to the namespace happened inside of its __init__.
It’s a useful little pattern for things like this. Give the accessor namespace a meaningful name and bury the extra attributes/methods on it.
I’ve used this pattern to reduce circular dependencies in code base. Times where I had a relationship of module Y being dependent on module X. Certain classes in module X had accessors registered upon import of module Y, making for a nicer API.
There is actually an interesting point in this proposal that would imply in a language change -or at least a change for the better in the echo-system.
The matter is that attaching “3rd party attributes” to objects, while allowed by the language semantics, is all but forbidden by the static type-checking mechanisms. A lot of workarounds, often involving casting and protocols are needed so that type-checked bases don’t error on attaching a new attribute to an existing instance.
Such a proposal, while a convention for coding -which I find convenient - could be be respected by type-checkers, thus allowing a (mostly) safe mechanism for 3rd party attribute creation, along with a practical way to correctly approach type-checking.
I am +1 on the proposal - and I’d expand it to cover type checking on further elaboration.
Expanding - regarding @Melendowski and and @RonnyPfannschmidt’s answers:
I think these examples clearly demonstrate there is a demand for safely attaching 3rd party attributes to instances - and these two high profile projects had gone (lots of) extra miles to add a safe way to allow so - and divergent miles at that.
This is allowed in Python since its inception, right?
And that the lack of a convention actually prevents this from being used - and the need for a long way to do a proper “right way to do it”. A convention might just clear such usage.
This has 0 to do with stdlib - since there are no language changes required, so “out of stdlib expermientation” doesn’t even have any meaning.
Currently objects are fully justified in assuming the contents of self.__dict__ is completely under their control - patterns in __reduce__ functions or reset functions that call self.__dict__.clear() very much exists.
objects are also allowed to use __slots__, disallowing this kind of usage.
What exactly are the usecases for this? IMO this is a pretty weird style, incompatible with both OOP and FP conventions.
If external libraries want to keep data for arbitrary objects, they should use a weakref dictionary instead of attaching it to objects directly.
attrs and dataclass both use this, to store information about the fields that is used to create the init. And to set the value of __slots__. The “object” in those cases is a class. (Which I understand to be within the scope of the OP.)
Aha. Arguably, those classes are very much “owned” by attrs. I think instead the ctypes style convention of _sunder_ should be used for those kinds of things - those are also unlikely to conflict with anything natural and are imitations of _dunder_ names for purposes similar to builtins but without being a reserved by the language itself.
yes - but “sunders” alone won’t avoid name-clashes.
Anyway , _{projectname}_attrname_ is just as good as x__{projectname}_{attrname} at this point into the idea. I’d have a personal preference for x__, I guess.
I still don’t think that name-clashes are a problem that is relevant. As I said, stuff like attrs is very much in a position to claim names on the relevant classes (and dataclasses just uses dunders (arguably, attrs could maybe resuse those dunder names, but that’s off-topic)).
So again, where is this a problem? What kinds of libraries add attributes to objects they don’t have a strong control over?
I think @MegaIng understands perfectly well that’s being said.[1] But why? is the question.
If I want to decorate a library provided object, I typically reach for containment. I define a wrapper which has the relevant library object, imitates as passthrough the parts of the API I care about, and provides whatever extra bits I need. In recent years, with the addition of Protocols, I can even implement the library-defined protocol and know where it’s safe to pass my wrapper.
That’s a classic OOP-style solution. Python also lets you do fancy dynamic versions of this, though type checkers won’t understand it.
What is the actual problem being solved here such that traditional solutions aren’t applicable?
EDIT: sorry, I shouldn’t speak for others. I think that was poor form. But my impression is that we’re clear that for attrs and similar, attribute ownership is clear. ↩︎
the objects in questions are clearly under someone else’s control (so not attrs or similar)
adding attributes/methods is significantly simpler than the alterntives (primarily keeping a weakref dict)
the verbosity of the attributes names doesn’t matter (meaning this isn’t a question of improved readability. (None of the suggested options that include the full package name qualify as a readable IMO)
Pandas has .attrs for storing arbitrary metadata but its doesn’t work very well and in all this time has had little work to really make it fully function.
There is also libraries outside pandas which utilize the pandas accessor system I linked above. Namely, GeoPandas and Hvplot, registers .geo and .hvplot, respectively. There’s probably more too - at one point there was a page in the Pandas documentation that listed all the 3rd party accessors and extension arrays.
@sirosen I believe the issue with your solution is that when the object you want to wrap has a large api you care about, such as pandas, doing that much pass through is a lot of it.
Granted now with all this said, the stuff done through the pandas accessor system could be functions that take in dataframes. There is the added convenience however, in case of the hvPlot, where you can change the plotting backend through Pandas. So instead of accessing .hvplot to get hvPlot figures, you change the backend, keep using .plot and it works seamlessly.
But stuff like .geo and .hyplot are clearly there to be readable and usable by end users, no? The proposed convention of .z__geopandas_geo isn’t exactly a user-friendly alternative[1], so they clearly are not the target market for OP’s proposal.
And Pandas needs to be aware of these attributes since they want to copy it - if this is expected behavior far more than just a simple naming convention needs to be specified. Whether .attrs is a good solution IDK, but this proposal does not to help it, right?