… but if ModuleType
has a __call__
method, then every module shows as callable, even those that don’t have a __call__
.
This seems solveable, but as @encukou said, it’s a nasty little detail to get right.
… but if ModuleType
has a __call__
method, then every module shows as callable, even those that don’t have a __call__
.
This seems solveable, but as @encukou said, it’s a nasty little detail to get right.
Yes. ModuleType.__call__
is what callable(x)
looks at, and it’s very far from trivial to change that.
(Spoiler: you’ll meet CPython’s tp_call
slot as a miniboss in this rabbit hole!)
What if I want the static method to be overridable by subclasses?
What if it is actually a class method?
My point is that you can turn a function into a class without breaking your users, but you cannot turn a module into a class, so by using __call__
you reduce your future design possibilities.
How does documentation work with this proposal? Let’s take the pprint
example. Assuming the pprint
module becomes callable, yet exposes other APIs besides its primary calling point, what does help(pprint)
do?
Like others I think this would create more confusion than clarity.
Thanks. I guess I’m overall neutral on this. I doubt I’d find much use for it myself, but others seem to like it. Issues like callable()
need sorting out, and as some people have pointed out, it can (slightly) restrict your future design options compared to exporting a single function/class. But it’s a tool people can use, and like any tool if it doesn’t suit your needs, don’t use it.
The PEP might be more persuasive with a few more examples, but don’t worry about it on my account - I’ve read the examples here and formed my view.
I assume, by the way, that it’s obvious to everyone that deprecating (for example) from glob import glob
in favour of import glob
is a bad idea? Add the new approach if you want, but there’s no practical value to breaking all existing uses of the module. And there’s lots of people maintaining code that has to support multiple versions - we don’t want to force them to write something like (untested)
with warnings.catch_warnings():
# Suppress the deprecation warning - does this need to be more precise?
warnings.simplefilter("ignore")
try:
# Use the old approach if it's still available
from glob import glob
except ImportError:
import glob
# Phew! That was way too hard...
files = glob("*.py")
At a first glance, I feel excited about this.
If I’m understanding it correctly, would this help alleviate the boilerplate on .py
files that one would like to treat as both a script and a module, so that the script functionality can be imported and used else where?
I tend to agree that the feature would likely cause more confusion than do good.
It is often not obvious what calling a particular module would actually do and such uncertainty usually results in hard to find errors, e.g. you forget to add the API name and accidentally call the module object instead of the API you really want to call inside the module. Python would happily accept this, but your application could potentially exhibit unexpected behavior.
I also wonder how introspection would work on such modules:
I had experimented with making modules real class instances (with all the associated features, including making them callable) in Python in the early 2000s and used this to implement lazy imports. While the logic worked well and Python was indeed capable of handling module classes without changes or major problems, the idea never really took off.
I later replaced the logic with a direct implementation of lazy modules, not relying on custom importers and used that in e.g. mxDateTime.
I like this proposal, because I think it would indeed be useful for micropackages that expose a single function.
But if successive PEPs add __getattr__
, __call__
, __setattr__
, and maybe more, could it make sense to introduce some sort of annotation that simply marks an individual module as “class-like”, so that everything is covered in one go?
Call me +0, I guess.
I’d be happy enough to not mess up import datetime
vs from datetime import datetime
again, but chances are I’d just mess up an isinstance(x, datetime)
later on and wouldn’t be much happier. (Arguably I shouldn’t do the isinstance
, but I seem to be serializing/deserializing often enough that it’s usually justified.)
Documentation I’m not concerned about - I assume API designers care about their users and will write the documentation they need, so help(module)
still just returns module.__doc__
.
However, we probably need a clarification on style, specifically, capitalisation, but also verbiage. Mainly because the style-enforcers I’m concerned about are the ones who blindly follow rules, and if we don’t give them rules then they’ll invent them and try to impose them anyway. Module naming tends to follow different rules from functions and classes, which means callers are likely to know they’re calling a module, when really they should only be calling a callable without being concerned as to its type.
For example, if we did this to a thread
module,[1] would thread(...)
be calling start_new_thread()
or instantiating Thread
, and why doesn’t the name thread
give me any hints? Is this just a case where I shouldn’t consider making the module callable? Or should I consider renaming it? Or is it going to be considered Good Design going forward for all-lowercase nouns to start something?
(Note that I’m not suggesting PEP 8 updates, except as necessary for stdlib implementers. I’m suggesting a section in here that provides some guidance on how to name callable modules, which I hope will look like “name them like regular modules unless you need to name them differently”.)
Technically, I’m really not concerned. This should be straightforward enough, and introspection looking for “callable” before “module” will see it as callable and will need to adapt.
There are tricks you can do to make import foo
“return” an instance of your own class rather than a true module (with only the default importers running), but it’s a bit rough. I believe the module __getattr__
came out of discussions to make it easier, as it was the only case found to be important enough to justify that level of metaprogramming.
Doesn’t really exist, but blend threading
and _thread
in your mind. ↩︎
Slightly OT: Do we want to encourage micropackages though? (by @mitsuhiko)
A couple of really simple examples of this are decimal
and fractions
. The module is called decimal
, but the type (class) is called Decimal
. And that’s (currently) matching the style guides that say modules should be lowercase and classes capitalised. What’s the “correct” style for a callable decimal
module? And fractions
is even worse, as there’s the question of singular vs plural.
As I said above, I’m a strong -1 on using this feature in any existing stdlib module, but it’s useful to think about stdlib cases to understand the design concerns the proposal brings up.
from pprint import pprint
from types import ModuleType
import sys
class _CallableModule(ModuleType):
def __call__(self, *args, **kwargs):
return pprint(*args, **kwargs)
sys.modules[__name__].__class__ = _CallableModule
This can already be done to a certain degree.
Yes, aside from the occasional minor convenience, mostly this will just mess with people’s mental model of Python. Currently, there is a huge and easy to explain difference between import pprint
and from pprint import pprint
.
This PEP will put code reviewer in the awkward position of having to remember which modules have the call capability and remembering which version of Python that capability was added. For example, when is this code correct, import pprint; pprint(dir())
.
Also, the premise that modules have only one principal capability is dubious. A module may start that way but can grow over time.
There is also the matter of spelling. We typically capitalize class names while lower casing function names. This is a problem for cases like the graphlib
module that only features TopologicalSorter
. We really don’t want instantiation with ts = graphlib(*args)
. That would appear too much like a function call.
No, this is unrelated. “Treating as a script” means running code on import (i.e. import foo
automatically executes top-level code from foo
). “Making a module callable” means running code when the imported module is called (i.e. import foo
merely imports the module, you still have to run foo()
to actually call it).
I think saying that a callable module is going to confuse developers and/or IDEs is disingenuous. No one who has spent any time with the language is confused that a class may contain attributes, even as part of its API, in addition to being callable or that instances of that class also contain instance attributes, as well as retaining access to class attributes, all while potentially being itself again callable.
So far the attention has been on callable top level modules, which I personally believe could be useful, but callable submodules will also be possible with this and I think be an extra tool in any developer’s refactoring tool belt, especially as a single module package begins to outgrow its first file.
Thinking in terms of submodules also shows how using a callable module isn’t really any different than any other API, you have to actually read the documentation. If they had always been part of the language yet you had never before used datetime, what would you think datetime.datetime was? A submodule, class, or maybe factory function? The real answer is probably what you would guess last based on convention. Yet it doesn’t really matter, the docs said to call it and pass these arguments, and that about all you care about as a consumer.
It is usually incorrect to suggest that someone is being “disingenuous” in discussions like these even if it does seem that way to you at the time that you say it. It is better to take what others are saying in good faith and try to understand their perspective.
I already see a lot of confusion particularly for beginners around the distinction between a module and an object imported from a module. These two things are sometimes interchangeable but not always. For example in
import a.b.c
each of a
, b
and c
needs to be a package/module. However in
from a.b import c
it is ambiguous whether c
is a module or an attribute that is defined in the b
module. It is easy to get mixed up about what sort of object you have here especially if the c
module contains an object whose name is also c
. Personally I think it is unfortunate that this ambiguity in the import statement was allowed especially since there is a clearer alternative for the case where it is a module:
import a.b.c as c
To consider a concrete hypothetical example that has already been mentioned currently you should choose between these two:
# approach 1
import datetime
t = datetime.datetime(2000, 1, 1)
# approach 2
from datetime import datetime
t = datetime(2000, 1, 1)
You need to know whether you are importing the module or importing the name from the module and mixing them up mostly gives a clear error:
import datetime
t = datetime(2000, 1, 1) # TypeError: 'module' object is not callable
Following the suggested approach here you could make the datetime module callable so that this works. It would still be necessary to choose between import datetime
and from datetime import datetime
though because the other attributes of the module/class would only be available in one case rather than the other. More confusingly they do have some attributes in common (‘date’, ‘time’, ‘tzinfo’) so it could be easy to mess things up.
This suggestion also presumes that it is clear what callable someone would typically want when using datetime e.g. here I presumed the datetime constructor but in practice I am more likely to use other methods like datetime.now
or datetime.fromtimestamp
. For those I would still need to do from datetime import datetime
or otherwise we are faced with an awkward choice for exactly what callable should be the “default” function of the module (in face of ambiguity…)
Altogether I think that the suggestion of making modules callable with __call__
offers very little benefit and has the potential to introduce confusion by mixing up what should be a clear distinction between modules and the objects found in modules. I see no particularly compelling use case because anything that can be done with this can also be done just by importing a function and calling it which is less prone to confusion. There might be very esoteric cases where it is particularly useful to do this but for those maybe patching sys.modules is good enough.
In a parallel thread this proposal is being conflated with a suggestion to be able to define __setattr__
for modules. I think conflating these two proposals is unfortunate because that other proposal is precisely about preventing users from getting confused and accidentally setting attributes on a module when they should be importing an object from the module and setting attributes on that. The intention is to help users to see when they have mixed up their imports. This proposal on the other hand is precisely aimed at blurring the distinction between modules and attributes of modules which I don’t think is particularly helpful.
Yes, I should have phrased that better. Existing dev tools like help()
or autocomplete or IDE tool tips shouldn’t have any harder job with callable modules than they already do working with any current object that is callable yet also provides attributes, methods, or even indexing at the same time.
it is ambiguous whether
c
is a module or an attribute that is defined in theb
module. It is easy to get mixed up about what sort of object you have here especially if thec
module contains an object whose name is alsoc
. Personally I think it is unfortunate that this ambiguity in the import statement was allowed especially since there is a clearer alternative for the case where it is a module:
This is a good point and is probably why I am personally fine with callable modules, because this ambiguity already exists.
The datetime module is probably a poor example (one that I perpetuated) as there is not only one thing it does. pprint is stronger IMO in that, while it provides a half dozen functions and a class, there is one that is wanted nearly every time. It’s not about packages that already provide a callable under the same name, but packages with one obvious primary task.
This can already be done to a certain degree.
This was going to be my comment – the __class__
assignment trick is definitely more obscure, but OTOH it works today plus it allows users to use any dunder or descriptor on modules, not just __call__
. This isn’t necessarily a blocker – we still added the special __getattr__
and __setattr__
support despite those already being possible through __class__
assignment. But it changes the question – not “are callable modules worth supporting at all?”, but “do the benefits of making them easier to discover/use justify adding a second way to do things?”. (And it would also be nice if we came out of the discussion with more general principles about which module dunders are worth special-casing and which aren’t.)
This isn’t necessarily a blocker – we still added the special
__getattr__
and__setattr__
support despite those already being possible through__class__
assignment.
Is there __setattr__
support?
There is a parallel thread proposing to add that:
Since CPython 3.5 it’s possible to customize setting module attributes by setting __class__ attribute. Unfortunately, this coming with a measurable speed regression for attribute access: $ cat b.py x = 1 $ python -m timeit -r11 -s 'import b' 'b.x' 5000000 loops, best of 11: 48.8 nsec per loop $ cat c.py import sys, types x = 1 class _Foo(types.ModuleType): pass sys.modules[__name__].__class__ = _Foo $ python -m timeit -r11 -s 'import c' 'c.x' 2000000 loops, best of 11: 131 nsec per loop For r…
As mentioned there one downside of setting __class__
is that it slows down all attribute access and reading attributes from a module is very common (np.sin(...)
etc).
it would also be nice if we came out of the discussion with more general principles about which module dunders are worth special-casing and which aren’t.
In ordinary Python code modules are namespaces and their interface is only expected to provide attributes. The __getattr__
proposal (PEP 562) has a clear motivation around accessing deprecated attributes. PEP 562 also added __dir__
which is for listing attributes. The __setattr__
proposal in the other thread is motivated by wanting to disallow (or warn about) setting attributes in cases where it could be a likely user error to do so. I don’t see why any special support should be added to encourage defining modules that have unusual features besides attribute access: most other operations should generally be expected to give TypeError
.
Is there
__setattr__
support?
No, I misremembered
As mentioned there one downside of setting
__class__
is that it slows down all attribute access and reading attributes from a module is very common (np.sin(...)
etc).
Yeah, that is unfortunate. Maybe we can fix that though? In principle there’s no reason why a no-op subclass has to have slower attribute access. (IIRC it’s because we currently have a special-case fast path for ModuleType
, and otherwise we go through the full regular lookup chain. So we’d want some extra cleverness to apply that fast path to ModuleType
subclasses that don’t mess with attribute lookup, eg.) And the nice thing is that a pure optimization is much simpler to land than a new public/supported API.