Tensorflow is a library with an interesting solution for this. Tensorflow folder structure vs import structure have very little correspondence. The documentation refers to public API/expected user imports and many of the modules that are intended to be imported don’t have a matching file in folder structure at all. Instead tensorflow has a utility function, tf_export that is used like
And then you import and use it like tf.sqrt or tf.math.sqrt even though tensorflow/math.py may not exist at all as a file in the codebase. The real file/package structure is treated like internal detail and may change in new version without any notice in release notes. Only the documented name that is path of generated docs made by tf_export is stable. Almost everything is actually in tf.python.stuff but I don’t think tf.python existence is even part of public api at all.
it pollutes the external interface with the core symbol (which often needs to be deleted),
there’s redundancy in main_entry_point, which causes unnecessary churn if it’s changed, and
you may want to be able to import main_entry_point endogenously, but not exogenously.
All this proposal would do is remove the assignment to __all__ . If that warrants new syntax
You’re absolutely right about the top level proposal.
However, I took that proposal a bit further to try to maintain a concise external interface. Please look at the lengths to which major libraries go to do that. They are creating an entire parallel structure of files with just the external interface. Or the tensorflow idea that Mehdi linked with a decorator and some machinery to synthesize the external interface. If we’re going to look at this seriously, we shouldn’t stop at just creating __all__.
That seems like the crux of it…a serious version of this might go all-in on public, private, extern, export, etc etc . There are good reasons for a language to explicitly define all that stuff, and some nice languages are built on that model. But Python has never done that and I can’t imagine it starting now.
or some version of those keywords and their behavior ↩︎
So what? Seriously, in a language like Python which is based on the “consenting adults” principle, why is it important to do this?
As I said, “If that warrants new syntax, your tolerance for a bit of redundancy is a lot lower than mine is…”
I have no idea what that means.
Well, I’m afraid I disagree with that design choice. Python’s design deliberately reflects the file structure in the package structure. Creating an artificial structure feels very “non-Pythonic” to me. And the fact that it’s complex to do reinforces the idea that it’s not the intended approach. I routinely use the package structure to locate the source of a function in the source - if a.b.c.foo isn’t a function in a/b/c.py, then it’s much harder to find it, and as a result, there’s a maintenance burden and you’ve made it harder for users who need to look at the implementation to do so.
I know there’s a bigger push these days for projects to want to hide implementation details, make things “really private”, and lock down code so that the user can only interact with it in certain specific ways. And I guess for commercial projects where support contracts are involved, that might make sense. But it’s not the style of Python code that I like to use or maintain, and it’s not the open model (“consenting adults”) that Python’s success was built on.
If projects want to do that, fine. They can do so. But it’s not a useful goal to make it easy to do things that are not recommended.
The reason is that users will reach into libraries and the libraries are then—whether they like it or not—obligated to preserve those attributes. There’s a comment I read in Jax codebase somewhere that says “remove this when we can convince internal users to stop using it”. The attribute wasn’t exported in the interface (neither the docs nor in __all__). People just want to get their job done, and instead of filing an issue, they reached in for the symbol they wanted. That’s probably a part of the Jax team’s motivation to hide everything in _src.
Another reason is that it aids in discoverability with a UI that expands when the attribute operator is typed. A beautiful example of this is the new Numpy Array API (numpy.array_api), which has a very well thought-out and minimal interface (especially compared with Numpy itself).
What I mean is that your library may want to import a symbol from itself (using, say, a relative import), but you don’t want users of your library to be able to import that symbol.
I’m not motivating my idea with the possibility of having a different external and internal structure. I motivated with the ability of having an external interface that is narrower than the internal interface. And there’s plenty of precedent in popular Python libraries that go to great lengths to make that happen.
I routinely use the package structure to locate the source of a function in the source
Sure, but libraries are primarily for users—not for maintainers. Libraries like Numpy decide on an external interface that doesn’t reflect their internal structure because it makes life easier for their plethora of users even if it’s slightly harder for their developers.
(Also, if you’re having trouble finding symbols you may want to look into improving your IDE. I just switched to NeoVim, and Spectre is pretty great for navigation.)
Exactly. Well put. So this comes down to a perennial dialectic between progressivism (favoring progress based on current experience) versus conservatism (favoring preservation and limiting change in light of past experience). Both have their merits.
I think that making life easier for commercial projects to do what they’re doing anyway doesn’t put the “open model that Python’s success was built on” at risk. But I can understand your counterpoint.
“Not recommended”? Is there an admonition somewhere in the docs or PEPs against keeping interfaces minimal (through symbol deletion or parallel structure)?
Sorry, but I don’t agree. They can deprecate and remove the attributes if they can’t simply remove them. And if the functionality is important to the users, they can replace it with a form they are willing to support. I speak from experience - the whole of pip is marked as “internal only”, and yet people still write code that imports pip. It’s not ruined our ability to maintain pip, although we do have to work on setting user expectations, and occasionally making changes more slowly than we’d prefer. I’m not saying the Jax project are wrong, just that we shouldn’t normalise the sort of adversarial relationship they seem to have with their users.
Why not put the actual code in an internal module, and keep the user-facing modules clean, with nothing but re-exports of the documented API? Isn’t that what people do now, and the only complaint there is “having to list everything in __all__ is annoying”?
Sorry, I’m old-school here. IMO the external interface is what the documentation says it is. If you care about your users restricting themselves to your supported interface, then document it well. Oh, and design it well - in my experience, people reach for the source code and look at internal details only when using the documented API fails them somehow.
I was speaking as a user. I’ll often read a library’s source to get insights into how it works. That’s a large part of how I learned to write code, and how I became an open source contributor. Reading the code is IMO something we want to encourage users to do more of. Reading the code isn’t the problem, it’s thinking that “if I can see it, I’m allowed to use it” - which leads to the sort of adversarial relationship between users and maintainers that I believe open source should be discouraging, not encouraging.
Libraries like Numpy have incredibly good documentation. I wouldn’t look at their source code as a user. But nor would I ever use an undocumented or internal function, precisely because I wouldn’t read the source code. Which in turn is because I’d be using the documented functions.
I’ve used Java in the past. And I’ve worked with people who claimed to be “Java experts” but were lost outside Eclipse. Don’t ever suggest to me that “a better IDE” is the way to solve code management and maintenance problems
More seriously, I often look at code in very limited environments - web pages, on servers with limited toolsets available. And I’ve had to deliver results - it wasn’t just casually browsing. In those environments, “improve your IDE” isn’t an option, and the suggestion definitely isn’t helpful.
Fair. Maybe my age is showing
The problem is that commercial projects often (nearly always!) depend on open source libraries. And they have reasonable arguments for wanting their dependencies to work consistently with their model - which leads to pressure to add these features to smaller open source libraries. (Yes, I’m still stressed from dealing with “please add type hints to your library” requests…)
I meant trying to enforce privacy. And while there’s no formal document, I was specifically referring to the “consenting adults” principle, which is widely accepted enough that I think it counts as “not recommending” enforced privacy.
I speak as someone who’s bad at documentation, so to be clear, I’m not setting the bar very high here ↩︎
I think you’ve misunderstood the problem in my example:
An internal attribute is exposed.
Users start using that attribute.
The internal attribute needs to be removed, but can’t be because users are using it.
Yes, of course, you can now deprecate it, but this means leaving dead code in a large library. And as other internal things change, there are ongoing support costs to maintaining this deprecated function.
The solution that major libraries are taking is not to expose internal attributes in the first place, which mitigates the problem from the start.
Like I said, this is a question of progressivism versus conservatism. I understand that you don’t want to normalize their interface minimization based on your personal values about what is Pythonic.
The complaint is that it’s a maintenance burden because it’s not just one internal module. It’s an entire parallel structure of internal packages and modules. Please have a look at the Jax codebase: https://github.com/google/jax/tree/main/jax/_src .
I understand, but there are other users do that. It makes sense that your preferred solution is to document things well because you don’t use undocumented functions.
Yes, I know that that’s how you feel I’ve also noticed that I tend to be on the other side of our pleasant discussions: I tend to be drawn to attractive changes despite their burdens (e.g., type hints). I think your voice definitely reminds me of a lot of users who share your way of thinking. You definitely raise important points.
It’s unclear to me that pip itself doesn’t also do this approach of having codebase structure that tries to hide implementation. Otherwise why is almost all of pip’s non vendored code written inside a folder called _internal? If documented public api is sufficient then why not avoid having internal folder structure and have more direct pip/commands.py, pip/install_command.py, etc? For pip there’s also extra special thing of it not having any programmatic public api. For other libraries they commonly have some public api and want to have some thing comparable to pip’s usage of _internal to make implementation clear as not intended for usage.
CPython core itself also has some maintainance burden from this issue. There are many undocumented internal methods in standard library. If a method has some mild usage in other open source codebases it is common for there to be avoidance/debate to not just remove it without deprecation period. There is a conflict between being able to follow statement that internal undocumented apis are free to be refactored and do not need to be maintained with minimizing issues for your user base. I don’t think this is an open source vs commercial dichotomy either. Usual examples of reasons to not just remove undocumented CPython method come from looking at other pypi packages usage of it/open source grep like scan (source graph).
Because naming conventions are also a way to express intent. We’ve got drawn down the “document the interface” path, but “give stuff names with underscores if you want to signal to users not to use them” is just as valid. And just as likely to trigger the same debates
The reason for pip’s _internal module is because we were being particularly heavy-handed about getting our message across
Agreed, absolutely. My only point here is that the burdens aren’t sufficient to warrant a new language construct, that’s all. And how much burden something is, is inherently subjective. If the projects we’re discussing here were actively asking for a feature like this (rather than us speculating that it might be useful for them) that would make a significant difference to the balance.
I’m mostly questioning your use of the term “exposed”. If the attribute is undocumented, and maybe even named with an initial underscore, how “exposed” is it? The users need to look at the source code, infer the purpose of the attribute from its usage, and choose to use it in spite of it not being in the documented interface.
Personally, I only do that if I can’t find a better (i.e., documented) way of doing what I want. And I’d be well aware that I was using something I wasn’t intended to use. Yes, I may come to rely on that attribute, and I may have significant problems if it were removed or changed. But I certainly wouldn’t be able to delude myself that it wasn’t always a possibility that this could happen. And if I didn’t ask the developers for a supported way of doing what I wanted, then that’s also on me.
But again, maybe I’m focused too much on the open source ethos as I see it, where anything a library developer provides should be viewed as a gift with no hidden commitments. But when a user is under a project deadline, and the library they are using isn’t doing what they want, it’s hard to work to that model - sure, you might provide your library as a no-commitment gift, but I have paying customers with a support contract.
Phew! I’m glad that at least I don’t come across as a complete PITA
In my personal projects and interests (both Python and otherwise), I’m actually much more progressive than I come across here. But all of my Python work is completely hobby-based, and as such I tend to be rather defensive of the principle that people shouldn’t assume commitments that don’t exist - not least because the truth is that I do care a lot, and as a result I’m continually in danger of spending too much time on things I should simply drop or scale back.
(But having said that, I’ve never found enforced privacy mechanisms in other languages like Java or C++ to be particularly beneficial, so on this specific topic, I think Python’s existing informal approach is fundamentally better).
@public is a simple solution that requires no changes to Python. The fact that it’s a decorator is a plus, because it keeps the export intentions close to the code. And for things that can’t be decorated, there’s good functional support.
I actually floated the idea of just adding it as a built-in several years ago. There were two criticisms, which I think have some merit, but personally aren’t enough to quash the idea for me .
__all__ is good enough and it’s only there for from import * anyway, which no one should use outside of an interactive interpreter session.
There’s a small import-time overhead for executing the decorator.
Also, using names prefixed by an underscore has been mentioned in this thread as an alternative to defining __all__. Note that it’s not just a convention. It’s part of the language specification. If __all__ isn’t defined, then * imports exclude names that start with underscore.
I believe you’re mistaken. Have you actually tested it?
Create a directory c, and populate it with an empty b.py, and an __init__.py that reads:
from .b import *
Now, in a local program or python shell, do:
print(c.b) # <module 'c.b' from '.../c/b.py'>
Yes, perfectly reasonable, but major libraries don’t consider this approach to be good enough. Plenty of global symbols are created that don’t have underscores including the namespace pollution from relative imports.
Sorry, I wasn’t thinking through the entire problem. For example, from multiprocessing import shared_memory doesn’t add a local “multiprocessing” name, but it does add “shared_memory” to the multiprocessing namespace. So of course from .b import * adds b to the package namespace when b gets imported.
Thanks for the public package. I would very much like something along those lines in the standard library, and I think it makes any new syntax unnecessary. That we do need a good way for tighter control of visible symbols is probably obvious by now to anyone who writes library packages (Hyrum’s law and all that).
I remember some people weren’t fond of the syntax for non-function variables. I’m ok with it, honestly.
My biggest issue is with __init__.py files where you have to specify __all__ because a subset of your users will use from <yourlib> import *. This problem cannot be solved by a custom decorator or a contextmanager because type checkers such as pyright will not support it and will mark symbols from my __init__.py files as “not exported”.
P.S. I personally love public and would settle on it if it received tooling support but I doubt pyright and tools like it will make an exception for public which is why I want something like a syntax.
That’s a very worrying, but unfortunately not unreasonable, justification for dedicated syntax. The fact that static analysis has enabled the sort of IDE support that we see nowadays for Python is absolutely awesome, but the pressure it puts on features to be “built into the language” or at least “part of the stdlib” in order to get support is concerning, because putting stuff into the language/stdlib effectively freezes the design, and makes it really difficult to correct mistakes or continue to innovate.
I don’t consider syntax a requirement for typing/static analysis. Typing PEPs commonly introduce new features without syntax. Being part of standard library is generally expected though. I also think new typing syntax generally only makes sense when there’s a way to do it without syntax that is commonly used and serves as evidence that promoting it to syntax is worthwhile.
Deprecated/override decorators are two pretty recent peps using decorator for typing purposes. A new public decorator sounds reasonable. Typing community would need to be aware of expectation and may need a pep with a few paragraphs explaining how type checkers should treat the new decorator. Normally new typing features/standards are covered in a pep and then discussed here + typing-sig.