Code is slightly easier to read if imports of certain packages use conventional names. E.g., Numpy is often imported as np.
Problem
Optionally, linters like Ruff already warn when libraries don’t use conventional names, but rely on the importing code to specify what the conventional names are. This requires importing code to contain a listing of conventional names, which needs to be
researched to find out what the conventional name is. E.g., scipy.special is conventionally imported within Scipy examples as sc, and
Somewhere in pyproject.toml, add a mapping that maps sub-packages to conventional import names. Thus, a string to string mapping. E.g., Jax might define:
This would allow linters like Ruff to use this mapping as an additional source of conventional import names that spares the importing library from having to determine and maintain this mapping.
Firstly how do you suppose that this information from jax’s pyproject.toml reaches ruff in an environment where jax is installed?
The pyproject.toml file from the jax sdist does not literally get installed when you pip install jax. The typical process by which some other metadata from pyproject.toml becomes available for an installed package is that the build backend takes the metadata from pyproject.toml and moves it into the wheel metadata and then the installer moves that metadata into site-packages. This metadata at each stage is specified by various versioned standards and PEPs e.g. see here. It might sound simple to add this piece of metadata to the pipeline but I think it would require updates to standards and much tooling and I suspect that the use case here would not be deemed worth the effort.
A much simpler option that does not require changing core metadata, all build backends etc would be to just say that jax could add an in-package file like jax/import_names.txt and then ruff could check that. At this point though you don’t really need any standard as such because it is just a question of ruff supporting this and then jax adding the file. The place to go suggest this would be in the ruff/jax issue trackers rather than here.
In general though what you are proposing is that the behaviour of a linter would be different depending on which packages are installed. That is typically not a desired behaviour for a linter because we want to have reproducible results across different developer machines, CI etc.
I would say that a better place to have this is just that it is something that could be included in per-project configuration. By per-project I don’t mean that the jax project would add this but rather that you as a maintainer of a project downstream from jax could configure ruff (or another linter) somehow to do this.
I think it would be nicer to specify everything in pyproject.toml rather than scanning through disparate files whenever things change.
However, you could have the build backend generate a file like import_names.txt from the metadata in pyproject.
I don’t see how this applies. Plenty of linter options depend on specifications within libraries. Typing, for example, depends on specifications within a library. If a library marks something as final, then that affects how the linter lints code that uses that library. It’s the same thing here.
And logically, I think this kind of setting does belong on the imported library and not on the importing library for the reasons I mentioned in the post, namely that every importing library has to research and maintain the conventional import names, which is tedious.
Suppose you have jax installed and I don’t. I run the linter and there are no errors but you run the linter and you see errors. I push to my pull request and CI fails because of errors that I don’t see on my machine. As a linter user I am frustrated by the fact that my pull request fails but I can’t reproduce the failure locally.
I see. You’re saying if you have Jax imports in your code, but you don’t have Jax installed.
Yes, but I think that should be a linter error, and I think that should fail CI.
As a linter user I am frustrated by the fact that my pull request fails but I can’t reproduce the failure locally.
Ideally, you should be testing by first installing something like pip install -e .[test], which should give you the same environment as the CI. Or, if you’re using a build tool, it might be something like uv sync --extras=test.
I don’t think this is unique. You will already get different type errors depending on which libraries you have installed. You should be syncing to the CI environment before testing, linting, and type-checking.
A bit off-topic, but as the author of a static analysis tool, I wish this convention would disappear. Looking at your example: xpx, eqx, it, jnp, jr, jss, nx, npt, optx, onp, pg, sc, sns… these variable names are the opposite of what best-practices recommend regarding readability.
I understand the will to reduce repetition in code, like not having to write jax.scipy.special.thing1, jax.scipy.special.thing2 over and over again, but IMO aliasing to very short variable names should not be encouraged, and left as a user decision. I would myself import it like this: from jax.scipy import special as jax_special, or just special (no aliasing) if that variable name is not ambiguous in the context of my module.
(By the way, is jax.scipy.special just an indirection to scipy.special? Another convention I would like to see disappear, as it’s even worse for static analysis tools.)
I too expose public API from a top-level module, griffe, but I would never tell users to do import griffe as g. IMO numpy is short enough, pandas is short enough, and if you have to repeat it more than you tolerate, then you can always import objects instead of using the namespace: from numpy import this, that, from pandas import this, that.
Interestingly, this convention seems to mostly (only?) be used in the science community, and I’m not sure why. It feels like a way to conform to the style of your peers rather than an informed decision. I’m very ignorant on this matter though: do these libraries provide documentation on why they recommend their users to alias their namespaces with very short variable names?
To get back to the topic at hand, and given the above, I would oppose to encouraging such practice further, especially through standards. Happy to elaborate. And sorry if that sounds like a rant: it comes from frustration having to handle this and not clearly understanding the benefits
For me, I think the benefit of everyone using the same shorthand names is that you can copy/paste code from examples/stackoverflow/other pieces of code without having to find+replace np ↔ numpy. That problem would never have existed though if we’d just stuck with import numpy.
I don’t see this as something that should be standardized. If communities naturally settle on an import convention, linters can add rules for them. This will also keep the bar high for new import aliases, which it should be, as any new rule will have to go through a review from the linter’s maintainers.
Allowing project authors to define arbitrary import aliases sounds like it will fracture things more. I could publish a project called foobar and solely decide on my own that it should be imported as fbr. And now linters will have to enforce it on everyone. Essentially, we are letting arbitrary third-party projects add arbitrary rules to your linter solely because you happened to install it. This also means that you can introduce lint errors by updating dependencies.
The numpy. clutters the mathematical expression. It is just easier to look at something like this and compare it with a properly rendered equation from somewhere:
The problem though is having lots of functions like sin and cos from different modules that are not compatible. A constant source of user confusion comes from mixing up things like math.sin vs numpy.sin vs sympy.sin etc. So then you don’t want to recommend
from numpy import sin, cos
because much experience has shown that this leads to significant novice confusion. Using import numpy as np is a compromise position: the namespace has to be there but let’s make it as short as possible if we are going to have to repeat it many times in one line and throughout much of our code:
The convention is wide enough spread now that you can show a code snippet with np.cos etc and everyone knows what you mean without needing to spell out all the import numpy as np.
Then many other libraries end up intentionally providing a very similar interface to numpy and there is a desire to make usage look similar e.g. jax’s jnp. Just similar enough and just different enough that you can understand it by analogy with np while remembering clearly that it is not numpy. I’m not sure all of the short-hands are as reasonable though.
That’s your opinion. I would say that whether or not long names are more readable is context-dependent. If you have to repeat the same long name many times that it can easily make it harder to read the code. It also depends very much on how often a particular name is used. Long descriptive names are better for things that are rarely used.
When something is used all the time it makes more sense to expect that a reader will be able to remember the meaning of a short and otherwise cryptic name. No np is not readable to the uninitiated but then neither is numpy. You have to learn what these things are before they have any meaning.
Maybe we could use fully descriptive names and do away with the cryptic operators like * etc so people don’t have to learn what those mean either:
There we go that’s much more readable. I even followed best practice and used an autoformatter because it is well known that those will always make everything readable no matter how long all those variable and function names are. Alternatively maybe it would be more readable if we split this expression into several assignments over many lines with suitably long names for the intermediate variables. I am only partly joking: I actually see a lot of code that looks like this and which seems completely unreadable to me but is presumably considered “good practice” because it uses “readable” names.
There is a reason that equations often use single letter names for variables: their purpose is to express relationships. Long names make it easier to intuit what the different variables are but harder to see what the relationships are between those variables. A simple equation like
profit = income - outgoings
can happily use full words for the variables. If you don’t shorten things though then even fairly simple mathematical formulae become unreadable. Ideally we would do away with np.sin(x) and just have sin(x) but unfortunately that has already been shown to be too confusing.
Squirt. Spray. You take a number and split it into lots of tiny pieces (“atomize”) it, and then spread them out over some distance. Once the area covered equals the original number, the distance that you had to spread them out is called that number’s “squirt”. For example, the squirt of nine is three, because if you squirt particles out three meters, they will cover an area of nine square meters.
Thanks @oscarbenjamin, that makes sense. Not all science-related libraries have this level of repetition/overlap though.
I suppose this convention also comes from the tools commonly used in the field, like Jupyter notebooks: you want to have all the functions/classes available at once, without having to update and re-execute the cell containing imports all the time. In short, this convention plays well with the “exploration” use-case.
I don’t think this kind of thing needs to be standardized. By their nature these are conventions, not part of the actual API of the library.
That seems like overkill to me and I would prefer that linters stop doing that rather than that we support it with additional metadata. The fact that I can import a library under any alias I want is a feature of Python; it means people can choose the names they want to use. If people want to create conventions around that in certain cases that’s fine, but it’s still fine for someone to use other names if they prefer and no proscription should be baked into the library metadata.
This seems like a perfectly cromulent lint rule for a group to specify within their own project, and not at all something that needs to be standardized across all projects.
I don’t want my co-workers/collaborators to come up with their own creative abbreviations for common modules. Our style guide should specify the convention we all adhere to, and a linter can be configured to enforce it. But that convention doesn’t need to be shared with anyone else.
It is a user decision. Did you think anyone was suggesting otherwise?
No. It’s a version of scipy.special that works with Jax’s array type.
Conforming to the style of your peers is a good idea. It makes your code easier to read by your peers. Unless you have a very good reason, you should be writing code in the most idiomatic way possible so that it’s easy to read. But I’m not interested in convincing you of this. Please keep writing code however you want to.
Sorry, but I don’t think it makes any sense to discourage people from writing idiomatic code that matches their peers.
In your own projects, you can write code however you like. You don’t have to enable linter settings that you don’t like.
If someone wants their project to have uniform imports, that’s their decision. And if they can’t turn on a linter rule, that means that contributors to that project end up having their submissions corrected in the code review. That is more work for everyone. It’s more work for the reviewer, and it’s more work for the contributor who would have more quickly discovered the error when running the linter.
Trying to stand in the way of other people having access to linter rules they want doesn’t make any sense to me. It’s also so far afield and totally irrelevant to this idea.
The rules already exist in two linters. The point of this feature is to make it easier to collect the standard import names, which the linters can’t realistically maintain.
No linters don’t have to do anything: Linter rules are optional. And even if someone decides that they want to turn a rule on, any user who doesn’t like your decision can simply override it in their project by adding a line: foobar = "foo" (or whatever they like).
Yes, exactly, it’s a convention. But somehow, the user has to find out the convention. And that’s why I suggested that the library should just tell you what it is to save you the trouble of finding it out and specifying it.
As for it being part of the API, I agree with you. That’s why I didn’t suggest putting it in any module. I suggested putting into pyproject.toml along with other project metadata. However, as @oscarbenjamin pointed out, there isn’t (yet) a clear path for that metadata to be transmitted from the project metadata to the installed library.
If only we had some kind of metadata.toml that was produced by the build system from pyproject.toml. This metadata.toml could also contain an entry like typed = True to replace the py.typed file. And who knows, it could contain other kinds of project metadata?
Then you don’t have to use it. Linter rules are optional.
What do you care which linter rules other people enable? You can do whatever you like in your projects. In large projects with dozens of contributors, uniformity is a huge benefit.
I don’t understand preventing other people from having access to linter rules that you don’t like.
Sorry, but I don’t see what you’re getting at here. You can write whatever code you like. No one is stopping you.
The purpose of sharing how your modules should be imported is to save the users who care about consistent imports (and only those users) the trouble of finding it out. Everyone is free to write whatever code they like. Uniformity is optional. What this feature does is make this optional uniformity easier.
Two different codebases might use different conventions for a given import, though. It’d be unconventional but it can happen, e.g. the normal abbreviation is already taken by a more-relevant import, or two packages define the same abbreviation.
It just seems like something that a project would want to configure in their own linter, and there’s not that a great need to codify it in the project metadata, where it will only ever be a suggestion.
Projects can override the setting in their own linter options already. This would save them the trouble in 99% of cases. It is already extremely unlikely for there to be collisions since the existing conventions tend to gravitate away from collisions.
Thanks for your replies. My initial reaction wasn’t very open, my apologies. Let me try again.
What would it look like for a user wanting to enable a conventional import name rule for one or more specific packages? Let say they import from Jax and use Ruff to lint their code. Something like this?