Conventional import name specification

NeilGirdhar · April 6, 2025, 2:46pm

The way Ruff enables rules is through the select mapping:

[tool.ruff.lint]
select = ["unconventional-import-alias"]

Then you use [tool.ruff.lint.flake8-import-conventions.extend-aliases] to add aliases.

The proposal in this OP is that you would not need to specify extend-aliases because it would gather all the aliases from the packages, thus saving you from a maintenance burden. You should not have to do anything like conventional-import-names = [ "jax"].

pf_moore · April 6, 2025, 4:03pm

This all seems like it’s purely a tool UI issue, not something that should be (or needs to be) standardised.

If you did want to pursue it as a standard, though, some questions come to mind:

How would I enable this check in flake8? In pycodestyle? Or is this just a ruff feature?
How would I disable the check for one or more particular modules?
How would I override the check for a small section of my code (for example, if I had a local variable called np in one of my functions, so I chose to use the full numpy name in that module)?

You could argue that all of these are tool UI matters, but that’s my point - this whole proposal feels like a tool UI matter to me.

Taking this as purely a matter of what could be standardised, I can imagine a standard for storing tool-specific data in the installed metadata for a project. This might take the form of a tool subdirectory under the project’s .dist-info directory. That would be a plausible proposal, but it would need motivating examples to justify it. You could start with this case, if you have support from the ruff maintainers for using it, and support from projects like numpy, Pandas and jax stating that they would be willing to publish such metadata. But one use case is unlikely to be enough to justify a new feature, so what other tools would find this useful to allow them to implement features that they would like to add?

NeilGirdhar · April 6, 2025, 4:42pm

Yes, all of those questions are tool UI matters that would be determined by particular linters.

Yes, “exported metadata” would be awesome. Thanks for providing the expert details for what that would look like.

Yes, that makes perfect sense.

Yes, I 100% agree with you. When Oscar made the point that there is currently no “exported metadata” (the feature you’re talking about), I realized that my proposal would probably be unlikely to justify it. So, we’re all agreed.

The one use case for project metadata I could see is declaring which packages are typed. E.g., Scipy might specify typed = ["scipy.special"], and Numpy might say typed = true (everything). This would replace the py.typed file that we currently have. It’s still not enough justification for exported metatada though.

I think we should defer this proposal for the time being, and wait to see if other uses of exported metadata come up over the next few years. (You’re a bit of an expert in this domain, maybe you can come up with some good ones?) If we gather enough use cases of exported metadata, we can reconsider this proposal. What do you think?

pf_moore · April 6, 2025, 5:11pm

Nope, my point here is that I don’t think there are sufficient use cases to justify this. It’s barely possible it might be added on the basis that giving tools the option could be an enabling feature, but that’s unlikely to happen except as part of some other PEP dealing with files in .dist-info. There’s currently the SBOM data PEP, but I don’t think there’s the appetite there for a broader overhaul of project metadata files.

Agreed, with the proviso that the “we” in that sentence is doing a lot of work. No-one is likely to collect such use cases without a goal in mind, and I don’t see anyone being particularly motivated to push this forward for its own sake. But if someone does want to take this on, that’s the way to do it.

NeilGirdhar · April 6, 2025, 5:22pm

Yes, that’s what I’m saying.

oscarbenjamin · April 6, 2025, 6:32pm

Specifically what use cases would be needed for is for storing non-packaging data in the .dist-info directory rather than the top level import package in site-packages. Bear in mind that these things do not necessarily have the same name and are not even in one-to-one correspondence. A project can be called e.g. jaxlib on PyPI and have a jaxlib-version.dist-info directory while providing a top-level import package called jax so you would pip install jaxlib but import jax.

The .dist-info directory is typically only looked at by packaging tools whereas other tools typically only look in the import package. You are proposing to put non-packaging metadata into the .dist-info directory and the obvious question is: why not put it in the import package?

The reason that py.typed is in the import package is because that is where a type checker looks to go find the imported module. It makes sense that the type checker looks there for the py.typed file rather than scanning all the .dist-info directories to see which one refers to a given importable name. The type checker doesn’t need to care about anything to do with packaging or installation: it just knows that some modules are imported and knows how to find them like the interpreter does using sys.path.

I suggested before that you could have a jax/import_names.txt: why would the import conventions for the jax module be anywhere other than in the import package?

pf_moore · April 6, 2025, 7:57pm

This is a very good point, and having framed it like that I agree that making this into packaging metadata is probably not the best approach. Instead, I’d suggest that this would be better handled by ruff recognising a particular named file in the import package. That can be purely a ruff convention, unless other linters decide it’s of value, at which point the various linters could agree on a shared convention and format. That way, it doesn’t need to be something that any of the standards bodies (language or packaging) need to be involved in. And conversely, if it can’t get traction as an informal linter convention, it’s unlikely to get much support as a standard.

Pierre-Sassoulas · April 6, 2025, 9:23pm

As far as I know this feature was never requested in pylint. If the standard abbreviation is standardized somewhere official (i.e. maintainer can put that information in the metadata, or the user’s project’s metadata contain information about import aliases), then surely this will ends up being implemented in pylint because it will be an official python standard that pylint would follow. But we didn’t have any demands for that yet.

The convention of pd for pandas and np for numpy are well established, and it’s hard to code without it because numpy or pandas code is often math with complex formulas that become unreadable very fast with full import, so it makes sense for linter to enforce it. I don’t know if it’s a good convention for other libraries where you should not want to be using an import 8 times in a single line. But If someone were to open an issue for that in pylint we would label it enhancement, realize that we need to make it generic and user configurable, and wait for someone to implement it.

What would not make sense though would be for the pandas team to be able to change from pd to p in a release, then to pan in the next through packaging metadata and for linters to blindly follow that and have different results based on the version of pandas installed for the exact same linter’s version. Then having to add a user option of their own because users want to override the default and import xxxxxxxxxxx as x forever even if their dependency maintainers change their mind.

Imo the default (numpy => np, pandas => pd) should be set in the linter for a very few package where it’s an accepted community convention, and never ever change. So, I think if a configuration to override the convention were to be added, it should either be in the user’s pyproject.toml “project.import-aliases” entry (or whatever is the best name for it), or in the linter’s own section (so nothing to do, and everything happens in each linters as Paul Moore suggested).

chepner · April 7, 2025, 1:18pm

I consider it a feature that there’s at least something forcing you to look at the code you are copying instead of just blindly assuming it will work.

NeilGirdhar · April 7, 2025, 2:57pm

I don’t think that’s likely to happen.

But if, in some odd universe, something like that does happen, then it’s just like any API change where client code that wants to adapt is free to adapt, and linter tools help client code that wants to adapt to do so.

Ultimately there’s a balance that different users will choose between setting fewer options versus setting more options.

The issue is that there are more conventions than just “a very few”. If you start using the new array_api_extra, then the convention is xpx. Unless you’re familiar with the package, you’ll either have to research that, or just guess what you like. I’d rather just have my linter tell me what to do since I don’t care what the convention is; I just want to follow it.

And more importantly, I don’t want to commit all these abbreviations to memory. I just want to guess anything and have the linter correct me. The linter acts as guard rails so that I can focus on the coding decisions that matter.

sirosen · April 8, 2025, 6:51am

In that case, why not implement a linter which enforces these rules for you? It can be done quite easily using only AST.
If you’re looking for prior art to copy, I’ll plug my own flake8-typing-as-t, which implements this kind of rule as a flake8 plugin.

ruff isn’t the only linter. It’s just a very popular one. If these naming rules are important to you, why not write something which works for your needs and see if it gains popularity?

(And in case speed is going to come up, as it so often does with ruff, I’ll note that such an AST lint is quite fast on codebases under 100K SLOC and can be run under pre-commit for even lower impact. So please consider the practical impacts of speed differences here, not just the theoretical truism that faster is better.)

I get that it would be nice if existing tools did this for you, but it’s not like you’re unable to act on this if you feel strongly. The ability to implement your own tools has long been one of the joys of Python – let’s not lose sight of that.

There’s another issue at play in this thread, which is that various libraries probably should not be involved in the linting rules which apply to their users, where that’s avoidable. Or at least, I think so and others seem to think so as well, possibly for different reasons.

Having every installed package contribute to your linting is a lot of exposure to unpredictable behaviors. For example, consider what will happen if a user has a distro-packaged version of numpy installed – it contributes to lint rules, but it’s always available and at a fixed version. If your project uses a different numpy version but you don’t install numpy when linting, you’ll get a set of rules, but not the right ones. That sort of thing isn’t a big deal to experienced users but it’s completely baffling for novices.

This is already the case for type annotations, and one of the implications of that is that when running a type checker, you must have all of your dependencies installed. And if you support multiple major versions of a dependency, you actually need to type check against each version, which quickly gets messy. Applying that same requirement to a broader class of lints is a pretty significant issue.

Once we consider how much cost there is to having libraries publish their own lint rules, do we see a set of benefits which counterbalance those costs? I don’t.

NeilGirdhar · April 8, 2025, 7:28am

Sorry, but this suggestion has nothing to do with the post at all. The OP is about libraries providing their own metadata to mitigate having to configure them. The linting part is already handled by existing tools.

This was already discussed above. And, it’s already the case that linter rules can depend on installed dependencies. So, yes, you should already be installing the development dependencies when running linters. That’s not special for this issue.

This is already the case for linting and type checking. Nothing has changed with that.

That’s not an accurate description of what I’m asking for. I’m asking for libraries to publish metadata that is useful for linting—just like they publish metadata that is useful for typing.

And I’ve already accepted that it may not be worth the cost for this feature alone.

sirosen · April 8, 2025, 8:16am

I meant that suggestion very much in earnest. So calling it ridiculous feels at least a tiny bit insulting. Perhaps you are significantly overestimating how hard it is to write an AST linter?

flake8-typing-as-t is about 80 lines of code and handles version dispatch for typing_extensions imports gracefully – meaning it’s more sophisticated than some of the common cases.

You can do the simplest version of “numpy as np” linting in about 10 lines of code. Add in a nice CLI and it will grow a bit in size.

To be blunt though, I’m not that interested in continuing this conversation since my input is apparently unwelcome.

NeilGirdhar · April 8, 2025, 8:18am

I’m sorry, but did you even read the post? The linter is already available since Ruff does it.

This post is about libraries providing metadata that is useful to linting.

Your input is always welcome when you make an effort to read some of the thread before participating. I think that reading the post before contributing would make sense, right?