Adding a global config to specify package indexes

I’d like to propose introducing a standardized config for users to specify package indexes. Right now, each tool must have its own config, which is not optimal. Most users will have their custom indexes in pip.conf, but pypa/build, for example, does not have any knowledge of this. Having a shared standard config that all tools/components of the ecosystem could read would be great. This is something that you almost always want to be configured globally, so having a global config would make sense.

Related to Isolated venv does not inherit PIP index · Issue #270 · pypa/build · GitHub

4 Likes

Yes, someone please write a PEP for this. pip config files has been somewhat a pain to deal with now that we have so many workflow tools that use it under the hood, each of them having their own configuration mechanism. A standard configuration would be very useful. (Not sure what “global” means in the title; does it mean global to all package management tools, or global to all Python environments on a machine?)

3 Likes

I’d like to add: think of the credentials as well if possible.

For example pip and poetry seem to both use keyring but use different keys or whatever (it was quite a while ago that I looked at it, I am fuzzy on the details now).

2 Likes

I could take a stab at writing a PEP on this, although I haven’t done that before. And in this case I feel it’s very important to bring input from the different backends that e.g. read pip.conf today. Also the security aspect is going to be important too.

Some questions;

Doesn’t this also ask for being able to configure on multiple levels; interpreter version-global, interpreter version-specific, project-global and project-specific?

Potentially with the ability that a setting can be overridden; project → project-global → interpreter version → interpreter version-global?

I’m thinking this config could be used for other things down the line too. Maybe not limited to indexes and credentials. Would this mean it would be good to split this into two PEPs; one for the ability to configure and one specific to specifying indexes with credentials?

Would it be file based, or e.g. utilize operating system specifics (such as keyring)?
I’m in favor for file-based.

Would such a config be done file-based, perhaps in the toml format?

Would credentials use e.g. base64-encoding so to prevent storing them in clear text or is encryption something to strive for?

Should this indexes/credential config also support ssh keys?

1 Like

Agreed, I would love to see a standardised format that is used by all tools, for things like package indexes, network settings (proxy, certificates), credentials etc.

I assume “global” means “used by all tools”.

The question of “global to all environments” vs “per-environment settings” is a different matter, and something that should be considered (for example, do we want pip to retain the ability to set an index for a specific environment and/or via environment variables? If we do, and the standard solution doesn’t offer that, how would the two mechanisms interact?)

1 Like

Whoever tries to draw a PEP for does remember to include the extra index URL functionality.

2 Likes

Possibly related: Twine and flit use .pypirc to specify package indexes and (optionally) credentials:

http://packaging.python.org/specifications/pypirc.html

2 Likes

Ok, so to draft a PEP around this, the idea needs to be vetted (according to PEP-1). Should I start by posting the idea in the Ideas discussion - or should we try having this thread moved there?

1 Like

For packaging PEPs, the initial discussion can happen on the Packaging category, so that’s not needed.

2 Likes

I’d like to make sure the file location is correctly specified, so perhaps we can do it together. I do the file location and discovery part, and you do the file format. How does that sound?

1 Like

Absolutely, I’m going to need a lot of input and help on this. :+1:

1 Like

Please think about the order. I am providing an internal package manager via Artifactory for all kinds of stuff (JARs for Maven and Gradle, gems, npm packages and Python wheels) and really would be annoyed if the single dependency of a project tries to reach out into the internet.
So please let a user specified configuration (I.e. in their home directory or one specified as parameter during invocation) override anything provided in a package or something defined globally.

3 Likes

Some ideas… I’m thinking this requires two PEPs:

  1. A common tools config
  2. A package index config standard

A common tools config (PEP #1)

How about we could use a file in one or more of these locations:

  • /etc/python3/tools.toml - “system” settings (on Windows, maybe under %APPDATA%?)
  • $HOME/.config/python/tools.toml - “user” settings
  • $SCM_PROJECT/.python/tools.toml - “project” settings
  • environment variable PYTHON_TOOLS_CONFIG=/path/to/tools.toml ?

We could then define a supersedence schema where “project” overrides “user” and “user” overrides “system”. And if we think we want the environment variable, that would be overriding “project”.

All of these tools.toml files could have the ability to handle differentiation on a per-interpreter version. I guess this would be delegated to each tool.

Thoughts

  • The above caters not only for package indexes and could be used by any tool.
  • I’m not super fond of the file naming above (e.g. “tools.toml”)… but I figured this could get the discussion going.
  • All of the above means we are not using any existing config file/format, such as the .pypirc from packaging.
  • Can a Python package on PyPI be developed which parse these files and return the final config? Or should something be included in the standard library here, to help the tools fetch the config?

The structure of this tools.toml needs to primarily dictate how we separate the different tools’ configs. Perhaps something like:

[sometool]

  [sometool.server1]
  ip = "10.0.0.1"
  dc = "eqdc10"

  [sometool.server2]
  ip = "10.0.0.2"
  dc = "eqdc10"

Here we’d have to reserve a tool name for “internal” or “pep-defined” tools, so to avoid having a community PyPI package trying to claim the internal one.

Package index config (PEP #2)

Ideas

I think this one is a bit more difficult, and here we need to get a lot of input from the community on what the different tools needs are like.

I believe, at least with pip, you can define as many --extra-index-urls as possible.

At work, we develop a couple of packages which have names that today have no equivalent on PyPI.org. We want to make really sure that the package indexes are not set up in such a way that we by mistake start downloading/installing a package from PyPI with the same name, if one such package is created on PyPI.

So in this case, I would put such internal repos first in the priority order. But then comes the caveat of having to deal with timeouts or 404 errors for the tools, when they try to download all packages from that repo first…

Maybe, it could be worth giving this further thinking, whether there should be an order of priority in which the indexes will be used.

One idea is also to require clear-text passwords to be entered as base64-encoded strings?

Existing config files

.pypirc

This is what the $HOME/.pypirc today can look like, according to packaging:

[distutils]
index-servers =
    pypi
    testpypi
    private-repository

[pypi]
username = __token__
password = <PyPI token>

[testpypi]
username = __token__
password = <TestPyPI token>

[private-repository]
repository = <private-repository URL>
username = <private-repository username>
password = <private-repository password>

Keyring can be used to then save API tokens and passwords securely:

keyring set https://upload.pypi.org/legacy/ __token__
keyring set https://test.pypi.org/legacy/ __token__
keyring set <private-repository URL> <private-repository username>

Twine

Twine can read the .pypirc file, either in your home directory, or provided with the --config-file option. It also has Keyring support.

Poetry

Poetry has two configuration files:

  • ~/.config/pypoetry/config.toml - all poetry configuration, including package indexes
  • ~/.config/pypoetry/auth.html - clear-text credentials, unless keyring is used

Private repositories can be set up in the config.toml:

[repositories]

    [repositories.myrepo1]
    url = "https://foo.bar/simple"

Any custom repository will have precedence over PyPI by default, but this can be changed on a per-repository level.

And the clear text credentials are stored in the auth.toml:

[http-basic]

    [http-basic.myrepo1]
    username = "foo"
    password = "bar"

According to the Poetry docs on repository management, you can add certificates, define which repo is the primary or secondary. You can also disable the use of PyPI.org.

What should this “PEP #2” cover?

When looking at the .pypirc and Poetry, it seems to me that whatever we decide on here needs to be handshaked with a lot of popular tools for them to really want to adopt a pre-defined set of configuration settings.

I have a feeling that over time, this PEP will need to be updated, to add certain settings which will be needed by the tools. I am not familiar enough with this, but can a PEP be updated down the road?

But just by looking at the above, I guess we should try to fit the following into the spec:

  • Custom package index (equivalent of --extra-index-url)
    • Priority order
    • Take precedence over PyPI.org (true/false)
    • Certificates (values should be filepaths)
    • Proxy server?
    • Credentials
      • Clear text username
      • Base64-encoded password
      • Token instead of username/password
      • Keyring
      • Filepath to another .toml file which will hold the credentials data
1 Like

One other suggestion: keep credentials outside of the config.toml but reference any repositories by an id in another file which only contains the credentials.
By means of this providers of enterprise package managers may easily provide the configuration file which could be updated and people could more easily change passwords/credentials.

2 Likes

I think this could be all in one PEP. It would:

  • Introduce a config directory and define its discovery
  • Define a unified config file for tools
  • Define a config file for package indexes

Maybe the first point can be split into its own PEP, but IMO it should not be needed. Future PEPs could define more config files, similar to the metadata PEPs.

1 Like

Sounds good, I added this to my previous post.

1 Like

It’s fine with me to put it all into one single PEP, but I see a clear boundary which could be drawn if we want that. Maybe we can get some input from core devs here on if the scope is too big or if it’s fine to do just one PEP with all of it.

1 Like

Why does this need to be standardised?

I understand standardising on the package index configuration, but why does the tool-specific configuration need to be standardised like this?

1 Like

The tool specific configuration is +0 for me, we don’t need to do it, but it would be nice. The config directory discovery needs to be standardized because its users need to know where is it.

1 Like

As a pip developer, I have no idea yet what this means for pip, so I can’t answer that question. Are you asking us to remove all of pip’s mechanisms for specifying package indexes (command line options like --index-url, environment variables like $PIP_INDEX_URL, the hierarchy of locations for pip.ini) in favour of this new mechanism? That sounds like a pretty major change for us (and our users) all by itself…

If not, then how are you expecting this change to affect pip?

1 Like