Sorry to sound so negative - I’m genuinely confused
as to what this is trying to offer.
No worries, it didn’t come across as negative at all; you brought up
points that we should have addressed, but writing is hard since we don’t
know what readers expect to know, hence this topic here.
It’s a “curated” index - who is doing the curation,
and what are the criteria?
Someone you trust, and the criteria you deem fit. To make it
less confusing, let’s refer to the tool chain as IPWHL and the sample
index (e.g. git.sr.ht/~cnx/ipwhl-data) as floating cheeses.
Floating cheeses’ policies are not as strict as we wish to be,
for the moment they include:
- The project is valid, e.g. not a typo squatting attempt.
- The built distribution is either built or verifiably so
from the version-controlled source (if only reproducible wheels
are common in the wild!).
- The difference to the previous version does not contain
any suspicious change.
A curated index presumably works on the basis that users trust
the curation - how is that trust established here?
I see no immediate reason why I should trust this index.
Trust is inherently social and should be established accordingly,
e.g. you and I have some level of mutual trust because we extendedly
interacted before (although the level should not be very high
since we have not communicated regularly for almost two years),
Huy is an IRL friend of mine so you two may or (may not) trust
each other, and the web of trust can also expand for people who
trust either of you.
I hear you, this is way too naĂŻve, but so is choosing to trust
(an uploader of) an upstream library. Moreover, we human
cannot keep track of many others. Distributions, like Debian,
FreeBSD or floating cheeses, narrow the number of identities to trust
from tens or hundreds of thousand to one or a few.
I understand that by only allowing “approved” packages,
certain exploits are avoided, but unless the approvers can be trusted,
other exploits are possible (for example, a malware-infected copy
of requests could be placed in the index, if the checks applied
to ensure that only the requests authors can provide the code
for requests are inadequate).
To put it briefly, right now users should not trust
the floating cheeses: we maintainers are not security experts
(we hardly know what we are doing for rule 3) and rule 2 is only
recently applied, i.e. most of the wheels were not vetted.
We have tried to get publicity for months hoping for more experienced
folks to chime in.
Our ultimate goal is the adoption of IPWHL so that if trust
is established, it is trivial to securely and efficiently distribute.
why the weird protocol? It appears that I need to have software
installed on my PC to talk to the repository, rather than just
using the standard https protocol. If this repository is only intended
for people who have already bought into whatever IPFS is, it would be
useful to make that clear up front.
Our security measures in no way try to be perfect,
but a content-addressable delivery mechanism like IPFS is important
in a few ways:
-
It should be easy to verify and modify an index, e.g. this wheel
is really from here and I can replace with my patched version there,
while sharing the same CDN for other ones. Like BitTorrent,
more people using the same thing should make things faster,
not demanding more infrastructure.
-
From one hash, e.g. QmQESYddXAEFiLUofuiNqFs7KdmNWY67NJwmme51y4pmux,
one can have the hash of every wheel in that index version
(like all hashes previous of a Git commit). This is why the IPFS
node should be run by someone you trust, ideally locally like how
TLS is done in a browser or pip compute the hash client-side.
An organization can share a same node, but compromised public ones
don’t show signs.
Since Git also uses a Merkel DAG, this analogy might help:
with HTTPS you can make sure the repository you clone the same
as the remote one, but only the commit hash can verify it’s
the one you want if the remote is compromised. (There’s
hash collision but with SHA-256 it’s not yet an attack vector.)
Similar efforts for content-addressable distribution is also being
experimented for Nix and Guix (although not with IPFS).
Also what does “singly-versioned” mean?
Since it’s not possible to import
multiple versions, which version
of a package to be included with a collection of other packages
should be pre-determinable.
Will it not be possible to get requests 2.27.0 from the index?
What will happen to requests 2.28.0 when 2.29.0 is released?
Each version of requests, if needed, will be on a different
index release. People usually don’t pin a version because they
like the number, but because it’s the first one they know to work.
Optimally, they should be provided with the best (often implying
updated) one that works.
Will I be unable to pin my dependencies, or use packages
that don’t support the new version of requests for some reason?
As mentioned earlier, instead of pinning some or every single package,
you pin the whole index. Back to the issue of trust, it’s a lot easier
to bump (or roll back) an index instead of each package individually.
From here, we wish to encourage upstream to not pin dependencies
in package definitions and facilitate reusability among packages.
While upstream can have an index version for development,
downstream can test across a wider range of releases. Say bar
depends on foo
, maintainers of bar
may pin bar 4.20
against
foo 6.9
, but when foo 6.10
comes out floating cheeses can push
the update to a testing index for users who like to live on the edge.
More importantly, if baz
also depends on foo
, and foo 6.10
is
an important (e.g. security) fix, downstream can push update,
simultaneously effective for both bar
and baz
.
The possibility of collective testing is not even possible with
a warehouse like PyPI. When foo 6.10
comes out, PyPI users have to
find out for themselves whether there exists any incompatibility,
in which case bar
’s support channels will be flooded with reports.
In short, IPWHL is a set of tools to, from a collection of bdists,
generate a single ID from which an index can be collaboratively
distributed and modified in a (hopefully more) secure and efficient way.