Introducing asaman: A tool to bulid reproducible wheels

asaman: Amra Saman

This is a tool to build reproducible wheels for your Python project or for all of your dependencies. Means, if you use the same Operating System version and similar system level dependencies, you will always get the same wheel generated. Thus enabling us to have a bit more protection from side-channel attacks. Any user of the wheels can verify that they are using the correct build from the exact source via verifying the builds themselves.

Why do we need a reproducible wheel?

A few different positive points:

  • If we build the wheels from a known source (via pinned hashes in requirements file), we can also verify if we are using the correct wheels build from them.
  • Any user/developer can rebuild the wheels from the pinned source, and should get the exact same wheel as output. Thus if anything gets into the build process (say in CI), or the wheel is actually built from a different source, automated tools can identify those.

One negative point is that for any extension based wheel, it will create a platform dependent wheel, not a manylinux one, means those wheels can not be shared via PyPI.

How to install?

python3 -m pip install asaman

How to build reproducible wheels?

asaman --help
Usage: asaman [OPTIONS]

  Tool to build reproducible wheels.

Options:
  -s, --source FILE          A single source tarball or zip file.
  -d, --directory DIRECTORY  A directory containing all source tarballs and
                             zips.
  -o, --output DIRECTORY     The output directory to store all wheel files.
                             Default: ./wheels
  -r, --requirement FILE     Path to the requirement.txt file which contains
                             all packages to build along with hashes.
  --sde TEXT                 Custom SOURCE_DATE_EPOCH value.
  --help                     Show this message and exit.

To build a reproducible wheel for a given source tar ball.

asaman -s dist/yourpackage_4.2.0.tar.gz

By default the freshly built wheel will be stored in the ./wheels/ directory, you can select any directory for the same using -o or --output option.

To built reproducible wheels for all the sources from a directory.

asaman -d path/to/sources/

Or, you can point to a requirements file which contains all the dependencies along with hashes.

asaman -r requirements.txt

How to generate a requirements file with hashes from the reproducible wheels?

asaman-generate requirements.txt

The asaman-generate command will help you to create a fresh verified-requirements.txt, which
will contain the hashes from reproducible wheels. You can pass -o/--output option to pass your
custom file name.

asaman-generate --help
Usage: asaman-generate [OPTIONS] REQUIREMENT

  Tool to build verified requirements file from reproducible wheels.

Options:
  -o, --output FILE       The output file. Default: verified-{requirement}.txt
  -w, --wheels DIRECTORY  The directory with reproducible wheels.
  -s, --skip TEXT         The packages we don't want in our final requirement
                          file.
  --help                  Show this message and exit.

How to create a requirement file with hashes from PyPI or your personal index?

Use pip-tools project.

pip-compile --generate-hashes --allow-unsafe --output-file=requirements.txt requirements.in

Please make sure that you note down all the build dependencies of any given dependency, otherwise during the build process, pip will download from PyPI and install them in the build environment. If you are building from a requirements file, during download and extracting each source tar ball, you can notice if the dependency has any build time dependency or not. Otherwise, you can manually look at the build time dependencies.

For example in the following text you can find a few packages with build time dependencies.
Look at the lines with Getting requirements to build wheel.

Collecting build==0.7.0
  Using cached build-0.7.0.tar.gz (15 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
    Preparing wheel metadata ... done
Collecting click==8.0.1
  Using cached click-8.0.1.tar.gz (327 kB)
Collecting packaging==21.0
  Using cached packaging-21.0.tar.gz (83 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
    Preparing wheel metadata ... done
Collecting pep517==0.11.0
  Using cached pep517-0.11.0.tar.gz (25 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
    Preparing wheel metadata ... done

Bootstrapping the build environment

For any production use, you should also bootstrap the build environment, and create the initial virtual environment to build all dependencies in that environment only. You can store the wheels in any place you want (S3, or git-lfs), and start from there during creating the environment next time.

In following commands, we will create a set of wheels for such bootstrap environment where the build requirements are mentioned in bootstrap.in

asaman >=0.1.0
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install pip-tools # This is coming directly from pypi
pip-compile --generate-hashes --allow-unsafe --output-file=bootstrap.txt bootstrap.in
asaman -r bootstrap.txt

This will create all the wheels in the ./wheels directory.
From next time, one can install them from the ./wheels directory directory.

But, first we will create a new requirements file with only the hashes from our reproducible wheels, the output file name will be verified-bootstrap.txt.

asaman-generate bootstrap.txt

Now we can use this file to create the environment.

python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install --no-index --find-links ./wheels --require-hashes --only-binary :all: -r verified-bootstrap.txt 

Need feedback

This is really initial release, I know the idea behind the tool works as in SecureDrop we are using reproducible wheels for some time now. I wrote a blog post about the same before.

Please play with the tool and let me know what do you think?

NOTE: still missing Windows support.

7 Likes

Thanks to the feedback I changed the project name to a simpler one.

2 Likes

Thanks @kushaldas for sharing! This looks super interesting from a supply-chain security perspective.

Some high-level questions from me before I get a chance to experiment with this:

  • How does it work? What does this do differently than running python setup.py bdist_wheel, aside from the extra UX around requirements files?
  • Are there any situations where a wheel still might not be reproducible, even when using this tool?
  • Any future plans or improvements that you or SecureDrop has for the project?
1 Like

uh, nice!

Additional questions:

  • does the tool also create reproducible builds for binary wheels?
1 Like

This builds via build into a pre-defined path (default build path /tmp/pip-wheel-build, I chose this SecureDrop is already using this), and uses a predefined SOURCE_DATE_EPOCH, the default is again in the memory of Aaron Swartz, when he started writing SecureDrop project.

If there is any change is the underlying build environment. Example: a dependent library in the system updated, or say using different version of Rust (for any extension written in Rust).

My first plan is to identifying all the corner cases and work on those. Step 2 would be integrate this into SecureDrop project, but that will depend on the rest of the team and what they think. SecureDrop’s intenal build scripts are very much dependent on the workflow SecureDrop uses and includes OpenPGP and other multiple steps.

Maybe before step2, I will try to work on the top projects from PyPI and see if we can build them reproducibily. May be figuring out how to run a service to build everything in a reproducible manner.

Yes, including cryptography if you use the same Rust environment :smiley:

2 Likes

For example: Two builds on a Fedora 34 system:

sha256sum ./wheels/cryptography-35.0.0-cp39-cp39-linux_x86_64.whl second/cryptography-35.0.0-cp39-cp39-linux_x86_64.whl 
04fb50f874d1d0796f4bb8da53815c7ee16224dfb75c915add9ea4bead2887ee  ./wheels/cryptography-35.0.0-cp39-cp39-linux_x86_64.whl
04fb50f874d1d0796f4bb8da53815c7ee16224dfb75c915add9ea4bead2887ee  second/cryptography-35.0.0-cp39-cp39-linux_x86_64.whl
1 Like

The OP appears to mix asaman and amrasaman, which as far as I can tell are the same package, probably the earlier is the official spelling and the latter was a transient name?

1 Like

Thank you for pointing out the old reference, fix it now.

2 Likes

@joerick, do you think that it’s a good idea to use this in cibuildwheel in the near future?

Probably hard to use this on cibuildwheel until this restriction is removed.

1 Like

@joerick, do you think that it’s a good idea to use this in cibuildwheel in the near future?

Perhaps. One way that might be convenient for everyone would be if asaman could be integrated into the modern build toolchain as a build backend as per PEP 517 - would that be possible? And then we’d have to check if auditwheel/delocate are fully deterministic too.

2 Likes

Or, alternatively, could we integrate these features into existing build backends for wheels?

FWIW, I think pushing PEP 517 backends to support reproducibility is definitely a worthwhile effort.

The mechanism for this tool is using SOURCE_EPOCH_DATE and having a deterministic build directory. Making PEP 517 backends respect SOURCE_EPOCH_DATE will basically do the needful here. Both setuptools and flit respect that variable (which is why this tool works).


@kushaldas I’m curious – have you explored whether Poetry based projects are built deterministically?