Hi everyone,
I’d like to share a draft proposal that could eventually become a Python Enhancement Proposal (PEP) if there’s enough community interest and support. The idea is to integrate automatic Artifact Dependency Graph (ADG) generation into Python’s packaging workflow. This post contains the full draft specification, and I welcome any feedback, suggestions, and discussion before proceeding with a formal PEP submission.
(Note: This is not yet a formal PEP, so no PEP number is assigned.)
Abstract
This idea proposes integrating automatic Artifact Dependency Graph (ADG) generation into Python’s packaging workflow. Specifically, it introduces content-based ADG creation during the bdist_wheel
build process managed by setuptools. The ADG is a machine-readable JSON file (ADG.json
) embedded in the final wheel file (.whl
), enumerating all files included in the distribution and their cryptographic hashes, plus a list of declared dependencies.
This approach aims to improve software supply chain security and vulnerability management by enabling verifiable content-based references to artifacts rather than relying solely on extrinsic and mutable metadata. The ADG can be consumed by vulnerability scanners, SBOM tools, and future Python infrastructure to rapidly identify which distributions contain known-vulnerable components.
Motivation
Current vulnerability management often relies on extrinsic identifiers (CPE, purl, SWID) that depend on naming conventions and mutable metadata. Security teams often struggle to quickly identify which specific wheels contain vulnerable files when a new CVE emerges.
By integrating ADG generation into the build process:
- Each built wheel will contain a verifiable record (
ADG.json
) of its contents, including exact file hashes and a dependency map. - Security scanners can use this intrinsic reference to match known-bad hashes and instantly identify impacted wheels, improving response times.
- This fosters greater transparency and trust in the Python software supply chain.
- ADGs complement and enhance existing SBOM formats, providing intrinsic IDs that can be correlated with external security data.
This approach aligns with modern cybersecurity practices, government directives (e.g., Executive Order (EO) 14028), and supply chain security frameworks (NIST SSDF).
Specification
New ADG Generation Capability:
- Opt-In Configuration: Developers enable ADG generation via
pyproject.toml
:[tool.setuptools.adg] enabled = true output = “ADG.json”
enabled
: If true, ADG generation occurs at wheel build time.output
: The filename for the generated ADG (default “ADG.json”).If not enabled, the build process proceeds as before with no ADG generated.
- When ADG is Generated: The ADG is created at the end of the
bdist_wheel
command, after all files to be included in the wheel are finalized but before writing the finalRECORD
file. This ensures all files and dependencies are known and stable. - ADG Contents:
ADG.json
contains:
- name: The project distribution name
- version: The distribution version
- build: A dictionary with:
timestamp
: The build time (ISO-8601)environment
: Optional description of the build environment
- files: A list of objects, each with:
path
: File path relative to the wheel roothash
: Cryptographic hash (e.g. sha256-…) of the file content
- dependencies: A list of normalized dependency strings (PEP 508 compatible)Example:
{ “name”: “mypackage”, “version”: “1.0”, “build”: { “timestamp”: “2025-01-01T12:00:00Z”, “environment”: “CPython 3.14 on Linux x86_64” }, “files”: [ {“path”: “mypackage/init.py”, “hash”: “sha256-abc123…”}, {“path”: “mypackage/module.py”, “hash”: “sha256-def456…”} ], “dependencies”: [ “requests>=2.0,<3.0”, “otherlib==1.2.3” ] }
- **Integration in
bdist_wheel.py
:**In the current setuptools codebase,bdist_wheel.py
is responsible for producing the final wheel artifact. After reviewing the code provided, we will:
- Hook into the
bdist_wheel.run()
method after the wheel build is completed and before finalizingRECORD
. - Evaluate all included files (using the already-known distribution files from the build steps).
- Compute file hashes (using SHA-256 or another cryptographic hash).
- Retrieve dependency lists from the distribution metadata (already available via
dist.py
anddist.distribution.metadata.install_requires
anddist.distribution.metadata.extras_require
). - Serialize the ADG to
ADG.json
in the{project}-{version}.dist-info
directory. - Update
RECORD
accordingly, ensuringADG.json
is recorded.The logical place for this integration is near whereWHEEL
andMETADATA
files are written (inbdist_wheel.write_wheelfile
or just before finalizing the wheel inbdist_wheel.run
). This ensures the ADG sees the final set of files and that all dynamic build steps are complete.Example pseudocode (not final):# After wheel is built but before RECORD finalization in bdist_wheel.run()if adg_enabled: dist_info_dir = os.path.join(self.bdist_dir, distinfo_dirname) adg_data = generate_adg_data(self.distribution, archive_root) adg_path = os.path.join(dist_info_dir, output_filename) write_json(adg_path, adg_data) # RECORD will pick up ADG.json on finalize
- Determining Dependencies and Hashes:
- Dependencies are obtained from
distribution.install_requires
anddistribution.extras_require
as per existing logic indist.py
where requirements are normalized. - Files are enumerated from the wheel build process (already collected by setuptools commands like
egg_info
andbuild_py
). - Each file is read, and a SHA-256 hash is computed (using
hashlib.sha256
). - The ADG is then written out as UTF-8 JSON.
- Performance Considerations:
- Hashing is fast and only done once per file.
- The feature is optional, minimizing impact for users who do not enable ADG.
- Future enhancements may support caching or incremental builds if needed.
- Interoperability:
- The ADG can be correlated with external SBOM formats.
- Tools can parse
ADG.json
to quickly identify vulnerable wheels if they have a known-bad hash database. - This proposal does not alter wheel format standards; it just adds an extra file.
Backwards Compatibility
- The ADG generation is opt-in. If disabled, behavior is unchanged.
- If enabled, it adds an extra file (
ADG.json
) to the.dist-info
directory. - Existing tooling that ignores unknown files in
.dist-info
will remain unaffected.
Security Implications
- Embedding file hashes improves trust and traceability.
- Attackers tampering with wheel contents would need to alter
ADG.json
andRECORD
, which is detectable. - Future work may include digital signatures for even greater authenticity.
How To Teach This
- Document in setuptools user guide: a new section on enabling ADG.
- Show example
pyproject.toml
and howADG.json
looks. - Tutorials on how security teams can query ADG data for quick vulnerability checks.
Reference Implementation
A reference implementation will be provided in a fork of setuptools:
- Modifications to
setuptools/command/bdist_wheel.py
to:- After
wheel_path
creation, compute file hashes for included files. - Gather dependencies from
dist.py
finalized distribution. - Write
ADG.json
intodist_info_dir
.
- After
- Add logic to
dist.py
or a utility function to extract and normalize dependency data and project name/version. - Add tests in
setuptools/tests/test_bdist_wheel.py
verifyingADG.json
presence, correctness of file hashes, and dependencies recorded.