Why isn't source distribution metadata trustworthy? Can we make it so?

pganssle · January 23, 2020, 5:15pm

This is because the input isn’t reliably deterministic. Consider the extreme example from Dustin’s blog post on this:

from setuptools import setup
import random

setup(
  name="paradox",
  version="0.0.1",
  description="A non-deterministic package",
  install_requires=[random.choice(["Dep1", "Dep2"])]
)

The much more common scenario is one where the dependencies are generated based on the platform that’s building from sdist, and this use case has been replaced with environment markers (that most people don’t know about):

install_requires = ["Dep1"]
if sys.version_info < (3, 7):
  install_requires.append("importlib-metadata")

setup(...,
    install_requires=install_requires
)

By the time it gets to setuptools, it’s just a list, and we don’t know if it was generated dynamically or not. If the dependencies are specified in setup.cfg, we know they are reliable and there’s an open issue to fix this. As others in the thread have mentioned, we can almost certainly parse setup.py with an AST and in many basic cases determine whether the dependencies are deterministic or not.

Most of the options for “banning dynamic metadata” are not great and have the potential to break stuff that would probably already just work in most scenarios, but if we decided the cost was worth paying, I’m curious to know if we would be stymied because there are legitimate use cases that we won’t be able to support in deterministic metadata implementations in a realistic time frame.

I’m also curious to know if this is just install requires or if there are places where the metadata is being set “dynamically”. The one use case I know of / have for that is that dateutil does a search-and-replace in README.rst during the build, because PyPI doesn’t support .. doctest::. It’s still deterministic, but it would be difficult to detect that it’s deterministic through heuristics.