Insights into how poetry.lock works cross platform

groodt · August 1, 2022, 12:22pm

Does anyone have any understanding how poetry is able to produce a lockfile that works cross platform? It seems to violate my understanding and assumptions of how distribution package dependencies work in python.

In theory, it should only be possible to resolve dependencies for a specific interpreter (or environment). Does poetry work because it makes assumptions that tend to be true in general, but with some packages that it can’t support at all? Are there specific instances or examples of packages where poetry simply cannot be used at all?

pf_moore · August 1, 2022, 12:36pm

I’m afraid I don’t. But as I’ve recently been looking at the lockfile standardisation thread, my first instinct is to ask what you even mean by “works”. I thought Poetry lockfiles supported dependencies that are only available in source form, and given that it’s valid to have a package with setup.py containing

    install_requires = [random.choice(["requests", "httpx", "urllib3"])]

it’s difficult to even be clear how you’d lock that at all.

I imagine the answer is that poetry makes some assumptions, and takes a practical attitude over not worrying about weird edge cases. How that translates into cross-platform lockfiles, I don’t know. But I’d be interested in the answer here as well

pradyunsg · August 1, 2022, 12:38pm

I’ve flagged this on the Poetry Discord, in case the maintainers don’t pay attention to this forum.

groodt · August 1, 2022, 1:35pm

Surprisingly to me, but poetry produces a lockfile format that captures markers and can be installed on multiple sys_platform and python versions. It also does not appear to only be sdist based. It appears to use both sdist and bdist.

Here is a fragment from the lockfile showing cross sys_platform for python -m poetry add ipython

[[package]]
name = "ipython"
version = "8.4.0"
description = "IPython: Productive Interactive Computing"
category = "main"
optional = false
python-versions = ">=3.8"

[package.dependencies]
appnope = {version = "*", markers = "sys_platform == \"darwin\""}
backcall = "*"
colorama = {version = "*", markers = "sys_platform == \"win32\""}
decorator = "*"
jedi = ">=0.16"
matplotlib-inline = "*"
pexpect = {version = ">4.3", markers = "sys_platform != \"win32\""}
pickleshare = "*"
prompt-toolkit = ">=2.0.0,<3.0.0 || >3.0.0,<3.0.1 || >3.0.1,<3.1.0"
pygments = ">=2.4.0"
stack-data = "*"
traitlets = ">=5"

Here is a fragment from the lockfile for a package that is only distributed as bdist wheel for or python -m poetry add torch:

[[package]]
name = "torch"
version = "1.12.0"
description = "Tensors and Dynamic neural networks in Python with strong GPU acceleration"
category = "main"
optional = false
python-versions = ">=3.7.0"

[package.dependencies]
typing-extensions = "*"

Interestingly, when I “export” it seems to bring along most of the markers:

.venv ❯ python -m poetry export
appnope==0.1.3; sys_platform == "darwin" and python_version >= "3.8" \
    --hash=sha256:265a455292d0bd8a72453494fa24df5a11eb18373a60c7c0430889f22548605e \
    --hash=sha256:02bd91c4de869fbb1e1c50aafc4098827a7a54ab2f39d9dcba6c9547ed920e24
...

pf_moore · August 1, 2022, 1:57pm

Markers are a standard feature and they are supplied in the metadata of the package being installed, not by poetry. See here. So I don’t think that’s related to what poetry does.

dstufft · August 1, 2022, 1:57pm

I’m pretty sure the answer is that they interpret markers, but they ignore the fact sdists are dynamic, so the lockfile isn’t guaranteed to be accurate.

groodt · August 1, 2022, 2:20pm

Right, but I guess there must be limitations because it doesn’t actually “install” on all platforms to grab this metadata.

I was able to produce a “lockfile” and export to requirements.txt that by my approximation, appears to be “installable” on multiple platforms.

/tmp/lockfile-demo
.venv ❯ python -m poetry export
appnope==0.1.3; sys_platform == "darwin" and python_version >= "3.8" \
    --hash=sha256:265a455292d0bd8a72453494fa24df5a11eb18373a60c7c0430889f22548605e \
    --hash=sha256:02bd91c4de869fbb1e1c50aafc4098827a7a54ab2f39d9dcba6c9547ed920e24
asttokens==2.0.5; python_version >= "3.8" \
    --hash=sha256:0844691e88552595a6f4a4281a9f7f79b8dd45ca4ccea82e5e05b4bbdb76705c \
    --hash=sha256:9a54c114f02c7a9480d56550932546a3f1fe71d8a02f1bc7ccd0ee3ee35cf4d5
backcall==0.2.0; python_version >= "3.8" \
    --hash=sha256:fbbce6a29f263178a1f7915c1940bde0ec2b2a967566fe1c65c1dfb7422bd255 \
    --hash=sha256:5cbdbf27be5e7cfadb448baf0aa95508f91f2bbc6c6437cd9cd06e2a4c215e1e
colorama==0.4.5; python_version >= "3.8" and python_full_version < "3.0.0" and sys_platform == "win32" or sys_platform == "win32" and python_version >= "3.8" and python_full_version >= "3.5.0" \
    --hash=sha256:854bf444933e37f5824ae7bfc1e98d5bce2ebe4160d46b5edf346a89358e99da \
    --hash=sha256:e6c6b4334fc50988a639d9b98aa429a0b57da6e17b9a44f0451f930b6967b7a4
decorator==5.1.1; python_version >= "3.8" \
    --hash=sha256:b8c3f85900b9dc423225913c5aace94729fe1fa9763b38939a95226f02d37186 \
    --hash=sha256:637996211036b6385ef91435e4fae22989472f9d571faba8927ba8253acbc330
executing==0.9.1; python_version >= "3.8" \
    --hash=sha256:4ce4d6082d99361c0231fc31ac1a0f56979363cc6819de0b1410784f99e49105 \
    --hash=sha256:ea278e2cf90cbbacd24f1080dd1f0ac25b71b2e21f50ab439b7ba45dd3195587
ipython==8.4.0; python_version >= "3.8" \
    --hash=sha256:7ca74052a38fa25fe9bedf52da0be7d3fdd2fb027c3b778ea78dfe8c212937d1 \
    --hash=sha256:f2db3a10254241d9b447232cec8b424847f338d9d36f9a577a6192c332a46abd
jedi==0.18.1; python_version >= "3.8" \
    --hash=sha256:637c9635fcf47945ceb91cd7f320234a7be540ded6f3e99a50cb6febdfd1ba8d \
    --hash=sha256:74137626a64a99c8eb6ae5832d99b3bdd7d29a3850fe2aa80a4126b2a7d949ab
matplotlib-inline==0.1.3; python_version >= "3.8" \
    --hash=sha256:a04bfba22e0d1395479f866853ec1ee28eea1485c1d69a6faf00dc3e24ff34ee \
    --hash=sha256:aed605ba3b72462d64d475a21a9296f400a19c4f74a31b59103d2a99ffd5aa5c
parso==0.8.3; python_version >= "3.8" \
    --hash=sha256:c001d4636cd3aecdaf33cbb40aebb59b094be2a74c556778ef5576c175e19e75 \
    --hash=sha256:8c07be290bb59f03588915921e29e8a50002acaf2cdc5fa0e0114f91709fafa0
pexpect==4.8.0; sys_platform != "win32" and python_version >= "3.8" \
    --hash=sha256:0b48a55dcb3c05f3329815901ea4fc1537514d6ba867a152b581d69ae3710937 \
    --hash=sha256:fc65a43959d153d0114afe13997d439c22823a27cefceb5ff35c2178c6784c0c
pickleshare==0.7.5; python_version >= "3.8" \
    --hash=sha256:9649af414d74d4df115d5d718f82acb59c9d418196b7b4290ed47a12ce62df56 \
    --hash=sha256:87683d47965c1da65cdacaf31c8441d12b8044cdec9aca500cd78fc2c683afca
prompt-toolkit==3.0.30; python_full_version >= "3.6.2" and python_version >= "3.8" \
    --hash=sha256:d8916d3f62a7b67ab353a952ce4ced6a1d2587dfe9ef8ebc30dd7c386751f289 \
    --hash=sha256:859b283c50bde45f5f97829f77a4674d1c1fcd88539364f1b28a37805cfd89c0
ptyprocess==0.7.0; sys_platform != "win32" and python_version >= "3.8" \
    --hash=sha256:4b41f3967fce3af57cc7e94b888626c18bf37a083e3651ca8feeb66d492fef35 \
    --hash=sha256:5c5d0a3b48ceee0b48485e0c26037c0acd7d29765ca3fbb5cb3831d347423220
pure-eval==0.2.2; python_version >= "3.8" \
    --hash=sha256:01eaab343580944bc56080ebe0a674b39ec44a945e6d09ba7db3cb8cec289350 \
    --hash=sha256:2b45320af6dfaa1750f543d714b6d1c520a1688dec6fd24d339063ce0aaa9ac3
pygments==2.12.0; python_version >= "3.8" \
    --hash=sha256:dc9c10fb40944260f6ed4c688ece0cd2048414940f1cea51b8b226318411c519 \
    --hash=sha256:5eb116118f9612ff1ee89ac96437bb6b49e8f04d8a13b514ba26f620208e26eb
six==1.16.0; python_version >= "3.8" and python_full_version < "3.0.0" or python_full_version >= "3.3.0" and python_version >= "3.8" \
    --hash=sha256:8abb2f1d86890a2dfb989f9a77cfcfd3e47c2a354b01111771326f8aa26e0254 \
    --hash=sha256:1e61c37477a1626458e36f7b1d82aa5c9b094fa4802892072e49de9c60c4c926
stack-data==0.3.0; python_version >= "3.8" \
    --hash=sha256:aa1d52d14d09c7a9a12bb740e6bdfffe0f5e8f4f9218d85e7c73a8c37f7ae38d \
    --hash=sha256:77bec1402dcd0987e9022326473fdbcc767304892a533ed8c29888dacb7dddbc
torch==1.12.0; python_full_version >= "3.7.0" \
    --hash=sha256:3322d33a06e440d715bb214334bd41314c94632d9a2f07d22006bf21da3a2be4 \
    --hash=sha256:2568f011dddeb5990d8698cc375d237f14568ffa8489854e3b94113b4b6b7c8b \
    --hash=sha256:e3e8348edca3e3cee5a67a2b452b85c57712efe1cc3ffdb87c128b3dde54534e \
    --hash=sha256:349ea3ba0c0e789e0507876c023181f13b35307aebc2e771efd0e045b8e03e84 \
    --hash=sha256:13c7cca6b2ea3704d775444f02af53c5f072d145247e17b8cd7813ac57869f03 \
    --hash=sha256:60d06ee2abfa85f10582d205404d52889d69bcbb71f7e211cfc37e3957ac19ca \
    --hash=sha256:a1325c9c28823af497cbf443369bddac9ac59f67f1e600f8ab9b754958e55b76 \
    --hash=sha256:fb47291596677570246d723ee6abbcbac07eeba89d8f83de31e3954f21f44879 \
    --hash=sha256:abbdc5483359b9495dc76e3bd7911ccd2ddc57706c117f8316832e31590af871 \
    --hash=sha256:72207b8733523388c49d43ffcc4416d1d8cd64c40f7826332e714605ace9b1d2 \
    --hash=sha256:0986685f2ec8b7c4d3593e8cfe96be85d462943f1a8f54112fc48d4d9fbbe903 \
    --hash=sha256:0399746f83b4541bcb5b219a18dbe8cade760aba1c660d2748a38c6dc338ebc7 \
    --hash=sha256:7ddb167827170c4e3ff6a27157414a00b9fef93dea175da04caf92a0619b7aee \
    --hash=sha256:2143d5fe192fd908b70b494349de5b1ac02854a8a902bd5f47d13d85b410e430 \
    --hash=sha256:44a3804e9bb189574f5d02ccc2dc6e32e26a81b3e095463b7067b786048c6072 \
    --hash=sha256:844f1db41173b53fe40c44b3e04fcca23a6ce00ac328b7099f2800e611766845 \
    --hash=sha256:63341f96840a223f277e498d2737b39da30d9f57c7a1ef88857b920096317739 \
    --hash=sha256:201abf43a99bb4980cc827dd4b38ac28f35e4dddac7832718be3d5479cafd2c1 \
    --hash=sha256:c0313438bc36448ffd209f5fb4e5f325b3af158cdf61c8829b8ddaf128c57816 \
    --hash=sha256:5ed69d5af232c5c3287d44cef998880dadcc9721cd020e9ae02f42e56b79c2e4
traitlets==5.3.0; python_version >= "3.8" \
    --hash=sha256:65fa18961659635933100db8ca120ef6220555286949774b9cfc106f941d1c7a \
    --hash=sha256:0bb9f1f9f017aa8ec187d8b1b2a7a6626a2a1d877116baba52a129bfa124f8e2
typing-extensions==4.3.0; python_version >= "3.7" and python_full_version >= "3.7.0" \
    --hash=sha256:25642c956049920a5aa49edcdd6ab1e06d7e5d467fc00e0506c44ac86fbfca02 \
    --hash=sha256:e6d2677a32f47fc7eb2795db1dd15c1f34eff616bcaf2cfb5e997f854fa1c4a6
wcwidth==0.2.5; python_full_version >= "3.6.2" and python_version >= "3.8" \
    --hash=sha256:beb4802a9cebb9144e99086eff703a642a13d6a0052920003a230f3294bbe784 \
    --hash=sha256:c4d647b99872929fdb7bdcaa4fbe7f01413ed3d98077df798530e5b04f116c83

I guess it could grab static metadata directly from wheels. For sdist, all bets are off. However, they may be able to “parse” out the common patterns given that with sdist, quite a lot is static these days…

ofek · August 1, 2022, 7:16pm

I think that’s what Cargo does, just puts Literally Everything inside the lock file.

njs · August 2, 2022, 7:51am

I believe they assume that sdists always have the same requirements, so you can build locally to get the requirements once and then builds on other platforms will come out the same.

And for environment markers, IIUC the idea is that they try to be conservative and in the lock file record “all the versions that might be relevant, regardless of whether the markers are true or false”. Then when installing on a particular platform, they do another run of their resolution algorithm while taking markers into account, but restricted to just the versions that appear in the lock file.

What I don’t know is how the “all the versions that might be relevant” part is done. I’m not aware of any simple correct algorithm for that, and the brute force approach requires solving an exponential number of NP-complete problems, some of which might fail, but you don’t even know whether the failures are relevant until you try to use the lockfile later. I don’t think they do this, but idk.

FRidh · August 2, 2022, 9:11am

From the FAQ. It tries to use the metadata accessible via the PyPI JSON API. The metadata is not always there for all packages, in which case it will download the packages and inspect them. Inspection uses different methods. If there is a PKG-INFO file, it uses pkginfo · PyPI. Otherwise, it falls back to using PEP 517 hooks or even reading setup.cfg.

groodt · August 2, 2022, 9:51am

That info.py does seem to be the place where dependency discovery happens.

From my quick skim, it runs a number of heuristics and fallbacks to sniff the metadata and essentially discover the dependencies from an sdist. Seems very thorough! I guess the key assumption that could lead to incorrect results happens if an sdist does need to be built. As already mentioned by @njs it is assumed that the sdist has the same dependencies on all platforms.

That’s probably not a bad call I think. The number of situations where it would need to build an sdist (without being able to sniff static metadata somewhere else) is probably quite low.

uranusjr · August 2, 2022, 10:33am

Also that all wheels of a same version have the same dependencies on all platforms. This is not always true,^[1] but also quite a reasonable call.

I remember there’s a case several years ago where a quasi-important package has different dependencies across wheels and refused to request to fix it due to the maintainers’ desire to support old pip/setuptools versions that predate environment markers. The situation is likely a lot better these days though. ↩︎

radoering · August 4, 2022, 3:24pm

Afaik, that’s true. In some cases when dependencies are determined dynamically in setup.py, poetry has no chance. In simple cases, poetry can even extract the information by parsing setup.py. But of course, that’s not always possible (e.g. in your example). However, most projects seem to define their dependencies in a static way nowadays.

Markers are not evaluated to True/False during locking because environment information is not relevant in order to generate an environment independent lock file.

Basically (simplified) by finding a solution without considering markers. Of course, if there are multiple constraints dependencies, this is not possible. In that case, there are multiple (partial) resolutions with that number of solutions (basically cartesian product of all multiple constraints dependencies). To reduce the number of resolutions, some of these combinations are discarded by checking if the intersection of markers is empty (e.g. the intersection of sys_platform == "linux" and sys_platform != "linux" is empty).

PS: I answered to the best of my knowledge but there are more experienced members than me in the poetry team who may know some things better.

remram44 · August 5, 2022, 1:06pm

Another issue I’ve run into with Poetry, is that different bdists for the same package+version can have different requirements (e.g. foobar-1.2.3-cp310-win32.whl can depend on numpy==2.0 while foobar-1.2.3-cp310-macosx.whl depends on numpy==2.5). Poetry doesn’t deal with that situation, it assumes dependencies are consistent.

IIRC this is an issue with the opencv-python-headless package.

pradyunsg · August 5, 2022, 1:09pm

Yup, Poetry’s resolution logic has a baked-in assumption that all distributions for a package+version will have the exact same dependencies.

IIUC, that’s based on the flawed data representation from PyPI’s JSON API which only exposes dependencies from the first wheel that it sees during a package’s upload.

dstufft · August 5, 2022, 1:21pm

first wheel that it sees during a package’s upload.

The first package all together, which can mean it gets nothing if it’s an sdist^[1]!

Though maybe sdists include dependencies now a days, I haven’t looked recently. ↩︎

njs · August 5, 2022, 5:57pm

tbh I think in this case poetry is right and the package is buggy – projects should be using environment markers, not this. Is that something we have written down anywhere? (Is it something we even have consensus on?)

pf_moore · August 5, 2022, 6:45pm

I agree, for what it’s worth.

I don’t think we do have a consensus as such, just a sort of communal weary feeling that things would be so much easier if only people wouldn’t do things like this

The packaging user guide would probably be a good place for making a statement like this, but I’m not sure how the PUG authors would feel about maintaining such a document. On the other hand, it’s only really setuptools that is flexible enough to even allow it, so arguably this is something that should be in the setuptools docs, so it’s reaching the right audience (and so that it’s clear that the setuptools maintainers are OK with it).

PEP 643 is the standard that leads towards formalising this type of requirement. But that’s waiting on PyPI accepting metadata 2.2 before we can even start implementing it in tools. And until it’s in wide use, saying that a tool won’t support dependencies that are marked as dynamic isn’t feasible.

dstufft · August 5, 2022, 8:43pm

I don’t think I agree?

Like if your dependency difference is capable of being expressed as an environment marker, then that’s a good thing to do, and you should do it, but afaik wheel tags are more expressive than environment markers are, or at least differently expressive, and in that case having different dependencies serves as a useful escape hatch.

njs · August 6, 2022, 2:15am

Yeah, wheel tags are sort of weirdly orthogonal to environment markers; you could have completely different requirements for a manylinux_2_17 and manylinux_2_18 wheel, which isn’t something that environment markers can express. (I’m not sure how that would even work; some sort of glibc_version marker?) You could even have different environment markers inside those two wheels, depending on python version or whatever.

So… you’re right it adds some expressivity. I’m not convinced that it’s useful expressivity though :-). By which I mean, maybe there are cases, if someone has them I’d love to see them, but for all I know there might be literally zero packages on PyPI right now that actually benefit from this.

And the cost of supporting this is pretty stark: it basically makes it impossible to support cross-platform lockfiles. Reasoning about foreign environment markers is non-trivial but it’s at least a well-defined problem that you can apply cleverness to. (Apparently poetry has a whole pile of code for logical reasoning about markers, including conversion to conjunctive-normal form, symbolic simplification, all kinds of stuff. tbh it seems like overkill to me, but I do appreciate that with environment markers it’s at least possible.) I don’t even know how to start on reasoning about a mix of ABI tags + markers within those tags. And having to fetch every wheel’s metadata before you can make a lockfile is kind of gratuitous.

And worst case: if you have an sdist, and want to make a portable lock file, you have to build the sdist for every possible target platform before you can know what their requirements are. So saying that different wheels for the same release are free to have arbitrarily different requirements, effectively means that every locking tool needs to have a full cross-compiler setup for every possible target, which is obviously not going to happen. Being able to build the sdist once for the current platform, and then use the resulting metadata to reason about other platforms, is absolutely crucial.