Should packaging understand platforms?

brettcannon · March 23, 2024, 9:40pm

Typically this is either documented as such or one only provides wheels for the supported platforms. Otherwise falling back on the sdist is best effort and something you can opt out of by only installing from wheels.

There is no roadmap, just what people put their volunteer time into to try and drive to become a standard. This is why people keep saying that if you would like to see a change you will need to look at the state of things, come up with a proposal, discuss your proposal with folks, get a sponsor for a PEP, write the PEP, and then try to get it accepted.

petersilva · March 23, 2024, 9:42pm

I understand the approach, but I find it odd that packages are expected to document the platform restrictions of their dependencies. instead of C having in it’s own metadata "platform != ‘windows’, your suggestion is that every developer depending on C in their package puts a conditional dependency in theirs (B.) Why isn’t specifying it in C enough?

fwiw, my portable package uses xattr on linux/max, and a alternate data streams on windows, and used conditional dependencies for a while. I ended up ripping them all out and making all the platform dependent features “extras” that could be installed explicitly, and added a layer to the app itself to inventory which deps were actually present and inform the user of what functionality is available as a result.

effigies · March 23, 2024, 9:51pm

I think the answer is because you can’t trust that every C specifies it correctly. We have to start with the world as we find it.

If I am packaging B, then the responsible thing for me to do is to claim to work on environments I’ve tested on. I have worked on a number of projects where we did not set out to write something platform-dependent, but we had no Windows machines to test it on (or insufficient motivation to do it). In those cases, we’ve included in the classifiers that it works on the systems we test on, and add in the docs that contributors are welcome to submit a fix to get it to work on Windows.

Now with Windows CI so readily available, it’s pretty easy to add Windows testing at the earliest stages of development, and make platform independence an explicit goal. And the way for me to manage that, with the packaging landscape as it is, is to use environment markers to make conditional dependencies.

This makes sense to me, too.

To be clear, I don’t think anybody thinks this is an ideal state, but I don’t really see how you would enforce platform tags to match actual capabilities, and without that, it’s hard to see how the task doesn’t ultimately fall on the downstream packagers.

kknechtel · March 23, 2024, 9:52pm

I think this leaves the question of why environment markers are apparently standardized even though the platform string is not; and also the question of what/where the specification actually is.

Presumably one could also rely on a fake dependency designed to communicate the platform information in the same way - analogous to the unsupported-python package described in another thread.

The model here assumes that building is a legitimate component of installation, when given something unbuilt to install.

There are two possibilities:

There is no feasible alternative to C on Windows. In this case, the dependency on C causes B not to work on Windows; therefore, of course B should be expected to specify that it doesn’t work on Windows - because it doesn’t.
There is a feasible alternative. In this case, the build process for B needs to communicate to Pip what to use (C) on non-Windows, and what to use (some other D) on Windows. It’s not as if Pip can just know the Windows equivalent to C, after all. And that’s, well, exactly what a conditional dependency is. (This includes cases where the “feasible alternative” is “nothing; just give a runtime error if the user tries to do impossible-on-Windows things on Windows”. In these cases, Pip needs to know to install C when appropriate, and not try to install it otherwise.)

pf_moore · March 23, 2024, 9:53pm

Yes, sdists are a fallback that expose people to the “try building it and see what happens” approach. We’ve been considering making it opt-in to use sdists in pip for some time now. It’s a big enough backward compatibility break that we were hoping to get some funded resource to work on it, but that seems to have stalled.

Really? Watch me run configure; make; make install at my Windows Powershell prompt That’s the typical “install from source” invocation for Unix software. You’re right that installing from source isn’t particularly friendly for a non-expert user. I’ve already conceded that.

I’d rather people didn’t get the idea that “publishing sdists is bad” from this thread, though. It feels contrary to the spirit of open source to me (even if source is still available, making it awkward and inconsistent to find it feels like a step backwards). Just because pip’s history means that installing from source is a long-established fallback doesn’t mean having source distributions is bad - it just means that pip’s defaults were established 15 years ago when the user community was very different.

Out of curiosity, why? Conditional dependencies (by which I assume you mean environment markers are designed precisely for this situation, so if you ended up not using them, why didn’t they work for you?

brettcannon · March 23, 2024, 9:53pm

Correct.

Because the specs are not written for that approach. You can find it “odd” all you want, but that doesn’t change how things work (and have for years, maybe a decade or more). Once again, your next step if you would like to see change is to come up w/ a proposal on what you want to see change and how to handle the transition.

pitrou · March 23, 2024, 10:04pm

Well, to put things in another way, while I have multiple times seen some failed package installs (for example become some bundled C or C++ code does not compile), I’ve never seen a single one of those failures lead to a broken Python environment.

So if you really have a package that leaves things in a broken state when it fails installing, then clearly something is wrong with that package.

BrenBarn · March 23, 2024, 10:16pm

Yes, that is the problem.

Well, heh, but I guess what we disagree on is that, to me “configure/make/make install” is not comparable to pip install and pip install is not an “install from source invocation”. I mean, it is, but that’s the problem: in 99% of cases that’s not what users intend when they do a pip install. It’s a footgun.

Publishing sdists isn’t necessarily bad. But I think we should make a distinction between:

making the source available
making the source available on PyPI
making the source available on PyPI and making pip automatically try to use it

To me, option 1 is perfectly compatible with the spirit of open source, as long as the “availability” is genuine (e.g., not some arduous “email for a time-limited link” thing). PyPI already lets you put a link to the source repository and that seems totally adequate to me.

Option 1 is fine enough with me that I don’t see that Option 2 is even necessarily a good idea. I think this is related to some of the issues that came up in the other thread about patched sdists; a lot of these would be less of an issue if we didn’t think that sdists published on PyPI were supposed to be both a potential install target for end users and the base for patches applied by distro maintainers. To me those audiences are just too different. (Rgommers mentioned this quite a while ago in his pypacking-native discussion).

If we do want PyPI or some official Python entity to host things for the latter case, I think that should be separated from what pip searches for install. In other words it’s Option 3 that is the biggest problem, and that is the situation we are in.

I think there is already a pretty good proposal:

Aka this issue. I think it’s a good idea.

This may be another situation where many people in this discussion come at it with expert knowledge about sdists and build process and how they work, and that may obscure what things are like for the vast majority of Python users who aren’t at that level. A breakage caused by moving to binary-only can be a good thing; it may nudge people who could be publishing pure-Python cross-platform wheels to do so. I’ll also venture to add that conda’s approach involves a clear separation between build and install and I think that contributes to the relative smoothness of things in that ecosystem.

petersilva · March 23, 2024, 10:25pm

Out of curiosity, why? Conditional dependencies (by which I assume you mean environment markers are designed precisely for this situation, so if you ended up not using them, why didn’t they work for you?

well that’s a rabbit hole… There are threads here:

github.com/MetPX/sarracenia

Dependency Management Strategies...

opened 10:38PM - 03 Aug 23 UTC

closed 03:21PM - 16 Aug 23 UTC

petersilva

bug enhancement Design Developer Discussion_Needed crasher

# The Problem Sarracenia uses a lot of other packages to provide functional…ity. These are called *dependencies*. In it's native environment (Ubuntu Linux) most of these dependencies are easily resolved using the built-in debian packaging tools (apt-get.) but in many other environments, It is more complex. like: https://xkcd.com/1987/ Even in environments where dependencies are installed *somewhere* it is not always clear which ones are available to a given program. On redhat-8, for example, there does not seem to be a wide variety of python packages available in operating system repositories. Rather the specific minimal packages needed for the OS's own needs of python are all that seem to be available. This makes it challenging to install on redhat, as one now has to package many dependencies as well as the main package. The typical approach is to hunt for individual dependencies in different third party repositories, or rebuild them from source... This is a bit haphazard, and in some cases, like watchdog or dateparser, the package itself has dependencies and one ends up having to create dozens of python packages. On redhat, as in many other environments, it seems more practical to use python native packaging, rather than the incomplete OS ones, as they do dependency resolution, and all the dependencies can be brought in using pip. The result of this, if done system-wide, is a mix of Distro packages, and pip provided packages, which complicates auditing and patching. System Administrators may also object to the use of pip packages in the base operating system. Windows is another example of an environment where pre-existing package availability is unclear. On windows, the natural distribution format would be a self-extracting EXE, but use of plugins with such a method is unclear, and all the dependencies need to be packaged within it. People also install python *distributions* ActiveState, Anaconda, or the more traditional cpython, and those will each have their own installation methods. The complications mostly arise from dependencies such as xattr, python3-magic, watchdog, etc... that is packages that are wrappers around C libraries or use C libraries as part of their implementation. In these cases, pure python packaging often fails, as more environmental support is needed. For example, the python-magic python package requires the c-library libmagic1 to be installed. If using OS packages, this is just an additional dependency, no problem, but with pip, it will just fail, and the user needs to find the OS package, install that, and then try installing the python package again. Another complication results from all these different platforms having methods of installation mean that it is not obvious what advice to provide to users when a dependency is missing "pip installe? conda install? apt install, yum install ?" ... the package naming conventions vary by distribution, and are different from the module names used to test their presence. ## Approaches to Dependency Management ### Manual Tailoring For HPC (which runs redhat 8.x) there are a few dependencies brought in by EPEL packages, some built from source, but some had to be left out. The setup.py file, when building packages on redhat are typically hand edited to work around packages that are not available. So manual editing of packages is done. After the RPM is generated, it is then tested on another system, and a different user, to see whether it runs (as the local user doing the build may have pip packages which provide deps not available to others.) implementation: manual editing of setup.py to remove dependencies. ### (Mostly) Silent Disable Looking at xattr, the *import* is in a try/except, and if it fails, the storing of metadata in extended file attributes is disabled. There is a loss of functionality or a different behaviour on these systems as a result. There is no way to query the system for which *degrades* are active. nothing to prompt the user what to do to address, if they want to. implementation in filemetadata.py: ``` try: import xattr supports_extended_attributes = True except: supports_extended_attributes = False ``` There are also tests in sarracenia/__init__.py for the code to degrade/understand when dependencies are missing: ``` extras = { 'amqp' : { 'modules_needed': [ 'amqp' ], 'present': False, 'lament' : 'will not be able to connect to rabbitmq broker s' }, 'appdirs' : { 'modules_needed': [ 'appdirs' ], 'present': False, 'lament' : 'will assume linux file placement under h ome dir' }, 'ftppoll' : { 'modules_needed': ['dateparser', 'pytz'], 'present': False, 'lament' : 'will not be able to poll with f tp' }, 'humanize' : { 'modules_needed': ['humanize' ], 'present': False, 'lament': 'humans will have to read larger, uglier numbers' }, 'mqtt' : { 'modules_needed': ['paho.mqtt.client'], 'present': False, 'lament': 'will not be able to connect to mqtt b rokers' }, 'filetypes' : { 'modules_needed': ['magic'], 'present': False, 'lament': 'will not be able to set content headers' }, 'vip' : { 'modules_needed': ['netifaces'] , 'present': False, 'lament': 'will not be able to use the vip option for high availability clustering' }, 'watch' : { 'modules_needed': ['watchdog'] , 'present': False, 'lament': 'cannot watch directories' } } for x in extras: extras[x]['present']=True for y in extras[x]['modules_needed']: try: if importlib.util.find_spec( y ): #logger.debug( f'found feature {y}, enabled') pass else: logger.debug( f"extra feature {x} needs missing module {y}. Disabled" ) extras[x]['present']=False except: logger.debug( f"extra feature {x} needs missing module {y}. Disabled" ) extras[x]['present']=False ``` ### Demotion to Extras The Python Packaging tool has a concept of extras, sort of the inverse of *batteries included*... in setup.py one can put extras that are available with additional dependencies being installed: ``` extras = { 'amqp' : [ "amqp" ], 'filetypes': [ "python-magic" ], 'ftppoll' : ['dateparser' ], 'mqtt': [ 'paho.mqtt>=1.5.1' ], 'vip': [ 'netifaces' ], 'redis': [ 'redis' ] } extras['all'] = list(itertools.chain.from_iterable(extras.values())) ``` ### Platform Dependent Deps one can add dependencies that vary depending on the platform we are installing on. ``` install_requires=[ "appdirs", "humanfriendly", "humanize", "jsonpickle", "paramiko", "psutil>=5.3.0", "watchdog", 'xattr ; sys_platform!="win32"', 'python-magic; sys_platform!="win32"', 'python-magic-bin; sys_platform=="win32"' ], ``` ( this is in the [v03_issue721_platdep](https://github.com/MetPX/sarracenia/tree/v03_issue721_platdep) branch) ## What do we do? So all of the approaches above (and perhaps others?) are used in the code, and someone using an installation will have a subset of functionality available, and sr3 has no way of reporting what is available or not. there is a branch https://github.com/MetPX/sarracenia/pull/738 that provides an example report of modules available using an *sr3 extras* command. should we at least report what is working, and what isn't? An additional problem is that configured plugins may have additional dependencies. The mechanism in the pull request also provides a way for plugins to *register* those, so they show up in the inventory command. Is this a reasonable/adviseable approach?

github.com/MetPX/sarracenia

Cannot Run Sarracenia due to Missing Magic

opened 05:39PM - 20 Jul 23 UTC

closed 06:55PM - 29 Aug 23 UTC

gc-nrcan-michael

bug likely-fixed Priority 2 - Critical regression UserStory windows v3 v3only Discussion_Needed

Hi Peter, I saw there were some updates related to Magic: https://github.c…om/MetPX/sarracenia/pull/698 When I run Sarracenia, it gives me an error that it cannot load Magic/libmagic: ![image](https://github.com/MetPX/sarracenia/assets/109979542/adfe00b3-f531-4377-ac8a-0f9497a675bf) I looked through the code and I see the loader attempts to search for the appropriate library. What is the best way to get this module working again?

one complication, I was using pynsist on linux to build windows packages, so the run-time
platform is different from the one where the package is built.
another issue different distros name the same package differently, (file-magic vs. magic) and I couldn’t figure out how to differentiate between redhat and ubuntu using conditionals.
there are also different packages with the same name. (xattr on redhat vs. xatttr on ubuntu.)
Have to literally test the API’s of the routines to figure out which one you have… or if you know the name, install the right one, again using distro specific conditions.

That’s all I remember for now… there is likely other stuff.

mwichmann · March 23, 2024, 10:27pm

I’ve noticed recently that a few packages are declining to upload an
sdist if they provide platform-specific wheels, I’m guessing that this
is to avoid the inevitable failed-to-build message on platforms that
aren’t really meant to be supported (that is, Windows).

pf_moore · March 23, 2024, 10:45pm

As does almost everyone who’s discussed it - including the pip maintainers. All that is needed to make it happen is for someone to come up with the necessary funds or resources to do the needed project management and UI/UX design to ensure that the transition isn’t a complete disaster. The skills needed are specialised, though, and not readily available via volunteer contributions, which is why this is stalled waiting for someone to fund it.

There really isn’t much more to say here. No amount of discussion will move this forward, it just needs (the right) resources.

kknechtel · March 23, 2024, 10:49pm

… to change the default sense of a command-line option, for behaviour that is already implemented?

pf_moore · March 23, 2024, 11:01pm

Go and read the issue. There’s a lot of detail in there about the potential considerations.

For a start, all projects on PyPI that don’t publish wheels will stop being installable by default. And teams whose workflows revolve around sharing sdists (but not wheels) on a local index server will be broken by default. We have no way of knowing how much impact that will cause - it could shut down businesses completely. It doesn’t matter that the fix is easy, it matters that we broke our users. And yes, people do install the latest version of pip in production without testing it before doing so (ask me how I know…)

How would you assess the potential impact of this change? Remember, 90%+ of your user base is completely inaccessible to you, and probably unaware of any publicity you might issue.

The technical side is easy. The project management side is huge.

kknechtel · March 24, 2024, 1:19am

… Then how have we ever managed to deprecate anything at all?

But the current “latest version of Pip” didn’t always exist. Before that, some other version was latest. And business keeps going during that period; therefore, users use each new version of Pip as it comes out. Which means, there are versions that have an opportunity to present a warning message about future changes. Again, that’s just deprecation as it normally works, and I don’t understand why this change would be different, or more difficult than things that have been done before.

Nor do I understand what sorts of “specialized skills” would be relevant here, or what funding would be used for - what it could be used for, in principle. It doesn’t make sense to me that Pip would be doing “project management” work for users of Pip, in any circumstance; maybe I understand the term differently from you. The UI proposed is a command-line flag and I just can’t see that there are that many decision points involved.

Edit (sorry @BrenBarn): I’ve read through most of the GitHub thread now. You were right, actually, that there is a lot to discuss, simply in that there are other possible approaches to the problem that I hadn’t considered. However, it comes across that the UI/UX expertise you’re thinking of soliciting, is for stuff along the lines of phrasing error messages, and, well… honestly, if we’re going to that extent, there’s a lot of other stuff in Pip that would benefit from the same level of care and attention. On the other hand, if we just want something that’s as usable as the rest of Pip, I don’t see why it couldn’t be accomplished with the same sort of resources that produced the rest of Pip.

A picture is coalescing in my head of an example scenario that should capture the most important use cases and user/developer perspectives on a feature like this, along with a fairly solid idea of how it should work. I’m thinking I should start a new thread for that.

ofek · March 24, 2024, 2:43pm

That would be a terrible experience for maintainers everywhere. I don’t want to define a package as a dependency and then experience at runtime the package not being available.

layday · March 24, 2024, 3:04pm

xattr doesn’t publish Python-only wheels and in this particular instance it would be probably be sufficient to do a simple sys.platform check in setup.py or the FFI module builder and error with a human-friendly message instead of whatever inscrutable error’s being thrown after attempting to include a header file that doesn’t exist on Windows. It’s not standards-based but it works.

petersilva · March 24, 2024, 3:43pm

That would be a terrible experience for maintainers everywhere. I don’t want to define a package as a dependency and then experience at runtime the package not being available.

It is not a run-time check, this is all install time… the idea is that when trying to install package A it finds a dependency (on package C) and C fails to install because C’s metadata says it can’t be installed on that platform. so the installation of package A fails (unable to satisfy dep of installing C).

Both suggestions work that way. but in one case, we are saying that B’s metadata is supposed to say C is only for windows, whereas I’m asking why C can’t, on it’s own, say that it only runs on windows?

petersilva · March 24, 2024, 3:47pm

+1 that this is very deeply related to Speculative: --only-binary by default? · Issue #9140 · pypa/pip · GitHub
the --only-binary default might prevent the problem here (building on an unsupported platform) from occurring in the vast majority of cases.

pf_moore · March 24, 2024, 4:00pm

You’re reaching the point where you are repeating yourself and not adding anything new to the conversation. As has been explained a number of times, it’s not a matter of “why can’t it?” What you’re suggesting is possible, but it needs someone to make a proposal, persuade people it’s worth implementing, and then actually implement it. Just saying over and over that you wish it worked like that won’t help.

For what it’s worth, until very recently (a few weeks ago, when Metadata 2.2 started to be allowed on PyPI) sdists provided no reliable metadata that didn’t involve running a build step. So there was literally no way of knowing anything about a sdist until you tried to build it. As a result, a sdist was always a possible way of installing a project, on any platform, and in any environment. The responsibility, for better or worse, was entirely on the project author to fail with a helpful error if asked to build on an unsupported platform, and to document clearly what was and was not supported. Most project authors, fairly reasonably, didn’t bother - so a build failure (which often involved a very user unfriendly error/traceback) was the norm for packages to signal “I don’t work on this system”.

No-one is saying this is a good situation, but equally, no-one had stepped up to fix it, so it’s what we had.

Now, with Metadata 2.2, it’s possible to determine some information about a sdist just by inspecting it. Not all build backends support this yet (and in particular I don’t think setuptools does) but in due course they probably will. This opens up the possibility for better checking and reporting. But it still won’t happen until someone steps up to do the work. Everyone involved in Python packaging is a volunteer. No-one is paid to pick up user requests and make them happen.

I think I’m done with this discussion. Nothing new is getting proposed here, and the conversation now just seems to be going in circles. I’ll wait to see if a PEP comes out of it, or if it just dies down with no useful outcome.