Should there be a new standard for installing arbitrary data files?

Here’s what I think is necessary to make this discussion move forward constructively:

  • Is there any prior art in language-specific package managers allowing for something like this?
  • What are the kinds of data that we want to allow users to place in distro-specific locations?
    • Notably, why does this data need to be provided on a per-package basis?

Some of these questions might seem obvious, but they’re not to me and it’ll help to have someone write these down so that we can have a baseline, for what we’re trying to do here.

To these I’d add the broader question:

  • How do cross-platform issues impact these answers (for example man pages are useful on Linux, but not on Windows)? My specific concern is whether we’re trying to impose an inherently platform-specific concept (“distro-specific locations”) onto a platform-independent format (wheels).
2 Likes

Would it be practical to consider a mechanism where the wheel could offer various types of files, presumably with some categorization or tagging, and the tool installing the wheel could use that information to decide whether (and how) it wants to install those files, if at all?

I can definitely see the value in a package providing, for example, systemd service/socket unit files for those who would like to use them, but installation of them should be opt-in only, in my opinion.

I don’t know. How would “the tool installing the wheel” (which could be a user program doing a simple unzip, at the moment) know whether the wheel would work without the files? How would the installer know where to put the files - would the user have to specify? Would the installer need to maintain information about various possible targets?

All of this is possible, but someone needs to define the details, and do so in a way that doesn’t assume any prior knowledge on the part of the installer maintainer (As an example, I don’t even know what a “systemd service/socket unit file” is, other than it’s something to do with Linux, and yet I maintain pip…)

Whether it’s practical depends largely on how hard it is to define those details, which does make it a bit of a chicken and egg question, I know :slight_smile:

1 Like

As far as I am aware, no current language packaging tooling resolves this properly right now.

As I said above, I think the only way to make this work properly is to make the Python interpreter aware of certain keys (eg. docs, systemd_service, udev_rule, dbus_service, polkit_rule, etc) and add a mechanism for distributors to specify how to install those files.
I don’t see any other way. Things work differently on different systems, and not all cases are a simple file install, some need to trigger external commands (for reload for eg).

1 Like

Perhaps the data files are a better example than the documentation. I understand tensorflow for example comes with a huge amount of those. license might be another. On the Ubuntu system I’m using most programs’ /usr/share/docs contain a README, a copyright and a changelog.Debian.gz which is not heavier than what we already put in .dist-info.

The simplest way to indicate that a category was not needed at runtime would be to tell the installer not to record the installed prefix for particular categories: put the files in RECORD but not in a category: prefix mapping needed for the program to find its files at runtime.

@nhatkhai

  • Users of those packages are also users of tools like pip. They’re running pip install ... on their end.
  • No one said the problem isn’t worth solving or whatever. The whole discussion is about how we should.
  • I encourage you to read this excellent article before engaging further here.

Anyway, I have other things to spend my weekend on. Expect this to be my last response to your comments here.

1 Like

For examples of categories that a wheel might want to use, autotools provides a number of categories with descriptions. I personally would love to be able to look up man pages for scripts I install.

As for how installers handle this, appdirs provides a good example for how to do this for user data (wrapping each OS’s layout with a number of standard, OS-independent categories). Have the modified PEP specify how to define which category a file belongs to, and have an informative (rather than normative) section which covers the categories supported by an example implementation (in the same way packaging acts for other PEPs). This then avoids bikeshedding about which categories are “valid” or the spelling of the categories.

2 Likes

As I understand , wheel package are simply unzip. So anything that ask for cross-platform stuff without allow any code execution on the package side during pip install would require pip install fully implement all corner cases. Then require a new language / macro / configuration ways to guide pip install logic. It would be extremely burden to pip install for implement the wheel style.

While .egg package which deprecated allow pip install do not need to known much. The detail how and where thing go took care from the package been installed. Able to executed code during installation help take care of platform it run on. At the package level it have much better knowledge what need to be done for each platforms.

So by deprecated .egg just because some people thing is the security risk, and switch to only unzip type like wheel is basically a architecture problems.

And yet, a lot of package maker had been yell out their issues many times, many places and still got no thing except headed / struggle of out a solution to moving forward. So I don’t think I had to waste more energy to yell out yet another struggle I got here. I did not got welcome to started anyway. I’m using the wrong tool for sure.

Wheel is very similar to egg. They are both .zip archives with a very similar layout. We even include an egg to wheel conversion script, but .egg had many problems which I tried to address in wheel. For example, too much would break when you started using a new Python version and the necessary eggs weren’t available. And .egg didn’t imagine the proliferation of so many different Python implementations and platforms, so it was easy to download an egg that didn’t actually work on your machine.

I don’t remember that .egg had post-install hooks. That feature would come from having setup.py install do the actual file copying.

Yep - One can overwrite/extend the behavior of install from setup.py. Which including adding python post-installation code.

Yes - but wheel more like a passive unzip, .egg more like an active unzip, and which execute the installation from the unzip.
Due this this, I would think the cause of problems. However, the problem not come from egg. It come from the purely designed package maker for setup.py. And since, there are not standard or some python package for help out, everyone have to come up it their own way.

So I like the clean idea of wheel. But need solution for extended installation code in some type of packages that wheel do not current support, or it had to again it’s own idea to support this feature.

From what I understand from this thread, there’s basically zero chance for post-install hooks to happen in wheels, since the fact wheels don’t run arbitrary code in a package is a fundamental design goal and feature. So if we are going to do it, it’ll need to take the form of something between source and wheel, “not entirely built from source code but offer users to perform magical steps”.

But before we try to do anything, I would like someone to actually describe the use case. I have not read any concrete reason why a post-install hook is needed that cannot be achived by an initialisation step triggered once when a package is executed/imported the first time. All the usages I can think of are outside of Python packaging’s feature scope (i.e. if you need to do them, your package shouldn’t have been distributed as a Python package to begin with).

4 Likes

Wheel is not really passively unzipped. This is probably bad writing in the wheel specification. It can be but that is usually only for debugging. The difference is that in .egg the metadata directory always has the same name, so you have to rename that directory if you happen to unzip two of them into the same target. I think @nhatkhai that your concerns are not related to the original question in this thread. RPM for example can install a data file without relying on post-install hooks.

I don’t think I could do any better then. I see enough, and read enough, and I got it too. I see it clear by just look at package source code - no need to tell me.

Sure - but RPM isn’t cross-platform. It known it own system very very well so that it may have every hooks needed without the post-installation. NOT for Windows installer still. I think we comparing apple and orange here.

One tell if you need post-install use RPM. Now once compare pip with RPM. Would this a finger pointing to two difference kinds?

Please keep in mind that when adding such mechanisms to an ecosystem as big as Python we need to design them in a way that will limit as much as reasonably possible the amount of ways users can screw up or abuse the system.

Your current proposal gives free rein to users, I am pretty confident that it will result in a big amount of poorly written post-install hooks that will make bad assumptions, and some users abusing post-install hooks to circumvent the packaging system. This will make pip install a possibly system breaking command. I already see this kind of behavior on the wild with do-it-yourself build systems such as Make and CMake, this is IMO the best thing about meson as an alternative.
Even if you don’t care about this resulting in a bad UX, just the fact pip install could break your system should be a deal-breaker – it is a huge one for me.

As @uranusjr asked, please provide use-cases that are not addressed by the proposals on this thread – I am particularly interested how mine would do – and we will try to come up with a reasonable alternative. I cannot think of anything.

1 Like

I don’t understand what you are trying to propose.

Restrict what package could be install is not architecture sound and require more time for pip developer as I had mention earlier. It would drive both side more work to solve the issue they needed. So it would be fun. I had this fun too. So I just try see why we going in to this path. My propose for myself is repair to use pip and develop the tools or use other tool that do the other part of so call arbitrary code. It including write some instruction for user to run some commands