Thinking about it - these day we need to make a lot of automation stuff on many weird platforms.
Having pip suppprt egg allow it unzip, and run its setup.py install is the most simple way so solve corner cases that pip had not idea about.
Remove this feature would cause end less request and discussion about need non-arbitrary code execution from wheel package.
Indeed, this is a great list.
However, as noted earlier, itās still unclear which of these we need to have and how weād handle this on platforms where some of these concepts donāt make sense.
As a heads up, Iām gonna be unsubscribing from this discussion now, since I donāt see it going in a constructive direction in the short term.
Hereās what I can find, though I cannot speak to the language maintainers feelings on each solution:
- Ruby/rubygems (Specification Reference - RubyGems Guides): Gemspec files have a āfilesā array that take relative file paths and output those same files in the gem. Gem files are installed in their own namespace, meaning that you can put things anywhere, including the root directory, without fear of clashing with other gems (the exception to this is the lib/ directory which might get added to LOAD_PATH). Typically you would load a data file using a path relative to your current file, and itās considered bad practise to load any files outside the lib directory of the gem. Iāve seen some examples of gems including man pages (see guard or ruby-irb), which repackagers can then easily move to a distro-specific location. Thereās lots of examples of including licenses and readmes, but this might come from the fact that there are no sdists/wheels, only gems. You canāt install to anywhere outside the gemās namespace.
- Lua/LuaRocks (Rockspec format Ā· luarocks/luarocks Wiki Ā· GitHub): Despite being geared towards being a build system, it has a
copy_directories
rule where specified directories are ācopied to the rock installation prefix as-isā. They give typical use cases such as āinstalling documentation and other files such as samples and testsā. Similarly to Ruby, you canāt install to anywhere outside the rockās prefix. - Rust/cargo (The Manifest Format - The Cargo Book): Cargo is very geared towards statically compiled binaries, so itās probably not a useful example. If itās not documentation or source code, cargo gives you no help. You can only include or exclude files for packaging, similar to MANIFEST.in. There is a mechanism for running arbitrary commands as part of a build script, but again itās geared towards compiling things. You can however do whatever you want and write things to wherever you want (For example see the build.rs file of zlib-sys), but the only output is binaries and therefore a repackager couldnāt make use of what was written out.
- Go: Very similar to Rust. The expectation seems to be to embed files in a binary, or use a tool other than the go package manager.
- Haskell/cabal (7. Package Description ā Cabal 3.4.0.0 User's Guide): Arbitrary data files can be included. They get installed to
/usr/share/<package_name>/
on Linux. Cabal even provides utilities for finding these data files so that packages can get installed to any location and still function.
I donāt think we need to support distro-specific locations, only OS specific ones. We could enable package maintainers to help out repackagers (if this is indeed one of the reasons that we want to continue supporting these sorts of files) by allowing the pip install
step of repackaging to place more files in their correct locations for āLinux in generalā, but I donāt think itās unreasonable to expect repackagers to have to do a certain amount of work. Especially when itās work that is specific to the distro they support.
I also donāt think itās realistic to expect package maintainers to support more than even a couple of distros. Plus repackagers will be more in touch with where files are supposed to go for their distro than a package maintainer is.
More broadly, I think even having the package declare which of these extras files should get installed helps our repackagers a lot because currently they have to dig through repos to find what can be installed and thereās the opportunity to miss things.
Is there another use case involving docker images? If Iām installing packages into a docker image it would be nice to have things like documentation or .desktop files for those packages included in the image, but I wouldnāt want to have to repackage everything Iām installing to make that happen.
Thatās assuming that the system Python is being used.
I donāt think I understand this question. Was the purpose of asking this to get us to more carefully consider all operating systems when providing possible solutions?
As mentioned above, I donāt think we should go so far as to consider distro specific files. But these are very good examples of files that we could be including in packages, and in fact these files are what I was seeing most often included in Ruby gems when I was researching the above answer.
Ruby gems arenāt in a good position to install things like .desktop files because theyāre installed into a namespaced directory, but I did see man pages get installed by a gem and then repackagers taking that file and moving it into /usr/share/man. Python packages are in a position where they can skip the middleman (middleperson?) and go straight to a location under sys.prefix
without the repackager needing to seek out those files and do anything extra with them.
The distro is the operating system. Linux distros generally implement the Posix standard, which is what I assume would be the āOSā here based on the value of os.name
, but there are a lot of degrees of freedom they can take.
Taking one of your examples, .desktop files, they are not even specified in the Posix standard, they are defined by the freedesktop.org spec. So, it wouldnāt be correct to install these files to the Posix OS location. Even getting past that, the freedesktop.org spec does not specify file location, it is usually used an extension to FHS (Filesystem Hierarchy Standard), which itself makes a lot of locations valid.
Taking another one, documenation. The location for documentation to be installed isnāt specified by Posix, most modern Linux distros implement the FHS standard, which is where that location comes from.
I donāt want to be pedantic here, but I think this does have a very significant impact here. In your proposal, how should a .desktop file be handled? What do Python packagers need to specify for its distribution, and what would installers do? How would documentation be handled? There are several standards distros might implement, and lots of ways they could implement them.
All this is then ignoring the fact that almost all modern distros assume they will be the only ones touching /usr
, only letting components touch their own paths. An example of this is /usr/lib/python3.9/site-packages
.
Having pip install to /usr/share/doc
or /usr/share/man
behind the distro will result in issues, causing a bad UX to users, and possibly unstable systems in other situations.
The proposal may sound good at first glance ā letās just let users package their own arbitrary data ā but it fails to consider the impacts that it has on an OS/distro level. pip is not the system package manager, and it shouldnāt be used in that way. There are, however, valid use cases where someone might want to install extra data, so we need to come to a middle term. The solution should be to integrate Python installers in the OS, not making Python installers an OS-level installer.
Ah this makes sense. Thanks for the clarification.
I donāt yet have a proposal really. I started this thread to collect the information that we would need to come up with a good solution for the problem. One of my goals for my initial post was to figure out use cases that could drive this, but so far itās been easier to discount use cases than it has been to come up with valid ones.
I think the melding of @dholthās suggestion and yours is the closest thing we have to a proposal at the moment. So that would be having categories of directories that an installer can install to a platform specific default location (if the platform even has a default) but also allowing overrides to these defaults at install time for distro packagers to make use of. Itāll be difficult to know if this is a decent solution to the problem when we donāt know how itāll get used, but it will at least encourage users to do the right thing whilst still giving them the power to do what they want.
If this sounds like a decent starting place and we think weāre as far into this discussion that we can go without proposing a solution, then I can flesh this out into a more detailed proposal? Iām still unclear what categories we would want to support because thereās been yeas and nays for anything beyond a ${root}
, as your questions point out.
To try and answer your questions though, I think it depends on what situations we want full support for them.
A .desktop file in a virtualenv isnāt really useful for example, but I donāt like the idea of installing a file or not depending on whether itās in a virtualenv. So for virtualenvs, I donāt think that a default location for a ā{desktop_files}" (or called whatever) category matters too much.
If we're trying to help repackagers, then we can get away with providing a default value for the category "like {root}/share/applicationsā because the repackager is going to specify a default value for the location of the category anyway.
For the case of pip installing to the system when in a docker container, I think itās a similar situation but the author of the Dockerfile acts like the repackager and itās up to them to provide a real default for the value.
I think itās a similar situation for other files. It probably doesnāt matter too much where they go, so long as they can be overridden at install time and potentially also so that the installed Python application has a way of querying them back at run time? Querying back wouldnāt be useful for every category, but presumably an application would want to query where the file that they put in ${datadir}
ended up.
Doing this with wheel force pip act like god tools ( it had to known everything ) due to a rules of not allows arbitrary code execution
Typical work for everyone would be a arbitrary code execution ( pip once supported this with eggs package < 20.* )
Here are more cases of package ask user do some form of addition step installation for run code that no longer could be run by pip
- GitHub - mhammond/pywin32: Python for Windows (pywin32) Extensions
- GitHub - tensorflow/tensorflow: An Open Source Machine Learning Framework for Everyone
- GitHub - pyusb/pyusb: USB access for Python
If I have projA , other person have projB and we both want automate everything. Hence package maker cannot implement arbitrary code, so now each project need to check it full dependency and write code to detect which version of Above package used to run arbitrary code for install the correct version in the correct way. This work is repeat as many as number of projects using above package.
This kind of issue been discussed many time over years and guess what we still been talk about similar thing here. It is simply a architecture yelling at us. Solution also already been there. Now it is none within pip and about to reinvent in some short.
Pywin32 been having arbitrary code to solve issue I mention above. Here is the evident
My own package also do similar thing as total automation solution. It now no longer useful.
What is the point of including files in the wheel at all if youāre not going to install them into an arbitrary location that standard tools donāt recognise and the installed Python code isnāt going to use (files the Python code will use should probably be installed as package data files)? Expecting users to manually hunt in the Python installation directory for (for example) documentation files is a pretty bad UI (I recall old setuptools-based installs doing this, and honestly they might as well not have supplied documentation for all the use it was )
Iād like to see an actual use case where this was genuinely the UI that people wantedā¦
Iām certainly that I do not known what you looking for, and how many example would count enough of a full people. I had tried, but Iām failing you I guess.
My initial post was the collection of all related use cases requested by users over the years in various discussions about data_files that I could find. The only one we havenāt discussed so far is my own, very specific, use case. You could generalise that use case to mean āany integrations with any applications that have some sort of search path mechanism and understand virtualenvsā. But thatās not particularly useful as something to drive a new design of data_files.
The only other use case that I can find is Jupyterās extension mechanism (Distributing Jupyter Extensions as Python Packages ā Jupyter Notebook 5.7.6 documentation), and coincidentally it seems to fit the above description. Jupyter looks for both configuration files and data files in directories created using data_files (Jupyter Paths priority order - General - Jupyter Community Forum). Data files are usually javascript files that form the main content of an extension. Both configuration files and data files go into specific directories that Jupyter searches to find extensions.
However the shortcoming of data files means that packages also need to provide an additional installation step that is used when the data files donāt end up in the expected place. This additional installation step requires the data files to be duplicated as package data so that the package can locate them to report to jupyter. This also extends jupyterās search paths with an additional directory per extension.
For the above reason, plus the fact that data_files are considered deprecated in setuptools and not supported by any other packaging backends yet, there has been a proposal to replace the current extension mechanism to not rely on data_files (RFP for successor to data_files-based extension discovery? Ā· Issue #351 Ā· jupyter-server/jupyter_server Ā· GitHub). However there doesnāt seem to have been any ideas that address the same problems that the additional install command has, and thereās still a preference for data_files to stick around. Perhaps @minrk or someone else from the Jupyter team can speak more on this and possibly even on what they would want to see in a redesign of data_files.
Sorry. I glossed over those because I thought the discussion had moved on, but they do deserve comment. See below.
I think the consensus is pretty clear at this point that āabsolutely anywhere on the machineā isnāt acceptable. In particular, I agree with @uranusjrās comment:
Iām therefore going to limit myself to proposals that only allow āarbitrary data filesā to be installed under sys.prefix
.
Looking at your use cases (Iāll defer the first one for now, as itās your specific case and you noted that hasnāt been discussed yet):
.desktop
files. You said yourself they are platform specific and only useful if installed into system locations, i.e. not undersys.prefix
. So the solutions weāre looking at wonāt work for these.- Manual pages. Same as
.desktop
files, platform specific and need to be in a system location. Iāve already called these out specifically as a case where my experience is that placing them undersys.prefix
where OS utilities canāt see them is pointless. - Files in
/etc
- again system locations, and not supported by solutions that install undersys.prefix
. - Binaries - these work right now (thereās a āscriptsā location that things get installed to, and console scripts go there so we know this works). I donāt understand your comment about not working in a virtualenv. Unless you mean āinstall to
/usr/local/bin
ā or something like that, but weāre back to locations outsidesys.prefix
again.
So all of your use cases basically require āarbitrary locationsā and I believe that we already have a general feeling that this isnāt acceptable. If you want to still argue for that behaviour, I suggest that you make a specific proposal that describes what you want. But be prepared for it to be rejected - I will definitely vote -1 on it as Iāve already said.
Iām not entirely sure about your VFX use case, but it seems like your solution using virtualenv with an application-specific plugin is a reasonably good approach, so Iām not sure anything more is needed. And if you want to install the files to a location thatās already on MAYA_SCRIPT_PATH
, youāre back to installing outside of sys.prefix
ā¦
So to be more specific, Iām looking for examples of use cases where it would be necessary to be able to install āarbitrary data filesā into particular locations under sys.prefix
other than the ones that are already covered by the existing sysconfig locations.
Perhaps @minrk or someone else from the Jupyter team can speak more on this and possibly even on what they would want to see in a redesign of data_files.
Jupyter extension packages typically include files that should go in one or both of these:
- configuration files to enable extensions /specify config (in $prefix/etc)
- static resources in $prefix/share (javascript extension sources, html templates, kernel specification files for discovery, etc.)
We never expect to write outside sys.prefix as part of package installation. Ideally, these are staged into the right place using data_files (or whatever its replacement should be) at install time, but there are some exceptions where that canāt/wonāt work, so we provide our own jupyter ... install
commands which stage files into $prefix/etc or $prefix/share after installation time. This two-step install has led to lots of mixed-up installations, as removing or upgrading a package is no longer associated with removing or changing other files associated with it. data_files
installs work great for this today.
Critically for Jupyter, none of Jupyter specs are Python-specific, and many of these things are not part of Python packages at all, so we standardize on evaluating paths relative to $PREFIX rather than something python specific (we generally use sys.prefix for this), and we donāt want to assume that everything comes from a Python package.
For install time, all we really want is a reliable way to write {sys.prefix}/share/
(or prefix/etc) and a way at runtime that returns the same {sys.prefix}/share|etc
that should work for installs:
- in venv
- not in env
--user
Itās the lack of symmetrical āwhere would you have put it?ā API thatās been a challenge for us. We encourage data_files as the easiest way to do this which works almost all of the time (very reliably in venvs and conda envs), but we are aware of custom distutils install schemes like system Pythons where it can get weird, because sys.prefix is not actually where installed files end up.
Cool, that makes a lot of sense to me. So if sysconfig added a new path for āshareā (or āetcā), that would satisfy this use case? The wheel spec already covers the installation side of this, as the rule in the PEP applies to any scheme key that exists in sysconfig. So all you need is the lookup side, which sysconfig would provide.
Fedora Python maintainer here. I need to pitch in as somebody who fundamentally disagrees that desktop files or manual pages "donāt work under sys.prefix
" or that installing stuff to an arbitrary location under sys.prefix
has an increased potential to create file-conflicts. It is matter of perspective. Let me describe a matrix of the following things I could think of:
Python modules in {sys.prefix}/.../site-packages
Python modules: sys.prefix
is /usr
- The modules work naturally because they are by definition in
sys.path
. - There is a potential of file-conflict between different Python packages.
- There is a potential of file-conflict between pip-installed packages and distro-package manager.
Python modules: sys.prefix
is /usr/local
or ~/.local
- The modules work naturally because they are by definition in
sys.path
. - There is a potential of file-conflict between different Python packages.
Python modules: sys.prefix
is within Python virtual environment
- The modules work naturally because they are by definition in
sys.path
. - There is a potential of file-conflict between different Python packages, albeit the chances are very limited, as virtual environments tend to be one-purpose and the package set usually does not grow without bounds.
Python modules: sys.prefix
is another arbitrary location including Windows
- The modules work naturally because they are by definition in
sys.path
. - There is a potential of file-conflict with different Python packages.
Commands (scripts) in {sys.prefix}/bin
Commands: sys.prefix
is /usr
- The commands work naturally because
/usr/bin
is (almost) always on$PATH
. - There is a potential of file-conflict between different Python packages.
- There is a potential of file-conflict between pip-installed packages and distro-package manager.
- There is a potential of file-conflict between pip-installed packages and other language stack package managers that would also install scripts into this location, or manually created content.
Commands: sys.prefix
is /usr/local
- The commands usually work because
/usr/local/bin
tends to be on$PATH
. - If needed, users can extend their
$PATH
easily to make it work. - There is a potential of file-conflict between different Python packages.
- There is a potential of file-conflict between pip-installed packages and other language stack package managers that would also install scripts into this location, or manually created content.
Commands: sys.prefix
is ~/.local
- The commands usually work because
~/.local/bin
tends to be on$PATH
on modern distros. - If needed, users can extend their
$PATH
easily to make it work. - There is a potential of file-conflict between different Python packages.
- There is a potential of file-conflict between pip-installed packages and other language stack package managers that would also install scripts into this location, or manually created content.
Commands: sys.prefix
is within Python virtual environment
- The commands donāt work unless the virtual environment is activated:
activate
script adds the directory to$PATH
. - Users can add symbolic links to scripts in a virtual environment to directories on their
$PATH
. - There is a potential of file-conflict between different Python packages, albeit the chances are very limited, as virtual environments tend to be one-purpose and the package set usually does not grow without bounds.
Commands: sys.prefix
is another arbitrary location
- The commands donāt work unless user modifies their
$PATH
. - Users can add symbolic links to scripts in arbitrary locations to directories on their
$PATH
. - There is a potential of file-conflict witch anything else.
Commands: Windows
- This is handled differently on Windows and it seems to work.
Manual pages in {sys.prefix}/share/man
Manpages: sys.prefix
is /usr
- The manual pages work naturally because
/usr/share/man
is (almost) always inmanpath
. - There is a potential of file-conflict between different Python packages.
- There is a potential of file-conflict between pip-installed packages and distro-package manager.
- There is a potential of file-conflict between pip-installed packages and other language stack package managers that would also install manual pages into this location, or manually created content.
Manpages: sys.prefix
is /usr/local
- The commands usually work because
/usr/local/share/man
tends to be inmanpath
. - If needed, users/distros can extend their config easily to make it work.
- There is a potential of file-conflict between different Python packages.
- There is a potential of file-conflict between pip-installed packages and other language stack package managers that would also install manual pages into this location, or manually created content.
Manpages: sys.prefix
is ~/.local
- The manual pages usually work because
~/.local/share/man
tends to be onmanpath
on modern distros. - If needed, users/distros can extend their config easily to make it work.
- There is a potential of file-conflict between different Python packages.
- There is a potential of file-conflict between pip-installed packages and other language stack package managers that would also install manual pages into this location, or manually created content.
Manpages: sys.prefix
is within Python virtual environment
- The manual pages donāt work out of the box.
- Users can make them work by setting/extending
$MANPATH
. - If deemed useful, the
activate
script could be improved to set/extend$MANPATH
. - Users can add symbolic links to manual pages in a virtual environment to directories on their
manpath
. - There is a potential of file-conflict between different Python packages, albeit the chances are very limited, as virtual environments tend to be one-purpose and the package set usually does not grow without bounds.
Manpages: sys.prefix
is another arbitrary location
- The manual pages donāt work out of the box.
- Users can make them work by setting/extending
$MANPATH
. - Users can add symbolic links to manual pages in arbitrary locations to directories on their
manpath
. - There is a potential of file-conflict witch anything else.
Manpages: Windows
- The manual pages are not relevant there but they donāt hurt anything.
Desktop files in {sys.prefix}/share/applications
(This also applies to their icons in {sys.prefix}/share/icons
or {sys.prefix}/share/pixmaps
.)
Desktop files: sys.prefix
is /usr
- The desktop files work naturally because
/usr/share/applications
is used by default. - There is a potential of file-conflict between different Python packages.
- There is a potential of file-conflict between pip-installed packages and distro-package manager.
- There is a potential of file-conflict between pip-installed packages and other language stack package managers that would also install desktop files into this location, or manually created content.
Desktop files: sys.prefix
is /usr/local
or ~/.local
- The desktop files work naturally because
/usr/local/share/applications
and~/.local/share/applications
is used by default. - There is a potential of file-conflict between different Python packages.
- There is a potential of file-conflict between pip-installed packages and other language stack package managers that would also install desktop files into this location, or manually created content.
Desktop files: sys.prefix
is within Python virtual environment
- The desktop files donāt work.
- Users might be able to make them work by some configuration (I have not explored this).
- Users can add symbolic links to desktop files in a virtual environment to the directories that work with desktop files.
- There is a potential of file-conflict between different Python packages, albeit the chances are very limited, as virtual environments tend to be one-purpose and the package set usually does not grow without bounds.
- If somebody wants to explore new ideas, we can have a concept of āactivating a virtual environment for your desktop environmentā, but I donāt think it would be that useful.
Desktop files: sys.prefix
is another arbitrary location
- The desktop files donāt work.
- Users might be able to make them work by some configuration (I have not explored this).
- There is a potential of file-conflict witch anything else.
Desktop files: Windows
- The desktop files are not relevant there but they donāt hurt anything.
Static application data in {sys.prefix}/share/{app_name}
(E.g. Jupyter kernels.)
- They always work regardless of
sys.prefix
. - There is a potential of file-conflict witch anything else that uses
app_name
.
tl;dr
- The stuff works quite fine for many values of
sys.prefix
; the degree of āworks out of the boxā varies, but there is some potential for improvement as well. - The potential for file-conflicts is not worse than the existing potential (you can nuke a system by installing
bash
orsh
script quite fine already even withoutdata_files
). - Many files are useless on Windows but I consider that OK.
Thanks, @AWhetter for looping Jupyter in, and @minrk for the thorough summary of our challenges!
To the various venv/not venv/--user
cases, Iād add:
-
uninstall
leaving the file system āunharmedā - a key ācanāt/wonātā work case:
pip install -e
(and whateverflit
andpoetry
do). I donāt know how this can be solved in a cross-platform way, but our experiments in going from 4 search paths to hundreds (viaentry_points
) wereā¦ not encouraging, and would require non-python Jupyter components to shell out to get such a list.
Semi-related: having at least some warning (which could ideally be elevated to an error with e.g. an environment variable) when two packages try to write to the same file in {sys.prefix}/etc
or {sys.prefix}/share
would be helpfulā¦ today we kinda wait for downstreams (or users) to find these issues. Our flagship first-party consumers of etc
have well-known (well, at least documented) {sys.prefix}/etc/jupyter_*_config.d/
folders that have made this more robust, but one (well, usually two) badly-behaved packages can make a right mess of things depending on the order of installation.
In practice, this is somewhat of a side issue. The question of whether desktop files or manual pages "work under sys.prefix
" is only significant if someone is arguing that wheels need to support installation of files to locations outside of sys.prefix
- and I donāt think anyone is still trying to make that argument.
What weāre left with now, as far as I can see, is that the wheel spec supports custom install locations for any named path that is supported by sysconfig
. Currently, sysconfig
supports 8 paths (from the Python docs):
- stdlib : directory containing the standard Python library files that are not platform-specific.
- platstdlib : directory containing the standard Python library files that are platform-specific.
- platlib : directory for site-specific, platform-specific files.
- purelib : directory for site-specific, non-platform-specific files.
- include : directory for non-platform-specific header files.
- platinclude : directory for platform-specific header files.
- scripts : directory for script files.
- data : directory for data files.
Adding extra install locations is as simple as getting some extra locations added to the sysconfig module. Itās also exactly as hard as doing that - doing anything outside of the stdlib still requires getting agreement on locations for all platforms, supporting virtualenvs, handling ways of letting distros customise the locations, etc. I suspect people have a view that thereās an āeasierā way than getting sysconfig changed, but honestly, I doubt thatās the case in practice.
So I think the ānew standardā here might simply be a matter of requesting new install locations for the sysconfig module.
We will also have to deal with getting KeyError when a wheelās .data/ directory has a name not in sysconfig. Iāve wanted to just leave the data directory in site-packages in that case and possibly print() a warning.
That should be the case right now. The PEP doesnāt insist that wheels stick to a list of names, so tools have to be prepared for KeyError
already.
Iād be fine with someone checking what existing installers do (pip, wheel, installer, poetry, ā¦?) and if they all have a common behaviour, adding that to the PEP as a clarification. If they do different things, weād have to say itās currently implementation-defined and if we want to define a specific behaviour, that would be a proper PEP update.
@AWhetter , @minrk, looking at the pySerial and dig depth into it. I realize the answer to our problem may fall into āCondaā tool - Conda for data scientists ā conda 4.10.1.post2+b6d32c8d7 documentation