Custom build steps / Moving Bokeh off setup.py

Unfortunately a Bokeh sdist is useless and unusable without the BokehJS runtime component bundled inside it, and BokehJS itself is built from Typescript. I understand your POV but when other languages and transpilations start to enter the mix, things get more complicated. Building the Typescript requires an entire toolchain that is not common in Python projects, and also not really supported by existing build tools. Add to that: we 100% want to be the only source of truth for BokehJS. I would quit OSS before subjecting myself to the support burden of users building and running subtly different BokehJS builds, all claiming to be the same version, by accident. :slight_smile:

I’m not sure why the JS files are not picked up after declaring them in MANIFEST.in, perhaps a bug with the version of setuptools-git-versioning in CI.

This seems surprising, setuptools-git-versioning just computes a version from git describe AFAIK, I am not sure what it would have to do with selection of files to include.

Usually an environment mismatch, either versions are different or tools/libraries are not present in CI (including pip and system libraries).

Isn’t this what build isolation is supposed to cover? Genuine question, I thought that was the intent. Both local and CI specify setuptools >=64 which I think only leaves setuptools 64 and 65 at the moment. I guess it’s possible that there is a discrepancy in that one version. I will certainly check versions more closely.

I did notice that the working wheel produced by python -m build -s -w . has these warnings (in CI but not locally):

  check.warn(importable)
/tmp/build-env-puuw3nrz/lib/python3.8/site-packages/setuptools/command/build_py.py:202: SetuptoolsDeprecationWarning:     Installing 'bokeh.server.static.js.lib.models.callbacks' as data is deprecated, please list it in `packages`.
    !!
    ############################
    # Package would be ignored #
    ############################
    Python recognizes 'bokeh.server.static.js.lib.models.callbacks' as an importable package,

I don’t undestand this, the directories are absolutely not importable modules. They are data, and I thought we had them covered by this line in our MANIFEST.in:

graft bokeh/server/static

is there something else we need to configure? We need to recursively include everything under bokeh/server/static as data that is part of the package.

OK! Well, mystery solved. The local sdist worked because the JS files happened to already be copied into the tree before anything at all happened. I tried overloading sdist instead of build but even that command is not early enough apparently. Forgoing any custom commands, and simply running the coe to install bokehjs unconditionally at the start, works:

build_or_install_bokehjs()

setup()

This includes the required files in both the sdist, and the wheel, and also gets rid of the warnings above (but I have no idea why that is the case).

You might be encountering package data in subdirectory causes warning · Issue #3340 · pypa/setuptools · GitHub — I think setuptools is going too far there, and data files in directories should be discussed in a bigger forum (like here).

@merwok thank you for the reference! I will definitely need to come up to speed on that discussion (though for now the warnings seem to have disappeared with the last change, but I still don’t know why)

I ended up silencing that warning by unnecessarily adding the
offending directories as additional packages:

[tool.setuptools]
# Silence a warning about namespace packages included as data
# because we ship subdirectories inside the foo package tree
packages = ["foo", "foo.tests", "foo.tests.fixtures"]

Hi @bryevdv, bokeh.server.static.js.lib.models.callbacks is an importable package in Python.
You can verify that affirmation by running the following snippet:

# docker run --rm -it python:3.10.6 bash
python -m venv /tmp/.venv
/tmp/.venv/bin/python -m pip install 'bokeh==3.0.0.dev13'
cat <<EOF | /tmp/.venv/bin/python -
import bokeh.server.static.js.lib.models.callbacks
print(bokeh.server.static.js.lib.models.callbacks)
print("__path__ = ", bokeh.server.static.js.lib.models.callbacks.__path__)
print("__package__ = ", bokeh.server.static.js.lib.models.callbacks.__package__)
print(dir(bokeh.server.static.js.lib.models.callbacks))
EOF

Which will output the following:

<module 'bokeh.server.static.js.lib.models.callbacks' (<_frozen_importlib_external._NamespaceLoader object at 0x7f9e9a7fff10>)>
__path__ =  _NamespacePath(['/tmp/.venv/lib/python3.10/site-packages/bokeh/server/static/js/lib/models/callbacks'])
__package__ =  bokeh.server.static.js.lib.models.callbacks
['__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__']

This is probably related to the fact that now you are creating the files before the “core” of setuptools reads its configuration. This way, when the tool.setuptools.packages.find configuration is expanded, bokeh.server.static.js.lib.models.callbacks is automatically scanned and listed.

I’ve sent you a PR, just as food for though.

@merwok, probably this deserves a separated thread, but I agree with you that the concept of directory considered as data (instead of package) is something that needs to be discussed in forum different that setuptools issue tracker (probably the discourse, maybe a different category). Personally, I am not that invested, so I will recommend anyone that believes in this difference to pursue that.

While there is no change in Python’s implementation, or PEP clarifying the subject, we can summarise the situation as the following:

  • To the eyes of Python, any directory corresponds to a package and can be imported[1].
  • Setuptools has a dedicated configuration parameter for any package that the user wants to include in the final distribution: [option] packages= in setup.cfg, [tool.setuptools] packages= in pyproject.toml and setup(..., packages=...) in setup.py.
    • The expecation is that if a package is not listed in this configuration, it should not be included in the final distribution.
    • include_package_data=True and similar configs are currently broken, because they will add packages to the distribution that should not be there.
    • This bug becomes even more obvious when you consider something like:
      setup(
         ...,
         packages=find_namespace_packages(exclude=["pkg.tests*"]),
         include_package_data=True
      )
      
  • The interpretation that directories without Python files are not packages seems to be a opinion.
    • This opinion is not backed by the behaviour of the Python interpreter (or PEP as far as I know, please correct me if I am wrong).
    • In fact, the opposite can be experimentally verified.
  • If we consider that directories should be treated as data instead of packages, we reach a conceptual deadlock when dealing with the find_...(exclude=...), include_package_data=True use case.
    • Without any other development on standardization or in the Python core, I believe that the best chance we have to solve this problem is to honour the packages= configuration.
    • The warning was added to raise awareness about Pythons’ behaviour and give time for the users to prepare.

Please feel free to open a different topic for discussion so we avoid hijacking this thread.


  1. For the time being setuptools warning is not issued when the directory name is not a valid identifier, because people can argue that this makes the directory not importable. We can however argue that the directory can be imported via importlib.import_module. ↩︎

3 Likes