How to use pip install to build some scientific packages from sources with custom build arguments

Great thread, thanks to all for the constructive discussion. I’ve upgraded my build Dockerfile for SciPy and NumPy which was modelled on the same article (the one from 5 years ago), targetting Amazon Linux 2 (a “layer” for a packaged AWS Lambda microservice). The build process from source enables it to fit within the AWS Lambda size constraints, which I appreciate is a bit of a misuse of the purpose but got to do what you’ve got to do to deploy! :smiley_cat:

There was a bit of a cascade of build dependency requirements with recent version updates’ requirements (cmake, gcc, etc.) so a significant portion of the dependencies had to be built from source in the Docker image.

I hope nobody minds if I share this here where it will might be visible to others facing a similar challenge, or for comparison against the solution above. There were a few other places describing this problem (e.g. here) which trailed off without a clear indication of whether they solved it or not.

I haven’t found the magic combination of options that works yet, but at least I think you’ve set me on the right path now, thanks.

Basic setup and Yum packages installed before build:

FROM mlupin/docker-lambda:python3.10-build AS build

USER root

WORKDIR /var/task

# https://towardsdatascience.com/how-to-shrink-numpy-scipy-pandas-and-matplotlib-for-your-data-product-4ec8d7e86ee4
ENV CFLAGS "-g0 -Wl,--strip-all -DNDEBUG -Os -I/usr/include:/usr/local/include -L/usr/lib64:/usr/local/lib64:/usr/lib:/usr/local/lib"

RUN yum install -y wget curl git nasm openblas-devel.x86_64 lapack-devel.x86_64 python-dev file-devel make Cython libgfortran10.x86_64 openssl-devel

# Download and install CMake
WORKDIR /tmp

ENV CMAKE_VERSION=3.26.4

# Download and install CMake
RUN wget https://github.com/Kitware/CMake/releases/download/v${CMAKE_VERSION}/cmake-${CMAKE_VERSION}.tar.gz
RUN tar -xvzf cmake-${CMAKE_VERSION}.tar.gz
RUN cd cmake-${CMAKE_VERSION} && ./bootstrap && make -j4 && make install

# Clean up temporary files
RUN rm -rf /tmp/cmake-${CMAKE_VERSION}
RUN rm /tmp/cmake-${CMAKE_VERSION}.tar.gz

WORKDIR /var/task
RUN pip install --upgrade pip

RUN pip --version

# Specify the version to use for numpy and scipy
ENV NUMPY_VERSION=1.24.3
ENV SCIPY_VERSION=1.10.1

# Download numpy and scipy source distributions
RUN pip download --no-binary=:all: numpy==$NUMPY_VERSION

# Upgrade GCC to version 8 for SciPy Meson build system
RUN wget https://ftp.gnu.org/gnu/gcc/gcc-8.4.0/gcc-8.4.0.tar.gz && \
    tar xf gcc-8.4.0.tar.gz && \
    rm gcc-8.4.0.tar.gz && \
    cd gcc-8.4.0 && \
    ./contrib/download_prerequisites && \
    mkdir build && \
    cd build && \
    ../configure --disable-multilib && \
    make -j$(nproc) && \
    make install && \
    cd / && \
    rm -rf gcc-8.4.0

# Set environment variables
ENV CC=/usr/local/bin/gcc
ENV CXX=/usr/local/bin/g++
ENV FC=/usr/local/bin/gfortran

# Verify GCC version
RUN gcc --version
RUN /usr/local/bin/gfortran --version

# Extract the numpy package and build the wheel
RUN pip install Cython
RUN ls && tar xzf numpy-$NUMPY_VERSION.tar.gz
RUN ls && cd numpy-$NUMPY_VERSION && python setup.py bdist_wheel build_ext -j 4

ENV BUILT_NUMPY_WHEEL=numpy-$NUMPY_VERSION/dist/numpy-$NUMPY_VERSION-*.whl

RUN ls $BUILT_NUMPY_WHEEL

NumPy and SciPy build (for simplicity I installed a wheel with the same version of NumPy as I was building from source, the wheel being purely for building SciPy)

# Don't install NumPy from the built wheel but use same version (it's a SciPy dependency)
RUN pip install numpy==$NUMPY_VERSION
RUN python -c "import numpy"

# Install build dependencies for the SciPy wheel
RUN pip install pybind11 pythran

# Extract the SciPy package and build the wheel
# RUN wget https://github.com/scipy/scipy/archive/refs/tags/v$SCIPY_VERSION.tar.gz -O scipy-$SCIPY_VERSION.tar.gz
RUN git clone --recursive https://github.com/scipy/scipy.git scipy-$SCIPY_VERSION && \
    cd scipy-$SCIPY_VERSION && \
    git checkout v$SCIPY_VERSION && \
    git submodule update --init

RUN cd scipy-$SCIPY_VERSION && python setup.py bdist_wheel build_ext -j 4

ENV BUILT_SCIPY_WHEEL=scipy-$SCIPY_VERSION/dist/SciPy-*.whl
RUN ls $BUILT_SCIPY_WHEEL

# Install the wheels with pip
# (Note: previously this used --compile but now we already did the wheel compilation)
RUN pip install --no-compile --no-cache-dir \
  -t /var/task/np_scipy_layer/python \
  $BUILT_NUMPY_WHEEL \
  $BUILT_SCIPY_WHEEL

RUN ls /var/task/np_scipy_layer/python

# Clean up the sdists and wheels
RUN rm numpy-$NUMPY_VERSION.tar.gz
RUN rm -r numpy-$NUMPY_VERSION scipy-$SCIPY_VERSION

# Uninstall non-built numpy after building the SciPy wheel
RUN pip uninstall numpy -y

RUN cp /var/task/libav/avprobe /var/task/np_scipy_layer/ \
    && cp /var/task/libav/avconv /var/task/np_scipy_layer/

RUN cp /usr/lib64/libblas.so.3.4.2 /var/task/np_scipy_layer/lib/libblas.so.3 \
    && cp /usr/lib64/libgfortran.so.4.0.0 /var/task/np_scipy_layer/lib/libgfortran.so.4 \
    && cp /usr/lib64/libgfortran.so.5.0.0 /var/task/np_scipy_layer/lib/libgfortran.so.5 \
    && cp /usr/lib64/liblapack.so.3.4.2 /var/task/np_scipy_layer/lib/liblapack.so.3 \
    && cp /usr/lib64/libquadmath.so.0.0.0 /var/task/np_scipy_layer/lib/libquadmath.so.0 \
    && cp /usr/lib64/libmagic.so.1.0.0 /var/task/np_scipy_layer/lib/libmagic.so.1 \
    && cp /usr/local/lib/libmp3lame*.so* /var/task/np_scipy_layer/lib \
    && cd /var/task/np_scipy_layer  \
    && zip -j9 np_scipy_layer.zip /var/task/np_scipy_layer/avconv \
    && zip -j9 np_scipy_layer.zip /var/task/np_scipy_layer/avprobe \
    && zip -r9 np_scipy_layer.zip magic  \
    && zip -r9 np_scipy_layer.zip python  \
    && zip -r9 np_scipy_layer.zip lib