Hi friends,
I’m working with pyspark and trying to solve symbolic equations using the sympy lib. To use sympy in a distributed spark env, I need to package the dependencies into a .zip
file and submit them with my script.
Initially, I attempted to package only the sympy
and mpmath
directories using the following command:
I installed the sympy with:
pip install --target=dependencies sympy
and here is the structure of my dependencies
folder:
dependencies/
├── mpmath/
├── mpmath-1.3.0.dist-info/
├── sympy/
├── sympy-1.13.3.dist-info/
├── __pycache__/
├── bin/
├── share/
Initially, I attempted to package only the sympy
and mpmath
directories using the following command:
zip -r ../dependencies.zip . sympy/* mpmath/*
However, when running my PySpark job, I encountered the following error:
ImportError: cannot import name 'make_mpc' from 'mpmath' (unknown location)
When I packaged the entire dependencies
directory without excluding any files or folders, like this:
zip -r ../dependencies.zip .
The error disappeared, and everything worked fine.
I’m trying to understand why excluding only the sympy
and mpmath
directories caused the import error . The dependencies
folder also contains other directories like mpmath-1.3.0.dist-info
, sympy-1.13.3.dist-info
, __pycache__
, and share
. Could the missing metadata or some other files in these folders be necessary for sympy or mpmath to work correctly?
Any insights into what might be causing the issue when only sympy
and mpmath
are packaged, and why including all files resolves the problem? I’m particularly interested in understanding if files in dist-info
, share
, or bin
directories are necessary for the library to function properly.
Thanks for your help!