Hi friends,
The background is I’m currently on a data science project where we use Python to perform a series of matrix operations and integrations for thermodynamic temperature calculations. However, I’ve encountered a performance bottleneck and had to switch to Rust because it runs much faster.
I understand that Python should be used in a more Pythonic way and not just like solving a puzzle, but I still want to bring this issue up for discussion. I hope to understand more about this language and its characteristics.
The input data structure is quite simple for a thermal model, like:
epoch_time, current, duration
1, xxxxx, 0.23
2, xxxxx, 0.3
3, xxxxx, 0.5
The physical model has several surfaces for each inductor [R1, R2, R3, R4…] and a matrix of initial parameters. We need to calculate the integral (heat by current over a certain time) via time, where each input value depends on the previous value we calculated.
The calculation parts (matrix operations) were written using numpy and scipy (for the integral part), avoiding dynamic lists/arrays (e.g., append list, np.insert, etc.). However, when the dataset became extremely large (over 10 billion entries), it still ran very slowly.
Then the code was rewritten in Rust (using the rectangular approximation method for integration), and it ran about a hundred times faster than the Python version.
I’m quite confused about why there is such a significant performance difference. numpy is supposed to be written in Fortran, and scipy packages should also be well-optimized. Is this performance gap due to the cost of compiling or something else?
The goal is still to write the entire project in Python so the team can easily understand it, and we do appreciate Python for data science.
Any thoughts and questions are welcome. more details can be provided if needed. or has anyone else done similar work or faced similar issues?