Is Python a stable language?

I have been trying to learn Python, specifically with respect to applying K-Means Analysis to my data. I found the following tutorial and copied the code into Jupyter notebook to run it.

K-Means Clustering with scikit-learn

I received three different errors, one of which I resolved (changed “size” to the revised term “height” in the code.

The first warning was returned for two instances of the following code, one for a training set and the second for a test set.

# Fill missing values with mean column values in the train set

train.fillna(train.mean(), inplace=True)

FutureWarning: Dropping of nuisance columns in DataFrame reductions (with ‘numeric_only=None’) is deprecated; in a future version this will raise TypeError. Select only valid columns before calling the reduction.
train.fillna(train.mean(), inplace=True)

The second was for multiple instances of slight variants of the following code:

KMeans(algorithm=‘auto’, copy_x=True, init=‘k-means++’, max_iter=600,
n_clusters=2, n_init=10, n_jobs=1, precompute_distances=‘auto’,
random_state=None, tol=0.0001, verbose=0)

TypeError: init() got an unexpected keyword argument ‘n_jobs’

I don’t know how to fix the incorrect code, which, I assume, was correct when the tutorial was written.

My question is this, is it worth learning Python if code is going to become obsolete, with code no longer running over, in this case, four years?

What you’re seeing here isn’t the stability of Python as a language, but the validity of that particular tutorial on a third-party library. It seems that this particular problem was removed in version 0.25 of scikit-learn:

In general, something shipped as version 0 cannot be considered stable; scikit-learn is now shipping version 1.1.3, so if you build your software for this version, you can expect a lot more stability now.

Unfortunately, the tutorial you linked to doesn’t seem to pay any heed to the fact that it’s been written for a pre-stable version of the library, so there’s no indication that the tutorial can easily go out of date.

Python itself, however, is very stable; code written for Python 3.0 (released 2008) will almost certainly run on the current version of Python, and even a lot of Python 2 code will still work just fine. Obviously new features get added in each new version, but something that doesn’t use those features should be able to run equivalently on any suitable version.

3 Likes

Thank you for the clarification and explanation. Very much appreciated.

1 Like

Python the language is more stable than, say, Julia; but less stable than, say, Java. As a general rule, Python is likely to add new features and unlikely to remove them.

The standard libraries that ship with Python tend to change a little more rapidly than the core language, but not much more.

Third-party libraries that ship independently are out of our control, they will change at whatever speed their authors want, and have whatever policy towards breaking changes they prefer.

That includes scikit-learn. Generally anything with the 0.x version is probably experimental and unstable.

Machine learning is a relatively new and active field, and so software related to it may change more rapidly than other software.

All bets are off for serious security issues (and there may be disagreements as to what counts as “serious”).

3 Likes

Thank you for the reassurance. :slight_smile:

Machine learning APIs change pretty rapidly regardless of the language. Five digit paper citation numbers and billions of dollars in investment speak for themselves. TensorFlow, a C++ library, has gone through a massive API redesign not too long ago.

Pure Python scripts from 14 years ago are 99% guaranteed to still work today, all other things being equal. But very few people code in pure Python, and that’s why dependency management tools are important for bigger projects.