Having read the ‘Infinity’ constant in Python, towards the end there was a mention that if NaN and Inf were to be made singletons users would do something like the following
x is NaN
or
x is Inf
And that it would be bad as a test.
I would like to pose why I think this isn’t so bad from the data science perspective. While NaN and Inf are floating values there is a current trend in the data science community to use NaN in particular as a method for indicating a missing value. This is most notable in Numpy and Pandas.
The issue with using NaN as a missing value is that suppose a column of DataFrame is an integer type, then of course, that column now becomes cast to a float, as it should. However, many people do not actually want that behavior. So Pandas has taken to making arrays and dtypes to handle this. They call the API the ExtensionArrays and ExtensionDtypes and from them they have created objects such as IntegerArray and BooleanArray which can hold the respective type in the name and NaN values.
Now to my naïve understanding, the way this work is that they have a logical mask along side the underlying Numpy array that indicates whether a value is NaN or not.
According to [1] Numpy is trying to get around to it. R currently has it.
So my point is that a large part of the community currently does
x is NaN
or
x is Inf
as a check (well the former much much more than the latter), in a sense.
It would be very nice to be able to have a built in that could be used universally as a missing value indicator. Now having said that, I realize None is supposed to act as this of sorts. Additionally, I realize that the reason None is not used in Pandas or Numpy is because it would turn their dtypes into Object type which makes things considerably slower.
So perhaps what I am asking for could never happen anyway and the entire burden of this falls completely on Numpy/Pandas. However, if it could, it would be quite amazing in my opinion. I feel like I may have just wasted my time typing this up but on the off chance it might happen, and core python isn’t aware of the current use case in data science, it will have made it worth it.
Thank you for your time and patience.
[1] Stackoverflow [NumPy or Pandas: Keeping array type as integer while having a NaN value]
PS:
I tried to add more links for ease of understanding but I am being limited to 2 as of right now : (