How to replace `nan` without external libraries when strings are present?

ryan-duve · July 2, 2024, 1:37pm

Given a dictionary whose keys are strings and values are a combination of strings, floats, integers and math.nan

import math

d = {"name": "Ryan", "value": 123.456, "other_value": math.nan}

I am trying to convert the nan values to an empty string like this:

{'name': 'Ryan', 'value': 123.456, 'other_value': ''}

The following gives an error:

{k: "" if math.isnan(v) else v for k, v in d.items()}
TypeError: must be real number, not str

What is the best way to check for nan that doesn’t break on non-numeric types? The only options I could come up with are:

Using an if statement that checks if each value is a float:
{k: v if not (isinstance(v, float) and math.isnan(v)) else '' for k,v in d.items()}
Utilizing the json module:
json.loads(json.dumps(d), parse_constant={"NaN":""}.get)
Relying on every other value being equal to itself:
{k: v if v==v else "" for k,v in d.items()}

There are ways to do this with dependencies like Numpy/Pandas, but they are not available at the time this code is executed. I am wondering whether there is a standard, idiomatic way to do this with just Python and the standard library before I pick from one of the above.

franklinvp · July 2, 2024, 1:46pm

Is it ensured that it is only going to be math.nan and never, for example, a float('nan') or a NaN created in some other way?

JamesParrott · July 2, 2024, 1:47pm

{k: "" if isinstance(v, float) and math.isnan(v) else v 
 for k, v in d.items()
}

ryan-duve · July 2, 2024, 1:52pm

@franklinvp – It is currently a mixture. They are made with

float('nan')
json.loads('{"key": NaN}')
math.nan

However, we control this and can standardize to make it simpler if necessary.

JamesParrott · July 2, 2024, 1:58pm

Any nan that is not handled correctly by math.isnan is a bug.

>>> math.isnan(math.nan)
True
>>> math.isnan(float('nan'))
True
>>> import json
>>> math.isnan(json.loads('{"key": NaN}')['key'])
True

franklinvp · July 2, 2024, 2:04pm

The set of NaNs is larger than the one value of math.nan.

It is a different question if they want to replace all NaNs than if they one to replace only the math.nan.

JamesParrott · July 2, 2024, 2:22pm

Sure. If only specific NaNs are wanted, then a more specific test is needed.

I was reading around the lines and took an educated guess about what OP meant to say they wanted, not what they actually said that they wanted. I.e. general data cleaning, and fewer bugs

Why would anyone want to keep one type of NaN, and not another (we already know they’re being replaced with an empty string, not a particular NaN option)?

litlighilit · July 2, 2024, 3:55pm

Using self-equality-check like if v != v over if math.isnan(v)
is fine.

According to IEEE-754: nan != nan.

litlighilit · July 2, 2024, 4:04pm

I test for numpy.float*, they all follows nan != nan, and scipy.nan is just alias for float(‘nan’)

The only exception is sympy.nan, where you have to check via not sympy.Eq(v, v) to check if v`` is of sympy.nan.

So if you do not use sympy, then v != v just works fine!

franklinvp · July 2, 2024, 4:59pm

You have the implication “isnan(x) implies x != x”.

The reverse implication is not true.

class X:
  def __eq__(s, o):
    return False

x = X()
x != x # is False.

Therefore, the two conditions are not equivalent.

You could use x != x, as long as you can ensure you don’t have a case in which they are not testing the same thing.

litlighilit · July 2, 2024, 5:02pm

I assume __eq__ conforms to reflexivity for any other normal types

In such a dict, each element’s __eq__ conforms to it except nan.

kknechtel · July 2, 2024, 6:00pm

I agree with @JamesParrott - this is the way.

Rosuav · July 2, 2024, 6:04pm

Yep. That’s the standard idiom for nan testing. If there are any other values that aren’t equal to themselves, deal with it when you find it; you’ll probably find that those would be considered to be nan by various other tools too. There’s really no reason to force a float check first.

micky388 · July 8, 2024, 1:48pm

To replace nan without external libraries when strings are present, you can iterate through your data and check each element. Use a conditional statement to replace ‘nan’ strings with a desired value, ensuring data integrity and consistency throughout your dataset.

ryan-duve · July 8, 2024, 2:52pm

Hey @micky388 and welcome to Python discussions! I’m actually really new here myself, but you seem to be newer so I thought I’d offer the welcome.

I appreciate you trying to help me, but I don’t see how I can take your comment and put it to action. The original post lists three ways to do what you suggest, but you don’t seem to identify which of them (if any) to go with. Let me know if I’m missing something from your post.

For everyone else, I went with

{k: "" if isinstance(v, float) and math.isnan(v) else v for k, v in d.items()}

because I think it’s more clear for code readers. I like how succinct if v==v else "" is, but I think the explicit type check is more clear to someone newer to Python (which is expected in the business context of this module). Thanks to everyone who answered!