Add a default option to int() to use if conversion fails

MarkCBell · October 29, 2020, 11:43pm

Many times I have found myself writing code to convert strings containing user input, regex matches etc. to other types such as integers. To handle edge cases in which int(x) raises a ValueError, this usually involves having to write code like:
x = int(user_input) if user_input else 0
or

matches = re.match(r'(.*):(\w*)', data)
try:
    y = int(matches.group(0))
except ValueError:
    y = 1
try:
    z = int(matches.group(1))
except ValueError:
    z = 10

What do people think of adding a keyword argument default= to int() which is used if the input would raise a ValueError? This would allow these examples to then be expressed as x = int(user_input, default=0) and y = int(matches.group(0), default=1).

In the case of CPython, it looks like this could by achieved by adding a single extra if statement around here which if reached and default was set would then return the default.

If this is a reasonable suggestion, what about adding a default option to other builtin constructors that can raise a ValueError such as float() or complex()?

uranusjr · October 30, 2020, 12:12am

Strong -1 from me. I’m not entirely against having such a convenience method, but it’s debatable whether it should be in the stdlib, and definitely does not belong in int().

steven.daprano · October 30, 2020, 1:17am

Mark Bell suggested:

x = int(user_input, default=0)

We might agree that converting the empty string to some default value is
(sometimes) a reasonable thing to do, but the empty string is not the
only input that raises a ValueError. Silencing arbitrary ValueErrors by
replacement with a default value is surely going to lead to “Garbage In,
Garbage Out” data processing.

I cannot imagine a scenario where I, as the user, would consider it
acceptable to use 0 because I entered “635.1” when an int was expected,
or because I typoed “97w5” when I meant 9735.

But what you do in your code is up to you. I’m not suggesting
that you shouldn’t be permitted to do whatever error handling you
prefer, including not handling it at all but merely replacing it with
some arbitrary default you plucked from thin air. Go for it, it’s your
code, I’m not judging.

But I would not want this raised up to an official supported and
recommended design, baked into the int and other builtins. That makes
it too much of an “attractive nuisance”, something that encourages
people to handle errors without thought, when they really do need to
think about it.

The other builtin functions that offer defaults tend to have these two
things in common:

the error condition they support is only missing data, not malformed
data;
and it is hard, or inefficient, to “Look Before You Leap” and check
for the error condition ahead of time.

For example, getattr(obj, name, default), dict.get(key, default),
next(iterator, default), unicodedata.name(char, default).

They aren’t intended to suppress arbitrary errors. (Historically, we’ve
had lots of problems with getattr, for example, suppressing errors
that it shouldn’t have.)

Neither of those conditions is true for your suggestion. Catching any
ValueError and replacing it with a default is too greedy to build it
into the function. And LBYL on the empty string is trivially easy, not
hard or inefficient.

cameron · October 30, 2020, 10:01pm

I’m also -1 on this, both for the reason Steven provides (too general,
you don’t want 0 from 6.7.1 usually, you want failure) and also Tzu-ping
Chung (this doesn’t belong on int()).

What you’re actually pining for is the regularly requested “inline
try/except” syntax, which is asked for in various forms.

It has not to date been accepted, and your post helps illustrate the
difficulties.

Your first example is this:

x = int(user_input) if user_input else 0

which tries to int() a nonempty str and use 0 otherwise. This is very
common (in fact I was looking at exactly such code last night) but is is
not your turn-any-ValueError-into-0 situation - it is turning a
specific well understood value (the empty string) into 0. It will
still raise a ValueError for other “non-int” strings. Most of us want
failure here - it indicates unknown input.

The point here is that we in Python land want predictable and reliable
behaviour from our code; this isn’t PHP with its tendency to just
convert things into something similar when convenient (thinking
particularly of its dict/hash/mappings here, but the attitude in its
libraries is pervasive).

Instead, you should decide what should happen with unparseable data.
Normally, we expect that this is the wrong information entered at a
prompt (so turning it into a 0 and proceeding is misguided at best), or
incorrect presumptions in the programme of the expected data (for
example, grabbing item 3 from a list, expecting a well behaved int but
getting a different field, maybe a name or something).

Now, perhaps in your situation proceeding with 0 is a sensible thing.
IMO, it almost never is, but that is really a policy decision.

So you should implement that policy as a function:

def convert_int(s):
    try:
        i = int(s)
    except ValueError:
        i = 0
    return i

and use “convert_int(s)” in your code. Problem solved.

In particular, “problem solved” in your particular problem space.

The problem with a “default=” argument for int() is that it far too
easily swallows a whole host of possible bad input.

You are probably thinking of the dict.get() method and similar, where it
has a default for missing keys. That is a far more constrained
circumstance, much more analogous to your “0 for the empty string” first
example. It is a tightly contrained situation, not a wild card.

Were it me, I would write a function like my example above which
encapsulates this policy decision. But it would be even more overt:

def convert_anything_to_int(s):
    ''' Convert any string `s` to an `int` value, return the value.

        This function returns `int(s)` into its integer value
        unless that raises a `ValueError`
        in which case it returns `0`.
    '''
    if not s:
        i = 0
    try:
        i = int(s)
    except ValueError as e:
        warning("convert_anything_to_int: converting invalid value %r into 0", s)
        i = 0
    return i

This has a number of features:

a clumsy and annoying name, to remind the user that this function is
quite zealous in accepting any string
a docstring defining it behaviour
silently converting a well defined expected empty string to 0
but issuing a warning log entry whenever handed anything outside its
expected domain (not empty and not a valid int)

A particular advantage of the warning is that you can review the logs to
see what kind of unexpected data were received. Interactively, it is
thrown in the user’s face that various pieces of garbage are being
treated as 0.

Note that you can avoid writing this function many times by keeping a
module with functions like this. And you can flag the behaviour at the
top of the code using it like this:

from policy_functions import convert_anything_to_int as atoi

and then use a concise “atoi(s)” call in your code thereon. Shorter
readable code results, and the import statement makes it clear what
“atoi” does.

Finally, you ask about float() and complex(). A lot of numerical stuff
has a notion of “not a number”, concretely expressed as the float NaN.
These come in a few flavours in IEEE floating point (overflow NaN, etc)
but generally have the property that you can continue to calculate with
them without exceptions - you just get NaN as a result of the
calculation.

This has the advantage that your “garbage in” situation alluded to by
Steven is identifiable later as “garbage out” because it is NaN. This
is arguably better than turning your “garbage in” into a 0, which is
then silently mixed into your calcuations from then on, and not
identifiable later as “garbage out”.

Cheers,
Cameron Simpson cs@cskk.id.au