Add environment variable type casting to the os module

massover · July 15, 2022, 11:23pm

I copy and paste some some small functions to get an environment variable and cast it to an int or cast it to a bool. There are some more full featured libraries like environs, django-environ but I actually haven’t used them.

One problem is this:

FOO_ENABLED = os.getenv("FOO_ENABLED")

if FOO_ENABLED:
   foo()

An experienced person will know that when export FOO_ENABLED=false, it’s still going to run because strings are truthy.

Another problem is:

RETRY_COUNT = os.getenv("RETRY_COUNT", 10)

if num_retries <= RETRY_COUNT:
    retry()

An experienced person will know that when export RETRY_COUNT=10, this is going to result in a TypeError, because strings and ints can not be compared.

I thought that maybe these cases are common and preventable enough with a reasonable implementation; therefore perhaps worthy of adding something to os module in the stdlib.

Two functions called os.getint or os.getbool seem like they would be great and straight forward. Alternatively I thought maybe adding a cast function parameter to the existing os.getenv might be nice. That way, if people have their own env var deserialization function, they can still use the getenv method. The pattern of “try and find something, try and serialize it, otherwise fall back on this default” is always nice to have. I think that the stdlib should also make a decision on how to parse a bool. If that does not always suffice, people can use their own bool casting function.

assert getenv2("TEST_INT", default=500, cast_to=int) == 500
assert getenv2("TEST_INT", cast_to=int) is None
os.environ["TEST_INT"] = "100"
assert getenv2("TEST_INT", default=500, cast_to=int) == 100

assert getenv2("TEST_BOOL", default=True, cast_to=bool) is True
assert getenv2("TEST_BOOL", cast_to=bool) is None
os.environ["TEST_BOOL"] = "True"
assert getenv2("TEST_BOOL", default=False, cast_to=bool) is True
os.environ["TEST_BOOL"] = "False"
assert getenv2("TEST_BOOL", default=True, cast_to=bool) is False

def boolenv(value):
    TRUE_VALUES = ("y", "yes", "t", "true", "on", "1")
    value = value.lower()
    return value in TRUE_VALUES


def getenv2(key, default=None, cast_to=None):
    try:
        result = os.environ[key]
    except KeyError:
        return default

    if cast_to is None:
        return result

    if cast_to is bool:
        fn = boolenv
    else:
        fn = cast_to

    try:
        return fn(result)
    except ValueError:
        return default

What do you think?

Rosuav · July 16, 2022, 6:05am

Why not just int(os.getenv("RETRY_COUNT", "10")) ? The getenv function doesn’t have to do everything on its own.

massover · July 16, 2022, 1:48pm

getenv is just a wrapper to os.environ.get. I’d say that this is less about it doing “everything”, and more about enabling it to do two things that are likely to happen from reading an env var.

There’s nothing wrong with your suggestion. It becomes a little less of a one liner in the Boolean case, but it’s certainly not awful. I think that these are common enough and worthy of a little improvement (as well as with documentation). I think that either getint/getbool or a more flexible getenv with a cast param are improvements that would be used if they were there.

Rosuav · July 16, 2022, 2:58pm

The trouble with the boolean case is that it isn’t always consistent anyway, so you’d end up needing to have another parameter that defines what counts as “True”, “False”, and what to do with others, so you may as well do that part yourself.

I’m sure that a cast parameter would be used if it were there. But that’s not sufficient justification for it to grow one. Other than casting to bool, it’s literally just a single additional function call, so you gain nothing whatsoever by it; if you’re really doing that much intification of env vars, you can always write your own helper.

sirosen · July 16, 2022, 8:49pm

I think this is an extremely good idea. Hundreds of engineers are writing the same parsing code on loop. The stdlib is well-positioned to eliminate the drudgery (yay!) and sources of inconsistencies and errors (bigger yay!).

My only question is whether or not it belongs in the os module.
Sometimes we’re reading data from another source other than env vars – a cli argument via argparse, a line of a file or an ini file via configparser, etc.

What about inverting the relationship between potential parsing code and os?

# hypothetical "strparse.py" (name TBD) for the stdlib
# exact code here is shoddy and not parametrizable, just a demo

def tobool(s: str) -> bool:
    truevals = ("yes", "y", "on", "true", "t", "1")
    falsevals = ("no", "n", "off", "false", "f", "0")
    if s.lower() in truevals: return True
    if s.lower() in falsevals: return False
    raise ValueError("ruh-roh")

def envtobool(env_var_name: str, default: bool) -> bool:
    value = os.getenv(env_var_name)
    if value is None: return default
    return tobool(value)

If there’s cpython maintainer buy-in enough to sponsor this, I’d be willing to write a sample implementation and try to submit a PEP to add a module to the stdlib.

steven.daprano · July 17, 2022, 2:03am

How is this proposed os.getint different from just calling int() on the result of getenv?

You don’t even have to give the default value as a string, since the int() of an int is unchange.

value = int(os.getenv('VALUE', 10))

It is harder and more work to remember whether the name of the function is spelled getint or get_int or getenv_int or os.getenv(cast=int) than to just call int.

That gives us ‘os.getfloat’ for free: just call float() on the result of os.getenv.

As for the proposed os.getbool, we have no way of knowing what values the programmer will expect their true and false envars to be. Obviously we want “Истинный” to return True, and “Ψεύτικος” to return False, that goes without saying. But what else might we want to support as bool strings?

Even in English, there is no way of guessing what values the programmer wishes to support, some or all of the following:

true/false
on/off
yes/no
y/n
0/1
enabled/disabled
active/inactive
open/closed

That goes double if you are reading the value from a config file, or directly from the user, say, using input().

Should we distinguish between these two cases?

the envar doesn’t exist;
the envar exists, but is set to the empty string.

What about upper and lower case? How should we deal with invalid values, raise an exception or display a warning and use the default?

Sure, we might come up with a consistent set of features here, but individual programmers will surely have their own preferences and want their own rules. Any English-only solution will be chauvinistic to the 87% of the world that does not have English as their first, or any, language.

(Yes, foreigners have computers these days, and some of them may even expect to use their own language in their own envars. Shocking, I know, but once we allowed them to use electricity this was inevitable.)

Either we create a big, complex, over-engineered solution in an attempt to satisfy everybody, or a simple solution that will not satisfy most programmers.

Or we let the programmer write their own conversion function, which could be as simple as a one-liner:

def string_to_bool(s, default):
    return {'true': True, 'false': False}.get(s.strip().casefold(), default)

or as complex as the programmer needs it to be.

petersuter · July 17, 2022, 7:36am

I wonder how configparser.ConfigParser.getboolean (there’s also getint and getfloat) came up with this:

This method is case-insensitive and recognizes Boolean values from 'yes'/'no', 'on'/'off', 'true'/'false' and '1'/'0'

configparser.ConfigParser.BOOLEAN_STATES:

can override this by specifying a custom dictionary of strings and their Boolean outcomes
custom_parser.BOOLEAN_STATES = {'sure': True, 'nope': False}

github.com/python/cpython

Apply modified SF patch 467580: ConfigParser.getboolean(): FALSE, TRUE.

committed 07:58PM - 04 Oct 01 UTC

gvanrossum

+8 -6

This patch allows ConfigParser.getboolean() to interpret TRUE, FALSE, YES, N…O, ON and OFF instead just '0' and '1'. While just allowing '0' and '1' sounds more correct users often demand to use more descriptive directives in configuration files. Instead of forcing every programmer do brew his own solution a system should include the batteries for this. [My modification to the patch is a slight rewording of the docstring and use of lowercase instead of uppercase templates. The code is still case sensitive. GvR.]

Is there any kind of standard for how boolean environment variables work?

YAML uses y|Y|yes|Yes|YES|n|N|no|No|NO |true|True|TRUE|false|False|FALSE |on|On|ON|off|Off|OFF

TOML and JSON use lowercase true and false only.

PHP uses "1", "true", "on" and "yes".

I’ve also seen enabled / enable / active / activated etc. elsewhere I think.

It seems quite arbitrary.

sirosen · July 17, 2022, 2:41pm

I understand the argument here – there’s no general solution that fits all cases, and its easy enough to write your own. Those are true, but I don’t agree that it needs to be over-engineered to death in order to do more good than harm.

What would be wrong with providing a function which is well-written, type annotated (which is hard for beginners), simple enough, and covers the vast majority of cases?

def str2bool(
    value: str,
    *,
    true_values: tuple[str, ...] = ("y", "yes", "t", "true", "on", "1"),
    false_values: tuple[str, ...] = ("n", "no", "f", "false", "off", "0"),
    lowercase=True,
    strip_whitespace=True,
) -> bool:
    if lowercase:
        value = value.casefold()
    if strip_whitespace:
        value = value.strip()
    if value in true_values:
        return True
    elif value in false_values:
        return False
    else:
        raise ValueError(f"invalid truth value: {value}")

To me, this is why it’s valuable for the stdlib to provide a solution. Otherwise, what happens is that I write it one way, my coworker writes it another way, and the behaviors of our libraries when combined become subtly inconsistent. (And I’m of the “just make a choice and document it” camp with respect to the specific question above.)

The English-first bias is already baked pretty deep into programming. Maybe this makes the situation worse, but I don’t know that it’s so cut-and-dried. It’s the modern lingua franca. I’m all for being inclusive, but most of the programmers I’ve met who are not native English speakers would still use "True", "on", "off" as bool-ish strings because they want their code to be intelligible to a broad audience – an audience which is already being forced to learn English to participate in discussions like this one.

The stdlib already knows how to parse strings to bools in several places. distutils.utils.strtobool, to cite a package which is being removed right now.

I’m against expanding the stdlib without planning and forethought. os.getbool(...) seems to me like yet another place for strtobool to live, with yet another set of subtly different rules.

I’d rather see a dedicated space for very simple str -> <type> parsing, starting with bool and growing as necessary. I suggested a new module before, but that’s unnecessary. What about an addition to string?

massover · July 17, 2022, 3:31pm

I drop my original idea in favor of Stephen’s suggestion here. Most responses so far have included the liner available to convert an integer. If we can make the equivalent boolean 1 linear available through this type of function, I think that would also be a great improvement!

I’d rather see a dedicated space for very simple str -> <type> parsing, starting with bool and growing as necessary. I suggested a new module before, but that’s unnecessary. What about an addition to string ?

I like this idea.

johnthagen · July 17, 2022, 4:58pm

For prior art, Pydantic Settings handle this in a very clean way: Settings management - pydantic

P403n1x87 · July 22, 2022, 1:35pm

At work, we have started work on yet another env var module: envier · PyPI. The needs this is trying to address are:

declarative flavour: we want to declare the interface with the environment in a single place, rather than having access to it scattered everywhere
type checking: we want existing tools, like mypy, IDEs, to detect type issues, so we’ve added a mypy plugin
documentation: variables can be annotated with documentation information (in a future release) that can be used to auto-generate documentation with, e.g., Sphinx.

We plan to make the repository public once we are satisfied that we have achieved our goals, but as you can see a few releases are already on PyPI.