How to satisfy mypy Optional complaint?

Hey folks,
so I’ve seen a couple of threads with similar questions, but none quite answer the situation I am in.
First off, why I did things the way I did.
I have a global variable called HELPER, that I initially wanted to set like so:

#script2.py
HELPER = _HelperClass()
...
def run() -> None:
    ...

However, the _HelperClass is dependent on a config file, whose location is evaluated at runtime in another script:

#script1.py
import script2
...
Config.set_env(...)
...

As you can see, script2 is being imported before the the environment for the Config is set, and thus HELPER throws an error because _HelperClass() cannot determine the location of the Config file.

To ‘lazy load’ HELPER I handled it like so:

#script2.py
HELPER = None


def _initialize() -> None:
    global HELPER
    HELPER = _HelperClass()

...

def run() -> None:
    _initialize()
    ...

This works really well, except for one thing: mypy.
Since HELPER is only technically of type Optional[_HelperClass], I feel that it doesn’t make sense to unwrap the Optional each time it is used.
On each line I use HELPER though, mypy complains that

"error: Item 'None' of Optional[_HelperClass] has no attribute 'my_attribute' [union-attr]"

since HELPER is seen as an Optional. I really don’t like the idea of checking

if HELPER is not None:
    ...

everytime HELPER is called, just to satisfy mypy.

I was wondering whether any of you have come across something similar or I’d just love to hear your input on how you would solve this situation in the most pythonic way, as I am still quite new to the language and programming as a whole! :slight_smile:

P.S: I have come across this PEP before and it seems like this would be exactly what I need. Sadly though, it is not implemented yet.

Thank you guys in advance!

One simple solution would be have a class with the functools.cached_property, so the t.Optional is simply not necessary, even though the property is lazy initialized.

1 Like

Thank you for the suggestion!
I’ve seen it thrown around before, but have never quite understood how to use it, how would a simple stub with cached_property look then? :slight_smile:

Here you go!

class Helper:
    @functools.cached_property
    def helper(self) -> _HelperClass:
        return _HelperClass()

HELPER = Helper()

def a_function() -> None:
    # HELPER.helper only gets initialized on first use
    HELPER.helper.do_something()

oh that makes sense! So it’s basically just a wrapper for the _HelperClass? :slight_smile:

It isn’t technically a “wrapper” because you actually get an instance of _HelperClass - if it were a wrapper, there’d be some other class involved there - but yes, that’s the right idea.

I think @TomRitchford’s answer can be simplified a bit, I think the Helper class is not necessary. You can use this:

#script2.py

@functools.cache
def helper() -> _HelperClass:
    return _HelperClass()
#script1.py
import script2
...
Config.set_env(...)
...
# Just make sure to call helper() only after required configs are set
helper().do_something()  

Now whenever you need the helper, just call helper() instead of the HELPER you had before. This (and Tom’s answer) have the additional benefit that the _HelperClass instance won’t actually be created upon importing script2 but only the first time you actually need it.

1 Like

Yes, even better answer!

I support a lot of small libraries that need to be backward compatible to 3.8, so I tend to avoid functools.cache which appeared in 3.9, or use the completely equivalent functools.lru_cache(None).

But 3.8 is pretty old and senile right now, so your answer is better.

1 Like

that’s an interesting approach!
As is the helper is only used within the script2.py.
script1.py is only calling the script2.run() method pretty much.
Another approach I found out just today is not to set HELPER = None but rather just annotate it like so:
HELPER: _HelperClass.
This makes HELPER not be an Optional[_HelperClass] while functionally being the same, at least for this use case. :slight_smile:

Pedantically: it tells the type checker that a global variable named HELPER will be created, and that the corresponding value will be a _HelperClass; but it doesn’t actually create the global. There is no global until the first assignment to it (via a global-declared variable in a function).

Python doesn’t have global variable declaration. The globals are just a dict with special syntax tricks. In 3.x, locals are reserved based on static analysis, and UnboundLocalError (a subtype of NameError) occurs on an attempt to use a local variable before it’s been assigned, but there is still no explicit declaration (only an implicit one resulting from the fact that the code references the name). Type annotations on local variables in a function have no effect at runtime. Annotations on names inside a class only set the __annotations__ of the class, and annotations on parameters or return types of a function only set the __annotations__ of the function.

HELPER does not not need to be explicitly defined before _initialize is called.

2 Likes

Wow thank you, I wasn’t aware of that. Good to know and very helpful to have learned some of the inner workings of python. :slight_smile:

Yeah sorry, I should have been more specific. While HELPER = None is not absolutely necessary,
my IDE (PyCharm) otherwise didn’t understand that a global variable existed, which made formatting and what not pretty annoying.
As @kknechtel pointed out, apparently Helper: _HelperClass doesn’t actually create a variable, however it appears to be enough to satisfy the IDE.

I might end up going with what @griendt suggested, which is likely a cleaner approach, however I do struggle with @griendt’s idea a little as the __init__() method takes a parameter, so it would be best if there was a way to call helper(param) once to initialize the Singleton and then be able to call just helper(), so the param doesn’t have to be passed around everywhere.

In any case a OOP approach might have been best here, because I could just create an instance of my script and that instance could have _HelperClass as one of its members, which would make global variables unnecessary.

It also lets your users pick exactly when they want to pay the cost of whatever initialization you’re doing—for instance, if they’re making a web service, they can run your initialization during startup instead of adding latency to the first HTTP request that needs to call your code.

I generally agree with what you’re saying, but given the context of the script this doesn’t seem quite necessary.

I am currently building the data pipeline for our data science project, in which the script is being called, which is why it doesn’t need to be as flexible. so think of it like so:

#pipeline.py
import script2
from project.config import Config
#setup connection to cloud provider
...
# fetch and set config env
...
script2.run()
#execute scripts for other data sources
...

I felt that OOP might add layers of complexity here, because the pipeline is executed as something similar to a cron job and the config barely ever changes - but it’s still specified as a yaml for readability and seperation of concerns.

I tried giving the script2.py a functional approach, because that seems like a perfect fit for data pipelines, however due to my lack of experience it is likely not the best implementation you’ll find out there - though I can say that I did learn a lot. :slight_smile:

1 Like