Override the all when importing

bobbyocean · May 13, 2024, 10:05pm

A common technique for implementing custom changes to python methods/functions is to import and override that functionality. For example,

from pathlib import Path 

class MyPath(Path):
    def iterdir(self,*args,*kargs):
        # make custom changes

This works well in customizing methods of a class. When a module contains functions that one wishes to customize, then this can be accomplished with “from module import *”. For example,

from yaml import * 

def load(stream, Loader):
    """
    Parse the first YAML document in a stream
    and produce the corresponding Python object.
    """
    loader = Loader(stream)
    try:
        #
        # add custom functionality here
        # 
        return loader.get_single_data()
    finally:
        loader.dispose()

However, sometimes a module is built without classes and includes (unfortunately) an __all__ that prevents seamless overriding for custom behavior. For example,

from shutil import * 

def copyfile(src, dst, *, follow_symlinks=True):
    """Copy data from src to dst in the most efficient way possible.

    If follow_symlinks is not set and src is a symbolic link, a new
    symlink will be created instead of copying the file it points to.

    """

    # 
    # add custom functionality here
    #

    sys.audit("shutil.copyfile", src, dst)

    if _samefile(src, dst):
        raise SameFileError("{!r} and {!r} are the same file".format(src, dst))

    file_size = 0
    for i, fn in enumerate([src, dst]):
        try:
            st = _stat(fn)
        except OSError:
            # File most likely does not exist
            pass
        else:
            # XXX What about other special files? (sockets, devices...)
            if stat.S_ISFIFO(st.st_mode):
                fn = fn.path if isinstance(fn, os.DirEntry) else fn
                raise SpecialFileError("`%s` is a named pipe" % fn)
            if _WINDOWS and i == 0:
                file_size = st.st_size

    if not follow_symlinks and _islink(src):
        os.symlink(os.readlink(src), dst)
    else:
        with open(src, 'rb') as fsrc:
            try:
                with open(dst, 'wb') as fdst:
                    # macOS
                    if _HAS_FCOPYFILE:
                        try:
                            _fastcopy_fcopyfile(fsrc, fdst, posix._COPYFILE_DATA)
                            return dst
                        except _GiveupOnFastCopy:
                            pass
                    # Linux
                    elif _USE_CP_SENDFILE:
                        try:
                            _fastcopy_sendfile(fsrc, fdst)
                            return dst
                        except _GiveupOnFastCopy:
                            pass
                    # Windows, see:
                    # https://github.com/python/cpython/pull/7160#discussion_r195405230
                    elif _WINDOWS and file_size > 0:
                        _copyfileobj_readinto(fsrc, fdst, min(file_size, COPY_BUFSIZE))
                        return dst

                    copyfileobj(fsrc, fdst)

            # Issue 43219, raise a less confusing exception
            except IsADirectoryError as e:
                if not os.path.exists(dst):
                    raise FileNotFoundError(f'Directory does not exist: {dst}') from e
                else:
                    raise
    return dst

now raises an error. This is because the __all__ in shutil prevents the seamless importing of everything that is used in copyfile(). Hence, adding custom functionality to copyfile() requires editing all the code found in copyfile() (instead of just adding code where you want to customize). This is notably different than in the other cases and feels very non-pythonic. One could painstakingly identify every method/function that is used and manually import each one, for example:

from shutil import _WINDOWS, _fastcopy_sendfile, copyfileobj #etc.

But again, very non-pythonic. One could argue that shutil should be built with classes so that customization is seamlessly possible, but that is not the point of this inquiry. Instead we are wondering how does one cleanly bypass the __all__ method in python, so that seamless customization can be accomplished. Should something like:

from shutil import **

be added to the python language? How would others approach customizing the copyfile() function (by only adding code, not editing the code already in the function) in shutil?

cameron · May 13, 2024, 10:30pm

One could painstakingly identify every method/function that is used and
manually import each one, for example:
from shutil import _WINDOWS, _fastcopy_sendfile, copyfileobj #etc.

My opinion is that if you’re using these symbols you should explicitly
import them. You’re accessing unpublished innards of a module so you
should be up front about what you’re accessing.

But again, very non-pythonic.

It’s entirely pythonic.

One could argue that shutil should be built with classes so that
customization is seamlessly possible,

You examples above do not show any of this, but I’m imagining that
you’re talking about monkey patching the method of a class defined in
the module (as opposed to the plain old subclass you seem to have
described). If that’s the case, please say so overtly. This isn’t a
great practice in general because you have no idea if some third party
use of the class relies on behavior you’re modifying, leading to
breakage.

from shutil import **
be added to the python language? How would others approach customizing
the copyfile() function (by only adding code, not editing the code
already in the function) in shutil?

Use of import * is a somewhat discouraged because of its tendency to
pollute the importer’s namespace with uncontrolled names.

I think if you want to use names which the module’s chosen to omit from
__all__ then you should just import the module and access them that
way, eg:

 import shutil
 ..........
 if shutil._WINDOWS:
     ....

Again, so that the association of this name is clear.

I’m quite -1 on this suggestion.

kknechtel · May 13, 2024, 10:53pm

Yes. This is a deliberate form of encapsulation. If you want the other names, you have to know what they are, because the entire point is that they don’t form part of the stable API. A future version of Python could include a shutil that doesn’t define those names, or makes them mean something completely different, breaking your code.

When it makes sense to customize the behaviour of something provided in a library, it’s the library’s responsibility to provide a hook for doing so, and define and document the means for using that hook. Sometimes that’s an optional callback that you pass to something; sometimes it’s a class you can subclass; etc. But in the particular example it’s hard to imagine any customization that makes sense. (Just to make sure, did you check the other copy functions in the module to see if they do what you want, instead of whatever change you planned for copyfile?)

Who exactly might “we” be?

JamesParrott · May 13, 2024, 11:04pm

import shutil

copyfile = shutil.copyfile

and fill your boots from there?

loic-simon · May 13, 2024, 11:05pm

You could monkey-patch the __all__ attribute of the module you wish to import to get all its symbols imported:

import shutil
_all = shutil.__all__
shutil.__all__ = dir(shutil)
from shutil import *
shutil.__all__ = _all

I suppose you could abstract that into some helper, but that’s not very neat and definitely against the philosophy of __all__.

bobbyocean · May 13, 2024, 11:23pm

Thanks everyone for the helpful feedback. I like @loic-simon suggestion and had come up with something similar, which also motivated me to post this topic. It isn’t very pythonic looking, but works.

To try and clarify some of the other comments. I agree that we could explicitly edit the function’s code, but this to me seems to detract from the spirit of preserving the original code. In very large corporate projects and the like, I have found dozens of times where customization is needed. Like in inheritance:

class NewClass(OldClass):
    def Method(self,*args,**kargs):
        # some custom stuff
        super().Method(*args,**kargs) # do previous code here
        # some other custom stuff

This seems efficient and pythonic. In the case of a non-class module function, one doesn’t have the luxury of self. It is common then (at least for myself over the years working on projects) to use a * import and copy the code instead. I agree that * importing is frowned upon generally, which is why I am only suggesting it used in this type of customization of non-classes (which I have seen done many times).

from module import * 

def Method(*args,**kargs): 
    # some custom stuff
    # copy previous function code here
    # some other custom stuff

Yes we can manually edit the original code; but in extreme cases this can be quite time consuming. So much so, one might just opt to copy the entire shutil.py file instead (which works) in order to make the few changes that are of interest. Copying the whole file, while faster and easier, doesn’t feel pythonic. Writing out

from shutil import item, item, item, item, ....

also doesn’t feel very pythonic. And performing hundreds of edits at every usage, shutil.<method/attr> doesn’t seem very pythonic either. To be clear, shutil is just an example.

Rosuav · May 13, 2024, 11:36pm

Robert Bates:

To try and clarify some of the other comments. I agree that we could explicitly edit the function’s code, but this to me seems to detract from the spirit of preserving the original code. In very large corporate projects and the like, I have found dozens of times where customization is needed. Like in inheritance:
class NewClass(OldClass):
    def Method(self,*args,**kargs):
        # some custom stuff
        super().Method(*args,**kargs) # do previous code here
        # some other custom stuff 
This seems efficient and pythonic.

You can do the exact same thing with a plain function.

def copyfile(*args, **kwargs):
    # some custom stuff
    shutil.copyfile(*args, **kwargs)
    # some other custom stuff

Why copy in the code in one form and not in the other?

bobbyocean · May 13, 2024, 11:41pm

@Rosuav Yes, in some circumstances that would work, but in other’s it would not (like editing the last part of a while loop). The same thing with class inheritance, sometimes you have to copy the code (instead of using super()), if your customization is fairly complicated. I don’t disagree your solution works sometimes.

Rosuav · May 14, 2024, 1:02am

Yeah, which means classes don’t actually give you any advantage. If you’re going to copy the entire contents of a method so you can adjust it, it’s not giving you any help.

But either way, if you have to copy an entire function (stand-alone or method), you ARE reaching into internals, and IMO it is okay if the language doesn’t try to make this easy.

cameron · May 14, 2024, 4:29am

Just to be clear: you’re not monkey patching the source module, you’re
making an equivalent function/method using much of the mechanism of the
original with changes that need to be part of the function body?

How do you feel about this (untested):

def import_all_all(that_module, this_module):
    that = sys.modules[that_module]
    this = sys.modules[this_module]
    for name in dir(that):
        # possibly a waning here if the name already exists
        setattr(this, name, getattr(that, name))

which you’d use like:

 # the code's usual imports
 import whatever
 from shutil import *

 # the import everything incantation
 from your_utils import import_all_all
 import_all_all('shutil', __name__)

blhsing · May 14, 2024, 6:30am

Loïc Simon:

You could monkey-patch the __all__ attribute of the module you wish to import to get all its symbols imported:
import shutil
_all = shutil.__all__
shutil.__all__ = dir(shutil)
from shutil import *
shutil.__all__ = _all

This will overwrite all dunders of the current namespace such as __name__, which would become 'shutil' with your code when it should remain as '__main__'.

Instead, we can overwrite shutil’s namespace with the current namespace before updating the current namespace with the merged namespace:

import shutil
globals().update(vars(shutil) | globals()) # __name__ remains as '__main__'

Again this is good if you want to try out some quick customization of a library you don’t own. Definitely not for production.

bobbyocean · May 14, 2024, 6:19pm

@Rosuav I am not following what you are saying. In a class, if I inherit and copy a method and change the code, I also have access to self which means I do NOT have to import all the attributes and other methods of the class and can access them. Likewise all the references in the original method’s code ALREADY have self.<stuff> coded; so one only needs to make the changes desired, not rewrite every reference within the method. But for a function that uses 100 references to variables and other functions, then editing that code requires one to specify all 100 references either as explicit imports or by rewriting all 100 references. Even references to importing other modules are blocked. For example,

from shutil import *
print(os)

raises an error, even though shutil imported os. Again, it would be much easier to just copy shutil.py into the project instead (which is not pythonic).

MegaIng · May 14, 2024, 6:31pm

Robert Bates:

One could painstakingly identify every method/function that is used and manually import each one, for example:
from shutil import _WINDOWS, _fastcopy_sendfile, copyfileobj #etc. 
But again, very non-pythonic

Do make this very explict: Yes, this is very non-phytonic, but not for the reason you seem to think. The problem is that you are accessing _underscored variables of a module you don’t control. Doesn’t matter how you do this, it’s fundamentally unpythonic and should not be done except as a last-resort. It definitely isn’t something that should be encouraged in any way by language feature.

Your fundamental goal behind this feature request goes against core pythonic ideas and written down and well known conventions.

Not at all. I never had a need for that. If I needed to, I would create a wrapper or (if really necessary) a complete replacement with whatever custom code I needed. I definitely wouldn’t try to access the internals of shutils.

bobbyocean · May 14, 2024, 7:02pm

Thanks for the feedback.

@MegaIng
Your fundamental goal behind this feature request goes against core pythonic ideas and written down and well known conventions.

What core pythonic idea am I violating? I would agree this can be solved in a number of ways, including rewriting ones own version of shutil (again, shutil is just an example). Creating a new updated copy of a well known module, doesn’t seem like something projects should be doing. In particular, this is only required because of the __all__ and lack of class structure in the module.

There are many other examples where this isn’t an issue. Finding a bug on a closed (no internet) system in tensorflow; easy, import said class and override the method; fixed. Implementing custom certificates for use in requests; easy, import requests and override the Session object adding in the capability. Adding callbacks that spawn other timers during various machine learning calls; again fairly easy and straight forward to work with modules like numpy, scipy, sklearn, etc. Customizing the exact presentation used in argparse to have callbacks for complicated necessary pre-calculations.

But here, this is honestly the first time I came across a module that had both, __all__ and no classes; and when I started researching, I found that python offers no straight forward way to work with this; like python does with classes. Something like adding a callback was not a straight forward task. While shutil was very small module, it got me thinking that this could be a much more complicated problem.

The globals().update(vars(shutil)|globals()) works. That just doesn’t seem very pythonic.

MegaIng · May 14, 2024, 7:05pm

Did you read the rest of my message? I explained it there. But to repeat: Accessing internals of a different module you don’t control.

pf_moore · May 14, 2024, 7:12pm

There’s no set list, so no-one can quote something specific here. It’s a classic case of “we’ll know it when we see it” - and just like that idea, it’s somewhat subjective and basically more about broad community principles than precise rules.

You were the one who framed the discussion in terms of how to customise a stdlib function in a “pythonic” way. But the problem is that copying a stdlib implementation and modifying it isn’t “pythonic” in the first place. So the answer is that there’s no pythonic way of doing what you want other than to not do it. Use the stdlib function by all means, or write your own.

Or you can copy/paste the stdlib source and edit it (“we’re all consenting adults” is a community principle, after all!) but you have to find your own way of doing what you want, and not expect it to be easy - the “pythonic” part is that the source is available to you to learn from or use as you wish, but not that you should be encouraged to customise everything easily.

bobbyocean · May 14, 2024, 7:40pm

@MegaIng I think I misunderstood what you were saying and thought we were in agreement in your first paragraph. However, you appear to be suggesting that, in general, the importing and overriding of modules should be frowned upon. We just fundamentally disagree on that. There are countless examples of pypi and conda packages that do this. There is dozens of corporate projects I have worked on that do this (I gave several examples, did you read them?). There are modules specifically for doing this exact thing, like ctypes. Even browsing stackexchange; every other answer suggests importing, inheriting, and then updating the code (rather than rewriting your own module copy); look at anything that deals with multiprocessing, sockets, matplotlib, etc.

@pf_moore Thanks for your kind response. Yes, I approximately agree that customizing a stdlib function is not common; but am unsure if I would go so far as to say it isn’t “pythonic”. I have done it so many times already in my career. Whether or not we agree on what is “pythonic”; it does seem there is an ability here that is lacking. Two people submit modules to a large project one with classes and the other with only functions. A later update wishes to add new functionality and finds themselves either doing something hacky, like globals().update(vars(shutil)|globals()), or finds themselves rewriting old working code; but only for the sub-module that didn’t use classes. No one is saying that overriding the __all__ should be common or encouraged. It is just like the from <module> import *; it should be rarely if ever used. But sometimes there is a particular good and time saving reason to use it and so people do and it is apart of the python’s language as a result. In the same way (whether we identify the exact specific use case or not), it seems that when someone needs to import all the values in a module there should be a natural way to bypass the __all__ attribute and there currently isn’t (which surprised me).

Rosuav · May 14, 2024, 7:43pm

Robert Bates:

For example,
from shutil import *
print(os)
raises an error, even though shutil imported os. Again, it would be much easier to just copy shutil.py into the project instead (which is not pythonic).

I think we’re using the word “pythonic” in very different ways here. But let me see if I can explain why, in my opinion, there’s no difference between copying and pasting one function from a module, and copying and pasting one method from a class (into a subclass). It’s about backward compatibility guarantees.

In general, you can trust that there is some measure of backward compatibility when:

you use the properly-exported symbols from a module
you use non-private methods/attributes of an object
etc

You generally can NOT depend on backward compatibility when:

you use something that starts with an underscore
you subclass something and replace something that starts with an underscore
etc

That’s not to say you should never do this - just that, when you do, ANY change might break your code, and you can’t moan to the author.

So! What do you do when you want to make a small tweak to a function? You copy and paste it, and edit. Sure. That’s fine. You want to then make use of some of the existing tools. That’s fine too - make those edits. But if that means you have to use an ugly “from shutil import x, y, z, a, b, c” line, then so be it. It’s a great indication that what you’re doing is weird and unusual and fragile. It’s a clear marker to anyone who reads the code that this module should be treated with care; minor edits could break things.

And if that’s not something that bothers you, then great! Go ahead, do it. Don’t bother asking what’s more “pythonic” though, because none of this is. It’s fine to do unpythonic things, you just won’t find a lot of advice about how to do it more tidily

MegaIng · May 14, 2024, 7:47pm

No. Don’t put words in my mouth. I mean exactly what I am saying. Don’t rely on internal attributes of other modules. Everything else is ok fine. I still would find it weird if you consistently need to override stdlib modules or classes, and would question if there isn’t a larger structural issue there, but I wouldn’t call it “unpythonic at it’s core” (in the objective sense of “python and it’s style gudies aren’t designed for this”), just “unpythonic” (in the subjective sense of “I don’t like it”).

pf_moore · May 14, 2024, 7:52pm

The correct way in both cases is to request that the underlying modules get updated to support the sort of customisation the user wants to do.

The key point that you seem to be missing here is that the author of the code being copied won’t support you if you copy then edit their code. I don’t have the experience you do of people copying and hacking code, but if I did encounter it, I’d argue very strongly against the practice, as there’s a big support liability being incurred and the developer doing the copying almost certainly doesn’t have the authority to expose us to that sort of risk. For example, if there was a CVE against shutil.copyfile and the Python core devs fixed it, what processes do you have in place to ensure that fix gets ported into your application copy? How would you report your exposure to that CVE to your customers? How would you write your SBOM?

So yes, ultimately you have the ability to do what you are doing. But don’t be surprised if no-one is interested in trying to make it any easier for you

Override the __all__ when importing

Override the all when importing