Move `copy.deepcopy()` to the builtin namespace

umarbutler · December 18, 2024, 5:58am

I’d like to propose that copy.deepcopy() be moved to the builtin namespace such that it is no longer necessary to import.

My motivation is that I find copy.deepcopy() so useful that I regularly think to myself that it should be in the builtin namespace. Whether I’m writing a web scraper, training a model, cleaning data or creating an API server, I often find myself using copy.deepcopy() in some way or else relying on existing code that depends on copy.deepcopy().

My main use for copy.deepcopy() is to copy a mutable object ensuring that the copy I create is a true copy that will not suffer from the side effects of mutability. My belief is that that is a pretty common use case that is not specific to me and generally affects most Python users at some point (whether they realise it or not ).

Obviously, this is all anecdotal evidence. Some hard evidence for copy.deepcopy()'s popularity includes that there are 440k Python files on GitHub that match from copy import deepcopy and there are 1M files that match copy.deepcopy (the same for copy.deepcopy().

There are slightly less mentions of bytes( (958k) than copy.deepcopy(, by comparison.

In terms of worrying about stealing an otherwise valid name from users (eg, min() sadly takes away your chance do to min = 0 (unless your happy with losing min()) (though I’m complaining)), I can only find 8.1k Python files that contain def deepcopy( and 2.8k that contain deepcopy =. So it does not seem to be overriden that much. There are, in fact, 6.3k files that contain def bytes(, something already in the builtin namespace.

If there are other downsides to adding deepcopy() to the builtin namespace that anyone feels outweigh its benefits, I’d be happy to hear them.

Rosuav · December 18, 2024, 6:53am

Hmm, this raises the obvious question: Why is deep copying so important to your algorithms? It’s an inefficient way to handle things. What makes it such a common operation?

If this is the only reason for it, I would suggest that perhaps you’re seeing mutability as a flaw rather than a feature.

umarbutler · December 18, 2024, 7:17am

It’s not just me that uses copy.deepcopy() though. As I mentioned, there’s some 1M mentions specifically of copy.deepcopy() in Python files on GitHub. That’s more than bytes(.

I don’t see mutability as a flaw or a feature. Well, maybe more a feature than a flaw.

But sometimes you need to copy an object without altering the original.

For example, my most recent use of copy.deepcopy() was to copy a model, then I merge its weights with another model, and at the end, the ‘base’ model remains as is, so now I can reuse the base model to merge it with a third model.

Now I wouldn’t say I use it all the time. Most often I use it with models, though most of my work is with models anyways. But its very useful for copying weights so that subsequent operations are not performed in place and the original weights remain intact.

A quick check shows that I also have more mentions of deepcopy() in my personal private repositories than bytes(). Not the most important tool in the shed but definitely not a once-off tool either.

Nineteendo · December 18, 2024, 7:22am

It’s not because it’s used frequently, that it should be added to builtins. (nan[j] and inf[j] are also not builtin) And how much would this affect Python’s starting time?

sirosen · December 18, 2024, 7:22am

I strongly recommend looking into options for immutable data structures with evolvers (pyrsistent is a great example). It’s not guaranteed to be applicable depending on your problem space, but where you can use immutable structures I urge you to try adapting to this pattern. Your code will be faster and less error prone – at the cost of an initial learning curve.

I don’t think I’ve ever come across a significant body of code built upon deepcopy which wasn’t either buggy or inefficient.
I only use it here and there, e.g. in testsuites to patch and revert.

Setting aside my opinions on its utility, making it a builtin is a breaking change. I don’t see this as important enough (even if I agreed with you about his useful it is) to be worth that breakage.

petercordia · December 18, 2024, 7:29am

I personally think that having copy.copy and copy.deepcopy in the namespace is easier to remember than if deepcopy was in the namespace.

I’ve had no issue using from copy import deepcopy either, when that was warranted.

umarbutler · December 18, 2024, 7:33am

Looks interesting but I’m not exactly sure how this would solve the model merging example I provided earlier… The models exist in PyTorch, whereas the library you suggested seem to have drop-in replacements for native structures?

Setting aside my opinions on it’s utility, making it a builtin is a breaking change. I don’t see this as important enough (even if I agreed with you about his useful it is) to be worth that breakage.

Why is it not possible to have deepcopy() and retain also copy.deepcopy().

It would be ridiculous for me to suggest all the 1M files on GitHub referencing copy.deepcopy() break.

I’m suggesting deepcopy() also be available in addition to copy.deepcopy().

In terms of breakage in that case, there are some 8k files, which is quite a small amount, that define functions called deepcopy() and I’m not sure they would even break. You can define a function named min() and it will then overrwrite the default min().

sirosen · December 18, 2024, 7:33am

And there are 4 million instances of sys.argv. I don’t think that sys.argv should be a builtin.

Frequency of use isn’t a meaningless metric, but it’s not sufficient on its own.

Aside: I said it was a breaking change above, but that’s wrong. I was thinking of hard keywords, not builtins. Sorry for the mistake.

umarbutler · December 18, 2024, 7:35am

But even if deepcopy() were added to the builtin namespace, from copy import deepcopy could still be made to work… It doesn’t need to be either X or Y.

I would definitely not suggest removing copy.deepcopy(). That would cause major breakage to millions of files.

umarbutler · December 18, 2024, 7:42am

Not sure that’s a good comparsion. argv is not a function. Its use is not as general purpose as deepcopy() (which can be used on anything from PyTorch models to strings if you really want). It is a variable. And it has a very short and generic name that is far more likely to be overriden than deepcopy(). There’s, in fact, 190k hits for argv =.

So I agree with you on that. sys.argv should not be introduced as a builtin variable.

deepcopy() has broad general uses and low conflict potential.

But sure there may in fact be other functions that have more of a right to be in the builtin namespace than deepcopy(), and they may eventually end up there. But this proposal should be considered on its own merits.

sirosen · December 18, 2024, 7:44am

Yeah, that’s what I meant by immutable structures not always being an easy fit. If you’re working with mutable types provided by a library, you can’t trivially make frozen versions of them.
If you control the problem space more fully, frozen dataclasses and library types like pyrsistent’s PMap are useful tools.

Let me ask what I think is an important question:
What benefit do users get from this change?
Is it only that their code is shorter?

I don’t think concision is bad, of course, but I also don’t rate it as very important at this level of granularity.

Rosuav · December 18, 2024, 7:44am

But it’s also something that every CLI tool needs, so it has a similarly strong call for its use.

Did you notice how many of those hits were argv = sys.argv or some variant thereof? Simple naive search counts don’t really say much; you need to analyze the search results and figure out what’s actually happening.

umarbutler · December 18, 2024, 7:51am

CLI is popular. Not general purpose though.

The majority of results are not doing argv = sys.argv. There’s 10 that match argv = sys.argv and 7 that match argv = sys.argv[. argv = sys.argv[:1] != argv = sys.argv for obvious reasons. That would still break. So if there’s 20 results on the first page and only 3 actually are argv = sys.argv, that 85%. 85% of 190k = 161.5k. Even if we said 50%, it’s still way more than deepcopy =.

So yes, I have analysed the search results.

sirosen · December 18, 2024, 8:07am

Why does that matter?

I do not understand your line of thinking here.
Like I said, I almost never use deepcopy in my work.
Here are some things I use all the time:

textwrap.dedent
time.sleep
sys.argv
sys.exit
json.dumps
functools.wraps
contextlib.contextmanager
datetime.datetime

Why shouldn’t any / all of these be builtins? I would find any one of them more useful than deepcopy.

I thought your argument was “look how many hits there are on GitHub” (an argument which I do not find convincing). But now that an example of something else with more hits on GitHub has come up, the argument is clearly something else. What is your reason for believing that deepcopy is more special than any of the symbols I listed above?

Rosuav · December 18, 2024, 8:08am

Nor is deepcopy. I’m just saying, both of these have their very important use-cases. There will be vast numbers of scripts that don’t use either argv or deepcopy, and that’s fine. There are plenty of scripts that never use min too.

Good, but next time, report your analysis, not just the numbers, please The numbers really don’t tell us much.

petercordia · December 18, 2024, 8:55am

You’re right, I didn’t express myself very well.

I (personally) think having deepcopy in the default namespace would be confusing/disturbing (to me).

I like my namespaces clean/sparse/empty. All the names in the name space add mental weight. I rather like that things like deepcopy aren’t in the namespace unless I choose to pull them in.

(The only places where I’m dissatisfied with the status quo is where I have to use from X import X, for example from dataclass import dataclass, but that’s rather tangential.)

umarbutler · December 18, 2024, 8:58am

That’s a good point though. min(), max(), round(), they all make sense to have in the namespace yet there’s a lot you can do without min() (though definitely less without depending on libraries that use those functions). Likewise, a lot of developers can quite go far without the need for deepcopy. But still it feels ‘core’ to the very essence of Python that I personally feel it would be worth adding to buitlins.

deepcopy is not a trivial function to implement. It operates recursively and works on everything (maybe not everything). It has general application and utility. Sometimes, you really do need to duplicate a mutable without modifying the original.

umarbutler · December 18, 2024, 9:00am

With everything said, I think part of the reason why I don’t like that deepcopy isn’t in the builtin namespace is that, in Jupyter notebooks on vscode (where is why I do most of my work), auto-imports still aren’t added to the first cell, they’re added to whatever cell you imported them in

If that was the case, I think I may not have even made this suggestion since in, regular Python files, importing is much less friction.

In my current setup though, every import is a pain since I need to scroll up, shift focus, add it, and then isort it.

Rosuav · December 18, 2024, 9:13am

Nobody’s doubting that it should exist. It’s a function very worth having in the stdlib. The question is, should it be namespaced away or in the builtins?

It’s really not. It happens to be very common in what you’re doing, but it also happens to NOT be very common in what I do. Namespaces exist for very good reason, and one of those reasons is to allow things to be in the mental model as a single module rather than as a huge number of stand-alone names. Granted, the copy module doesn’t have very many names in it (unlike, say, the socket module - in addition to constructing sockets, it has a wide assortment of utility functions and a vast number of constants), but if deepcopy were being presented today as a brand new idea, it would be under discussion as to whether it ought to be slapped into some other module somewhere.

This is how Python works. Only the very most core functionality is in the builtins. There are the exception types, a handful of data types for which there are literals (str, bytes, float…), a small number of things that are mainly for interactive use and would be an utter pain to have to import (eg quit), and other than that, it’s all things that are of immensely broad and general value - functions that work on all kinds of objects, that can be used in all kinds of programs. The vast majority of Python’s standard library ISN’T in the builtins.

One thing you may want to consider, since it seems to matter a lot to your programs, would be injecting it into builtins via site.py.

If you create a sitecustomize module, you can put anything you like in there, and it’ll be run before your other scripts.

gkb · December 18, 2024, 9:49am

By the way, there exists a solution for Jupyter: You can add a file ~/.ipython/profile_default/startup/00-imports.py and add the import there:

from copy import deepcopy

This imports deepcopy whenever you open a notebook.