Turn shutil into a runnable module

(cross-posted as Issue #126562 · python/cpython)

Proposal

py -m shutil copy2 ~/my/src.txt .

I often find it necessary to use some shutil functionalities in scripts. It does the right thing efficiently, has more precise error handling than cmake -E, and, most importantly, is cross-platform.

Precedent

The zipfile module can be used by py -m zipfile. It gains credits for being a cross-platform ZIP64 decompressor.

Details

  • The subcommands should cover copyfile, copystat, copy, copy2, copytree, rmtree, move, chown, which, make_archive, and unpack_archive;
  • The keyword parameters that expect cross-platform arguments and are easy to represent in cmdline should be adapted into --kw arg cmdline options; flags can follow the style --follow_symlinks and --no-follow_symlinks (follow_symlinks doesn’t have a cross-platform behavior, yet, but the arguments are boolean).
  • It would be even nicer if combined with a progress bar (See also: Add a basic progressbar implementation to shutil).
5 Likes

I mostly agree, but if the main motivation is cross-platform compatibility, then it’s hard to argue for an inclusion of chown and platform-specific flags such as --follow-symlinks because if they are included then you would need conditional statements in your script when issuing these platform-dependent commands, at which point you might as well use commands provided by the platform directly.

Sure, I don’t care about user, group, mode, and chown, especially in CI.

--no-follow_symlinks or --symlinks in copytree may be valuable if that copy junctions under Windows… But it seems not.

Or we can make it ignore such flags when run in Windows.

1 Like

Works for me.

It seems usable enough, though part of me thinks its better to advocate for a generalized ‘run function in module’ cli module instead of continually adding more of these.

Thinking out loud a bit.

Maybe like:

python -m modrun <MODULE> <FUNCTION> .. <OPTIONAL ARGS>

Though that is more or less just:

python -c "import <MODULE>; <MODULE>.<FUNCTION>(ARGS)"

One complaint I’ve had about doing this via -c is that it looks so verbose since unless you use from X import Y, you need to do MODULE.thing

A kind of weird idea to make that better could be to have a -M to import * from the given module to place its stuff in the global scope for -c

So like:

python -M shutil -c "copy($src, $dst)" 

I just kind of wish if we were going to do these, we generalize it more to not need to keep doing them.

1 Like

The runpy module may be a good candidate to be extended. But certain cases might only need to present in shutil’s runmodule mode. For example, progress bar by default, if that exists, logging, and error handling that doesn’t point errors back to shutil.py source code.

A sub -c for running code snippets doesn’t address my need. Too many frameworks/systems poorly handle quotes and whitespace; things like "copy($src, $dst)" don’t survive. I need something that puts path arguments in individual command-line arguments.

2 Likes

I’d be at least somewhat open to a runpy patch that defined a new runpy.run_resolved_name function that used importlib.util.resolve_name to resolve dotted.module:dotted.name references and run the referenced callable [1].

Given that to build on, python -m runpy could be enhanced to accept names containing “:” and call them using run_resolved_name, with all the remaining command line options passed in as positional arguments.

Since many shutil functions can do useful things with only positional strings as arguments, that would actually cover quite a bit of ground. For example:

py -m runpy shutil:copy2 ~/my/src.txt .

By default, the return value from the called function would be ignored, but --str, --repr, and --ascii options could be added to request printing the result to stdout instead.

However, you wouldn’t be able to pass anything other than strings (no booleans for example), and you wouldn’t be able to pass keyword arguments.

The only generalised solution that has occurred to me for the “keywords and non-string types” part of the problem is to define a mechanism for accepting JSON inputs and emitting JSON outputs.

Something like:

py -m runpy --json shutil:copy2 \
    '["~/my/src.txt", "."]' \
    '{"follow_symlinks": true}'

In JSON mode, if the called function produces a non-None result, that would be dumped to stdout as JSON (with a --no-output option to suppress that behaviour, and --str and --repr also overriding it).

The “somewhat open” hesitance comes from the fact that I’m not sure the simpler version is worth adding on its own (since “only strings, and no keyword arguments” is a pretty massive restriction in applicability), and the JSON-based enhancement feels like a significant enough UX addition to the standard library that it should probably be a PEP (I’d be willing to sponsor such a PEP, though).


  1. To allow additional keyword arguments that affect how the target callable is invoked, the module function would accept args and kwds as positional-or-keyword arguments, rather than accepting arbitrary args and keywords and forwarding them on. ↩︎

2 Likes

The problem with a generalized solution like your first example would be that it is difficult to tell if an argument such as 1 is supposed to be parsed as an int or a string, while the problem with your second example has been explained by the OP.

I do agree that a generalized solution would be the way to go in the long term. We just need to find a syntax that’s both reasonably clean and versatile.

While using JSON solves the typing issue, I can’t help but feel it makes the usage unnecessarily verbose and unergonomic, mostly because JSON is designed for structured data while most of the use cases of calling a utility function from the CLI do not involve nested data structures.

Perhaps we can make it more convenient for most common use cases by automatically deciding which type to convert an argument to based on the pattern it matches, while allowing a preceding switch to force its type:

py -m runpy builtins.type 1 # <class 'int'>
py -m runpy builtins.type foo # <class 'str'>
py -m runpy builtins.type True # <class 'bool'>
py -m runpy builtins.type -s True # <class 'str'>

And an --always-string switch to parse every argument as a string:

py -m runpy --always-string builtins.type True # <class 'str'>

And use --keyword=value to specify keyword arguments, and use --keyword and --no-keyword to specify keyword arguments with boolean values. Dashes in keywords can be automatically converted to underscores to better follow the common CLI convention:

py -m runpy builtins.print 1 2 --sep=, # 1,2
py -m runpy shutil.copy /foo /bar --no-follow-symlinks
# shutil.copy('/foo', '/bar/', follow_symlinks=False)

There already exists a 3rd-party package called python-fire, which does exactly that, e.g.:

python -m fire shutil copy2 a.txt b.txt
3 Likes

Yeah fire is great, but I wish it didn’t use quotes to force an argument to be a string because the argument then can’t contain unescaped quotes, leading to the escape hell that the OP mentions.

It eats quotes python-fire/docs/guide.md at master · google/python-fire

> touch 3
> py -m fire shutil copy2 3 3.txt
Traceback (most recent call last):
  File "[...]\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "[...]\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "[...]\site-packages\fire\__main__.py", line 126, in <module>
    main(sys.argv)
  File "[...]\site-packages\fire\__main__.py", line 122, in main
    fire.Fire(module, name=module_name, command=args[2:])
  File "[...]\site-packages\fire\core.py", line 135, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "[...]\site-packages\fire\core.py", line 468, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "[...]\site-packages\fire\core.py", line 684, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "[...]\lib\shutil.py", line 444, in copy2
    copyfile(src, dst, follow_symlinks=follow_symlinks)
  File "[...]\lib\shutil.py", line 264, in copyfile
    with open(src, 'rb') as fsrc:
OSError: [WinError 6] The handle is invalid

I think we may not want a solution that is way too automatic. But if there is something that requires opt-in, then it becomes an implementation detail of shutil and will have to compete with Click.

The documentation says Bash, not fire, eats quotes. So if you need an argument to be always parsed as a string and quote the variable as documented, here’s what happens if you don’t escape a quote inside the string:

# export s="Hara"; python -mfire builtins print "'$s'"
Hara
# export s="O'Hara"; python -mfire builtins print "'$s'"
'O'Hara'
# export s="O\'Hara"; python -mfire builtins print "'$s'"
O'Hara

As you see, if you don’t escape the quote, the enclosing quotes become part of the string, which is not intended, and the solution is to escape the quote, resulting in the said escape hell.

1 Like

If I remember correctly, you can’t use single quotes on Windows, only double quotes. So:

py -m runpy --json shutil:copy2 \
    "['~/my/src.txt', '.']" \
    "{'follow_symlinks': true}"

Which is invalid JSON, requiring changes to the json module.

Non-strings could be handled by looking at type hints and doing conversion accordingly. For example, if a function has a parameter example: int, arguments from the command line could be converted to integers.

Keyword arguments could be recognized from syntax example=42. This could either be always considered a keyword argument usage or the code could first check does the function have a parameter example.

1 Like

That would help only the simplest cases. The problems with using type hints include:

  1. Type hints are not mandatory.
  2. Type hints are not available for built-ins and C extensions unless typesheds are provided.
  3. There are many overloaded functions with parameters of multiple types declared with Union or @overload.
  4. A type hint can be simply an ABC or protocol rather than a concrete type.

To expand on this, the cmd syntax would be:

"[""~/my/src.txt"", "".""]" "{""follow_symlinks"": true}"

where any mistake would lead to an unhelpful JSON parse error. This is not valid sh/bash syntax so if the goal of making a shutil CLI is to be cross platform then this already breaks that objective.

(There’s also the issue that the ~ is inside quotes and is therefore interpreted literally so this specific command is now broken for all platforms+shells.)

It does something else on zsh, but backslashes do work:

$ echo "[""~/my/src.txt"", "".""]" "{""follow_symlinks"": true}"
[~/my/src.txt, .] {follow_symlinks: true}
$ echo "[\"~/my/src.txt\", \".\"]" "{\"follow_symlinks\": true}"
["~/my/src.txt", "."] {"follow_symlinks": true}
2 Likes

My thinking was quite the opposite. A lot of things that work well as function in python might not make for a good cross-platform experience as a cli interface, and trying to automate this will only highlight such functions. It would be better for people to selectively show that the functions they want to expose are going to be an improvement.

I’m reasonably sure shutil as a runnable module is not an improvement (I can’t think of a reason to use shutil from a shell over native utils, and python scripts can just import shutil and call what they need already), so using it here to argue for generalizing this idea of runnable module functions being exposed seems to fail far before figuring out how.

2 Likes