Standard idiom for reading from stdin, writing to stdout?

I have a bunch of small Python scripts meant to read from stdin and write to stdout if input and/or output files aren’t given, you know, the usual Unix pipeline idiom. Note that I’m not piping within the program, just reading from stdin, writing to stdout (so packages like the pipe package on PyPI aren’t useful here). Many years ago, I originally wrote them something like so:

    inf = open((args[0] if args else "/dev/stdin"), "r")
    outf = open((args[1] if len(args) > 1 else "/dev/stdout"), "w")
    ...

This works, but note that I wasn’t specifying encodings to the open calls, nor was I using the with statement. I may or may not have explicitly remembered to close the files. (Much of this code predates with by a long ways.)

I’ve been working to bring standard practice to these scripts, and now do something like this:

    with ((open(args[0], "r", encoding=options.encoding)
           if len(args) >= 1 else sys.stdin) as inf,
          (open(args[1], "w", encoding=options.encoding)
           if len(args) == 2 else sys.stdout) as outf):
        ...

This also works, but is a mouthful, and… pylint complains that I have a bunch of duplicate lines across these related scripts. (I should probably open /dev/stdin and /dev/stdout with the desired encoding, but that’s a minor thing.) If I was still not using the with statement, I could open both files in a function and just return the open file handles, something like:

    (inf, outf) = open_both(args[0] if len(args) >= 1 else "/dev/stdin",
                            args[1] if len(args) == 2 else "/dev/stdout",
                            options.encoding)
    ...

That’s still a mouthful, but I could probably whittle down the if/else expressions a bit.

My question: How do I push the file open logic into a function in the face of the desire to use with statements?

There isn’t a clean way to do it with open because it acquires the resource (the open file) when called rather than in the file object’s context manager __enter__ method. Without making your own wrapper context manager this means that you can’t return the file objects from a function and then use with after.

What you can do though is separate everything in the body of the with to a separate function:

def main(infile='/dev/stdin', outfile='/dev/stdout'):
    with open(infile, 'r') as fin:
        with open(outfile, 'w') as fout:
            return _main(fin, fout)

If you want to make a wrapper context manager you can use something like this:

@contextmanager
def inout(infile='/dev/stdin', outfile='/dev/stdout'):
    with open(infile) as fin:
        with open(outfile, 'w') as fout:
            yield (fin, fout)

with inout(*sys.argv[1:]) as fin, fout:
    ...
1 Like

You could consider using sys.stdin and sys.stdout in place of the file you would open based on parsing the command line.

That way your script would work on systems that do not have /dev/stdin etc.

2 Likes

Making a context manager class “CliContext” or similar which defines __enter__ and __exit__ is probably the route I’d go. In __init__ take in the filenames to open, on enter either open them (or return sys.stdin/sys.stdout), on exit close when needed (if self.inf != sys.stdin, …). For me, I like putting all my “turn command line into python objects” in one place which would be there.

Another option would be using a contextlib.ExitStack, but I don’t see an easy way to pass inf/outf doing that.

Less code way is probably context decorator that @oscarbenjamin suggested, and if need to do more in the “open or default” (ex. encoding or other prep work that you mentioned), turn that into a non-contextmanager function.

1 Like

Thanks, yes, I used to use sys.{stdin,stdout}, but if I want to change the encoding, I’d need to use os.fdopen, right? That would make the file vs stdin/stdout cases different. I don’t do Windows, and /dev/{stdin,stdout} are available on my Mac & Linux boxes, so I’m fine with using them. Given that I will probably encapsulate the opening logic in some sort of function, the extra effort to use the std file objects might not add much extra code. I’ll give it some thought.

I really only wrote these scripts for my use, because I did a lot of CSV file processing BITD. If someone else really wanted to use them on Windows, they could do the work necessary. I’m sure someone has written something better by now anyway. I’m horsing around mostly just to keep my head in the Python game.

1 Like

Thanks, the contextmanager looks to be the simplest solution. As I indicated in my reply to @barry-scott, it might also be easy to add handling of sys.{stdin,stdout} instead of my Unix-specific dev files.

1 Like

open() allows specifying encoding (it has a lot of knobs…), and sys.stdin in normal interpreter / when code outside interpreter startup is setup by an open() (although what arguments it gets depends on how Python is run sys.stdin, sys.stdout, and sys.stderr all have distinct options depending on runtime environment…). Shouldn’t need to go to the os module to get the behavior you want. open() can also be passed an fd (and told that it doesn’t own that fd / closefd parameter), so can get different config around the same stream. There’s a push that they’re always utf-8 by default (PEP 686 – Make UTF-8 mode default | peps.python.org),

Depending on what you’re doing, opening stdin/stdout/stderr multiple times can make races because by default it does some level of buffering internal to Python which won’t be shared… Can set sys.stdin to a differently open fd though, and there’s some library code to help with that (contextlib — Utilities for with-statement contexts — Python 3.13.1 documentation). All a lot of tradeoffs/design choices though.

There’s also the reconfigure method that will let you change the encoding of sys.stdin and sys.stdout if the one it’s picked up isn’t the one you want, for some reason.

1 Like

Could you elaborate a bit more why you can’t return the file objects from a function and then use with after? I don’t get it.

Suppose:

def files():
    f1 = open(...)
    1/0
    f2 = open(...)
    return f1, f2

f1, f2 = files()
with f1:
    with f2:
        ...

I’ve used 1/0 to raise an exception but a more realistic example would be that opening f2 might fail. In general if you don’t use with immediately with open then an exception can be raised before the context manager is set up. Then the with statement cannot guarantee to close the file. In CPython file.__del__ would close the file in this example but in PyPy or just more generally if the code is more complicated you can’t depend on __del__ which is why we have with.

This problem arises because open acquires its resource before __enter__:

class CM:
    def __init__(self, ...):
        # open acquires resource here

    def __enter__(self):
        # it is better if context managers
        # acquire resources here.

    def __exit__(self):
        # resource released here

The with statement pairs up the __enter__ and __exit__ methods to acquire and release the resource. However open acquires the resource in __init__ so you need to pair up the __init__ and __exit__ methods. The only way to guarantee that is by using with immediately. Even then there is a race condition:

>>> import dis
>>> def f():
...     with open('stuff') as fin:
...        pass
...
>>> dis.dis(f)
  2           0 LOAD_GLOBAL              0 (open)
              2 LOAD_CONST               1 ('stuff')
              4 CALL_FUNCTION            1
              6 SETUP_WITH              16 (to 24)
              8 STORE_FAST               0 (fin)

  3          10 POP_BLOCK
             12 LOAD_CONST               0 (None)
             14 DUP_TOP
             16 DUP_TOP
             18 CALL_FUNCTION            3
             20 POP_TOP
             22 JUMP_FORWARD            16 (to 40)
        >>   24 WITH_EXCEPT_START
             26 POP_JUMP_IF_TRUE        30
             28 RERAISE
        >>   30 POP_TOP
             32 POP_TOP
             34 POP_TOP
             36 POP_EXCEPT
             38 POP_TOP
        >>   40 LOAD_CONST               0 (None)
             42 RETURN_VALUE

I think a KeyboardInterrupt after CALL_FUNCTION but before SETUP_WITH would bypass the context manager.

1 Like

You might not want to add a dependency, but Click makes this reasonably simple[1]. Adapted from their documentation:

#!/usr/bin/env python

import sys
import click

@click.command()
@click.argument('in', type=click.File('rb'), default=sys.stdin)
@click.argument('out', type=click.File('wb'), default=sys.stdout)
def inout(in, out):
    """Copy contents of INPUT to OUTPUT."""
    while True:
        chunk = in.read(1024)
        if not chunk:
            break
        out.write(chunk)


if __name__ == "__main__":
    inout()

  1. and is just nice to use for CLI tools, IMO ↩︎

1 Like

Thanks, James. I don’t mind an extra dependency, and as it happens, I already have Click installed, probably as a side effect of using Flask. I might well mess around with it. (And it’s likely better than argparse, optparse and getopt.)

Warning: it may contain Opinions :sweat_smile:

1 Like