I’m writing an application where I would like to catch all errors and show the user a short summary of what went wrong, while also logging the full exception to a file. Is there a common, established pattern for this? I have not been able to find one.
What I would like to achieve:
Prevent scary-looking errors from reaching the user
Log those same errors to a file
Exit with 0 if there are no errors, 1 (or possibly a more specific error code) otherwise
This is what I’m currently doing:
import logging
import sys
from io import StringIO
LOGGER = logging.getLogger()
def main():
debug_stream = setup_logging()
try:
step_1()
step_2()
return 0
except Exception:
LOGGER.debug("", exc_info=True)
write_debug_log(debug_stream)
return 1
except KeyboardInterrupt:
return 1
def setup_logging():
error_handler = logging.StreamHandler(sys.stderr)
error_handler.setLevel(logging.ERROR)
LOGGER.addHandler(error_handler)
debug_stream = StringIO()
debug_handler = logging.StreamHandler(debug_stream)
debug_handler.setLevel(logging.DEBUG)
LOGGER.addHandler(debug_handler)
# INFO and WARNING handlers omitted for brevity.
return debug_stream
def step_1():
try:
...
except Exception:
LOGGER.error("Error: Step 1 failed")
raise
def step_2():
try:
...
except Exception:
LOGGER.error("Error: Step 2 failed")
raise
def write_debug_log(debug_stream):
try:
with open("debug.log", "w") as fout:
fout.write(debug_stream.getvalue())
except Exception:
LOGGER.error("Error: Failed to write debug log")
LOGGER.error("Please re-run app with --debug and save the output manually")
else:
LOGGER.error("Traceback written to debug.log")
finally:
LOGGER.error("Please report this bug and include the debug log")
sys.exit(main())
In reality the LOGGER.error("Error: Step 1 failed") messages are more meaningful.
I’m looking for general feedback on this approach, as well as specific feedback on the following points:
Is there a better way to log the exception at DEBUG level than calling logging.debug with an empty message?
Is Exception the right thing to catch? Too broad, or too narrow?
Is catching KeyboardInterrupt silently good practice? I would prefer to avoid showing the user a Traceback (most recent call last): File "<stdin>", line 3, in <module> KeyboardInterrupt-type message.
If so, is 1 an appropriate exit code when catching KeyboardInterrupt?
Overall, I think this approach makes things much trickier to test. Using try / except Exception to catch everything is a classic way to create a debugging nightmare for yourself. Things can be challenging enough, even when only catching ImportErrors. EAFFTP is great - I just advise putting as little code in the try: as possible, and catching specific classes of Exception. So when something goes wrong, it’s obvious what it was.
If an app suppresses errors (i.e. purposefully hides useful information from the user, that they could otherwise use to try to fix what they’ve done wrong themselves) then the onus is on the developer to think of everything possible that can go wrong in every possible situation, and provide some other sort of constructive feedback to the user for each. That’s not impossible, but it’s akin to assuming your code is bug free, is far more work, confuses Python users, and is un-Pythonic IMHO.
I’m aware of this design principle, and usually adhere to it. But in this case, every text on CLI design I’ve read agrees: Throwing tracebacks at the user when they haven’t explicitly asked for it is Bad Form. Thus, I can see no alternative to wrapping most of my business logic in one big try/except, even knowing the caveats of that approach.
This is my goal. I don’t think this is necessarily akin to thinking one’s code is bug free; rather, it recognizes that if an exception occurs at certain points in the program, that indicates a the presence of a bug. I deal with this by asking the user to submit a bug report.
Could it be that pythonic design is at odds with CLI design?
If you have a log with full details then the user can share that log with the developer. I use this pattern myself in production systems and it’s great for maintaining them.
Fir a cli you can print a message telling the your something unexpected went wrong and where the log file is to be put in a bug report.
Good point. Logging is seldom a bad idea - I was referring more to masking everything with try / except.
It’s possible to have an internal application to work on and debug, and just use this pattern to mask it, to improve UX and reduce low value bug reports.
That table is the return codes presented in the response to a wait*()
call.
Regular programme success is 0.
Regular program failure is nonzero, often 1. I use 2 for usage errors
i.e. bad CLI options etc. A few programmes have a variety of values for
specific failure sitations.
128 upward encode programme termination due to a signal. 130 is signal
2 i.e. SIGINT, which is usually cause by someone typing Ctrl-C.
If they’ve caught it then 130 is wrong. (And also not doable.)
It looks like that second sentence is wrong. You can return a number >= 128 from a UNIX process. Possibly this postdates when I first dug into the UNIX wait() system call; I’m sure this 128+signum was wired directly into things at the OS level then, so you only got a 7 bit value from a process exit.
These days we get an 8 bit value from the exit status and test for a signal with the WIFSIGNALED(status) macro.
I remain of the opinion that 130 is not the typical chosen exit code for catching an interrupt and exiting. I still use 1 for that, absent some weird requirement.
Bah! Nay, not so. This 128 thing is a shell level thing. A gander at the V7 shell source shows it getting an 8 bit exit value from the wait() status number from the OS. The 128 stuff is some munging of that if there was a signal. I learnt this stuff on V7 UNIX, so this conflation of the process exit status with the shell exit code must have happened in my head then.
Still strongly against returning 130 directly though.
For those who care, the V7 shell goes:
INT rc=0, wx=0;
INT w;
then:
p=wait(&w);
to fetch the exit status (into an INT, a 16-bit word then). Then computes rc, the shell level return code, thus:
w_hi = (w>>8)&LOBYTE;
IF sig = w&0177
THEN IF sig == 0177 /* ptrace! return */
THEN prs("ptrace: ");
sig = w_hi;
FI
IF sysmsg[sig]
THEN IF i!=p ORF (flags&prompt)==0 THEN prp(); prn(p); blank() FI
prs(sysmsg[sig]);
IF w&0200 THEN prs(coredump) FI
FI
newline();
FI
IF rc==0
THEN rc = (sig ? sig|SIGFLG : w_hi);
FI
wx |= w;
OD
IF wx ANDF flags&errflg
THEN exitsh(rc);
FI
exitval=rc; exitset();
As trivia the above is from at&t v6 unix originally and its C code.
There are a set of #define statements that make the IF THEN FI expand to valid C code.