This is either my first or second Python project, and I’m still
learning how to program. I’m finding it challenging to comprehend
classes and methods in terms of programming in general.
Well, you’re not using classes here, so we’re good!
Is it possible to make this code more concise or efficient?
Concise? Maybe, we’ll see. Efficient? Probably not much, by which I mean
that it is straight linear stuff, so efficiency gains will be low level
things like using a “faster” low level function rather than higher level
“big O order” things like changing from O(n^2) performance to O(n) or
something like that.
Can I change it to Rust without being concerned about low-level
programming?
Probably? I do not yet speak Rust, but you’re doing basic stuff here
with no very special Python specific things, so I don’t see why not.
That said, let’s look at the code.
"""Add the converted number to the current time. Plus, a basic calculator.
This is the result of two projects merging
The second project is a basic calculator that uses regex to filter input. It accepts numbers, parenthesis, and arithmetic operators (*/+-).
The first project is to convert numbers into time format. For example, 120 is equal to 2 hours.
"""
A docstring! This is very welcome. We use these like opening comments,
but it has the advantage that other things can pull it out of the code,
for example to make documentation automatically. You can see this with
the help()
builtin function, eg:
help(print)
which effectively prints the docstring supplpied with the print()
function.
import re
import subprocess
import sys
Imports right after the docstring, also good.
def _invalid():
print(
“Invalid. There should be only digits with operators (*/±) with or without spaces and parenthesis.”
)
sys.exit()
I presume you’ve named _input
as a “private” name (beginning with an
underscore) because it is used only here in this code? You code is not
structured for reuse, instead it is a standalone programme. So there’s
no need to do this.
This is also has some issues.
The complaint "Invalid. ....."
is hard wired to a specific error.
That’s very special purpose; normally you’d put such a message at the
point of the error itself, possibly passing it as a parameter to this
function.
The print()
writes to the standard output. Error message normally go
to the standard error output, thus:
print("Invalid. .....", file=sys.stderr)
For you, both of these are attached to your terminal and you may be
thinking, what difference does it make. But consider a pipeline of
commands:
cat filenames... | grep something | wc -l
intended to concatenate the contents of some files, search for lines
containing something and count the number of such lines.
here, the standard output of cat
and grep
are attached to pipes,
which present them as the standard input of the next command (grep
and
wc
respectively). The standard error is as before, attached to the
terminal.
Suppose one of the filenames was incorrect. The cat
command can report
the bad filename to the standard error output (a) you the user get to
see the error message because it still goes to the terminal and (b) the
following programme does not treat the error message as input data,
which it would if the message had gone to the standard output (a pipe
connected to the next command). Receiving error or debug messages as
input data is called a “Plug” in the Taxonomy Of Bugs, a piece of junk
coming down the pipeline to foul up what follows.
Send error messages to the standard error output.
Then you call sys.exit()
to exit the programme. This is unusual in
Python also, though not unknown in basic interactive things.
Normally we write functions, and do not explicitly set “policy” about
what the larger programme does inside the function. But calling
sys.exit()
you’re expressing a global policy, because that aborts the
whole programme. Making these things (print
and exit
) into a single
function makes the calling end concise, but I’d be a lot happier ifthe
function name included the word abort
or exit
somewhere to make it
more clear that it exits the programme; functions do not normally do
that.
A better, more Pythonic, approach is to raise an exception where you’re
doing the validation, instead of calling sys.exit()
. (sys.exit
itself raises an exception to do its work, but let’s ignore that for
now.)
Your validation code would look like this:
USER_INPUT = input("$ ")
if re.search(r"^(?<!\D)\d+(\.\d+)?(?!\D)$", USER_INPUT):
PASS = True
elif re.search(r"[^\s\d*/+-.()]", USER_INPUT):
raise ValueError("Invalid input, expected only number and operators.")
This:
- places the specific error message right here at the failed validation
where is can precisely (and obviously) correspond to the failed test
- raises an exception, which the caller (not that there is one here) can
catch and make their own policy decision about retrying or taking
other action
This thing about exceptions is that it separates the detection of a
problem from the policy of what to do about the problem. If the caller
doesn’t have an opinion about that the exception continues to bubble
outward, and exits the programme anyway if nothing else has an opinion.
We’d normally put the “input and validate” stp itself into a function,
eg:
def read_user_input(prompt="$ "):
USER_INPUT = input(prompt)
if re.search(r"^(?<!\D)\d+(\.\d+)?(?!\D)$", USER_INPUT):
pass
elif re.search(r"[^\s\d*/+-.()]", USER_INPUT):
raise ValueError("expected only number and operators")
return USER_INPUT
This has a few features:
- a default for the prompt - the caller might want to supply a different
prompt sometimes
- the return value from the function is the user input string itself, so
you can just assign it to a variable for use
- if the input is invalid it raises a
ValueError
, which the caller can
catch if they wish, or not catch and let something further out catch
You’d call it like this:
USER_INPUT = read_user_input()
If the input’s invalid the exception bubbles out and aborts the
programme if you don’t do anything about that. If you had a policy
opinion about the bad input, you could catch it:
try:
USER_INPUT = read_user_input()
except ValueErorr as e:
print(f"Bad input: {e}", file=sys.stderr)
or run a little loop on that basis:
while True:
try:
USER_INPUT = read_user_input()
except ValueErorr as e:
print(f"Bad input: {e}", file=sys.stderr)
else:
break
... continue and use USER_INPUT here ...
It looks more elaborate and wordy, but it is far more flexible and,
importantly, separates the validation (the regexp check and raising a
ValueError
) from the policy action which decides what to do about the
error, which can then go somewhere suitable according to the wider needs
of the programme: the input function does not know or care.
PATTERN = r"[*/+-]?\s*?\(.*?\)"
Thi could do with (a) a btter variable name than PATTERN
, eg
VALID_USER_INPUT_PATTERN
and (b) an explainatory comment because
regular expressions are both cryptic and error prone (in that it is easy
to get them incorrect) and so it is important to say what this regexp
should match. That way if it’s wrong we know how to correct it, and if
it’s right we know what it does.
It looks to me like it matches an optional operator, the shortest run of
whitespace it can get away with, then a pair of brakcets surrounding any
text. Is that correct?
EXECUTE = False
DONE = False
PASS = False
Usually we name Python variables in lower case, eg execute
. There’s a
(small) suite of conventions and adhering to them makes your code easier
for others to read. And less SHOUTY.
You’d have to pick another name than pass
because pass
is a Python
keyword for the pass
statement, which is a placeholder that does
nothing. But you’ll see that you don’t need PASS
below.
Let’s revisit your input loop:
USER_INPUT = input("$ ")
if re.search(r"^(?<!\D)\d+(\.\d+)?(?!\D)$", USER_INPUT):
PASS = True
elif re.search(r"[^\s\d*/+-.()]", USER_INPUT):
_invalid()
while True:
if PASS:
break
This stretch is the only place you’re using PASS
. I’d restructure
this, maybe like this:
while True:
try:
USER_INPUT = read_user_input()
except ValueErorr as e:
print(f"Bad input: {e}", file=sys.stderr)
continue
adapting the suggestion earlier. This:
- keeps your outer
while
, required for the main loop of your programme
- shoehorns the
try
/except
from earlier in to control the user input
This fetches the user input, and if it is invalid (which we detect as a
ValueError
exception) we complain (reciting the exception itself to
provide the specifics) and continue
, which returns control to the top
of the loop, ready for another input prompt.
Control only proceeds to the stuff below if there’s no ValueError
,
which means the input was valid (or “valid enough”) and you can use it.
MATCH = re.search(rf"{PATTERN}", USER_INPUT)
Using a format string to embed just a string is kind of round about. You
can just write:
MATCH = re.search(PATTERN, USER_INPUT)
The following logic is a little obtuse, and could do with several
explainatory comments. It is important that code is readable to other
people, which includes you when you come back to it in a month or so. It
also aids debugging, because you’ll be writing down what you want to
achieve, which makes it easier to check if the code actually does that.
And also makes it clearer to you as you write it whether what you’re
doing fits your larger purpose.
if MATCH:
EXECUTE = True
MATCH = USER_INPUT[MATCH.start():MATCH.end()]
I’d be calling this matched
, not MATCH
i.e. a different name from
the earlier variable, because it means a different thing. Also, regular
expression “match” objects have several useful methods.
See: (Match objects)[re — Regular expression operations — Python 3.12.1 documentation]
You can write:
MATCH = USER_INPUT[MATCH.start():MATCH.end()]
as:
MATCH = MATCH.group(0)
or even:
MATCH = MATCH.group()
which saves a lot of verbiage.
RE_OPERATORS = re.findall(r"[*/+-]", MATCH)
Ok, a list of “operator” characters (well, 1-character strings, as
Python does not have a distinct character type).
RE_OPERATORS_LIST = RE_OPERATORS
Do you need this additional variable?
if len(RE_OPERATORS_LIST) == 2:
FIRST_OPERATOR, SECOND_OPERATOR = RE_OPERATORS
else:
SECOND_OPERATOR = RE_OPERATORS
SECOND_OPERATOR = str(SECOND_OPERATOR)
SECOND_OPERATOR = re.sub(r"\['|'\]", "", SECOND_OPERATOR)
FIRST_OPERATOR = ""
This seems to accomodate 2 operators or 1 operators? What if there are 3
or more? Or none? Looking at the second part:
SECOND_OPERATOR = RE_OPERATORS
SECOND_OPERATOR = str(SECOND_OPERATOR)
SECOND_OPERATOR = re.sub(r"\['|'\]", "", SECOND_OPERATOR)
FIRST_OPERATOR = ""
This looks like you want the (only?) operator to come out as
SECOND_OPERATOR. Why not just grab it directly:
SECOND_OPERATOR = RE_OPERATORS[0]
The code you’ve got takes your list-of-operators and:
- converted it to a string, which means how Python would write a list of
strings, eg ['+']
for a list with a single '+'
string
- strip out the Python punctuation
This kind of approach is very hazard prone. What if your operator
syntax grew some [
or ]
operators? Etc etc. In general the “convert
to string then unpick the string contents” approach is always fragile
and very easy to get wrong.
A better approach would be to check the length of the list as you do,
but more so:
if len(RE_OPERATORS_LIST) == 1:
FIRST_OPERATOR = ""
SECOND_OPERATOR = RE_OPERATORS[0]
elif len(RE_OPERATORS_LIST) == 2:
FIRST_OPERATOR, SECOND_OPERATOR = RE_OPERATORS
else:
raise RuntimeError("I do not handle 0 or more than 2 operators.")
Alternatively, since the “unpacking assignment” checks that everything
gets correctly unpacked, you could go:
try:
FIRST_OPERATOR, SECOND_OPERATOR = RE_OPERATORS
except ValueError:
# not exactly 2 list elements
try:
FIRST_OPERATOR = ""
# note the commas below, which makes this an unpacking
# assignment
SECOND_OPERATOR, = RE_OPERATORS
except ValueError:
# not exactly 1 list element
raise RuntimeError("I do not handle 0 or more than 2 operators.")
which I guess is much uglier. It can be useful in some other contexts.
Basicly this approach goes try 2 operators, try 1 operator, fail.
FIRST_NUMBER, SECOND_NUMBER = re.findall(r"\d+\.\d+|\d+", MATCH)
FIRST_NUMBER = float(FIRST_NUMBER)
SECOND_NUMBER = float(SECOND_NUMBER)
This seems to expect just one operator? Unsure.
match SECOND_OPERATOR:
case "+":
RESULT = FIRST_NUMBER + SECOND_NUMBER
case "-":
RESULT = FIRST_NUMBER - SECOND_NUMBER
case "*":
RESULT = FIRST_NUMBER * SECOND_NUMBER
case "/":
RESULT = FIRST_NUMBER / SECOND_NUMBER
Ah, the shiny new match
statement. It is mostly aimed at structural
matching, but this is still a valid use. Traditionally this would be an
if
/elif
/… chain, eg:
if SECOND_OPERATOR == "+":
RESULT = FIRST_NUMBER + SECOND_NUMBER
elif SECOND_OPERATOR == "-":
RESULT = FIRST_NUMBER - SECOND_NUMBER
... etc ...
I would be ending the match
with a default
, or an if
/elif
with a
final else
to catch anything you fail to match, probably raising a
RuntimError
. Without that your programme will silently do nothing
with some operator you have not implemented.
if FIRST_OPERATOR == "":
USER_INPUT = USER_INPUT.replace(f"{MATCH}", f"{RESULT}")
Again, a format string to insert a string is pretty pointless, so you
can go:
USER_INPUT = USER_INPUT.replace(MATCH, RESULT)
and the same in this code:
else:
USER_INPUT = USER_INPUT.replace(
f"{MATCH}“,
f”{FIRST_OPERATOR} {RESULT}")
This code is… opaque. It really needs comments.
match PATTERN:
case r"[*/+-]?\s*?\(.*?\)":
Note that this is a string equality test. You’re testing whther
PATTERN
contains the string r"[*/+-]?\s*?\(.*?\)"
. To the naive eye
this might look like a regular expression match aainst PATTERN
, which
it isn’t.
This is also an instance of what we call “magic numbers”, more commonly
encountered with actual numbers: a special value wired straight into the
code, usualy without explaination.
The preferred approach it that somewhere near the top of your code is an
assignment like this:
OP_BRACKETS_PATTERN = r"[*/+-]?\s*?\(.*?\)"
and then down here you would write:
match PATTERN:
case OP_BRACKETS_PATTERN:
making it more clear what you’re matching in terms of its meaning.
Then this:
PATTERN = r"\d+(\.\d+)?\s*?[*/]\s*?\d+(\.\d+)?"
is also a “magic” number, and should be dealt with the same way - a well
named variable at the top of the code.
case r"\d+(\.\d+)?\s*?[*/]\s*?\d+(\.\d+)?":
PATTERN = r"\d+(\.\d+)?\s*?[+-]\s*?\d+(\.\d+)?"
case r"\d+(\.\d+)?\s*?[+-]\s*?\d+(\.\d+)?":
DONE = True
Then finally, what should happen if none of the case
s match?
if DONE:
if EXECUTE:
break
_invalid()
Does this say “if we’re done but we didn’t execute anything raise an
error, otherwise break
”?
I would not use _invalid()
here - it is so far from any tests that its
very-specific error message may well be inappropriate or vague or wrong,
depending. Tests and their complaints should try to be close together.
def _output(hour, minute, am_or_pm="", one_two_four="\n24"):
Do you really mean the default for one_two_four
to be a string
starting with a newline character?
if one_two_four == "12":
if (hour >= 12) and (hour != 24):
am_or_pm = "PM"
else:
am_or_pm = "AM"
if hour > 12:
hour = hour - 12
if hour < 10 and minute < 10:
print(
f"{one_two_four}-hour format:\ 0{hour}:0{minute} {am_or_pm}"
)
Format strings accept a leading 0 width specifier which would save you
this cumbersome test against 10:
print(f"one_two_four}-hour format: {hour:02d}:{minute:02d} {am_or_pm}")
which covers any value of hour
or minute
.
def _splitter(string):
string = string.replace('"', "")
string = int(string)
return string
Should this be called _splitter
- that does not seem to be what it
does. It looks like it removes all double quote characters then converts
what’s left to an int
. Maybe some name like _intof
? (int
itself is
already taken of course).
TIME = subprocess.run(
["/bin/date", '+"%H:%M"'], capture_output=True, text=True, check=True
).stdout
Did you really want double quotes embedded in your result? Because
that’s what the '+"%H:%M"'
will do: the `date command will get the
format string:
"%H:%M"
and fill it in. These spurious characters may be why your _splitter()
function needs to strip them out; it would be better to just not have
them in the first place.
Also, the datetime
module can format dates and times for you without
needing to reach for the external date
command. You would need to do
some extra fiddling (get the current time, convert to hours/minutes,
format a string) though.
Cheers,
Cameron Simpson cs@cskk.id.au