Requesting a code review

Denise_unique · October 19, 2022, 4:33am

This is either my first or second Python project, and I’m still learning how to program. I’m finding it challenging to comprehend classes and methods in terms of programming in general. Is it possible to make this code more concise or efficient? Can I change it to Rust without being concerned about low-level programming?

"""Add the converted number to the current time. Plus, a basic calculator.

This is the result of two projects merging

The second project is a basic calculator that uses regex to filter input. It accepts numbers, parenthesis, and arithmetic operators (*/+-).

The first project is to convert numbers into time format. For example, 120 is equal to 2 hours.
"""

import re
import subprocess
import sys


def _invalid():
    print(
        "Invalid. There should be only digits with operators (*/+-) with or without spaces and parenthesis."
    )
    sys.exit()


PATTERN = r"[*/+-]?\s*?\(.*?\)"
EXECUTE = False
DONE = False
PASS = False

USER_INPUT = input("$ ")
if re.search(r"^(?<!\D)\d+(\.\d+)?(?!\D)$", USER_INPUT):
    PASS = True
elif re.search(r"[^\s\d*/+-.()]", USER_INPUT):
    _invalid()


while True:
    if PASS:
        break
    MATCH = re.search(rf"{PATTERN}", USER_INPUT)
    if MATCH:
        EXECUTE = True
        MATCH = USER_INPUT[MATCH.start():MATCH.end()]
        RE_OPERATORS = re.findall(r"[*/+-]", MATCH)
        RE_OPERATORS_LIST = RE_OPERATORS
        if len(RE_OPERATORS_LIST) == 2:
            FIRST_OPERATOR, SECOND_OPERATOR = RE_OPERATORS
        else:
            SECOND_OPERATOR = RE_OPERATORS
            SECOND_OPERATOR = str(SECOND_OPERATOR)
            SECOND_OPERATOR = re.sub(r"\['|'\]", "", SECOND_OPERATOR)
            FIRST_OPERATOR = ""
        FIRST_NUMBER, SECOND_NUMBER = re.findall(r"\d+\.\d+|\d+", MATCH)
        FIRST_NUMBER = float(FIRST_NUMBER)
        SECOND_NUMBER = float(SECOND_NUMBER)
        match SECOND_OPERATOR:
            case "+":
                RESULT = FIRST_NUMBER + SECOND_NUMBER
            case "-":
                RESULT = FIRST_NUMBER - SECOND_NUMBER
            case "*":
                RESULT = FIRST_NUMBER * SECOND_NUMBER
            case "/":
                RESULT = FIRST_NUMBER / SECOND_NUMBER
        if FIRST_OPERATOR == "":
            USER_INPUT = USER_INPUT.replace(f"{MATCH}", f"{RESULT}")
        else:
            USER_INPUT = USER_INPUT.replace(
                f"{MATCH}",
                f"{FIRST_OPERATOR} {RESULT}")
    else:
        match PATTERN:
            case r"[*/+-]?\s*?\(.*?\)":
                PATTERN = r"\d+(\.\d+)?\s*?[*/]\s*?\d+(\.\d+)?"
            case r"\d+(\.\d+)?\s*?[*/]\s*?\d+(\.\d+)?":
                PATTERN = r"\d+(\.\d+)?\s*?[+-]\s*?\d+(\.\d+)?"
            case r"\d+(\.\d+)?\s*?[+-]\s*?\d+(\.\d+)?":
                DONE = True
        if DONE:
            if EXECUTE:
                break
            _invalid()


# ------------------------------------------------------------ #


def _output(hour, minute, am_or_pm="", one_two_four="\n24"):
    if one_two_four == "12":
        if (hour >= 12) and (hour != 24):
            am_or_pm = "PM"
        else:
            am_or_pm = "AM"
        if hour > 12:
            hour = hour - 12
    if hour < 10 and minute < 10:
        print(
            f"{one_two_four}-hour format:\
 0{hour}:0{minute} {am_or_pm}"
        )
    elif hour < 10:
        print(
            f"{one_two_four}-hour format:\
 0{hour}:{minute} {am_or_pm}"
        )
    elif minute < 10:
        print(
            f"{one_two_four}-hour fomrat:\
 {hour}:0{minute} {am_or_pm}"
        )
    else:
        print(
            f"{one_two_four}-hour format:\
 {hour}:{minute} {am_or_pm}"
        )


def _splitter(string):
    string = string.replace('"', "")
    string = int(string)
    return string


LINE = "# -------------------------------------------------------- #"

# ------------------------------------------------------------ #

print(LINE)
print("ADDED TIME:")
USER_INPUT = round(float(str(USER_INPUT)))
if USER_INPUT < 60:
    ADD_MINUTE = USER_INPUT

ADD_HOUR = round(USER_INPUT / 60)

if USER_INPUT < 0 or ADD_HOUR >= 24:
    print("Less than 0 and more than a day is not supported. Sorry.")
    sys.exit()

while USER_INPUT > 1:
    USER_INPUT = USER_INPUT - 60
    if not USER_INPUT < 0:
        ADD_MINUTE = USER_INPUT

_output(ADD_HOUR, ADD_MINUTE, one_two_four="24")
_output(ADD_HOUR, ADD_MINUTE, one_two_four="12")
print(LINE)

# ------------------------------------------------------------ #


print("CURRENT TIME:")

TIME = subprocess.run(
    ["/bin/date", '+"%H:%M"'], capture_output=True, text=True, check=True
).stdout
SPLIT = TIME.split(":")

CURRENT_HOUR = _splitter(SPLIT[0])
CURRENT_MINUTE = _splitter(SPLIT[1])

_output(CURRENT_HOUR, CURRENT_MINUTE, one_two_four="24")
_output(CURRENT_HOUR, CURRENT_MINUTE, one_two_four="12")
print(LINE)


# ------------------------------------------------------------ #


print("TOTAL TIME:")

TOTAL_HOUR = CURRENT_HOUR + ADD_HOUR
TOTAL_MINUTE = CURRENT_MINUTE + ADD_MINUTE

if TOTAL_MINUTE >= 60:
    TOTAL_HOUR = TOTAL_HOUR + 1
    TOTAL_MINUTE = TOTAL_MINUTE - 60

if TOTAL_HOUR > 24:
    print("It's more than a day, sorry.")
    sys.exit()

_output(TOTAL_HOUR, TOTAL_MINUTE, one_two_four="24")
_output(TOTAL_HOUR, TOTAL_MINUTE, one_two_four="12")
print(LINE)

ndc86430 · October 19, 2022, 6:04am

I haven’t read the code, but I’m curious about your mention of Rust. It’s an entirely different language, with different ideas and things you have to be concerned about. Why do you want to switch? If you’re new to programming, stick with one language until you’re confident with it. The time will come when you’re ready for new ideas and then that’s the right time to learn a new one.

Denise_unique · October 19, 2022, 8:22am

I don’t wish to switch, if my understanding of this is right. While Python is still my first language, I just want to make Rust my secondary language. I’d like to learn both by first writing a Python program and then attempting to convert it to Rust. I hope to gain knowledge in both areas. Without delving too deeply into low-level, I just want to learn how to use variables, conditionals, and functions in Rust. Rust appeals to me because of its safety features and because I think it will result in fewer problematic codes. I’ll learn more about programming as a result. But my enjoyment of rust ends there.

cameron · October 19, 2022, 11:20pm

This is either my first or second Python project, and I’m still
learning how to program. I’m finding it challenging to comprehend
classes and methods in terms of programming in general.

Well, you’re not using classes here, so we’re good!

Is it possible to make this code more concise or efficient?

Concise? Maybe, we’ll see. Efficient? Probably not much, by which I mean
that it is straight linear stuff, so efficiency gains will be low level
things like using a “faster” low level function rather than higher level
“big O order” things like changing from O(n^2) performance to O(n) or
something like that.

Can I change it to Rust without being concerned about low-level
programming?

Probably? I do not yet speak Rust, but you’re doing basic stuff here
with no very special Python specific things, so I don’t see why not.

That said, let’s look at the code.

"""Add the converted number to the current time. Plus, a basic calculator.

This is the result of two projects merging

The second project is a basic calculator that uses regex to filter input. It accepts numbers, parenthesis, and arithmetic operators (*/+-).

The first project is to convert numbers into time format. For example, 120 is equal to 2 hours.
"""

A docstring! This is very welcome. We use these like opening comments,
but it has the advantage that other things can pull it out of the code,
for example to make documentation automatically. You can see this with
the help() builtin function, eg:

 help(print)

which effectively prints the docstring supplpied with the print()
function.

import re
import subprocess
import sys

Imports right after the docstring, also good.

def _invalid():
print(
“Invalid. There should be only digits with operators (*/±) with or without spaces and parenthesis.”
)
sys.exit()

I presume you’ve named _input as a “private” name (beginning with an
underscore) because it is used only here in this code? You code is not
structured for reuse, instead it is a standalone programme. So there’s
no need to do this.

This is also has some issues.

The complaint "Invalid. ....." is hard wired to a specific error.
That’s very special purpose; normally you’d put such a message at the
point of the error itself, possibly passing it as a parameter to this
function.

The print() writes to the standard output. Error message normally go
to the standard error output, thus:

 print("Invalid. .....", file=sys.stderr)

For you, both of these are attached to your terminal and you may be
thinking, what difference does it make. But consider a pipeline of
commands:

 cat filenames... | grep something | wc -l

intended to concatenate the contents of some files, search for lines
containing something and count the number of such lines.

here, the standard output of cat and grep are attached to pipes,
which present them as the standard input of the next command (grep and
wc respectively). The standard error is as before, attached to the
terminal.

Suppose one of the filenames was incorrect. The cat command can report
the bad filename to the standard error output (a) you the user get to
see the error message because it still goes to the terminal and (b) the
following programme does not treat the error message as input data,
which it would if the message had gone to the standard output (a pipe
connected to the next command). Receiving error or debug messages as
input data is called a “Plug” in the Taxonomy Of Bugs, a piece of junk
coming down the pipeline to foul up what follows.

Send error messages to the standard error output.

Then you call sys.exit() to exit the programme. This is unusual in
Python also, though not unknown in basic interactive things.

Normally we write functions, and do not explicitly set “policy” about
what the larger programme does inside the function. But calling
sys.exit() you’re expressing a global policy, because that aborts the
whole programme. Making these things (print and exit) into a single
function makes the calling end concise, but I’d be a lot happier ifthe
function name included the word abort or exit somewhere to make it
more clear that it exits the programme; functions do not normally do
that.

A better, more Pythonic, approach is to raise an exception where you’re
doing the validation, instead of calling sys.exit(). (sys.exit
itself raises an exception to do its work, but let’s ignore that for
now.)

Your validation code would look like this:

 USER_INPUT = input("$ ")
 if re.search(r"^(?<!\D)\d+(\.\d+)?(?!\D)$", USER_INPUT):
     PASS = True
 elif re.search(r"[^\s\d*/+-.()]", USER_INPUT):
     raise ValueError("Invalid input, expected only number and operators.")

This:

places the specific error message right here at the failed validation
where is can precisely (and obviously) correspond to the failed test
raises an exception, which the caller (not that there is one here) can
catch and make their own policy decision about retrying or taking
other action

This thing about exceptions is that it separates the detection of a
problem from the policy of what to do about the problem. If the caller
doesn’t have an opinion about that the exception continues to bubble
outward, and exits the programme anyway if nothing else has an opinion.

We’d normally put the “input and validate” stp itself into a function,
eg:

 def read_user_input(prompt="$ "):
     USER_INPUT = input(prompt)
     if re.search(r"^(?<!\D)\d+(\.\d+)?(?!\D)$", USER_INPUT):
         pass
     elif re.search(r"[^\s\d*/+-.()]", USER_INPUT):
         raise ValueError("expected only number and operators")
     return USER_INPUT

This has a few features:

a default for the prompt - the caller might want to supply a different
prompt sometimes
the return value from the function is the user input string itself, so
you can just assign it to a variable for use
if the input is invalid it raises a ValueError, which the caller can
catch if they wish, or not catch and let something further out catch

You’d call it like this:

 USER_INPUT = read_user_input()

If the input’s invalid the exception bubbles out and aborts the
programme if you don’t do anything about that. If you had a policy
opinion about the bad input, you could catch it:

 try:
     USER_INPUT = read_user_input()
 except ValueErorr as e:
     print(f"Bad input: {e}", file=sys.stderr)

or run a little loop on that basis:

 while True:
     try:
         USER_INPUT = read_user_input()
     except ValueErorr as e:
         print(f"Bad input: {e}", file=sys.stderr)
     else:
         break
 ... continue and use USER_INPUT here ...

It looks more elaborate and wordy, but it is far more flexible and,
importantly, separates the validation (the regexp check and raising a
ValueError) from the policy action which decides what to do about the
error, which can then go somewhere suitable according to the wider needs
of the programme: the input function does not know or care.

 PATTERN = r"[*/+-]?\s*?\(.*?\)"

Thi could do with (a) a btter variable name than PATTERN, eg
VALID_USER_INPUT_PATTERN and (b) an explainatory comment because
regular expressions are both cryptic and error prone (in that it is easy
to get them incorrect) and so it is important to say what this regexp
should match. That way if it’s wrong we know how to correct it, and if
it’s right we know what it does.

It looks to me like it matches an optional operator, the shortest run of
whitespace it can get away with, then a pair of brakcets surrounding any
text. Is that correct?

 EXECUTE = False
 DONE = False
 PASS = False

Usually we name Python variables in lower case, eg execute. There’s a
(small) suite of conventions and adhering to them makes your code easier
for others to read. And less SHOUTY.

You’d have to pick another name than pass because pass is a Python
keyword for the pass statement, which is a placeholder that does
nothing. But you’ll see that you don’t need PASS below.

Let’s revisit your input loop:

 USER_INPUT = input("$ ")
 if re.search(r"^(?<!\D)\d+(\.\d+)?(?!\D)$", USER_INPUT):
     PASS = True
 elif re.search(r"[^\s\d*/+-.()]", USER_INPUT):
     _invalid()
 while True:
     if PASS:
         break

This stretch is the only place you’re using PASS. I’d restructure
this, maybe like this:

 while True:
     try:
         USER_INPUT = read_user_input()
     except ValueErorr as e:
         print(f"Bad input: {e}", file=sys.stderr)
         continue

adapting the suggestion earlier. This:

keeps your outer while, required for the main loop of your programme
shoehorns the try/except from earlier in to control the user input

This fetches the user input, and if it is invalid (which we detect as a
ValueError exception) we complain (reciting the exception itself to
provide the specifics) and continue, which returns control to the top
of the loop, ready for another input prompt.

Control only proceeds to the stuff below if there’s no ValueError,
which means the input was valid (or “valid enough”) and you can use it.

     MATCH = re.search(rf"{PATTERN}", USER_INPUT)

Using a format string to embed just a string is kind of round about. You
can just write:

     MATCH = re.search(PATTERN, USER_INPUT)

The following logic is a little obtuse, and could do with several
explainatory comments. It is important that code is readable to other
people, which includes you when you come back to it in a month or so. It
also aids debugging, because you’ll be writing down what you want to
achieve, which makes it easier to check if the code actually does that.
And also makes it clearer to you as you write it whether what you’re
doing fits your larger purpose.

     if MATCH:
         EXECUTE = True
         MATCH = USER_INPUT[MATCH.start():MATCH.end()]

I’d be calling this matched, not MATCH i.e. a different name from
the earlier variable, because it means a different thing. Also, regular
expression “match” objects have several useful methods.

See: (Match objects)[re — Regular expression operations — Python 3.12.1 documentation]

You can write:

 MATCH = USER_INPUT[MATCH.start():MATCH.end()]

as:

 MATCH = MATCH.group(0)

or even:

 MATCH = MATCH.group()

which saves a lot of verbiage.

         RE_OPERATORS = re.findall(r"[*/+-]", MATCH)

Ok, a list of “operator” characters (well, 1-character strings, as
Python does not have a distinct character type).

         RE_OPERATORS_LIST = RE_OPERATORS

Do you need this additional variable?

         if len(RE_OPERATORS_LIST) == 2:
             FIRST_OPERATOR, SECOND_OPERATOR = RE_OPERATORS
         else:
             SECOND_OPERATOR = RE_OPERATORS
             SECOND_OPERATOR = str(SECOND_OPERATOR)
             SECOND_OPERATOR = re.sub(r"\['|'\]", "", SECOND_OPERATOR)
             FIRST_OPERATOR = ""

This seems to accomodate 2 operators or 1 operators? What if there are 3
or more? Or none? Looking at the second part:

   SECOND_OPERATOR = RE_OPERATORS
   SECOND_OPERATOR = str(SECOND_OPERATOR)
   SECOND_OPERATOR = re.sub(r"\['|'\]", "", SECOND_OPERATOR)
   FIRST_OPERATOR = ""

This looks like you want the (only?) operator to come out as
SECOND_OPERATOR. Why not just grab it directly:

 SECOND_OPERATOR = RE_OPERATORS[0]

The code you’ve got takes your list-of-operators and:

converted it to a string, which means how Python would write a list of
strings, eg ['+'] for a list with a single '+' string
strip out the Python punctuation
This kind of approach is very hazard prone. What if your operator
syntax grew some [ or ] operators? Etc etc. In general the “convert
to string then unpick the string contents” approach is always fragile
and very easy to get wrong.

A better approach would be to check the length of the list as you do,
but more so:

 if len(RE_OPERATORS_LIST) == 1:
     FIRST_OPERATOR = ""
     SECOND_OPERATOR = RE_OPERATORS[0]
 elif len(RE_OPERATORS_LIST) == 2:
     FIRST_OPERATOR, SECOND_OPERATOR = RE_OPERATORS
 else:
     raise RuntimeError("I do not handle 0 or more than 2 operators.")

Alternatively, since the “unpacking assignment” checks that everything
gets correctly unpacked, you could go:

 try:
     FIRST_OPERATOR, SECOND_OPERATOR = RE_OPERATORS
 except ValueError:
     # not exactly 2 list elements
     try:
         FIRST_OPERATOR = ""
         # note the commas below, which makes this an unpacking 
         # assignment
         SECOND_OPERATOR, = RE_OPERATORS
     except ValueError:
         # not exactly 1 list element
         raise RuntimeError("I do not handle 0 or more than 2 operators.")

which I guess is much uglier. It can be useful in some other contexts.
Basicly this approach goes try 2 operators, try 1 operator, fail.

 FIRST_NUMBER, SECOND_NUMBER = re.findall(r"\d+\.\d+|\d+", MATCH)
 FIRST_NUMBER = float(FIRST_NUMBER)
 SECOND_NUMBER = float(SECOND_NUMBER)

This seems to expect just one operator? Unsure.

 match SECOND_OPERATOR:
     case "+":
         RESULT = FIRST_NUMBER + SECOND_NUMBER
     case "-":
         RESULT = FIRST_NUMBER - SECOND_NUMBER
     case "*":
         RESULT = FIRST_NUMBER * SECOND_NUMBER
     case "/":
         RESULT = FIRST_NUMBER / SECOND_NUMBER

Ah, the shiny new match statement. It is mostly aimed at structural
matching, but this is still a valid use. Traditionally this would be an
if/elif/… chain, eg:

 if SECOND_OPERATOR == "+":
     RESULT = FIRST_NUMBER + SECOND_NUMBER
 elif SECOND_OPERATOR == "-":
     RESULT = FIRST_NUMBER - SECOND_NUMBER
 ... etc ...

I would be ending the match with a default, or an if/elif with a
final else to catch anything you fail to match, probably raising a
RuntimError. Without that your programme will silently do nothing
with some operator you have not implemented.

         if FIRST_OPERATOR == "":
             USER_INPUT = USER_INPUT.replace(f"{MATCH}", f"{RESULT}")

Again, a format string to insert a string is pretty pointless, so you
can go:

 USER_INPUT = USER_INPUT.replace(MATCH, RESULT)

and the same in this code:
else:
USER_INPUT = USER_INPUT.replace(
f"{MATCH}“,
f”{FIRST_OPERATOR} {RESULT}")

This code is… opaque. It really needs comments.

 match PATTERN:
     case r"[*/+-]?\s*?\(.*?\)":

Note that this is a string equality test. You’re testing whther
PATTERN contains the string r"[*/+-]?\s*?$.*?$". To the naive eye
this might look like a regular expression match aainst PATTERN, which
it isn’t.

This is also an instance of what we call “magic numbers”, more commonly
encountered with actual numbers: a special value wired straight into the
code, usualy without explaination.

The preferred approach it that somewhere near the top of your code is an
assignment like this:

 OP_BRACKETS_PATTERN = r"[*/+-]?\s*?\(.*?\)"

and then down here you would write:

 match PATTERN:
     case OP_BRACKETS_PATTERN:

making it more clear what you’re matching in terms of its meaning.

Then this:

 PATTERN = r"\d+(\.\d+)?\s*?[*/]\s*?\d+(\.\d+)?"

is also a “magic” number, and should be dealt with the same way - a well
named variable at the top of the code.

             case r"\d+(\.\d+)?\s*?[*/]\s*?\d+(\.\d+)?":
                 PATTERN = r"\d+(\.\d+)?\s*?[+-]\s*?\d+(\.\d+)?"
             case r"\d+(\.\d+)?\s*?[+-]\s*?\d+(\.\d+)?":
                 DONE = True

Then finally, what should happen if none of the cases match?

         if DONE:
             if EXECUTE:
                 break
             _invalid()

Does this say “if we’re done but we didn’t execute anything raise an
error, otherwise break”?

I would not use _invalid() here - it is so far from any tests that its
very-specific error message may well be inappropriate or vague or wrong,
depending. Tests and their complaints should try to be close together.

 def _output(hour, minute, am_or_pm="", one_two_four="\n24"):

Do you really mean the default for one_two_four to be a string
starting with a newline character?

     if one_two_four == "12":
         if (hour >= 12) and (hour != 24):
             am_or_pm = "PM"
         else:
             am_or_pm = "AM"
         if hour > 12:
             hour = hour - 12
     if hour < 10 and minute < 10:
         print(
             f"{one_two_four}-hour format:\ 0{hour}:0{minute} {am_or_pm}"
         )

Format strings accept a leading 0 width specifier which would save you
this cumbersome test against 10:

 print(f"one_two_four}-hour format: {hour:02d}:{minute:02d} {am_or_pm}")

which covers any value of hour or minute.

 def _splitter(string):
     string = string.replace('"', "")
     string = int(string)
     return string

Should this be called _splitter - that does not seem to be what it
does. It looks like it removes all double quote characters then converts
what’s left to an int. Maybe some name like _intof? (int itself is
already taken of course).

 TIME = subprocess.run(
     ["/bin/date", '+"%H:%M"'], capture_output=True, text=True, check=True
 ).stdout

Did you really want double quotes embedded in your result? Because
that’s what the '+"%H:%M"' will do: the `date command will get the
format string:

 "%H:%M"

and fill it in. These spurious characters may be why your _splitter()
function needs to strip them out; it would be better to just not have
them in the first place.

Also, the datetime module can format dates and times for you without
needing to reach for the external date command. You would need to do
some extra fiddling (get the current time, convert to hours/minutes,
format a string) though.

Cheers,
Cameron Simpson cs@cskk.id.au

Denise_unique · October 20, 2022, 11:15am

@cameron
First and foremost, I’d like to express my gratitude for your time, especially since I didn’t create comments to be more explicit. Because I don’t speak English very well, there will be errors or things that are difficult to think of when naming variables. I will also make mistakes and take some time to communicate what I want to say, so please pardon me. However, I will try to explain more about your review of my script tomorrow. I just wanted to say how much I value your time and insight.

Denise_unique · October 20, 2022, 11:22am

testing how to reply back…

cameron · October 20, 2022, 8:17pm

Your English is far far better than my… any other language. There’s no
need to apologise for this. Though I can see that comments in a foreign
language won’t be your first instinct

Cheers,
Cameron Simpson cs@cskk.id.au

cameron · October 20, 2022, 8:26pm

The preferred approach it that somewhere near the top of your code is
an
assignment like this:
OP_BRACKETS_PATTERN = r"[*/+-]?\s*?$.*?$"
and then down here you would write:
match PATTERN:
    case OP_BRACKETS_PATTERN:
making it more clear what you’re matching in terms of its meaning.

Actually, this part is incorrect. match statements are new and are not
really like a series of if-statements.

The match statement is a structural pattern matching statement. The
case is a structural match to the expression in the match part.
Because it is legal to write:

 OP_BRACKETS_PATTERN = PATTERN

that is effectlively what would happen above - the case would assign
the value in PATTERN to the variable OP_BRACKETS_PATTERN, regardless
of what it was before. This is because match works against the “shape”
if the case.

A better fit might be your “1 or 2 operators” situation. The only
reason your existing match statement works was that all the cases were
literal values - the regexp text.

Here’s an example like your 1 or 2 operator situation:

 >>> L=[1,2,3]
 >>> match L:
 ...   case a,b,c,d: print("4:",a,b,c,d)
 ...   case x,y,z: print("3:",x,y,z)
 ...
 3: 1 2 3
 >>>

So to eliminate “magic numbers” from your match PATTERN: piece of code
you would need to switch to an if/elif`/… chain of if-statements.

Sorry about that,
Cameron Simpson cs@cskk.id.au

Denise_unique · October 25, 2022, 9:17am

@cameron

Well, kinda, but the main reason is that I’m just too lazy to think about what to write in a docstring because I’m using a lot of linters.

Cameron Simpson:

This is also has some issues.

The complaint "Invalid. ....." is hard wired to a specific error.
That’s very special purpose; normally you’d put such a message at the
point of the error itself, possibly passing it as a parameter to this
function.

The print() writes to the standard output. Error message normally go
to the standard error output, thus:
 print("Invalid. .....", file=sys.stderr)
For you, both of these are attached to your terminal and you may be
thinking, what difference does it make. But consider a pipeline of
commands:
 cat filenames... | grep something | wc -l
intended to concatenate the contents of some files, search for lines
containing something and count the number of such lines.

here, the standard output of cat and grep are attached to pipes,
which present them as the standard input of the next command (grep and
wc respectively). The standard error is as before, attached to the
terminal.

Suppose one of the filenames was incorrect. The cat command can report
the bad filename to the standard error output (a) you the user get to
see the error message because it still goes to the terminal and (b) the
following programme does not treat the error message as input data,
which it would if the message had gone to the standard output (a pipe
connected to the next command). Receiving error or debug messages as
input data is called a “Plug” in the Taxonomy Of Bugs, a piece of junk
coming down the pipeline to foul up what follows.

Send error messages to the standard error output.

Then you call sys.exit() to exit the programme. This is unusual in
Python also, though not unknown in basic interactive things.

Normally we write functions, and do not explicitly set “policy” about
what the larger programme does inside the function. But calling
sys.exit() you’re expressing a global policy, because that aborts the
whole programme. Making these things (print and exit) into a single
function makes the calling end concise, but I’d be a lot happier ifthe
function name included the word abort or exit somewhere to make it
more clear that it exits the programme; functions do not normally do
that.

A better, more Pythonic, approach is to raise an exception where you’re
doing the validation, instead of calling sys.exit(). (sys.exit
itself raises an exception to do its work, but let’s ignore that for
now.)

Your validation code would look like this:
 USER_INPUT = input("$ ")
 if re.search(r"^(?<!\D)\d+(\.\d+)?(?!\D)$", USER_INPUT):
     PASS = True
 elif re.search(r"[^\s\d*/+-.()]", USER_INPUT):
     raise ValueError("Invalid input, expected only number and operators.")
This:

places the specific error message right here at the failed validation
where is can precisely (and obviously) correspond to the failed test

raises an exception, which the caller (not that there is one here) can
catch and make their own policy decision about retrying or taking
other action

This thing about exceptions is that it separates the detection of a
problem from the policy of what to do about the problem. If the caller
doesn’t have an opinion about that the exception continues to bubble
outward, and exits the programme anyway if nothing else has an opinion.

We’d normally put the “input and validate” stp itself into a function,
eg:
 def read_user_input(prompt="$ "):
     USER_INPUT = input(prompt)
     if re.search(r"^(?<!\D)\d+(\.\d+)?(?!\D)$", USER_INPUT):
         pass
     elif re.search(r"[^\s\d*/+-.()]", USER_INPUT):
         raise ValueError("expected only number and operators")
     return USER_INPUT
This has a few features:

a default for the prompt - the caller might want to supply a different
prompt sometimes

the return value from the function is the user input string itself, so
you can just assign it to a variable for use

if the input is invalid it raises a ValueError, which the caller can
catch if they wish, or not catch and let something further out catch

You’d call it like this:
 USER_INPUT = read_user_input()
If the input’s invalid the exception bubbles out and aborts the
programme if you don’t do anything about that. If you had a policy
opinion about the bad input, you could catch it:
 try:
     USER_INPUT = read_user_input()
 except ValueErorr as e:
     print(f"Bad input: {e}", file=sys.stderr)
or run a little loop on that basis:
 while True:
     try:
         USER_INPUT = read_user_input()
     except ValueErorr as e:
         print(f"Bad input: {e}", file=sys.stderr)
     else:
         break
 ... continue and use USER_INPUT here ...
It looks more elaborate and wordy, but it is far more flexible and,
importantly, separates the validation (the regexp check and raising a
ValueError) from the policy action which decides what to do about the
error, which can then go somewhere suitable according to the wider needs
of the programme: the input function does not know or care.

Do you think this now better? if not pls tell me more! I want to learn!

def read_user_input(prompt="$ "):
    user_input = input(prompt)
    # This will check if there are only a group of numbers, operators, and parenthesis.
    if re.search(r"^\(?\d+(\.\d+)?(\s*?[*/+-]\s*?\(?\d+(\.\d+)?\)?)+$", user_input):
        # This one searches if there are two operators inside the parenthesis.
        error = re.search(r"\(\d+(\.\d+)?\s*?[*/+-]\s*?\d+(\.\d+)?\s*?[*/+-]", user_input)
        # if the parenthesis is closed properly.
        if len(re.findall(r"[()]", user_input)) % 2 != 0:
            raise Exception("The parenthesis are not even.")
        elif error:
            error = error.group()
            raise Exception(f"\"{error}\" in \"{user_input}\"\nhelp: only 1 operator is supported inside the parenthesis!")
        else:
            return user_input
    # This will check if there is only one group of numbers without an operator and if there is no non-numeric at the beginning of that group and end of it. So it can tell the calculator to not process this or stop the loop.
    elif re.search(r"^(?<!\D)\d+(\.\d+)?(?!\D)$", user_input):
        global skip_calculation
        skip_calculation = True
        return user_input
    else:
        raise Exception(f"\"{user_input}\"")

skip_calculation = False
get_input_success = False

while True:
    # It needs a switch or else it will keep asking for input loop after loop.
    if not get_input_success:
        try:
            user_input = read_user_input()
        except Exception as e:
            print(f"Bad input: {e}", file=sys.stderr)
            continue

    if skip_calculation:
        break

    get_input_success = True

Can I just use raise Exception("foo") if I don’t know what to name the error? If not, can you refer me to where I can find the names of all the exception errors?

cameron · October 25, 2022, 11:25pm

Cameron Simpson:

def _invalid():
print(
“Invalid. There should be only digits with operators (*/±) with or without spaces and parenthesis.”
)
sys.exit()

I presume you’ve named _input as a “private” name (beginning with an
underscore) because it is used only here in this code? You code is not
structured for reuse, instead it is a standalone programme. So there’s
no need to do this.

Well, kinda, but the main reason is that I’m just too lazy to think about what to write in a docstring because I’m using a lot of linters.

Ah, placating linters. Usually linters have settings to turn off
particular types of warnings. For example, my personal “lint” script has
these wired into it:

 pycodestyle_ignore=E111,E114,E124,E125,E126,E129,E201,E202,E127,E221,E226,E227,E265,E266,E301,E302,E305,E501,E731,W503,W504
 pylint_disable=bad-whitespace,bad-indentation,bad-continuation,invalid-name,useless-object-inheritance,consider-using-f-string
 shellcheck_exclusions=SC1007,SC2244,SC2250

So you can see I disable a suite of things for pycodestyle and a suite
of things for pylint. shellcheck is a linter for Bourne shell
scripts, not Python.

I presume one of your linters complains about public functions with no
docstring. Mine too! I do leave the check enabled, myself.

That said, a docstring does not need much detail.

 def _invalid():
     '''Issue a warning about invalid input and exit the programme.'''

Do you think this now better? if not pls tell me more! I want to learn!

It’s definitely a lot better.

Let’s get more picky…

Regular expressions are generally overused. That’s because (a) they’re
very convenient and (b) they’re cryptic and hard to get correct, and
therefore small errors can be hard to identify. To quote JWZ:

Some people, when confronted with a problem, think “I know, I’ll use
regular expressions. Now they have two problems.”

Jamie Zawinski, in alt.religion.emacs

That said, there’s a window of tasks for which they are a good match,
and simple user input checks are often in that window.

But notice how hard to use and fix they are!

 def read_user_input(prompt="$ "):
     user_input = input(prompt)
     # This will check if there are only a group of numbers, operators, and parenthesis.
     if re.search(r"^\(?\d+(\.\d+)?(\s*?[*/+-]\s*?\(?\d+(\.\d+)?\)?)+$", user_input):

This expression has a lot of “nongreedy” matches. A regular expression
is normally “greedy”, in that each component will try to match the most
it possibly can, while still matching the rest (which makes that much
less trouble that you might first imagine).

When you’re matching different classes of things eg digits or whitespace
or punctuation, none of which can be mistaken for each other, you don’t
need to mark matches as “nongreedy” - they’ve really only got one choice
about what to match in the text.

And generally, you rarely need to make things nongreedy; it is sometimes
needed, but uncommon.

So your expression:

 ^\(?\d+(\.\d+)?(\s*?[*/+-]\s*?\(?\d+(\.\d+)?\)?)+$

can be simplified, and a simpler regexp is a lot easier to think
about. Let’s look:

\(?

This is a nongreedy match for a single character. There’s no difference
between greedy and nongreedy for single characters - they’re always
exactly 1 character long! So this can be just:

\(

All your whitespace matches:

 \s*?

must match all the whitespace at that point (because the matches either
side are nonwhitespace). So there’s no difference between a greedy and
nongreedy match here, either. So they can all be written:

\s*

This lets you trim this down to:

 ^\(\d+(\.\d+)?(\s*[*/+-]\s*\(\d+(\.\d+)?\))+$

Now let’s look at its structure. We can break this up a bit:

 ^
 \(
 \d+(\.\d+)?
 (  \s*
    [*/+-]
    \s*
    \(\d+(\.\d+)?\)
 )+
 $

Being:

start of string
an opening bracket ( character
a number
a group of:
- whitespace
- an operator character */+-
- whitespace
- a number-in-brackets
  repeated one or more times (+)

Note: I think you’re missing a closing bracket somewhere, maybe after
the first number (based on what you match on the second number, which
has brackets on both sides of it). You’ll match (2.0+(3.0), which has
an odd number of brackets.

If that matches, you then do some further sanity checks:

 # This one searches if there are two operators inside the parenthesis.
 error = re.search(r"\(\d+(\.\d+)?\s*?[*/+-]\s*?\d+(\.\d+)?\s*?[*/+-]", user_input)

Now you’re checking for 2 operators (and implicitly, this will also
match 3 or more, since they also have 2 operators). But you’re doing it
by matching the whole expression. That is more complication than you
need. You’ve already matched the general piece of syntax earlier, so you
don’t need to precisely check it here - you know it is in the expected
form.

Since any operator character must be an operator, all you really need
to do is look for at least 2 operators:

 [*/+-].*?[*/+-]

i.e. an operator and then another operator somewhere later, which is
much shorter than the check-the-exact-expression regexp you’ve got.

But wait, you can do more. Your concern is effectively that there’s more
than one “operator-(number)” sequence in the expression. But your first
regexp explicitly matched “one or more operator-(number)” sequences with
the (group)+ part.

You could match exactly one such group just by leaving out the +:

 ^\(\d+(\.\d+)?(\s*[*/+-]\s*\(\d+(\.\d+)?\)$

and since you’re now only allowing one “operator-(number)”, not a
repeated group of them, you also do not really need the brackets which
define the group, so you can make it:

 ^\(\d+(\.\d+)?\s*[*/+-]\s*\(\d+(\.\d+)?\)$

Written like that it will one match somthing like this:

 (2.0 + (3.0)

and not, say:

 (2.0 + (3.0) + (4.0)

because it would only allow one “operator-(number)”, you then do not
even need the check for more than one operator, because that would not
have matched the first regexp.

 # if the parenthesis is closed properly.
 if len(re.findall(r"[()]", user_input)) % 2 != 0:
     raise Exception("The parenthesis are not even.")

This isn’t a correct check. What you’re doing is counting (len()) the
number of [()] characters in the input, and checking that that is
even. But this will produce an even count:

((

because there are 2 matching characters. A better check would be to
count the open bracket and close brackets separatly, and check that they
are the same:

 if len(re.findall(r"[(]", user_input)) != len(re.findall(r"[)]", user_input)):

You could also write [(] as $ and [)] as $. Whichever seems
more readable to you.

But counting brackets is not enough for correctness. For example, these:

 ())(
 ))((
 )3.0(

will all pass the “same number of each type of bracket” test. To be
really thorough you’re want to test that the nesting is correct, eg:

 (3.0)               correct
 )3.0(               incorrect

In fact, because regexps do not support recursive matching, this is not
possible in general with regexps.

Fortunately, your initial regexp enforces a quite rigid expression, and
if it matched there cannot be bad brackets anyway. So you don’t need
this check at all.

 elif error:
     error = error.group()

I would probably not overwrite error here - now it means a different
thing.

Instead I’d just put the expression into the format string, changing:

     raise Exception(f"\"{error}\" in \"{user_input}\"\nhelp: only 1 operator is supported inside the parenthesis!")

into:

     raise Exception(f"\"{error.group()}\" in \"{user_input}\"\nhelp: only 1 operator is supported inside the parenthesis!")

Note: if you raise an exception, just as if you return from a
function, control does not proceed further. So an if/elif` chain like:

 if badness1:
     raise Exception(.....)
 elif badness2:
     raise Exception(.....)
 elif badness3:
     raise Exception(.....)
 else:
     return good_result

can be written:

 if badness1:
     raise Exception(.....)
 if badness2:
     raise Exception(.....)
 if badness3:
     raise Exception(.....)
 return good_result

which may be more readable.

 # This will check if there is only one group of numbers without 
 # an operator and if there is no non-numeric at the beginning of 
 # that group and end of it. So it can tell the calculator to not 
 # process this or stop the loop.
 elif re.search(r"^(?<!\D)\d+(\.\d+)?(?!\D)$", user_input):

This is faily complex. Why not:

 ^\d+(\.\d+)?$

which required a bare number as the entire string? Because it is bound
to the start (^) and end ($) of the string, there cannot be
nonnumeric characters before or after.

 global skip_calculation
 skip_calculation = True
 return user_input

I’d avoid the global variable. Instead I’d return both the user_input
and the skip_calculation flag from the function:

 return user_input, True

To match, that would imply that the earlier return would be:

 return user_input, False

At the calling end you would want to expect both values:

 user_input, skip_calculation = read_user_input()

Now you don’t need a global variable, and you certainly don’t need to
know about some global variable inside the function. Simpler all
around.

Can I just use raise Exception("foo") if I don’t know what to name the error? If not, can you refer me to where I can find the names of all the exception errors?

You can just raise Exception(.....) but it is better to raise a more
specific exception. That way the calling coe can catch just that type of
exception, not “all exceptions”.

The basic rule of thumb with try/except is to catch an handle only
what you expect. That way the unexpected is uncaught, and bubbles out
and makes a stack trace, to help you debug it.

For this reason there are many exception types.

For functions with bad values, the ValueError exception is the common
choice:

 def square_root(x):
     if x < 0:
         raise ValueError(f"x must be nonnegative, received {x!r}")
     return math.sqrt(x)

I’d use ValueError in your input function as well, it is a good match.

This means that the called end can write:

 try:
     user_input, skip_calculation = read_user_input()
 except ValueError as e:
     print(f"bad input: {e}", file=sys.stderr)
     continue

and catch just the bad input (a ValueError exception) and not other
errors (eg end of file, or NameError from logic errors where you use a
variable because you set it, etc etc).

The built in exceptions are described here:

and enumerated here:

(the same web page).

The tutorial in the Python docs talks about exceptions:

You can also define your own exceptions, usually by subclassing an
existing exception class. That lets you make even more fine grained
try/except clauses if the need arises.

Cheers,
Cameron Simpson cs@cskk.id.au

Denise_unique · October 29, 2022, 9:12am

@cameron

Cameron Simpson:

Ah, placating linters. Usually linters have settings to turn off
particular types of warnings. For example, my personal “lint” script has
these wired into it:
 pycodestyle_ignore=E111,E114,E124,E125,E126,E129,E201,E202,E127,E221,E226,E227,E265,E266,E301,E302,E305,E501,E731,W503,W504
 pylint_disable=bad-whitespace,bad-indentation,bad-continuation,invalid-name,useless-object-inheritance,consider-using-f-string
 shellcheck_exclusions=SC1007,SC2244,SC2250
So you can see I disable a suite of things for pycodestyle and a suite
of things for pylint. shellcheck is a linter for Bourne shell
scripts, not Python.

I presume one of your linters complains about public functions with no
docstring. Mine too! I do leave the check enabled, myself.

That said, a docstring does not need much detail.
 def _invalid():
     '''Issue a warning about invalid input and exit the programme.'''

The reason I use UPPER_CASE when naming global variables is because of Pylint’s complaints about not conforming to the UPPER_CASE naming style (invalid-name). And I really don’t know which style I should follow. I just assume the most pythonic way to name global variables is the UPPER_CASE naming style.

Cameron Simpson:

All your whitespace matches:
 \s*?
must match all the whitespace at that point (because the matches either
side are nonwhitespace). So there’s no difference between a greedy and
nongreedy match here, either. So they can all be written:
 \s*

While I agree with this, because it will still capture 0 or more whitespaces.

Cameron Simpson:

 \(?
This is a nongreedy match for a single character. There’s no difference
between greedy and nongreedy for single characters - they’re always
exactly 1 character long! So this can be just:
 \(

My reason why I use ? is because I want to capture 0 or 1 parenthesis. And I designed the calculator to be input-flexible, so it doesn’t matter if there is or isn’t whitespace, parenthesis, or how many numbers to solve, for example, n+n*(n-n)/n, as opposed to basic input, which asks for the first number, then the second number.

Note: I didn’t think about how many numbers should be inside the parenthesis when I made the algorithm. (6*7+8) will create an exception error even if I filter it in regex because of the algorithm of the calculator I created.
So the valid inputs are:
n / n… = group_one
n+n… = group_two
(group_one)…
(group_two)…

Note: I’m putting a lot of spaces in the example but it keeps deleting them.
7 * 6+47- (50- 40)/ 2 is valid

The first number is dependent on the second number; that’s why it doesn’t have closing parentheses. While the second number has both open and close parenthesis, this is due to the fact that it can repeat itself. For example, 7*8*9, where 8 and 9 are both second numbers, simply repeats itself, which is why the second number is in a group and has the + symbol, which means it is 1 or more. The reason I didn’t include the first number in the group is because of the start of the string (^).

Cameron Simpson:

Now you’re checking for 2 operators (and implicitly, this will also
match 3 or more, since they also have 2 operators). But you’re doing it
by matching the whole expression. That is more complication than you
need. You’ve already matched the general piece of syntax earlier, so you
don’t need to precisely check it here - you know it is in the expected
form.

Since any operator character must be an operator, all you really need
to do is look for at least 2 operators:
 [*/+-].*?[*/+-]
i.e. an operator and then another operator somewhere later, which is
much shorter than the check-the-exact-expression regexp you’ve got.

It only checks for two operators inside the parenthesis, which is why I didn’t use the ? in the \(. But you are right! I should just look for 2 operators instead of including numbers. Thank you. I don’t know why I didn’t consider that.

But I think you miss that because there’s no close parenthesis in sanity check.
\(\d+(\.\d+)?\s*?[*/+-]\s*?\d+(\.\d+)?\s*?[*/+-] ←

Cameron Simpson:

elif error:
     error = error.group()
I would probably not overwrite error here - now it means a different
thing.

Instead I’d just put the expression into the format string, changing:
     raise Exception(f"\"{error}\" in \"{user_input}\"\nhelp: only 1 operator is supported inside the parenthesis!")
into:
     raise Exception(f"\"{error.group()}\" in \"{user_input}\"\nhelp: only 1 operator is supported inside the parenthesis!")

Is it valid, however, if I use the variable error multiple times?

Cameron Simpson:

Note: if you raise an exception, just as if you return from a
function, control does not proceed further. So an if/elif` chain like:
 if badness1:
     raise Exception(.....)
 elif badness2:
     raise Exception(.....)
 elif badness3:
     raise Exception(.....)
 else:
     return good_result
can be written:
 if badness1:
     raise Exception(.....)
 if badness2:
     raise Exception(.....)
 if badness3:
     raise Exception(.....)
 return good_result
which may be more readable.

I also sometimes see that in linters. Is the else statement considered obsolete?

Cameron Simpson:

This is faily complex. Why not:
 ^\d+(\.\d+)?$
which required a bare number as the entire string? Because it is bound
to the start (^) and end ($) of the string, there cannot be
nonnumeric characters before or after.

It’s funny because you just solved my problem in Rust, which was not accepting lookahead and lookbehind in the standard library.

Cameron Simpson:

I’d avoid the global variable. Instead I’d return both the user_input
and the skip_calculation flag from the function:
 return user_input, True
To match, that would imply that the earlier return would be:
 return user_input, False
At the calling end you would want to expect both values:
 user_input, skip_calculation = read_user_input()
Now you don’t need a global variable, and you certainly don’t need to
know about some global variable inside the function. Simpler all
around.

I forgot you could return more than 1.

This is so useful, especially the tutorial, even though I’m not great at reading documents. Maybe I will get used to it someday.

Once again, thanks for your wisdom. I’m learning things I didn’t know and it means a lot to me. I’m sorry I’m not replying more quickly.

cameron · October 29, 2022, 10:58am

Cameron Simpson:
Ah, placating linters. Usually linters have settings to turn off
particular types of warnings. For example, my personal “lint” script has
these wired into it:
pycodestyle_ignore=E111,E114,E124,E125,E126,E129,E201,E202,E127,E221,E226,E227,E265,E266,E301,E302,E305,E501,E731,W503,W504
pylint_disable=bad-whitespace,bad-indentation,bad-continuation,invalid-name,useless-object-inheritance,consider-using-f-string
shellcheck_exclusions=SC1007,SC2244,SC2250
So you can see I disable a suite of things for pycodestyle and a suite
of things for pylint. shellcheck is a linter for Bourne shell
scripts, not Python.
The reason I use UPPER_CASE when naming global variables is because of Pylint’s complaints about not conforming to the UPPER_CASE naming style (invalid-name).

You can see that in my list of disabled rules above But largely for
some local variables I tend to use.

And I really don’t know which style I should follow. I just assume the
most pythonic way to name global variables is the UPPER_CASE naming
style.

Ah.

Ideally you don’t use global variables (or not much). The trouble with
globals is that they affect everything (which uses them) and there’s
only one copy of them.

If you put your main programme in a function, all its variables will be
locals and you can give them traditional lower_case names without the
linter complaining.

So, what’s with the UPPERCASE for globals rule? Once your codes encased
in a function, the only real global variables are what would in other
languages be constants, values to tune the behaviour of sizes of things.
Examples:

  # a default allocation size
  BUFFER_SIZE = 4096

  # a default port number for communications
  ACCESS_PORT = 1234

These would typically be “constants” in languages which have those
(Python doesn’t) and in such languages constants are often written in
UPPERCASE to make it clear that they’re fixed tuning values. So we use
the same rule in Python.

Linters expect you to have almost no globals and therefore assume that
any globals you do have are “constants”, and thus recommend UPPERCASE
for them.

Cameron Simpson:
All your whitespace matches:
\s*?
must match all the whitespace at that point (because the matches either
side are nonwhitespace). So there’s no difference between a greedy and
nongreedy match here, either. So they can all be written:
\s*
While I agree with this, because it will still capture 0 or more whitespaces.

Aye. So drop the ? from these, because it is more verbiage which makes
things harder to read.

Cameron Simpson:
\(?
This is a nongreedy match for a single character. There’s no difference
between greedy and nongreedy for single characters - they’re always
exactly 1 character long! So this can be just:
\(
My reason why I use ? is because I want to capture 0 or 1 parenthesis.

Ah yes of course. Sorry, I was misreading this - you’re completely
correct.

Got it confused with the something*? syntax which does nongreedy
matching. Sorry.

And I designed the calculator to be input-flexible, so it doesn’t
matter if there is or isn’t whitespace, parenthesis, or how many
numbers to solve, for example, n+n*(n-n)/n, as opposed to basic input,
which asks for the first number, then the second number.

Ok.

One way to simplfy things with the whitespace is to get rid of it
entirely before trying to parse the rest. Example:

 >>> expr_s = '1+2 + 3*4'
 >>> re.sub(r'\s+', '', expr_s)
 '1+2+3*4'

so you could do that to your expression string first before matching
things. With that modification to the string you can remove all the
\s* instances from your regexps, simplifying things again.

Note: I didn’t think about how many numbers should be inside the
parenthesis when I made the algorithm. (6*7+8) will create an exception
error even if I filter it in regex because of the algorithm of the
calculator I created.
So the valid inputs are:
n / n… = group_one
n+n… = group_two
(group_one)…
(group_two)…

7 * 6+47- (50- 40)/ 2 is valid

So you can have many operators?

Cameron Simpson:

Note: I think you’re missing a closing bracket somewhere, maybe after
the first number (based on what you match on the second number, which
has brackets on both sides of it). You’ll match (2.0+(3.0), which has
an odd number of brackets.

The first number is dependent on the second number; that’s why it doesn’t have closing parentheses. While the second number has both open and close parenthesis, this is due to the fact that it can repeat itself. For example, 789, where 8 and 9 are both second numbers, simply repeats itself, which is why the second number is in a group and has the + symbol, which means it is 1 or more. The reason I didn’t include the first number in the group is because of the start of the string (^).

I was talking about the literal $ at the start - it doesn’t seem to
have a literal $ to match it later in the regexp.

Cameron Simpson:
Since any operator character must be an operator, all you really need
to do is look for at least 2 operators:
[*/+-].*?[*/+-]
i.e. an operator and then another operator somewhere later, which is
much shorter than the check-the-exact-expression regexp you’ve got.
It only checks for two operators inside the parenthesis, which is why I didn’t use the ? in the \(. But you are right! I should just look for 2 operators instead of including numbers. Thank you. I don’t know why I didn’t consider that.

Practice.

But I think you miss that because there’s no close parenthesis in
sanity check.
\(\d+(\.\d+)?\s*?[*/+-]\s*?\d+(\.\d+)?\s*?[*/+-] ←
Cameron Simpson:
elif error:
    error = error.group()
I would probably not overwrite error here - now it means a different
thing.

Instead I’d just put the expression into the format string, changing:
    raise Exception(f"\"{error}\" in \"{user_input}\"\nhelp: only 1 operator is supported inside the parenthesis!")
into:
    raise Exception(f"\"{error.group()}\" in \"{user_input}\"\nhelp: only 1 operator is supported inside the parenthesis!")
Is it valid, however, if I use the variable error multiple times?

It’s just a variable. You can use it as often as you like. All you have
to ensure is that it means what you intend when you use it. What I was
getting at is that this statement:

 error = error.group()

changes the kind of things error refers to. Legal, but potentially
confusing. Before that assignment error is a regexp Match object
containg the result of your regexp match. After the assignment it is the
text which was matched, a string.

We tend to like a particular variable name to refer to the same kind of
thing mostly to reduce confusion.

Cameron Simpson:
Note: if you raise an exception, just as if you return from a
function, control does not proceed further. So an if/elif` chain like:
if badness1:
    raise Exception(.....)
elif badness2:
    raise Exception(.....)
elif badness3:
    raise Exception(.....)
else:
    return good_result
can be written:
if badness1:
    raise Exception(.....)
if badness2:
    raise Exception(.....)
if badness3:
    raise Exception(.....)
return good_result
which may be more readable.
I also sometimes see that in linters. Is the else statement considered obsolete?

Not at all. The linter’s complaining that if there’s a raise or
return in the “true” branch, there’s no way that branch con flow on to
the code below. You’ll find it only complains with a return or a
raise in the branch.

Consider this:

 if x == 3:
     y = 4
 else:
     y = 5

Here the else is necessary. The alternative without the else goes:

 if x == 3:
     y = 4
 y = 5

here y would always end up as 5 regardless of the value of x,
because the y = 5 assignment will always execute. The else separates
the two alternatives so that only one runs.

But code like this:

 if x == 3:
     raise ValueError("the value 3 is forbidden!")
 else:
     y = 5

doesn’t need the else because the “true” branch leaves the function
courtesy of the raise. So this:

 if x == 3:
     raise ValueError("the value 3 is forbidden!")
 y = 5

is valid, and shorter.

elif is short for else if and the same logic applies.

Cameron Simpson:
This is faily complex. Why not:
^\d+(\.\d+)?$
which required a bare number as the entire string? Because it is bound
to the start (^) and end ($) of the string, there cannot be
nonnumeric characters before or after.
It’s funny because you just solved my problem in Rust, which was not accepting lookahead and lookbehind in the standard library.

Good. Regexps do variable a bit from language to language, or from
regexp library to regexp library. They all share a common basis, but
some features are not present everywhere. Python’s regexp library (the
re module) implements “Perl Compatible Regular Expressions”. The
specifics are in the module documentation.

Cameron Simpson:
I’d avoid the global variable. Instead I’d return both the user_input
and the skip_calculation flag from the function:
return user_input, True
To match, that would imply that the earlier return would be:
return user_input, False
At the calling end you would want to expect both values:
user_input, skip_calculation = read_user_input()
Now you don’t need a global variable, and you certainly don’t need to
know about some global variable inside the function. Simpler all
around.
I forgot you could return more than 1.

There are languages where you can’t do that. Even in Python
(technically) you’re only returning one value, it just happens to be a
2-tuple. At the other end you’re doing what we call an “unpacking
assignment”, where there are multiple variable names on the left and an
object containing multiple values on the right (the 2-tuple we got
from the function).

Cameron Simpson:

The built in exceptions are described here:
Built-in Exceptions — Python 3.12.1 documentation
and enumerated here:
Built-in Exceptions — Python 3.12.1 documentation
(the same web page).

The tutorial in the Python docs talks about exceptions:
8. Errors and Exceptions — Python 3.12.1 documentation

This is so useful, especially the tutorial, even though I’m not great at reading documents. Maybe I will get used to it someday.

My approach is to read the documentation (until my brain is full). At
that point I’ve learnt some of it, and know the details are there. I can
then return to the documentation to look up the specifics when I need
them. The details I use a lot I’ll learn as a result.

Once again, thanks for your wisdom. I’m learning things I didn’t know
and it means a lot to me. I’m sorry I’m not replying more quickly.

The beauty of email is that there’s no hurry. This thread will pop to
the top of my inbox when a reply happens. All good!

Cheers,
Cameron Simpson cs@cskk.id.au

Denise_unique · October 30, 2022, 8:38am

@cameron

Cameron Simpson:

Ok.

One way to simplfy things with the whitespace is to get rid of it
entirely before trying to parse the rest. Example:
 >>> expr_s = '1+2 + 3*4'
 >>> re.sub(r'\s+', '', expr_s)
 '1+2+3*4'
so you could do that to your expression string first before matching
things. With that modification to the string you can remove all the
\s* instances from your regexps, simplifying things again.

Done! I will try to be more simple, but only for regex, right? because it’s cryptic?

Yup, so 1+1*7-9/2+7... is valid
       (1+1*7-9/2+7...) is not valid

Cameron Simpson:

cameron:

Note: I think you’re missing a closing bracket somewhere, maybe after
the first number (based on what you match on the second number, which
has brackets on both sides of it). You’ll match (2.0+(3.0), which has
an odd number of brackets.

The first number is dependent on the second number; that’s why it doesn’t have closing parentheses. While the second number has both open and close parenthesis, this is due to the fact that it can repeat itself. For example, 789, where 8 and 9 are both second numbers, simply repeats itself, which is why the second number is in a group and has the + symbol, which means it is 1 or more. The reason I didn’t include the first number in the group is because of the start of the string (^).

I was talking about the literal $ at the start - it doesn’t seem to
have a literal $ to match it later in the regexp.

This could be broken in your email. Sorry about that.

^
 \( <-- So this is what we are talking about, right?
 \d+(\.\d+)? <-- The first number should have this
 \) <-------------------------------------------| But it doesn't because of this
 (  <--------------------------------------------------------------------*****|
    \s* ****************************************************************|*****|
    [*/+-] *************************************************************|*****|
    \s* ****************************************************************|*****|
    \( <-- The reason why the second number has that is that it's repeatable. |
    \d+(\.\d+)?  <------------------|***********************************|*****|
    \)  <---------------------------------------------------------------|------
 )+  <-------------------------------------------------------------------
 $

I will try to explain more tomorrow.

cameron · October 30, 2022, 9:01am

Cameron Simpson:
One way to simplfy things with the whitespace is to get rid of it
entirely before trying to parse the rest. Example:
>>> expr_s = '1+2 + 3*4'
>>> re.sub(r'\s+', '', expr_s)
'1+2+3*4'
so you could do that to your expression string first before matching
things. With that modification to the string you can remove all the
\s* instances from your regexps, simplifying things again.
Done! I will try to be more simple, but only for regex, right? because it’s cryptic?

Well, simple is always good, provide it’s correct. But it’s particularly
useful for regexps because they’re cryptic.

This could be broken in your email. Sorry about that.

Hah! I read email in a terminal, so I’ve got a fixed width font. Your
diagram is just fine.

^
\( <-- So this is what we are talking about, right?
\d+(\.\d+)? <-- The first number should have this
\) <-------------------------------------------| But it doesn't because of this

Right, I must have missed that. Yes, you’re correct.

(  <--------------------------------------------------------------------*****|
   \s* ****************************************************************|*****|
   [*/+-] *************************************************************|*****|
   \s* ****************************************************************|*****|
   \( <-- The reason why the second number has that is that it's repeatable. |
   \d+(\.\d+)?  <------------------|***********************************|*****|
   \)  <---------------------------------------------------------------|------
)+  <-------------------------------------------------------------------
$

So you’re requiring literal open and close bracket characters around
your numbers? A plain ( and ) use grouping in the regexp. An escaped
$ and $ are literal bracket characters.

I will try to explain more tomorrow.

Great.

Cheers,
Cameron Simpson cs@cskk.id.au

Denise_unique · October 30, 2022, 10:05am

@cameron

That’s cool! I’m using vim in the terminal.

It should have a question mark \(?. I’m sorry I forgot to include it. I just copied and pasted it from your post, which you said I should remove the question mark, which I think you misunderstood because I didn’t explain more and, as you said, regexps is cryptic. But no, it isn’t required to have a parenthesis. That’s why there’s a sanity check if the parenthesis are closed properly. I included parenthesis because, just like in MDAS, the difference is that it will solve the numbers inside the parenthesis first, just like in a calculator.

vbrozik · October 30, 2022, 12:20pm

Note that you can write valid regexes containing comments and whitespaces. They are really useful:

regex = re.compile(
        r"""\d +  # the integral part
            \.    # the decimal point
            \d *  # some fractional digits""", re.X)

# or everything in the regex string:
regex = re.compile(
        r"""(?x)
            \d +  # the integral part
            \.    # the decimal point
            \d *  # some fractional digits""")

See re — Regular expression operations — Python 3.11.0 documentation
It is not necessary to compile them in advance like in the example.

Denise_unique · October 30, 2022, 12:35pm

Václav Brožík:

Note that you can write valid regexes containing comments and whitespaces. They are really useful:

regex = re.compile(
        r"""\d +  # the integral part
            \.    # the decimal point
            \d *  # some fractional digits""", re.X)

# or everything in the regex string:
regex = re.compile(
        r"""(?x)
            \d +  # the integral part
            \.    # the decimal point
            \d *  # some fractional digits""")

That is so useful. It will help me to be more explicit and simplify things. I will surely try to remember that! Thank you @vbrozik

cameron · October 30, 2022, 9:21pm

Maybe we should:

restate the problem clearly (I’m losing track)
write out the regexps in VERBOSE form per Václav’s suggestion

AIUI, you want:

to accept any expression like 2 + 3.0 * 7.2 / 5 + ( 4 + 7 ) of any length
use brackets for grouping, for example the ( 4 + 7 ) above
to compute the bracketed groups before the other parts

Unclear:

do you allow brackets around numbers, eg (3.0) ?
can brackets nest eg (3.0 * (4 + 7))
are you otherwise computing left to right, ignoring precedence?

WRT the last, I’m asking if 3+4*5 equals 35 or 23.

I’m thinking of suggesting a change in algorithm, partly because regexps
do not recurse, and partly to accomodate the “compute bracketed groups
first”.

Cheers,
Cameron Simpson cs@cskk.id.au

Denise_unique · October 31, 2022, 2:47am

@cameron

def read_user_input(prompt="$ "):
    user_input = input(prompt)
    # Get rid of whitespace to simplfy things.
    user_input = ic(re.sub(r'\s+', '', user_input))

    # This will check if there are only a group of numbers, operators, and parenthesis.
    if ic(re.search(
          r"""^           #1 start of string should be numbers required parenthesis optional
              \(?         #2 literal parenthesis 0 or 1 not required
              \d+(\.\d+)? #3 first number required decimal is optional
              (           #4 start of the group            
              [*/+-]      #5 operators required
              \(?         #6 literal parenthesis 0 or 1 not required
              \d+(\.\d+)? #7 second number required decimal is optional
              \)?         #8 literal parenthesis 0 or 1 not required
              )+          #9 end of the group
              $           #0 end of string should be numbers required parenthesis optional""", user_input)):
        # This will group open-closed parenthesis and inside of it
        inside_parenthesis = re.findall(r"\(.*?\)", user_input)
        # if the parenthesis is closed properly.
        if len(re.findall(r"[()]", user_input)) % 2 != 0:
            raise ValueError("The parenthesis are not even.")
        # This one searches if there are two operators inside the parenthesis
        elif inside_parenthesis:
            for group_parenthesis in inside_parenthesis:
                if len(re.findall(r"[*/+-]", group_parenthesis)) > 1:
                    raise ValueError(f"help: only 1 operator is supported inside the parenthesis!")
                return user_input, False
        return user_input, False
    # This will check if there is only one group of numbers without an operator and if there is no non-numeric at the beginning of that group and end of it. So it can tell the calculator to not process this or stop the loop.
   
    #11
    elif ic(re.search(r"^\(?\d+(\.\d+)?\)?$", user_input)):
        if len(re.findall(r"[()]", user_input)) % 2 != 0:
            raise ValueError("The parenthesis are not even.")
        elif re.findall(r"[()]", user_input):
            return re.search(r"\d+(\.\d+)?", user_input).group(), True
        return user_input, True
    raise ValueError(f"\"{user_input}\"")

I use #1 and #0 To exclude nonnumeric characters before and after

So #2 is the original problem you said I should add literal close parenthesis \)? in first number #3 but I didnt because of literal closed parenthesis in second number #8 because this sanity check required two numbers ex. 156+78 is valid and not 156 which is why I made #11 for alone numbers. 156+78*9 is also valid its just optional see #9 + means 1 or more

So the second number #7 is in this group #4-#9 thats why it has literal open parenthesis #6 because its repeatable unlike first number #3 that doesnt have literal close parenthesis #8 also I use + in end of group #9 for 1 or more means first number #3 is still dependent on second number #7

Yes you understand it all correctly

No, before…

elif re.search(r"^(?<!\D)\d+(\.\d+)?(?!\D)$", user_input):
    global skip_calculation
    skip_calculation = True
    return user_input

But now I add #2 and #3 as optional, and then #5 to check if they are open or closed properly.

elif ic(re.search(
        r"""^
            \(?          #2
            \d+(\.\d+)?
            \)?          #3
            $""", user_input)):
    #5
    if len(re.findall(r"[()]", user_input)) % 2 != 0:
        raise ValueError("The parenthesis are not even."
    return re.search(r"\d+(\.\d+)?", user_input).group(), True
raise ValueError(f"\"{user_input}\"")

No, I didn’t even think about that when I made the algorithm. (3.0*4+7) is not valid. number-operator-number or (3.0*4) is the only valid inside the parenthesis.

Yup, because it will not be correct math-wise, but parenthesis is the first priority. Because of mdas, I had to change my algorithm previously. I can’t even remember how many times I changed the algorithm before. Good times

It’s equal to 23 because of mdas.

I don’t think so, because I used re.search instead of re.findall, so I can only get one match instead of two or more, and luckily it starts searching on the left.

So the forever loop will search for parenthesis first, then multiplication or division, then addition or subtraction, and then it will stop solving if there is only one number remaining, i.e., 35.

And that’s why the variable pattern is changing.

else:
    match PATTERN:
        # In literal parenthesis, there is no question mark. It's the default, which is why it's the first priority to solve. Don't mind the operators and spaces for now.
        case r"[*/+-]?\s*?\(.*?\)":
            # It changes the pattern to look for multiplication or division [*/] I'm sure I could simplify regexps here. Hehe..
            PATTERN = r"\d+(\.\d+)?\s*?[*/]\s*?\d+(\.\d+)?"
        case r"\d+(\.\d+)?\s*?[*/]\s*?\d+(\.\d+)?":
            # Then it changed to addition or subtraction.
            PATTERN = r"\d+(\.\d+)?\s*?[+-]\s*?\d+(\.\d+)?"
        case r"\d+(\.\d+)?\s*?[+-]\s*?\d+(\.\d+)?":
            DONE = True
    if DONE:
        if EXECUTE:
            break
        _invalid()

If you’re having a hard time understanding the words I’m using, if so, I will use a paraphraser to find more meaningful words. It’s just going to take more time.

cameron · October 31, 2022, 4:50am

Cameron Simpson:

Maybe we should:

restate the problem clearly (I’m losing track)

write out the regexps in VERBOSE form per Václav’s suggestion
def read_user_input(prompt="$ "):

Etc. This restates the code, not the problem. But we need the code
anyway.

   user_input = input(prompt)
   # Get rid of whitespace to simplfy things.
   user_input = ic(re.sub(r'\s+', '', user_input))

   # This will check if there are only a group of numbers, operators, and parenthesis.
   if ic(re.search(

What’s ic?

     r"""^           #1 start of string should be numbers required parenthesis optional
         \(?         #2 literal parenthesis 0 or 1 not required
         \d+(\.\d+)? #3 first number required decimal is optional
         (           #4 start of the group
         [*/+-]      #5 operators required
         \(?         #6 literal parenthesis 0 or 1 not required
         \d+(\.\d+)? #7 second number required decimal is optional
         \)?         #8 literal parenthesis 0 or 1 not required
         )+          #9 end of the group
         $           #0 end of string should be numbers required parenthesis optional""", user_input)):
   # This will group open-closed parenthesis and inside of it
   inside_parenthesis = re.findall(r"\(.*?\)", user_input)
   # if the parenthesis is closed properly.
   if len(re.findall(r"[()]", user_input)) % 2 != 0:
       raise ValueError("The parenthesis are not even.")
   # This one searches if there are two operators inside the parenthesis
   elif inside_parenthesis:
       for group_parenthesis in inside_parenthesis:
           if len(re.findall(r"[*/+-]", group_parenthesis)) > 1:
               raise ValueError(f"help: only 1 operator is supported inside the parenthesis!")
           return user_input, False
   return user_input, False

This will check if there is only one group of numbers without an operator and if there is no non-numeric at the beginning of that group and end of it. So it can tell the calculator to not process this or stop the loop.

#11
elif ic(re.search(r"^(?\d+(.\d+)?)?$“, user_input)):
if len(re.findall(r”[()]“, user_input)) % 2 != 0:
raise ValueError(“The parenthesis are not even.”)
elif re.findall(r”[()]“, user_input):
return re.search(r”\d+(.\d+)?“, user_input).group(), True
return user_input, True
raise ValueError(f”"{user_input}"")


I use #1 and #0 To exclude nonnumeric characters before and after

Right.

So #2 is the original problem you said I should add literal close parenthesis \)? in first number #3 but I didnt because of literal closed parenthesis in second number #8 because this sanity check required two numbers ex. 156+78 is valid and not 156 which is why I made #11 for alone numbers. 156+78*9 is also valid its just optional see #9 + means 1 or more

The parts #6,#7,#8 read like a number which may (optionally) have
brackets around it, hence my question about (3.0) below.

So the second number #7 is in this group #4-#9 thats why it has literal open parenthesis #6 because its repeatable unlike first number #3 that doesnt have literal close parenthesis #8 also I use + in end of group #9 for 1 or more means first number #3 is still dependent on second number #7

Right.

Cameron Simpson:

AIUI, you want:

to accept any expression like 2 + 3.0 * 7.2 / 5 + ( 4 + 7 ) of any length

use brackets for grouping, for example the ( 4 + 7 ) above

to compute the bracketed groups before the other parts

Yes you understand it all correctly

Ok.

Cameron Simpson:

Unclear:

do you allow brackets around numbers, eg (3.0) ?

No, […]

Ok.

[…]

Cameron Simpson:

Unclear:

can brackets nest eg (3.0 * (4 + 7))

No, I didn’t even think about that when I made the algorithm. (3.0*4+7) is not valid. number-operator-number or (3.0*4) is the only valid inside the parenthesis.

Ok.

Cameron Simpson:

Unclear:

are you otherwise computing left to right, ignoring precedence?

Yup, because it will not be correct math-wise, but parenthesis is the first priority. Because of mdas, I had to change my algorithm previously. I can’t even remember how many times I changed the algorithm before. Good times

I’m going to suggest another change. The good times are back again.

Cameron Simpson:

WRT the last, I’m asking if 3+4*5 equals 35 or 23.

It’s equal to 23 because of mdas.

I’m taking MDAS to mean Multiplication, Division, Addition, Subtraction?
If so, that’s what I meant about “precedence” before - * and /
having higher precedence is what makes 3+4*5==23. In order for your
code to produce correct results you need to match * and / “ahead” of
matching + and - in some fashion.

Cameron Simpson:

I’m thinking of suggesting a change in algorithm, partly because regexps
do not recurse, and partly to accomodate the “compute bracketed groups
first”.

I don’t think so, because I used re.search instead of re.findall, so I can only get one match instead of two or more, and luckily it starts searching on the left.

Well, starting at the left is part of the specification for regexps:
they match the earliest legal match i.e. leftmost. This produces
predictable behaviour.

So the forever loop will search for parenthesis first, then multiplication or division, then addition or subtraction, and then it will stop solving if there is only one number remaining, i.e., 35.

Is (3+4)*5 valid? What about 5*(3+4)? How does your code find the
various pieces in the right order of evaluation if we swap things
around?

And that’s why the variable pattern is changing.

else:
   match PATTERN:
       # In literal parenthesis, there is no question mark. It's the default, which is why it's the first priority to solve. Don't mind the operators and spaces for now.
       case r"[*/+-]?\s*?\(.*?\)":
           # It changes the pattern to look for multiplication or division [*/] I'm sure I could simplify regexps here. Hehe..
           PATTERN = r"\d+(\.\d+)?\s*?[*/]\s*?\d+(\.\d+)?"
       case r"\d+(\.\d+)?\s*?[*/]\s*?\d+(\.\d+)?":
           # Then it changed to addition or subtraction.
           PATTERN = r"\d+(\.\d+)?\s*?[+-]\s*?\d+(\.\d+)?"
       case r"\d+(\.\d+)?\s*?[+-]\s*?\d+(\.\d+)?":
           DONE = True
   if DONE:
       if EXECUTE:
           break
       _invalid()

If you’re having a hard time understanding the words I’m using, if so, I will use a paraphraser to find more meaningful words. It’s just going to take more time.

The change I’m thinking of suggesting is to progressively reduce the
expression until there is a single number left, or a syntax error (no
matches). If you do that you don’t need to count brackets or anything -
just consume them in their innermost pairs until you can’t.

So your order of evaluation precedence is something like:

( number op number )
number * number
number / number
number + number
number - number

So you could make something which went along the lines of:

 expression = input(.......)
 expression = re.sub(r'\s+', '', expression)
 while True:
     if look for ( number op number ):
         replace with evaluation of number op number
     elif look for number * number:
         replace with evaluation of number * number
     ... etc etc
     else:
         # nothing we know how to evaluate, exit loop
         break
 if expression matches ^\d(\.\d+)$
     result = float(expression)
 else:
     raise ValueError("Syntax Error in expression!")

So each loop iteration replaces the higest precedence simple thing it
can see, once. Repeat until there’s nothing else to replace.

Thoughts?

Cheers,
Cameron Simpson cs@cskk.id.au