How do I remove XXX from dir printing

vbrozik · August 11, 2022, 9:22pm

If you have a list of strings or text consisting of lines which you need to process individually one-by one then use regex in a loop. In general, loops are used to process multiple items / events / attempts…

No, definitely not. Regular expressions were invented in 1950s and became very popular with Unix (around 1970). Though they have multiple inconveniences I have never seen anything capable of replacing them.

Few years ago I started learning Python using an interactive course on Sololearn:
https://www.sololearn.com/learning/1157

I really liked it. It is for free. The course has multiple exercises, there is a community and there are also more advanced Python courses.

Of course in addition to Sololearn there are many other interactive online courses.

woodturner550 · August 11, 2022, 9:25pm

What I have so far… with problem.

import math
MODULE_NAME = math

for i in range(len(str(dir(MODULE_NAME)))):
    print(dir(MODULE_NAME)[i])  # PROBLEM list out of range

How do I handle that. I know I don’t want to iterate more than the items in the list. Then I’m out of range. So x = len(MODULE_NAME) will give me the number of items in the list. One way maybe is a second loop.

vbrozik · August 11, 2022, 9:43pm

for i in range(len(str(dir(MODULE_NAME)))):
#                  ^^^--- Here you converted the list of strings to a single long string.

The conversion caused that len(str(dir(MODULE_NAME))) gives the number of characters in the whole list converted to a single long string. Just omitting the str() conversion will help.

…but in Python we do not need the numeric index. We iterate lists (and other containers) directly:

for item in dir(MODULE_NAME):
    # Try to print the individual items here in the loop's body.

woodturner550 · August 11, 2022, 11:30pm

Great Course! Thanks, this will cover a lot of missing things!

woodturner550 · August 14, 2022, 12:49am

I’m going to school for python so most of my time is in studing, but, I’ve come this far.

import math

MODULE_NAME = math

for item in dir(MODULE_NAME):  # Try to print the individual items here in the loop's body.
    output_of_characters = [i for i in item]
    print('string_of_characters', output_of_characters)
    
    for x in item:
        print('long list of characters', x)

I believe we will need the characters to check against each item in the pattern.

steven.daprano · August 14, 2022, 7:26am

You don’t have to create a new variable to look at its dir(). And the easiest way to split a string into a list of characters is with list().

import math
for name in dir(math):
    characters = list(name)
    print(characters)

woodturner550 · August 19, 2022, 8:53pm

I’m almost happy with this script.

import math

MODULE_NAME = math
import re

pattern_non_dunder = re.compile(r"(\b[a-z][^_]+\b[^_])")
commands_list = re.findall(pattern_non_dunder, str(dir(MODULE_NAME),))

for item in commands_list:
    print(item)

acos', 'acosh', 'asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign', 'cos', 'cosh', 'degrees', 'e', 'erf', 'erfc', 'exp', 'expm1', 'fabs', 'factorial', 'floor', 'fmod', 'frexp', 'fsum', 'gamma', 'gcd', 'hypot', 'inf', 'isclose', 'isfinite', 'isinf', 'isnan', 'ldexp', 'lgamma', 'log', 'log10', 'log1p', 'log2', 'modf', 'nan', 'pi', 'pow', 'radians', 'remainder', 'sin', 'sinh', 'sqrt', 'tan', 'tanh', 'tau', 'trunc'

Process finished with exit code 0

One problem is the lack of one ’ at the beginning of the list. I don’t understand this.The first as well as all commands should be in a single quote.
Any idea how to correct this? Thanks. I like finishing something I started.

cameron · August 19, 2022, 10:15pm

By Leonard Dye via Discussions on Python.org at 19Aug2022 21:03:

I’m almost happy with this script.

I still find this approach rather strange. You’re getting a textual
printout of the names in a module, and scanning that single text string
for your target names.

If your objective is to use regexps to scan text, rather than purely
to classifify the names, this may be sensible. But if you’re just trying
to identify your nondunder names, I think that converting dir() to a
single string and scanning it is a complex and error prone way to do
this.

All that said, let’s look at your code for the concern you’ve expressed:

One problem is the lack of one ’ at the beginning of the list. I don’t
understand this.The first as well as all commands should be in a
single quote.

The result of dir(math) is a list of str instances:

>>> import math
>>> dir(math)
['__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'acos', 'acosh', 'asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'comb', 'copysign', 'cos', 'cosh', 'degrees', 'dist', 'e', 'erf', 'erfc', 'exp', 'expm1', 'fabs', 'factorial', 'floor', 'fmod', 'frexp', 'fsum', 'gamma', 'gcd', 'hypot', 'inf', 'isclose', 'isfinite', 'isinf', 'isnan', 'isqrt', 'lcm', 'ldexp', 'lgamma', 'log', 'log10', 'log1p', 'log2', 'modf', 'nan', 'nextafter', 'perm', 'pi', 'pow', 'prod', 'radians', 'remainder', 'sin', 'sinh', 'sqrt', 'tan', 'tanh', 'tau', 'trunc', 'ulp']

and str(dir(math)), which is what you are scanning with a regexp, is
this:

>>> str(dir(math))
"['__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'acos', 'acosh', 'asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'comb', 'copysign', 'cos', 'cosh', 'degrees', 'dist', 'e', 'erf', 'erfc', 'exp', 'expm1', 'fabs', 'factorial', 'floor', 'fmod', 'frexp', 'fsum', 'gamma', 'gcd', 'hypot', 'inf', 'isclose', 'isfinite', 'isinf', 'isnan', 'isqrt', 'lcm', 'ldexp', 'lgamma', 'log', 'log10', 'log1p', 'log2', 'modf', 'nan', 'nextafter', 'perm', 'pi', 'pow', 'prod', 'radians', 'remainder', 'sin', 'sinh', 'sqrt', 'tan', 'tanh', 'tau', 'trunc', 'ulp']"

Here’s your regular expression:

(\b[a-z][^_]+\b[^_])

We can ignore the surrounding (), as you are not currently using the
subgroups, so:

\b[a-z][^_]+\b[^_]

being:

a word boundary
an alphabetic character
1 or more non-underscore characters
a word boundary
a non-underscore

This has some problems, exhibited in your output, but let’s look at your
“missing leading quote mark” issue first.

The first character matched in your expression is an alphabetic
character. So you’re not matching a quote mark, and it will not be
included in your match.

You do match a non-underscore at the end of the expression, and as it
happens that non-underscore is a quote mark in the text, so you get a
quote mark at the end of the match.

All this is because re.findall, which is the correct thing to do for
the approach you are taking, does not have to match at the start of the
text. You could include the character preceeding the word boundary in
your match, like this (using your “non-underscore” criterion):

[^_]\b[a-z][^_]+\b[^_]

which would pick up the leading quote mark in the matched text.

A larger concern is that you have an overly generous regular expression.
Your intent, I gather, is obtain a list of the nondunder names from
dir(math). You print these names out in a loop at the end with
distinct print() calls. That should print one name per line:

for item in commands_list:
    print(item)

However, look at the output:

acos', 'acosh', 'asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign', 'cos', 'cosh', 'degrees', 'e', 'erf', 'erfc', 'exp', 'expm1', 'fabs', 'factorial', 'floor', 'fmod', 'frexp', 'fsum', 'gamma', 'gcd', 'hypot', 'inf', 'isclose', 'isfinite', 'isinf', 'isnan', 'ldexp', 'lgamma', 'log', 'log10', 'log1p', 'log2', 'modf', 'nan', 'pi', 'pow', 'radians', 'remainder', 'sin', 'sinh', 'sqrt', 'tan', 'tanh', 'tau', 'trunc'

That is one line of text, indicating that there is just one very long
string in commands_list, not many short strings. Try printing
len(commands_list) to check this.

Why is this so? Let’s review your regular expression:

\b[a-z][^_]+\b[^_]

being:

a word boundary
an alphabetic character
1 or more non-underscore characters
a word boundary
a non-underscore

The core concern here is that a “non-underscore” (from [^_]) matches
anything that is not an underscore, including punctuation. So it
happily consumes these characters after the names:

', '

The \b word boundary is just that: a boundary marker. It occurs
between a “word” and “nonword” character in either order. It has no
other direct effect. Specificly, it does not force matched stuff between
the marks to “be word characters”.

Your regular expression happily matches the entire string from the
first word to the last, as a single match.

What you primarily need to do is to ensure that middle of the word is
only what you want. I would imagine that to be, maybe, just letters. So
instead of matching non-underscores with [^_] you should perhaps match
latters with [a-z], resulting in this:

\b[a-z][a-z]+\b[^_]

The expression would match a letters-only word followed by a
non-underscore. Because there is a word boundary before that
non-underscore, it cannot match eg abcde_ since that is considered a
“word” for purposes of \b and thus there would not be a “word
boundary” between the e and the _. So your final “non-underscore”
can effectively only match punctuation. Which is what you wanted.

But this kind of complication is why regular expressions are considered
overused. They are hard to get correct, particularly for people new to
them.

My personal approach would not be to convert dir(math) into a single
string to scan for nondunder names. I would keep it as is (a list) and
scan that list for nondunder names. You could classify the names in the
list using a regular expression somewhat as you are now, or use a
non-regular expression based classification with the string startswith
and endswith methods.

Untested sketch:

commands_list = []
for name in dir(math):
    if ... test that name is a nondunder name ...:
        commands_list.append(name)

inserting your preferred text expression.

Cheers,
Cameron Simpson cs@cskk.id.au

woodturner550 · August 19, 2022, 11:17pm

I will be looking at all three, regex, startswith and endswith. I went back and worked through why the str part was the problem, yet the way I did it required it to be a str rather than a list. By the code 'for name in dir(math)" it is happy as a list.

I must express how much I am learning and the forums help is invaluable.

One of the things I think would be helpful is if I would write out the steps to get to the output desired in written form, remove those steps that are not needed. Like a road map of the problem. Rather than just trying to do it off the top of my head. Something like; get name out of dir, test if nondunder, add names to list of commands, print list. It writes out the lines of code almost. Failure is a good teacher if you know how and why you failed.

I have set a goal of one year to get a good handle on learning Python, its been twenty day. I don’t expect to be an expert or get a job as a programmer, this is for the joy of programming, ghost from before the stroke.

cameron · August 19, 2022, 11:51pm

By Leonard Dye via Discussions on Python.org at 19Aug2022 23:27:

I will be looking at all three, regex, startswith and endswith. I went
back and worked through why the str part was the problem, yet the way I
did it required it to be a str rather than a list. By the code 'for
name in dir(math)" it is happy as a list.

Aye. But you took a list (dir(math)), then turned into one big string,
and that then forced you down the path of re.findall to processs that
string.

I must express how much I am learning and the forums help is invaluable.

That’s what they’re for!

One of the things I think would be helpful is if I would write out the steps to get to the output desired in written form, remove those steps that are not needed. Like a road map of the problem.

Yep. It is a good suggestion, particularly for a problem you’re still
working a solution for. How would one intuitively approach this as a
human, write that out (informally), clean it up (your “remove those
steps that are not needed”). Then formalise what’s left into code.

Rather than just trying to do it off the top of my head. Something
like; get name out of dir, test if nondunder, add names to list of commands, print list. It writes out the lines of code almost. Failure is a good teacher if you know how and why you failed.

Indeed.

I have set a goal of one year to get a good handle on learning Python,
its been twenty day. I don’t expect to be an expert or get a job as a
programmer, this is for the joy of programming, ghost from before the
stroke.

Aye. I cannot imagine how hard that must be, and how galling to have to
regain a skill you were once adept with.

Your questions are all welcome here.

Cheers,
Cameron Simpson cs@cskk.id.au

woodturner550 · August 20, 2022, 4:24am

It works for math module only. Just a start headed the right way I think.

import fnmatch
import math
MODULE_NAME = math

commands_list = []
for name in dir(MODULE_NAME):
    if fnmatch.fnmatch(name, '__*__'):
        print('')
    else:
        commands_list.append(name)
print('Commands', commands_list)

Now I have to figure what I need to do to remove all but the common commands, not just dunders.
May I put multiple if statements in a row, "if fnmatch.fnmatch(name, ‘*’), if fnmatch…(name, ‘upper case’);

Thanks,

cameron · August 20, 2022, 9:52am

By Leonard Dye via Discussions on Python.org at 20Aug2022 04:34:

It works for math module only. Just a start headed the right way I
think.
import fnmatch
import math
MODULE_NAME = math

commands_list = []
for name in dir(MODULE_NAME):
   if fnmatch.fnmatch(name, '__*__'):
       print('')
   else:
       commands_list.append(name)
print('Commands', commands_list)
Now I have to figure what I need to do to remove all but the common commands, not just dunders.
May I put multiple if statements in a row, "if fnmatch.fnmatch(name,
‘*’), if fnmatch…(name, ‘upper case’);

Well, you can put any number of if-statements inside the loop:

for name in dir(MODULE_NAME):
    if fnmatch.fnmatch(name, '__*__'):
        print('')
    if fnmatch.fnmatch(nname, '[a-z]*'):
        print('alphabetic')

Of course, both might match depending on the conditions. Exclusive
choices like your if-else can be extended:

for name in dir(MODULE_NAME):
    if fnmatch.fnmatch(name, '__*__'):
        print('')
    elif fnmatch.fnmatch(nname, '[a-z]*'):
        print('alphabetic')
    else:
        commands_list.append(name)

elif means “else if”, but doesn’t need additional indentation.

And of course the condition after the if can have many parts.

fnmatch does filename globs like *.txt, but is not expressive enough
for “all the letters are upper case”. As we know that names are
identifiers, you could infer uppercaseness by elimating the other
allowed characters:

if not fnmatch(name, '*[a-z0-9_]*'):
    # no underscores, lowercase or digits
    print("must be upper case then")

Cheers,
Cameron Simpson cs@cskk.id.au

woodturner550 · August 20, 2022, 8:51pm

Hope I’m understanding what I’m seeing. This is a printout of module re. Are the all capitalized letters constants within the module? And are ‘Match’, ‘Pattern’ and ‘Scanner’ classes? Are the classes(if classes) used in normal programming. Should they be included or not. I don’t know because I’m to new to python. I think, we need them because we can use the classes as well as commands.
Print out:

if matched _pickle
if matched _special_chars_map
if matched _subx
Wanted commands and classes:  ['A', 'ASCII', 'DEBUG', 'DOTALL', 'I', 'IGNORECASE', 'L', 'LOCALE', 'M', 'MULTILINE', 'Match', 'Pattern', 'RegexFlag', 'S', 'Scanner', 'T', 'TEMPLATE', 'U', 'UNICODE', 'VERBOSE', 'X', 'compile', 'copyreg', 'enum', 'error', 'escape', 'findall', 'finditer', 'fullmatch', 'functools', 'match', 'purge', 'search', 'split', 'sre_compile', 'sre_parse', 'sub', 'subn', 'template']

This is the script so far.

import fnmatch
import copy
import math
import array
import abc
import pprint
import re

MODULE_NAME = re

commands_list = []
for name in dir(MODULE_NAME):
    if name.startswith('_'):
        print('if matched', name)
    else:
        commands_list.append(name)
print('Wanted commands and classes: ', commands_list)

Works well, don’t know about the above mentioned items, they may need to be handled also.

cameron · August 20, 2022, 10:10pm

By Leonard Dye via Discussions on Python.org at 20Aug2022 21:01:

Hope I’m understanding what I’m seeing. This is a printout of module
re.

Are the all capitalized letters constants within the module?

That is the naming convention most commonly used in Python. So you can
assume “yes”, particularly for code like yours which has no special
knowledge about the module.

(Aside: Python doesn’t have “constants” as such, but defaults and other
tuning values which would be constants in other languages and are used
that way in Python are named and used like constants here too.)

And are ‘Match’, ‘Pattern’ and ‘Scanner’ classes?

Again, using the common conventions: yes.

Note: there’s an inspect module which lets you further classify things
by directly inspecting them, example:

>>> class C:
...   pass
...
>>> from inspect import isclass
>>> isclass(C)
True
>>> obj = C()
>>> isclass(obj)
False

The code inside the Python implementation (including the stdlib, so
the “re” module et al) generally follows PEP8:

which defines the style to be used in the Python implementation
itself.

Most other Python code also follows this style or something quite
similar to it.

These naming conventions are particularly common:

UPPERCASE for “constants”
CamelCase for class names
snake_case for variables, functions, methods
_leading_underscore for “private names” - names which are nor part of
the “public” api for the code, and represent internal details which
might be changed at any time
dunder names for class attributes and methods tied to operations,
such as __add__ to implement what the + addition operator does,
etc

Are the classes(if classes) used in normal programming.

For the re module, not by name so much. Mostly people use re.compile
to make a regular expression object and work from that. (In fact, the
re module autocaches regexp strings, so many people do not even bother
with re.compile and instead reach directly for top level functions
like re.findall as you did.)

The names are just there:

>>> import re
>>> regexp = re.compile(r'foo')
>>> regexp
re.compile('foo')
>>> type(regexp)
<class 're.Pattern'>

You don’t normally see code which uses the Pattern class itself to
create a regular expression object. The docs do not even mention it:

As I recall, these names use to be private in earlier Python releases.

However for most modules, names like these would be public documented
things, and people would expect to use them.

Should they be included or not.

Yes, they should. The re module is a bit unusual.

I don’t know because I’m to new to python. I think, we need them
because we can use the classes as well as commands.

Generally, that is true.

Cheers,
Cameron Simpson cs@cskk.id.au

How do I remove __XXX__ from dir printing

How do I remove XXX from dir printing