How do I remove __XXX__ from dir printing

I would like to remove the file names( XXX) in the print out. Yea I’m new to Python 3. Thanks.

import math
content = dir(math)
print (content)

output:
[‘doc’, ‘file’, ‘name’, ‘acos’, ‘asin’, ‘atan’,
‘atan2’, ‘ceil’, ‘cos’, ‘cosh’, ‘degrees’, ‘e’, ‘exp’,
‘fabs’, ‘floor’, ‘fmod’, ‘frexp’, ‘hypot’, ‘ldexp’, ‘log’,
‘log10’, ‘modf’, ‘pi’, ‘pow’, ‘radians’, ‘sin’, ‘sinh’,
‘sqrt’, ‘tan’, ‘tanh’]

By Leonard Dye via Discussions on Python.org at 30Jul2022 23:56:

I would like to remove the file names( XXX) in the print out. Yea
I’m new to Python 3. Thanks.

This sounds like an exercise. I would suggest you use a list
comprehension:
https://docs.python.org/3/glossary.html#term-list-comprehension

So dir(math) returns a list of names from the math module. You want
that list without the __*__ names. So write a list comprehension which
only results in names which do not start with '__' and end with
'__'. You might test that with some string methods:
https://docs.python.org/3/library/stdtypes.html#str.startswith

Cheers,
Cameron Simpson cs@cskk.id.au

So I need to make a list of math.dir and the remove the offending items (‘XXX’). print the results.

Thank you for your help!

woodturner550

Hi Leonard,

For future note, when typing code you can place code blocks with triple backticks “`”

def foo():
    ...

Or you can place code inline with a single back “`”

code in line inline_variable_name.

This is particularly useful because as your post shows the double underscore you are using without the backticks is being interpretted as markdown

(‘XXX’) - no backticks
('__XXX__') - with single backtick surround

This is as far as I can go till I get a book on regex codes.

‘’’
import re

pattern = ‘.*.’ # needed a wild card
text = ‘Does xxalkjsjlkjxx fhjgfjf in this text match the pattern?’

match = re.search(pattern, text)

s = match.start()
e = match.end()

print(‘Found “{}”\nin “{}”\nfrom {} to {} (“{}”)’.format(
match.re.pattern, match.string, s, e, text[s:e]))
‘’’
It is missing ‘.*.

By Leonard Dye via Discussions on Python.org at 01Aug2022 01:06:

This is as far as I can go till I get a book on regex codes.

You may notice that I did not suggest regular expressions. They are, as
is often the case, massive overkill for the problem you are trying to
solve.

They have their uses and place, but I would not address your requirement
with regular expressions.

Cheers,
Cameron Simpson cs@cskk.id.au

You don’t need the nuclear-powered bulldozer of regexes to crack this peanut.

def is_dunder(s):
    """Return True if string s is a dunder, otherwise False."""
    return len(s) > 2 and s.startswith('__') and s.endswith('__')

shorter_dir = [s for s in dir() if not is_dunder(s)]

The reason for the check that len(s) > 2 is so that the double underscore string '__' does not register as a dunder. If you think it should, you can remove that check.

Here is an alternative to the startswith and endswith methods:

def is_dunder(s):
    """Return True if string s is a dunder, otherwise False."""
    return len(s) > 2 and s[:2] == s[-2:] == '__'

I hear you loud and clear.It’s not that it won’t work that way but it is the hard way, more complicated.

Thanks

woodturner550

It would still recognize triple and quadruple underscore as a dunder.
The condition should rather be len(s) > 4.


I do not understand the relatively strong opposition to regexes. I think in this case the regex solution would be more readable and also probably faster than the three function/method calls.

Regexes are extremely useful for text processing and they are being used also by non-programmers. Here is the regex condition corresponding to the variant with len(s) > 4:

re.fullmatch(r'__.+__', s)

Note: Here it is not needed to make the string raw but it is a good habit to use raw strings always for regexes.


I think this site contains very good introduction to regexes:

The Python documentation is very useful too:
https://docs.python.org/3/howto/regex.html
https://docs.python.org/3/library/re.html

The list comprehension solution, which Cameron first suggested, can be made a one-liner that still reads like English. I think it is hard to beat that in elegance and readability.

I don’t have anything against regular expressions. Though I do find that I need to look up the exact formatting each time, because I mix the different variations in different languages and Linux command line programs. If the problem was difficult to solve without regex, there is no reason to shy from regex.

Good point. It depends on how we define dunders.

If a dunder is any string with double underscores prepended and appended, then '__' + '' + '__' or quad underscores should be the shortest dunder string, and the minimum length should be 4.

If a dunder is any string that begins and ends with two underscores, not necessarily distinct, then '__' and '___' could be considered dunders, and the minimum length is 2.

If a dunder is any non-empty string with two leading and trailing underscores, then the minimum length should be 5.

Normally these cases don’t come up, because we’re talking about dunder methods or at least identifiers, and the identifier part is always a non-empty name like “add” or “init”. But the ipython environment defines special global variables __, ___ etc and it is not clear if they should be filtered out or not.

Possibly some of that concern about regexes is that they are frequently over-used, especially in the Perl and Javascript communities. Consequently, many people are very defensive – perhaps too defensive – when they see beginners reading for regexes when there are faster, better, easier solutions.

Or in the words of Jamie Zawinski: “Some people, when confronted with a problem, think ‘I know, I’ll use regular expressions.’ Now they have two problems.”

Even expert coders can easily mess up regexes, for instance when a rogue regex accidently cut off access to vast areas of the web to all of Russia.

Some of the issues with regexes include:

  • extremely terse, cryptic syntax;
  • for anything but trivial problems, regexes are hard to write, hard to read, hard to debug, and hard to maintain;
  • consequently, except for the simplest and most trivial problems, regexes often contain bugs and misbehaving corner cases.

(There are others, such as the risk of exponentially poor backtracking performance, and bugs in the regex engine itself.)

No community is more pro-regexes than Perl, and this is what Perl’s creator, Larry Wall, had to say about regexes when they were evolving the language into Perl 6:

“… unfortunately, there’s a lot of regex culture that needs breaking.”

It is worth ready Larry Wall’s essay there, it makes very good points about the problems with regex syntax as it exists today.

Regexes are an independent programming language, (almost?) as powerful as Python. For example, here is a regular expression that will tell you whether a number n is a prime number or not.

not re.match(r'^1?$|^(11+?)\1+$', '1'*n)

If you are tempted to use it, don’t: it is very inefficient. Short and cryptic but inefficient.

I think that Python regexes are not quite powerful enough to count as a full Turing-complete programming language, but other languages (especially Perl) have even more powerful regexes which are genuinely Turing-complete.

You may have seen the monster regex that validates that email addresses are fully compliant with the standard. No, it is not a joke.

You suggested this regex:

re.fullmatch(r'__.+__', s)

That looks about as short and simple as a regex can be. Let’s see how it performs on my computer. (Your computer may perform differently.)

Here is your regex solution:

[steve ~]$ python3.10 -m timeit -s "import re" "re.fullmatch(r'__.+__', '__abcde__')"
500000 loops, best of 5: 648 nsec per loop

[steve ~]$ python3.10 -m timeit -s "import re" "re.fullmatch(r'__.+__', '__abcde__')"
500000 loops, best of 5: 652 nsec per loop

Here is the solution using startswith and endswith, in a helper function:

[steve ~]$ python3.10 -m timeit -s "def is_dunder(s): return len(s) > 4 and s.startswith('__') and s.endswith('__')" "is_dunder('__abcde__')"
1000000 loops, best of 5: 272 nsec per loop

[steve ~]$ python3.10 -m timeit -s "def is_dunder(s): return len(s) > 4 and s.startswith('__') and s.endswith('__')" "is_dunder('__abcde__')"
1000000 loops, best of 5: 276 nsec per loop

The regex solution takes more than double the time.

Here is the solution using slicing, with no length check.

[steve ~]$ python3.10 -m timeit -s "def is_dunder(s): return s[:2] == s[-2:] == '__'" "is_dunder('__abcde__')"
1000000 loops, best of 5: 233 nsec per loop

[steve ~]$ python3.10 -m timeit -s "def is_dunder(s): return s[:2] == s[-2:] == '__'" "is_dunder('__abcde__')"
1000000 loops, best of 5: 232 nsec per loop
1 Like

Thank you for good points.

That is a good one :slight_smile: One of few use-case for the unary numeral system.

The e-mail address monster regex is certainly an extreme example of what regexes should never be used for. It was generated by a program and it is certainly unreadable and unmanageable by a human.

I do not use complex regexes. When they would need more than one level of nested groups I either use a verbose regex with comments or I compose the regex from simpler ones using an fr string.

I must admit I expected a bigger disadvantage of the function call overhead… To be more fair to regexes we should account with the fact that the compiled regex is normally cached. I think this measurement would be more fair:

$ python3.10 -m timeit -s "import re; regex = re.compile(r'__.+__')" "regex.fullmatch('__abcde__')"
1000000 loops, best of 5: 242 nsec per loop
$ python3.10 -m timeit -s "import re; regex = re.compile(r'__.+__')" "regex.fullmatch('__abcde__')"
1000000 loops, best of 5: 240 nsec per loop
$ python3.10 -m timeit -s "def is_dunder(s): return len(s) > 4 and s.startswith('__') and s.endswith('__')" "is_dunder('__abcde__')"
1000000 loops, best of 5: 230 nsec per loop
$ python3.10 -m timeit -s "def is_dunder(s): return len(s) > 4 and s.startswith('__') and s.endswith('__')" "is_dunder('__abcde__')"
1000000 loops, best of 5: 223 nsec per loop

For me in this case the regex solution is more robust and straightforward because the other solutions contain some information duplication (the prefix and postfix length). If we remove the duplication they would look like:

prefix = '__'
postfix = '__'

len(s) > len(prefix) + len(postfix) and s.startswith(prefix) and s.endswith(postfix)

A little history, I’m a stroke survior age 71 year old and just learning programing again. Lost it in the stroke.
A big THANKS for your time and help. I am learning.
This is not an exersize from a book, just I thought the list of directly usable by normal programing would be good. :slight_smile:

woodturner550

3 Likes

By Leonard Dye via Discussions on Python.org at 01Aug2022 16:40:

A little history, I’m a stroke survior age 71 year old and just
learning programing again. Lost it in the stroke.
A big THANKS for your time and help. I am learning.
This is not an exersize from a book, just I thought the list of
directly usable by normal programing would be good. :slight_smile:

For some added context, you could consider the default criteria for
import *: all public names i.e. all names not commencing with an
underscore. That excludes the dunder names, and also excludes “private”
names, which exist for use internally by classes etc.

Python itself does not care about the leading underscore, but it is a
near universal convention that “public” names, for use outside the
module/class commence with letters and “private” names commence with an
underscore. This provides a rule of thumb indicates what you might
expect to be stable and what you should avoid.

It is also a much simpler test :slight_smile:

Cheers,
Cameron Simpson cs@cskk.id.au

Well, I’ve been working on the string way to get rid of the dunders in a list of commands in modules.

import math

result = dir(math) # Makes math dir in result
#print(result) # copy of list containing the math dir

def is_dunder(s):
print(s)
“”“Return True if string s is a dunder, otherwise False.”“”
return len(s) > 2 and s.startswith(‘‘) and s.endswith(’’)
shorter_dir = [s for s in dir() if not is_dunder(s)]

print(shorter_dir) # final output

is_dunder(str(result)) # call function does not work as expected

Output is:

annotations

builtins

cached

doc

file

loader

name

package

spec

is_dunder

math

result

[‘is_dunder’, ‘math’, ‘result’] # this looks good but WRONG info

[‘doc’, ‘loader’, ‘name’, ‘package’, ‘spec’, ‘acos’, ‘acosh’, ‘asin’, ‘asinh’, ‘atan’, ‘atan2’, ‘atanh’, ‘ceil’, ‘copysign’, ‘cos’, ‘cosh’, ‘degrees’, ‘e’, ‘erf’, ‘erfc’, ‘exp’, ‘expm1’, ‘fabs’, ‘factorial’, ‘floor’, ‘fmod’, ‘frexp’, ‘fsum’, ‘gamma’, ‘gcd’, ‘hypot’, ‘inf’, ‘isclose’, ‘isfinite’, ‘isinf’, ‘isnan’, ‘ldexp’, ‘lgamma’, ‘log’, ‘log10’, ‘log1p’, ‘log2’, ‘modf’, ‘nan’, ‘pi’, ‘pow’, ‘radians’, ‘remainder’, ‘sin’, ‘sinh’, ‘sqrt’, ‘tan’, ‘tanh’, ‘tau’, ‘trunc’]

I’m close but not there yet :frowning: Any help, I don’t like giving up. this ishow I have to learn. By doing. THANKS.
woodturner550

In your post please enclose the Python code and the output between triple backticks like this:

```
# Your Python code will be here.
```
```
Your output will be here.
```

You probably noticed that the text shows formatted here (using Markdown). Unfortunately this mangles Python code to an invalid code. Please check that we will be able to copy the code and run it the same way you run it.

Also it is not clear what output you expect. Please let us know what you want to get as the output.

I was just trying to get a list minus the dunders. (dunders and non dunder) in and (non dunders) out.
Thanks for the help. I will get this yet.
woodturner550

I hope this is right!

import math

result = dir(math) # Makes math dir in result
#print(result) # copy of list containing the math dir

def is_dunder(s):
print(s)
“”“Return True if string s is a dunder, otherwise False.”“”
return len(s) > 2 and s.startswith(‘‘) and s.endswith(’’)
shorter_dir = [s for s in dir() if not is_dunder(s)]

print(shorter_dir) # final output

is_dunder(str(result)) # call function does not work as expected


Output is:

annotations

builtins

cached

doc

file

loader

name

package

spec

is_dunder

math

result

[‘is_dunder’, ‘math’, ‘result’]

[‘doc’, ‘loader’, ‘name’, ‘package’, ‘spec’, ‘acos’, ‘acosh’, ‘asin’, ‘asinh’, ‘atan’, ‘atan2’, ‘atanh’, ‘ceil’, ‘copysign’, ‘cos’, ‘cosh’, ‘degrees’, ‘e’, ‘erf’, ‘erfc’, ‘exp’, ‘expm1’, ‘fabs’, ‘factorial’, ‘floor’, ‘fmod’, ‘frexp’, ‘fsum’, ‘gamma’, ‘gcd’, ‘hypot’, ‘inf’, ‘isclose’, ‘isfinite’, ‘isinf’, ‘isnan’, ‘ldexp’, ‘lgamma’, ‘log’, ‘log10’, ‘log1p’, ‘log2’, ‘modf’, ‘nan’, ‘pi’, ‘pow’, ‘radians’, ‘remainder’, ‘sin’, ‘sinh’, ‘sqrt’, ‘tan’, ‘tanh’, ‘tau’, ‘trunc’]

By Leonard Dye via Discussions on Python.org at 03Aug2022 23:16:

I hope this is right!

Something has eaten your indents anyway. Maybe something to do with
copy/paste?

The code also want to be in backticks “`” (on my Mac, under the tilde
“~” at top left; note: not open-single-quote). You seem to have used
triple dots (“.”) instead of backticks. Example:

```
your code
goes here
```

I’ll reindent as I go below:

def is_dunder(s):
    print(s)
    """Return True if string s is a dunder, otherwise False."""
    return len(s) > **2** and s.startswith('__') and s.endswith('__')

You may want to comment out the print(s) above when happy.

Looks ok, though I would use len(s)>4 myself. On the premise that __
is not a dunder, but __x__ is. Adjust according to your own
intentions.

shorter_dir = [s for s in dir() if not is_dunder(s)]

This is also correct.

print(shorter_dir) # final output

The output looked good to me:

['is_dunder', 'math', 'result']

Then you have this:

is_dunder(str(result)) # call function does not work as expected

It isn’t clear what you expect. First, str(result) is a string
containing the textual representation of the entire list from result.
That commences with [ and ends with ], so is_dunder() will return
False.

Note that the interactive Python prompt >>> prints expression results
for you (if the expression result is not None). So interactively:

>>> is_dunder(str(result))
False

but in a programme it is just an expression. Which gets evaluated and
then discarded (because it isn’t assigned to anything).

Cheers,
Cameron Simpson cs@cskk.id.au

A lot of thanks for your help. How this all started, I was working interactively with modules and their sub modules. Got to wondering if there wasn’t a easy way to to remove the dunder in the lists. I still remember that in Perl we use to do things like that (just a shadow of what was since the stroke). That will be for another day after I learn a LOT more. I have memory problems so it takes me a while to grasp things.
woodturner550