Regex problem with "_"

I’m working on a personal assignment in my studies. Working with regex I can handle everything except the entry “get_cache_token” The problem is that the data I don’t want has _ in the beginning and the end, I don’t want. How ever the example is what I also need besides the ones that are just lower case. This is what I have so far.

(\b[a-z])([^_A-Z0-9]+),  keep all lowercase at beginning of word, discard all _ A-Z and 0-9 as many times as needed.
VALID:
loss
get_cache_token
INVALID:
_junk
__open
Patience
__main__

It’s all about the second and third _ in the VALID: data. Looked at groups but could not get it that way either.

Thanks.

I found out one way to do this. Not sure if it is the best.

(\b[a-z][^_A-Z0-9]+\B_[a-z]+\B_[a-z]+)

The \B = Returns match of characters, NOT in beginning of word
The + = one or more times.

By Leonard Dye via Discussions on Python.org at 22Aug2022 22:20:

I found out one way to do this. Not sure if it is the best.

(\b[a-z][^_A-Z0-9]+\B_[a-z]+\B_[a-z]+)

The \B = Returns match of characters, NOT in beginning of word
The + = one or more times.

This is overly complex. Do I understand that you want names not
commencing with an underscore and not ending with an underscore? How
about:

\b[a-z][_a-z0-9]*[a-z0-9]\b

which requires a leading lowercase letter, as many intervening
underscores or lowercase letters or digits as needed, and a trailing
lowercase letter or digit.

The thing to remember with regexps is that components are (a) greedy by
default but (b) only match as much as possible while still allowing the
following components to also match
.

So a word like “get_cache_token” would match:

[a-z]       g
[_a-z0-9]*  et_cache_toke
[a-z0-9]    n

The middle bit will try to match “et_cache_token”, but the matcher will
then discover that the last [a-z0-9] cannot match, so it will
backtrack the middle to match “et_cache_toke” and then the “n” can
match.

So by constraining the start and end of the word to your required
characters, and constraining the middle to the “middle” allowed
characters, you get what you want.

Cheers,
Cameron Simpson cs@cskk.id.au


In your case those in blue are not a match as you will see in the lower left hand side. In mu case all the Valid test items are yellow and none of the invalid test items have color( you can’t see that in this shot) Notice the difference in the lower left panels.

Thanks again.