List of words starting by letter from a sentence (string)

Hello,

I am trying to create a list of all the words strating by letter “m” in my string but I don’t manage to have the result as a list (only non define type or multiple lists). I want to use len.list with the result but I can’t with my results.

< wikipedia_text = “The Liplje Monastery is a Serbian Orthodox monastery dedicated to the Annunciation and located in the Municipality of Teslić in northern Republika Srpska. It stands at the widest part of a narrow gorge through which a little river named Bistrica flows. The earliest mention of the monastery is found in a chronicle dated to the second half of the 15th century. The monks of Liplje were active in transcribing religious books during the 17th century.” >

1st one =

Input =
def word_letter(x):
list =
for word in x.split(’ ') :
if word.startswith(“m”) or word.startswith(“M”):
list.append(word)
print(list)

word_m = word_letter(wikipedia_text)
word_m

Output :

[‘Monastery’]
[‘Monastery’, ‘monastery’]
[‘Monastery’, ‘monastery’, ‘Municipality’]
[‘Monastery’, ‘monastery’, ‘Municipality’, ‘mention’]
[‘Monastery’, ‘monastery’, ‘Municipality’, ‘mention’, ‘monastery’]
[‘Monastery’, ‘monastery’, ‘Municipality’, ‘mention’, ‘monastery’, ‘monks’]

2nd one :

Input =

def word_letter(x):

for word in x.split(’ ') :
if word.startswith(“m”) or word.startswith(“M”):
print(word)

word_m = word_letter(wikipedia_text)
word_m

Output =

Monastery
monastery
Municipality
mention
monastery
monks

Could anyone of you help me with that ?
Thank you very much !

maybe,

def word_letter(x):
  lst = []
  for word in x.split(' ') :
    if word.startswith('m') or word.startswith('M'):
      lst.append(word)
  return lst
word_m = word_letter(wikipedia_text)
word_m

could use any to avoid repeating word.startswith

def word_letter(x):
  lst = []
  for word in x.split(' ') :
    if any(word.startswith(j) for j in ('m', 'M')):
      lst.append(word)
  return lst
word_m = word_letter(wikipedia_text)
word_m

we could also use a list comprehension here,

def word_letter(x):
  return [i for i in x.split(' ') if any(i.startswith(j) for j in ('m', 'M'))]
word_m = word_letter(wikipedia_text)
word_m

if the name of function, word_m is not relevant,

(lambda x: [i for i in x.split(' ') if any(i.startswith(j) for j in ('m', 'M'))])(wikipedia_text)

could rename,

(lambda paragraph: [word for word in paragraph.split(' ') if any(word.startswith(character) for character in ('m', 'M'))])(wikipedia_text)

there might be a way to do this without using for loops, with the help of slicing.

found one way using numpy, but it is a bit repetitive

import numpy as np
x = np.array(wikipedia_text.split(' '))
list(x[np.char.startswith(x, ('M'))]) + list(x[np.char.startswith(x, ('m'))])

we could use map, reduce to reduce repetition, but the number of characters of code is higher

from functools import reduce
import operator as op
list(reduce(op.add, map(lambda y: list(x[np.char.startswith(x, y)]), ('m', 'M'))))

one variation is,

op.add(*map(lambda y: list(x[np.char.startswith(x, y)]), ('m', 'M')))

.startswith can accept a tuple, so you can say word.startswith(('m', 'M')).

2 Likes

Please wrap any code in triple backticks to retain its formatting:

```
for i in range(3):
   print('Hello')
```

Another option could be using lower (or upper):

word.lower().startswith('m')

Which in list comprehension could be reasonable readable:

[word for word in wikipedia_text.split() if word.lower().startswith('m')]

we could use compress from itertools

from itertools import compress
list(compress((paragraph := wikipedia_text.split()), 
              [i.startswith(('m', 'M')) for i in paragraph]))

one issue is that boolean indexing is currently not valid syntax, otherwise we could do,

x = [1, 2, 3]
x[x > 1]

[2, 3]

maybe this could also work, if boolean indexing is added to the language in the future

x = wikipedia_text.split()
x[x.startswith(('m', 'M'))]

['Monastery', 'monastery', 'Municipality', 'mention', 'monastery', 'monks']

directly on a python list

What a weird idea… Why would you like to overload comparison operators such an ambiguous way? If x > 1 should produce something meaningful, what x == 1 should do? Would you like to break the current functionality?

This way is much more clear:

>>> x = [1, 2, 3]
>>> [item for item in x if item > 1]
[2, 3]

…and boolean indexing works fine :slight_smile:

>>> x[False]
1
>>> x[True]
2

False works as 0, True as 1.

I mean something like this,

class AlterGTList(list):
    def __gt__(self, a):
        match a:
            case int():
                return [i > a for i in self]
            case _:
                return super().__gt__(a)

class AlterSliceList(list):
    def __getitem__(self, x):
        match x:
            case list():
                return [key for key, value in dict(zip(self, x)).items() if value]
            case _:
                return super().__getitem__(x)
x = AlterGTList([1, 2, 3])
y = AlterSliceList([3, 4, 5])
y[x > 2]

[5]

one problem is that I would want list of bool after case, but am not sure how to do that,

case list[bool]:

ListBool = list[bool]
case ListBool():

both dont work

Ok, I understand that you want comparison of a list with a value to result in a list of bools. My point was: What would x == 2 (in your example) produce? It would either be inconsistent with x > 2 or it will break the current functionality.


You are creating a redundant dict in your AlterSliceList.__getitem__() and using very confusing variable naming. The straightforward list() branch would be:

return [value for value, include in zip(self, x) if include]
1 Like

by break the current functionality do you mean that something like,

[2, 2, 2] == 2

would give True if we have a similar __eq__ method, and in the current case it gives False?

The current behaviour:

>>> [1, 2, 3] == 2
False

For your proposal to be consistent and make sense it must change to this:

>>> [1, 2, 3] == 2
[False, True, False]

…and what about this? Should we get the result we all are used to?

>>> [1, 2, 3] == [2, 2, 3]
False

Or should we get this one?

>>> [1, 2, 3] == [2, 2, 3]
[False, True, True]
>>> # To get the original behaviour you would need to write:
>>> all([1, 2, 3] == [2, 2, 3])
False

…and more problems will probably show up.