Natural Language Processing

I don’t understand how to write a code for 2 steps of my task. I am thinking about it for 2 weeks, but I don’t have time anymore. Can you help me?

3.1. Get a ‘window’ from the list of stems for computing LR. The ‘window’ is simply a subset of your list of stems which starts at index i (inclusive) and ends with index i + n (exclusive).

3.2. Compute LR by dividing the number of unique stems in the window by the window size n and add it to your variable initialized in Step 2.

These are the whole steps. But i need 3.1 and 3.2

  1. Define window width n = 100 .

  2. Initialize a variable for summing LR values from all iterations of the loop (described in Step 3.2).

  3. Use for loop to loop through word indices i from 0 to len(stems) - n .
    (Note that we will use stems instead of words so that the LR value is less influenced by the inflectional morphology of the language which would make the lexicon seem richer than it actually is.)
    In each iteration of the loop do Steps 3.1 and 3.2:
    3.1. Get a ‘window’ from the list of stems for computing LR. The ‘window’ is simply a subset of your list of stems which starts at index i (inclusive) and ends with index i + n (exclusive).
    3.2. Compute LR by dividing the number of unique stems in the window by the window size n and add it to your variable initialized in Step 2.

  4. After the loop, compute mean of the LR to get Mean Lexical Richness (MLR). Now print it out.

Here are some stems (words?):

stems = ['cat', 'floor', 'verb', 'books' 'Inspector-General', 
         'caught', 'purple', 'hungry', 'table', 'artistic',
         'window', 'quickly', 'tiny', 'punch', 'swim', 'fox',
         'kettle', 'southerly', 'fill', 'wooden']

Here is how you can get the window of words starting at word five and
continuing to nine:

window = stems[5:10]
# gives ['caught', 'purple', 'hungry', 'table', 'artistic']

Remember:

  • the first word is at position 0;
  • the first index (5 in this example) is included;
  • the second index (10 in this example) is excluded;
  • the width of the window is 10-5 = 5.

Here is how you can count the number of unique words in the window:

count = len(set(window))

Good luck!

1 Like