Add a module named "suggestion"

Now python use a module named _suggestion to give the suggestions for exception when error occurred if possible in traceback.py. I think that this module can be public: this module was wrote in c language so that it is faster than difflib, and programmer can use the module to give the suggestion to user faster and more exactly.

The required arguments I think that needed are:

  • wrong_name: the name that is wrong. Must be str.
  • possible_list: the names which are possible. Must be list[str].
  • max_list_length: the max length of possible_list. If the length of possible_list is beyond of the max_list_length it return None. Must be int and default 750.
  • max_string_length: the max length of the wrong_name. If the length of wrong_name is beyond of the max_string_length it return None. Must be int and default 40.

If there is a possible name found in possible_list, the function will return the possible name. Otherwise it returns None.

The module suggestion is a python script. First it tried to import the module _suggestion. If it failed it use python implemention (just like traceback.py).

Very application specific. E.g. if there is a candidate that is exactly the same, then it is skipped:

_generate_suggestions(['aaa', 'aab'], 'aaa')
'aab'

Also, I think this is subject to improvement.
And improvement in this case means different output.
Thus, not having a contract with the user does seem like a good idea.


Talking about the “goodness” of result for fuzzy suggestion:

_generate_suggestions(['aab', 'aacd'], 'aac')
'aab'

difflib.get_close_matches('aac', ["aab", "aacd"])[0]
'aacd'

I knew about this for a while now and although it is very fast (for short strings) I have never used it - Levenstein distance rarely gives me what I want in fuzzy applications.

I would recommend difflib if performance is not a big issue - its algorithm is much more human input friendly.

@Locked-chess-official
Having that said, I appreciate that there might be some scope in levelling up string alignment stuff in stdlib. This has also been briefly mentioned in `seqtools` (`alglib` / `seqlib`) as a desirable place for better / new algorithms.

What is your motivation for this?
Is it that you are not happy with difflib’s results?
Is performance a bottleneck for you?
What is your application?