`seqtools` (`alglib` / `seqlib`)

These things can all usefully exist in packages, and I don’t see any real need for them to be in the stdlib.

It’s actually nicer for them to exist separately, because there are big performance enhancements that require e.g. specific CPU instructions. The development timeline of CPython takes a while to support such things[1], whereas a 3rd-party library can take advantage quickly.


  1. if it’s even worth the effort ↩︎

Much thanks.

The creation of the module with couple of generic sequence utilities such as itertools.islice equivalent would be the most sensible start.

The module should be the place for basic generic sequence methods (of complexity similar to components of itertools). In other words, sequence utilities that are very common sequence operations and are useful in large number of contexts.

With possible exception of algorithms that inevitably need to be in stdlib and making them sequence agnostic is worth it or sequence algorithms that are needed by some important stuff - e.g. graphlib case.


If it so happens that this is created you can try proposing various additions separately and see how it goes.

Maybe some flexible sequence alignment algorithm has a chance in the future - it has a fairly wide scope of applications and it could, for example, be useful for suggestive error messages in stdlib.

1 Like

Completely forgot. Sequence alignment already exists in stdlib - difflib.

2 Likes

Thanks! This must be difflib.SequenceMatcher which I didn’t know about (even though I have used difflib.unified_diff) . looks nice and useful.

Never looked at what it actually does, just used it on few occasions.

It does not provide LCS and is more human-friendly-diff thing - Gestalt pattern matching - Wikipedia.

Nevertheless, endeavours in this direction should take this into account.
In the long run, I think there is a possibility to upgrade it.

For example, incorporating different method while keeping backwards compatibility. Maybe even coding it in C and migrating it to seqtools.

However, there would need to be a good reason for it as this does not seem trivial work and would add complexity that someone would need to maintain.

In short, far out of scope of this thread…

Nevertheless, thank you for mentioning this.

I agree, but at the same time, certain algorithms that satisfy certain conditions can be sensible to be implement in standard library.

If something is already implemented to a certain degree, then upgrade or/and exposing it to the user could be worthwhile.

An algorithm, which has reached a near final theoretical version could be a good candidate if it both satisfies a certain need from users while it can also be exploited for standard library / Python core.

Just came across the fact that Levenstein Distance implementation already exists in standard library in import traceback. And also C implementation exists.

If such module as proposed is introduced, then I think localizing, upgrading and moving string alignment stuff for 2 sequences into seqtools could be a reasonable project in the future, where the cost of extra complexity added might be covered by various benefits.

So although I don’t think it is happening any time soon, your suggestion is quite spot on as to what could potentially end up in such module in the long run if it existed.

2 Likes

Thanks @dg-pb :slight_smile:

1 Like