Took a step back from implementing new Matcher with Automaton and did some work to improve the current one.
For the worst case with autojunk=True (which this improvement is targeted at):
| difflib’s current | new algorithm | |
|---|---|---|
| size 10K | 10.6 s | 0.2 s |
| size 15K | 24.1 s | 0.4 s |
| size 20K | 60.5 s | 0.8 s |
| size 50K | ~ 13 m | 6.6 s |
| size 100K | ~ 1.5 h | 31 s |
- Memory increase is reasonable.
- Additional complexity is average.
As the default Matcher isn’t changing, maybe the improvement is worth it?
I think it might pay off in the long run - it would solve 99% of related issues.