Include time and space complexity in documentation

Few things that might be useful to add:

  1. str.rfind doesn’t use two-way and is subject to O(n^2) worst case. This implements it, but uncertain when it is going to merged - not an issue in practice.
  2. difflib’s longest common substring implementation is worst case O(n^2). I think this is the only algorithm in stdlib from the family of sequence alignment (apart from private _suggestions Levenstein DP score). Thought of replacing it with O(n) so to have a proper LCSUB that users can rely on (this has been brought up several times), but decided to wait until there is a dedicated place for sequence algorithms. Current implementation is good enough for the difflib’s approach.
  3. list.sort is adaptive - O(n) best case. A lot of cases run faster than O(nlogn).
2 Likes

Thank you.

Regarding the word comprehensive. Perhaps a language barrier, but I understand that to mean that it covers a large portion of of the language and stdlib. Certainly more than anything else.

Correctness is a separate issue addressed in the Why Trust section. I wanted to be transparent in how it was done. If you don’t trust coding agents then you can calibrate your trust based on that. Over time, if people point out false information, accuracy will get better. And over time the agents and models will also get better, and I plan to do verification runs when new generations of those show up. Personally I would bet that it is at least 80% accurate, and maybe even up to 95%. But time will tell; I am prepared to change my mind.

I realized that when I asked the agents to review, I did not explicitly point all the agents to the corresponding ship branch. I remember some reading the main branch. But I don’t remember anything looking at pull requests. Definitely something to lock down and keep an eye on.

I have had to implement Damerau-Levenshtein distance 2-3 times in the last 10 years, so it would be nice to get a good implementation in the stdlib.

I’ve struggled a bit with wording here and there. It should be concise, but not misleading one direction or another.

I am currently planning to work on this again on Thursday (a bit busy with other stuff at the moment). I’ll try to address all the feedback items then.

18 posts were split to a new topic: Algorithmic complexity of difflib

Should the difflib discussion be moved to a new thread?

5 Likes

Done to Algorithmic complexity of difflib .

4 Likes