How to sort filenames that have numbers in them but numbers are not zero-padded?

avi.gross · September 21, 2024, 9:14pm

This is a topic that never ends as people keep coming up with more.

I think it better if I let them and get out of the way.

@barry-scott presents yet another way to do what is requested and his code reminds me of how I might have done this ages ago in C but using pointers, or perhaps in PASCAL. It works and uses mainly low-level and inexpensive constructs. But, of course, he could give us a .pyc file with intermediate code or write it in C.

I tend to agree with @brass75 that humans can be best off writing code at higher levels of abstraction and not necessarily re-invent the wheel in detail every time. If time is critical at run-time, you can then see which few parts might best be tightened up or replaced and not instead decide to do it in assembler in the first place. Using regular expressions can be quite expensive, as I saw when I dug up the code yesterday, but it is a bit like a Swiss Army knife that can flexible be a tool to do many things, not just one simple thing. I use such functionality when I can.

As for @Paddy3118 we do not disagree. I said that if you assume a fixed format you can write a more focused and hopefully faster algorithm. I also was not sure you could depend on that as in the PDF example, as retrieving these file-names may include others, or a human can mess things up after.

I will point out a detail. The issue is about whether some operation is done a little or a lot. I see two potential steps here.

After getting a list of what to work with, filter through the list and select the ones to keep. This operation takes O(N) (If not seeing the equation, it says order of N) steps which grows linearly as N rises.
Now sort the remaining items using the sort key chosen. Which sorting algorithm you use matters here. Some may be O(N^{2}) (order of n-squared) which potentially means you call the key function lots of times. Other sort algorithms do better such as O(N \log{N}) (N times the logarithm of N) and so on.

The result is that a key function is in some sense more expensive than just checking each name to see if it fits the pattern. Memoization techniques might help as they would eventually store N results and get them as needed in constant time from a dictionary.

I await someone writing a version that does an entire sort algorithm with simple commands and does the key creation in-line as a parallel data structure …

A key question in these situations is better for what?