[shutil] Sorting multi-line text files

I think that sorting text files is just as useful as copying and archiving already existing in shutil. Moreover, this functionality can be enriched by external sorting, which is necessary for very large files.

shutil.sort(src, dst, headers=0, delimiter=',', key=None, reverse=False, allow_disk_use=False)

I took all the arguments except headers from different Python modules and functions.

  • src, dst: shutil
  • delimiter: csv
  • key, reverse: sort and sorted
  • allow_disk_use: pymongo (it’s spelled there as allowDiskUse)

Perfectionist remark :slightly_smiling_face:.
It would be ideal to have archive support as an input and output.

Discussion on this subject was recently held in a bug tracker, and an early prototype was formed there. Issue was closed due to the negative opinion of two developers. I would like to initiate a conversation that will involve more participants in order to get a more thoughtful solution.

I agree with the comments made in the issue. There’s no immediate evidence that people would use this functionality. As something like this could be developed as (all or part of) a package on PyPI, doing so would be a good way to collect evidence that people need something like this - and then the implementation could be migrated to the stdlib.

Personally, I’ve never needed to sort sets of data that won’t fit in memory - and if I ever did, I’d probably be doing something specialised enough that I’d be looking at external libraries anyway. So the built in list.sort() is fine for my needs.

1 Like

An indirect proof of the popularity of this idea is the existence of GNU Coreutils sort.

That’s an application, not a library, and it’s not at all clear how often it’s used for tasks where an in-memory sort wouldn’t be sufficient.

1 Like