Correct way to remove all file extensions from a path with pathlib?

If my file has multiple extensions, such as library.tar.gz, then .stem only removes the last one. How do I remove all of them? Is this how I’m expected to do it:

from pathlib import Path

filename = Path('file.tar.gz')

while filename.suffix:
    filename = filename.with_suffix('')

It just seems a bit verbose.

Hi aoeu,

You have two lines of code. If you think that’s “verbose”, you should
see the actual implementation of pathlib (1500+ lines) wink

If you are worried about having to repeat those two lines in multiple
places, you can write a helper function.

By the way, if you are working with arbitrary files of unknown file
types, you should be aware that many files use dots in the actual
filename part, often but not always in place of spaces. It might be
worth checking the suffix against a set of the suffixes you are
expecting:

while filename.suffix in {'.tar', '.gz', '.zip'}:
    filename = filename.with_suffix('')
1 Like

If you’re not restricted to remove only certain file extensions, like the code above by Steven, here’s another one-liner approach where all extensions will be removed with an extra overhead of string conversion:

justname = str(filename).rstrip(''.join(filename.suffixes))

This of course returns a string, if you still need a Path wrap it up with Path().

I’m actually asking if pathlib could get this helper method so I don’t have to define it myself.

You meant to do

justname = str(filename).removesuffix('.'.join(filename.suffixes))

strip and rstrip work on characters not on strings. 'aaaaaaaaaa'.rstrip('a') results in an empty string '', not one less 'a'.

Yes, removesuffix() would be perfect. Do bear in mind though removesuffix() is only supported on Python 3.9 and above. If you’re distributing the module, you may need to write an additional function resembling removesuffix() for maintaining backward compatibility.

It glossed over my mind rstrip() doesn’t behave like like removesuffix(). Here’s something you can use for backwards compatibility for Python < 3.9:

if sys.version_info < (3, 9):
    justname = str(filename)[:str(filename).rfind(''.join(filename.suffixes))]

Why not something like:

filename.name[: -len(''.join(filename.suffixes))]

This just strips the exact number of characters from the right as desired, no rstrip or removesuffix needed.

If filename.suffixes has multiple extensions eg ('.gz','.foo') it
will strip 7 characters from the end of the file, wrong behaviour
regardless of the actual extension.

Cheers,
Cameron Simpson cs@cskk.id.au

I must have misunderstood the question, then. Original poster wanted to remove all extensions from a file, not just the last one, i.e. from file.tar.gz to file. Whereas calling .stem only removes .gz in this case. So would you not want to strip the last 7 characters, so that file.tar.gz turns to file?

This is something Path doesn’t provide, so you will need to do string manipulation on the name. The str.partition() method works well for this case:

filename = Path('file.tar.gz')

base, first_dot, rest = filename.name.partition('.')
filename = filename.with_name(base)

But double-check that this actually is what you want to do (see Steven’s comment).
AFAIK, the reason Pathlib doesn’t include this operation is that people often think they need it, but it turns out it’s actually not what they need.