Path.suffix is to Path.stem as Path.suffixes is to …?
>>> from pathlib import Path
>>> p = Path("CurveFile.vs2022.vcxproj")
>>> p.suffix
'.vcxproj'
>>> p.stem
'CurveFile.vs2022'
>>> p.suffixes
['.vs2022', '.vcxproj']
While you can programmatically get each and every suffix, currently, there is no way to extract first part ('CurveFile') without doing string manipulation.
Suggestion: Path.stems would return a list of the stems, to which suffixes could be appended which would reconstruct the original Path.name component. eg)
>>> p.stems
['CurveFile', 'CurveFile.vs2022']
>>> for idx, stem in enumerate(p.stems):
... print(stem + "".join(p.suffixes[idx:]))
...
CurveFile.vs2022.vcxproj
CurveFile.vs2022.vcxproj
Alternately, just a Path.initial_stem which returns the 'CurveFile'?
Alternately, a Path.name_parts which returns ['CurveFile', '.vs2022', '.vcxproj'] and Path.suffixes could be re-implemented as just return self.name_parts[1:]
Constructing a list of stems which is redundant with the suffixes (requiring careful matching to not drop or duplicate one) sounds like a bad idea to me. IOW
[p == st + su for st, su in zip(p.stems, p.suffixes)]
would yield a list full of True, as in, each way we can reconstruct p more than once is redundant IMO. And woe betide anyone who gets some indexing wrong between stems & suffixes, then you end up with 'CurveFile.vs2022.vs2022.vcxproj', or 'CurveFile.vcxproj', etc.
To me, Path.stem in that example should just be 'CurveFile', though I understand we probably can’t (easily) change that behaviour. Still, that way things would be unambiguous, minimal & complete for the purposes of being able to take paths apart & put them back together again, which sounds more desirable IMO than the other alternatives (which would however still be better than p.suffixes, IMO).
File extensions are difficult to get right, because filenames like pip-8.1.1.tar.gz are commonplace, and humans rely on context and experience to figure out where the stem ends and the extension begins. I think this is why os.path provides splitext(), but not a function that repeatedly applies splitext() to get “all the extensions”.
Similarly pathlib has path.stem and path.suffix, which split on the rightmost period. This produces reasonable results 99% of the time. There’s path.suffixes too, but it should be treated with care: items earlier in the returned list are less likely to be file extensions:
Personally I don’t much like path.suffixes - it’s too easy to get misleading results like the above. A path.stems property would be affected by the same problem, I think.
Well, clearly, that file should have been pip-8.1.1.tgz then.
In all seriousness, the result ['.1', '.1', '.tar', '.gz'] highlights the incompleteness of .suffixes. Path.parts yields all of parts of the given path, and could be reversed to get back the original Path.
I think we need that for the Path.name as well. Path.suffixes is close, but omits the “prefix”.
If Path.name_parts returned ['pip-8', '.1', '.1', '.tar', '.gz'], then the user could use "".join(p.name_parts[:-2]) to recover the path’s name without the final two extensions, be that pip-8.1.1 or CurveFile, but that is falling back on string concatenation.
Using path.pop_suffix() would be misleading, in that we aren’t popping anything. The path object is immutable. It would effectively just be an alternative spelling for:
I’m not seeing much support for .without_suffixes(2) or .without_suffix(".tar.gz"). How about a .splits attribute, which returned all prefix/suffix pairs as a list of (named) tuples? If you want just the prefix before the first period, that is just the prefix of the first element. If you want to split off exactly 2 suffixes, that is the second-last element.
p.stems returning ['CurveFile', '.vs2022'] isn’t quite right. That would actually be p.stem_parts. I don’t think we can actually call '.vs2022' a “stem”.