Support method chaining in pathlib

Hi! I propose a small, backwards compatible improvement to pathlib, to allow method chaining and therefore cleaner user code. For example, to write a new file in a new directory, I currently use:

from pathlib import Path
d = Path().joinpath('new_directory')
d.mkdir()
d.joinpath('new_file').write_text('text')

This awkward, multi-statement form is required because Path.mkdir() returns nothing. If we changed Path.mkdir() to instead return self, then we could do some cleaner method chaining:

from pathlib import Path
Path().joinpath('new_directory').mkdir().joinpath('new_file').write_text('text')

I may be missing something here, but here are some of the things I thought of:

  1. Backwards compatible. Users shouldn’t currently be using the return value of mkdir(), since currently it is always None
  2. Allows simpler user code.
  3. Semantically OK: As an analogy, list.reverse() returns nothing, because it modifies the input in-place. This is a good API, because it prevents users from doing the mistake of new_list = old_list.reverse(). mkdir() also modifies the underlying state (of the filesystem), but unlike reverse(), mkdir() is very obvious that it does that, and it just returns the original path, instead of a modified instance. Is there some semantic problem with returning self from mkdir() that I don’t see?
  4. Not pre-emptive. As soon as we add a return value, then it’s locked in and we can’t change it, else we break users. However, this choice of return value is the only sane choice I can see. I can’t see how in the future we will regret this choice and wish we had returned something else. We already can tell success vs failure based on exceptions. Is there a different candidate return value that I’ve overlooked?

I looked through the rest of pathlib, and here are some other candidates for this same tweak:

  1. Path.touch(): Similar to mkdir(). I would vote for changing this as well to return self.
  2. Path.symlink_to() and Path.link_to(): A bit weird, should this return src or dst? Probably leave this as is.
  3. Path.rmdir(): Could return the parent directory? Probably should leave this as is.
  4. Path.chmod(): This is a bit unrelated, but this could return the old mode and permissions? Probably should leave this as is.
  5. Something else?

I can submit a PR for this if reception here is positive. Thanks for your thoughts!

PS: Background, further reading: Issue 31163: Return destination path in Path.rename and Path.replace - Python tracker

Traditionally Python does not have mutating calls return themselves to prevent accidental misunderstanding that something changed underneath you and thus you aren’t getting a new copy of something. The canonical example of this is list.sort() not returning self.

So all of these methods are following that practice of not returning self when the call has side-effects.

P.S. I noticed your use of Path.joinpath(); do you happen to know about the use of / in pathlib? E.g. d / 'new_file'?

See my point 3 above. My thought was that this case was different from list.sort(), all users are aware that mkdir() has side effects. Let me explain that point a bit more:

  1. I think that every user is aware that mkdir() is operating on the the singleton of the filesystem. Since there only is one filesystem, it couldn’t make sense for mkdir() to return a modified copy of something. I think every user of mkdir() would confidently say “it modifies the underlying filesystem”.

  2. mkdir() returns the original path unmodified. Since it’s unmodified, users are more likely to assume that some state is being mutated, otherwise why even bother calling the identity function something so complicated as mkdir() :slight_smile:

  3. sort() is prone to tricking people because they are likely to reach for it blindly when they are looking for something (“a sorted version of this list”), and then it shooting them in the foot. The return types of list.sort() vs sorted() help these blind reachers end up with the correct tool for the job. On the other hand, I don’t think mkdir() is as prone to this “blind reaching”. There is only one single tool for the job, mkdir(). They don’t find mkdir() when looking for something else.

I’m trying to come up with other examples where a mutating method also returns self. Other languages and non stdlib frameworks do that all the time, but I’m having trouble finding other examples in the standard library. I suppose to be consistent, if this changes, then os.mkdir() should also change, even though os.mkdir() doesn’t receive as big of a benefit since it’s structure isn’t as conducive for chaining (os.mkdir(previous().chain()).following().chain() instead of previous().chain().mkdir().following().chain()).

What do you think of those points? I think maybe the crux lies between whether python wants the mantra “mutating calls shouldn’t return self” as a rule in and of itself, vs. as a guideline for the larger goal of preventing foot-shooting such as list.sort().

re: joinpath(), it was merely that I thought that looked cleaner than needing to wrap things in a paren with (d / 'new_file').write_text(), but thanks :slight_smile:

I think that’s the main thing: the standard-library has not really adopted method-chaining, meaning users will typically perform one action per line.

Looking at your initial example:

You import, instantiate, make a directory, then write to file. Each one of the actions I just stated was one line of code, helping with readability. Even in JavaScript, it’s common to see:

const result = new MyClass(config)
    .addData(data)
    .transform(newConfig)
    .getProperty(prop)

But this is enforced by Python’s standard-library.

From my experience, method-chaining does not generally reduce line counts, since it is still clearer to put each chained method in a line when you have more than a couple of calls. Aside from breaking long lines, keeping each call on their own line also prevents misunderstandings when the code is read (you can mistake the call target if you miss one method call in a chain).

The most significant advantage to method-chaining is to reduce how many times you need to type local variables when you call multiple methods on a variable, and to avoid even naming things in many situations. Neither of them is needed to pathlib specifically though, since there are not too many methods you can call on the same path (unlike e.g. jQuery-UI where you can move and rotate and move and move one thing in quick succession). Naming is also not a problem for path objects (which tend to have obvious names), and even less problematic for Python in general—it’s usually more popular in languages where variable declaration requires more ceremony, and JavaScript where traditionally local variables are just avoided if possible. So all in all the benefit of this particular addition seems minimal to me.

Correct, method chaining is not a “thing” in the stdlib (probably because method chaining seems to come out of Java which is newer than the stdlib :grin: ).

And I agree. Method chaining is really useful in e.g. Java because historically the cost of defining a new variable was painful. Python doesn’t have that cost and we prefer people be explicit with what they are doing. And so having to make method calls on their own to call out you are doing something that has side-effects can be viewed as a benefit, not a hindrance.

Well, Smalltalk has as a default returning self if not specifying something different, so actually it is a lot older. In Smalltalk you will find a lot of chaining in the “standard library”.

Smalltalk is very much like Python in that respect, so I think it is more or less a “philosophical choice” what you do. Personally, I don’t mind either :slight_smile:

Thanks all, there are some interesting points in here I hadn’t thought of. I still think that for simple cases (eg my example or simpler) method chaining can improve things, but for anything more complicated I agree method chaining is worse. I respect losing a little in order to see the rest of the benefits that y’all state. Seeing everyone’s support for the current no-chaining pattern convinces me, I was admittedly skeptical that I had thought of something that the original authors hadn’t :slight_smile:

In Python it would be even longer:

result = (
    MyClass(config)
    .addData(data)
    .transform(newConfig)
    .getProperty(prop)
)

Additional 2 lines for parentheses as formatted by Black.