Uniform Function Call Syntax (UFCS)

E.g. duckdb has tables (relations to be precise) with methods that return a table, to which one can apply a method, which returns a table, to which one can apply a method, which returns a table, etc.

# Total salary of IT in Aug (I think, on mobile, no laptop)
(duckdb.table('employees')
       .filter("department = 'IT'")
       .join(duckdb.table('emp_salaries')
             .filter("year_month = 202408"), 
             'emp_id')
       .aggregate('sum(salary)')
       .show()
)

With extra brackets this works (because they are methods).

Now suppose I have a function, e.g. natural_join() that avoids specification of the join column:

def natural_join(t1, t2):
    common = ', '.join(set(t1.columns) & set(t2.columns))
    return t1.join(t2, common)

Then I cannot use the same style, but instead need to:

(natural_join(duckdb.table('employees')
                    .filter("department = 'IT'"),
              duckdb.table('emp_salaries')
                    .filter("year_month = 202408"))
              .aggregate('sum(salary)')
              .show()
)

My understanding of the proposed syntax is that it allows something closer to the original:

(duckdb.table('employees')
       .filter("department = 'IT'")
       .natural_join(duckdb.table('emp_salaries')
                           .filter("year_month = 202408"))
       .aggregate('sum(salary)')
       .show()
)

While that’s attractive for this use case, object().print() is not, so I’d like to know how to prevent such use.

2 Likes

Subclassing would help, but if the DuckDB API doesn’t allow subclassing, that’s not Python’s problem. :slightly_smiling_face:

That’s exactly what I would avoid when writing code. It may look cool and easy as a one-liner, but the next morning, you won’t understand what you wrote.

You either need to add necessary comments or, preferably, use descriptive variables for each function call.

I don’t think I can subclass, because (other than via class methods) I cannot create a relation instance, so no __new__() to super() to. I’ll try. Anyhow subclassing seems a heavy tool for such utility additions (potentially resulting in multiple inheritance). So the problem is mine, then :sweat_smile:

And yes, object().print() should definitely not be possible. SQL is often written in a piped mode (also see e.g. PRQL), imperative code shouldn’t.

I don’t know whether I share your preference for descriptive variables for each (emphasis mine) function call. Linux wouldn’t be the same with all the > and < temp_files.

No, he was referring to a different one, namely, dict(**A, **B). This one raises an error on duplicate keys.

{**A, **B} can overwrite, but does not respect types and converts to pure dict.

why it is not clear enough?

Intermediate variable cost more mentally. It more clear to run though all process in one go, like recieps.

2 Likes

I didn’t say it wasn’t “clear enough”. I said:

For me, I think this proposal would be a lot stronger with real life examples showing how this pattern has significant benefits over idiomatic Python. I’m not convinced that “functional programming is better than imperative programming”. As in life, everyone has different experience, and if you want to be convincing, it helps to relate your experience rather than your conclusions.


Also, I disagree that intermediate variables are always worse. Their names function as comments, and they facilitate debugging. Maybe there are cases where they’re superfluous. However, any new functional syntax is also noisy when you’re used to reading imperative code.

3 Likes

E.g. compare earlier duckdb fragment:

(duckdb.table('employees')
       .filter("department = 'IT'")
       .natural_join(duckdb.table('emp_salaries')
                           .filter("year_month = 202408"))
       .aggregate('sum(salary)')
       .show()
)

with ‘idiomatic’ form with temporary values:

employees = duckdb.table('employees')
it_employees = employees.filter("department = 'IT'")
emp_salaries = duckdb.table('emp_salaries')
aug_salaries = emp_salaries.filter("year_month = 202408")
aug_salaries_of_it_employees = natural_join(it_employees, aug_salaries)
total_aug_salaries_of_it_employees = aug_salaries_of_it_employees.aggregate('sum(salary)')
total_aug_salaries_of_it_employees.show()
2 Likes

Good example. I guess it’s a matter of taste, but with the variable names, I actually find it easier to understand what’s going on in the imperative case.

3 Likes

Also, the imperative form is not written particularly clearly. Better would be

emp_data = duckdb.table('employees').filter("department = 'IT'")
sal_data = duckdb.table('emp_salaries').filter("year_month = 202408")
agg_salaries = natural_join(emp_data, sal_data).aggregate('sum(salary)')
agg_salaries.show()

That reads very clearly to me. It’s also easier to debug, as you can inspect the two parts of the natural join without having to rewrite the code. I also used shorter variable names - that’s personal preference, but IMO the extremely long names @Dutcho used obscure the meaning rather than clarifying it.

It’s all very much a matter of taste, but the advantage of the imperative form is that it doesn’t need a change to the Python language :slightly_smiling_face:

6 Likes

It’s definitely a matter of taste. My personal taste is that the variable names don’t help, but that that’s not the fault of the variable names… the problem is that this is SQL masquerading as Python, and it would be FAR clearer written like this:

duckdb.query("""select sum(salary)
    from employees natural join emp_salaries
    where department = 'IT'
    and year_month = 202408
""")

Especially since (as evidenced by the filter clauses) you still need to be aware of the foibles of SQL, so you gain very little from doing it as a bunch of methods.

I’ve seen ORMs done well, but this isn’t a good example, and I don’t think it’s a good showcase in the debate on pipelining vs imperative. The intermediate variables are unhelpful since only the resultant query is really meaningful here; but on the other hand, the .show() at the end is pure noise that exists for the sake of permitting incomplete queries to be unevaluated objects. (Unless that’s actually a “display to console” method, in which case it’s not PURE noise, but still, you could replace query(...) with show(...) and it’d be just as easy.)

7 Likes

@bwoodsend :

  1. I think I mentioned this in the OP, that I agree, and that’s the reason I was thinking about what alternative symbol would be good. I still don’t have a good answer, and I was using ▼ thus far. I’ll use |> for this post to see how it feels.
  2. Fair point. These are solvable problems, but it means implementing pseudo-UFCS would require yet more work.
  3. :thinking: . I hate being surprised by side-effects too. I hadn’t thought about the fact that this option would encourage people to hide their side effects.

@BrenBarn :
I would propose that |> acts as a very strongly binding partial, so that func = obj|>print results in func==partial(obj, print).
I think this avoids the errors. Then func==partial(obj, print); func() is fine.

use cases:

At the moment the main situation where I’d like to have pseudo-UFCS is in situations where I write

arg = func(obj.method(*args))

which feels ugly to me.

I’d prefer to be able to write

arg = obj.method(*args)|>func()

Recent examples are situations where I want to convert a Path.glob() iterator into a list, or to know it’s length, or to know whether it has any elements, or to get it’s first element. I get to the end of my line of code, then suddenly I have to stop, go back to the start, type the function, add brackets around the whole thing. It’s disruptive. The flow of programming suddenly goes backwards.
And I know readability is more important than ease of writing, but personally I find

file = Path(SOURCE_ROOT_PATH).glob("*/proprietory/pattern/probably/maybe*").__next__()

easier to read than

file = next(Path(SOURCE_ROOT_PATH).glob("*/proprietory/pattern/probably/maybe*"))

Then again, I also found a lot of examples in my code where writing obj.method(*args)|>func() would make the code worse. Most of the examples my regex grep turned up are best left as-is to be honest.

The other use-case is when you’re working with a library that has good support for method chaining. Working with dataframes for example. It’s annoying when custom-written functions disrupt the esthetic of the library. I mean, I like the way this looks:

result = (
    df[df["A"] > 2]
    .groupby("B")
    .agg({"C": "mean"})
    .rename(columns={"C": "Mean"})
    .sort_values(by="Mean", ascending=False)
    .reset_index()
)

every line does exactly 1 thing, I don’t need much working memory to comprehend it, I don’t need to check that the names match, etc.
And if I want to debug something like this, it would help me to be able to write

result = (
    df[df["A"] > 2]
    .groupby("B")
    .agg({"C": "mean"})
    |>my_log()
    .rename(columns={"C": "Mean"})
    .sort_values(by="Mean", ascending=False)
    .reset_index()
)

for example instead of my current technique

result[0] = (
    df[df["A"] > 2]
    .groupby("B")
    .agg({"C": "mean"})
)
result[1] = (
    result[0]
    .rename(columns={"C": "Mean"})
    .sort_values(by="Mean", ascending=False)
    .reset_index()
)

I think it would be easier if you just broke up your lines:

source_root_path = Path(SOURCE_ROOT_PATH)
files = source_root_path.glob("*/proprietory/pattern/probably/maybe*")
file = next(files)

If you really want to, you could work around this by deriving from the dataframe class:

class LoggableDataFrame(DataFrame):
  def log(self) -> Self: ...

df[LoggabaleDataFrame(df["A"]) > 2].groupby...........log()........

That should work as long as pandas hasn’t done anything too strange.

pandas provides a method for this btw
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pipe.html

Overall I like the idea of general pipes, they are very nice to use in the functional programming languages that have them. But I am not sure if I like the proposed syntaxes here.

You can use pipes already with third party libraries that do operator overloading for | and some extra magic, like this one:

1 Like

@NeilGirdhar :
Yes, I probably should use source_root_path = Path(SOURCE_ROOT_PATH).
However, files=...glob(...); file = next(files) is absolutely awful for me. For the rest of the time that I’d be reading that function, I’d have the information in my head “files is the result of the glob pattern XXX, from which I consumed the first item.” That’s nasty complex information, that takes a chunk out of my working memory, completely uselessly.

As for subclassing from DataFrame, your example actually serves to illustrate (one of the reasons) why I despice subclassing. I don’t know whether there is a word for this. It’s not a footgun. But it’s a construction that’ll lie in wait for you and introduce nasty hard-to-trace bugs. Maybe one of the DataFrame methods constucts an actual DataFrame in its implementation where you expected it to modify the input, so suddenly unknowingly you have converted your LoggableDataFrame into a DataFrame, and an entirely different place in your program raises an AttributeError. The error and the Error could be in entirely separate files that don’t even talk with each other, but are both imported into a common function.

@saaketp :
Thanks for pointing out pandas…pipe. That’ll be useful when I get back to working with Dataframes.

I don’t see myself working with sspipe, because the syntax doesn’t look comfortable enough.

And I agree the proposed syntax isn’t nice. |> is just 2 of the common pipe operators stuck together. It doesn’t read well. That’s part of why this is an ‘idea’ and not a ‘proposal’ actually. I couldn’t come up with a good symbol, and I wanted to open discussion to see whether other people could come up with something nice.
Syntax comfort/esthetics is important enough for me to reject sspipe, and it’s probably important enough to reject |>.

Thinking of it as a ‘pipe method’ would nudge me in the direction of .|
What would you think of "Hello World".|print() ? :wink:

It’s not really an issue with subclassing thought. The problem is that the pattern (x.f().g().h()) is not common in Python, and so there isn’t a good way to extend the methods you can call on it. Ideally, the library should ensure that any dataframes returned by a method in the chain have type Self. If they can’t promise that, then you can’t use inheritance to inject methods.

Honestly though I think that the problem illustrated in this idea might need to be explored. It does happen fairly often that you have a sequence of methods, each of which each return a modified copy of some input object. Consider:

def f(state):
  y, state = g(state, x)
  state = h(state)
  i(state)  # Forgot to accept modified state and overwrite state!
  statex = j(state) # Accidentally wrote to statex!  (Linter may catch unused variable.)
  return state

This mirrors the example in this post, but illustrates two actual potential bugs that might be caught with some clever syntax may be able to detect and prevent these problems.

One idea would be to add a syntax like:

def f(some_state):
  inject some_state as state:
    y = g(x)  # state is removed from the returned tuple assuming that the tuple is a namedtuple
    h()
    i()
    j()
  return state

This probably still needs some way to mark which lines need this injection of state. Still, I can’t see how this would be worth it, but I think it addresses the original problem, maintains Python’s imperative style, and doesn’t introduce any weird punctuation.

It’s quite funny to me that some folks think |> doesn’t look nice as a pipe operator - thats the actual forward “pipe” operator (reverse application) in the OCaml family of languages :laughing:

(OCaml library : Stdlib)

Just FYI, R added a pipe symbol and also chose |> as the symbol. People had been using pipes for years using add-on packages and %>% was not necessarily as easy on the eyes, not to mention some variants perhaps rarely used like %<>%, %$%, %!>%, or %T>%.

Combinations of existing ASCII symbols that do not cause conflicts are not trivial to find. And, unlike R, python may not easily support arbitrary user-created operators like the percent sign bracketed ones R allows or things in grave accents that allow quite a bit more.

Of course, borrowing from another language implementation is not uncommon.

Actually that was the part I don’t have problem with, since I have used same syntax in F# pipelines.
What I find weird specifically in your example is the stuff with parenthesis, where one argument is coming from the pipeline operator and rest provided inside ().