Apply operator ( f @ x means f(x) )

eyalk11 · September 3, 2023, 12:53am

I don’t encourage applying a lot of functions in one line. And yet sometimes it is useful. And having too much brackets can be confusing.
For such cases, it is good to have an apply operator.
For example: I think that list @ map @ (lambda x:x*2 , lst) is more readable than
list(map(lambda x:x*2 , lst)) .

The idea was taken from Wolfram Mathematica btw.

thejcannon · September 3, 2023, 2:07am

Python already supports an @ operator for “matrix multiply”.

gwerbin · September 3, 2023, 5:40am

As a big Numpy user, I always was uncomfortable with the addition of @ to the language. I’d have feel better about it if it was adopted as a general function composition operator.

You could have functions implement __matmul__ where f @ g = lambda(*args, **kwargs): f(g(*args, **kwargs)). Matrix multiplication is just composition of linear transformation anyway, so it kind of works out as a retroactive justification for the name and its usage in Numpy.

That said, I don’t think you can have this act as both function application and function composition. What happens if you have a callable object? Maybe you could special-case it for a tuple, so f @ x is application and not composition only when x is a tuple.

storchaka · September 3, 2023, 6:55am

Just yesterday I implemented a function that performs generic function composition. The last time I did this was a few years ago. So while this looks like an attractive idea, the need for this does not arise so often.

ajoino · September 3, 2023, 10:20am

I would like to have both a function composition operator and currying in python, but I think that ship sailed long ago.

vovavili · September 3, 2023, 11:24am

Usually, this kind of operation is performed using a pipe operator in other languages (R, Elixir, Unix shell, etc.), As a language polyglot, this is the one killer feature that I like the most that’s absent in Python, it makes the code look so much more clean. Compare:

result <- filter(data, Age >= 25)
result <- select(result, Name, Score)
result <- arrange(result, desc(Score))

to

result <- data |>
  filter(Age >= 25) |>
  select(Name, Score) |>
  arrange(desc(Score))

In Python, to some extent this can be achieved through a fluent interface, though you’re not always working within the limits of one object. I am not sure whether pipe operator syntax would fit Python well. I would, however, appreciate it if functools had a functional composition function. That would make this kind of code much readable without introducing much sacrifice in terms of operator inflation.

from functools import compose

pipeline = compose(map(fun=lambda x:x*2), list)
pipeline(lst)

An inspiration can be taken from source code of toolz/cytoolz.

vovavili · September 3, 2023, 11:49am

One practical use case where this does often arise for me is Apache Airflow’s 2.0 functional API. Consider this example from their tutorial:

from airflow.decorators import dag, task
@dag(...)
def tutorial_taskflow_api():
    @task()
    def extract():
        ...
    @task(multiple_outputs=True)
    def transform(order_data_dict: dict):
        ...
    @task()
    def load(total_order_value: float):
        ...
    order_data = extract()
    order_summary = transform(order_data)
    load(order_summary["total_order_value"])
tutorial_taskflow_api()

If Python had functional composition, that last could use less assignment, though you do have to use a lambda at some point:

from functools import compose

pipeline = compose(extract, transform, lambda x: load(x["total_order_value"]))
pipeline()

For a three-step DAG it doesn’t matter much, but for a really long DAG it really would feel nice not to have to use intermittent variables.

Another one is a long chain of NumPy calculations. Unlike pandas, NumPy doesn’t support fluent interface, which makes a long chain of function calls kind of inconvenient.

Compare this:

import numpy as np

arr = np.array(...)

arr = np.mean(arr)
arr = np.sqrt(arr)
arr = np.log(arr)

to this:

from functools import compose
import numpy as np

arr = np.array(...)
pipeline = compose(np.mean, np.sqrt, np.log)
pipeline(arr)

pf_moore · September 3, 2023, 12:47pm

Vladimir:

As a language polyglot, this is the one killer feature that I like the most that’s absent in Python, it makes the code look so much more clean. Compare:
result <- filter(data, Age >= 25)
result <- select(result, Name, Score)
result <- arrange(result, desc(Score))
to
result <- data |>
  filter(Age >= 25) |>
  select(Name, Score) |>
  arrange(desc(Score))

Personally, I prefer the first form. Which I assume is not the point you were trying to make

This is very much a matter of opinion, in my view. Sometimes, fluent interfaces, or pipeline mechanics, are more readable. Other times not so much. And as a result, some languages offer multiple ways of doing the same thing, allowing users to choose.

Python, however, has always had a principle of “there should be only one obvious way”. This isn’t a hard and fast rule, or even a guideline as such, but it is a statement of the design philosophy. And it’s saying that we value consistency over variety, and we tend to pick a style and prefer^[1] it.

In this case, a series of assignment statements, each containing a single function call which modifies the input data one step at a time, is the preferred approach. Single-expression approaches like pipelines and fluent APIs are possible, but not generally encouraged, and are unlikely to get special language support.

So basically what I’m saying is that arguing for this sort of functional language style of expression on a purely “readability” basis is unlikely to get anywhere. What would be needed is an extremely compelling example of real-world code, that was so clearly better using the proposed feature that even people who normally prefer the statement-based approach would concede that in at least the given case, the proposed alternative was better.

not mandate! ↩︎

Melendowski · September 3, 2023, 1:07pm

I love the chained operation syntax that pandas provides and would like it for functions, but to your point, I think the biggest reason against is that it’s not a huge improvement and actually a detriment in certain cases.

Debugging for example is an area where chained operations like this are bad. It’s all one line for the debugger and to step through it, you have to copy and paste the code into the repl.

I love writing a long chained operation with pandas then letting black reformat it into multiple lines. Writes easy, reads easy, then I have to debug it and I hate it.

vovavili · September 3, 2023, 1:33pm

I think pandas + black is one of the most compelling cases for usefulness of method chaining I can think of. An example from this article:

(
    wine.pipe(csnap)
    .rename(columns={"color_intensity": "ci"})
    .assign(color_filter=lambda x: np.where((x.hue > 1) & (x.ci > 7), 1, 0))
    .pipe(csnap)
    .query("alcohol > 14")
    .pipe(csnap)
    .sort_values("alcohol", ascending=False)
    .reset_index(drop=True)
    .loc[:, ["alcohol", "ci", "hue"]]
    .pipe(csnap, lambda x: x.sample(5))
)

Would have to be written like this, which is much more verbose:

wine = csnap(wine)
wine = wine.rename(columns={"color_intensity": "ci"})
wine["color_filter"] = np.where((wine["hue"] > 1) & (wine["ci"] > 7), 1, 0)
wine = csnap(wine)
wine = wine.query("alcohol > 14")
wine = csnap(wine)
wine = wine.sort_values("alcohol", ascending=False)
wine = wine.reset_index(drop=True)
wine = wine.loc[:, ["alcohol", "ci", "hue"]]
csnap(wine, lambda x: x.sample(5))

And that’s if your dataframe name is just wine. Some production dataframes I’ve seen had names like df_final_filtered_exclude_last_col, not using method chaining with these gets ugly fast. And it’s also temping to rename variables you’re dealing with the second approach, which increases cognitive burden on the programmer - coming up with good variable names is genuinely hard. Some codebases I’ve seen change pandas variable name at each method call.

One upside of the second approach is that it’s easier to debug with breakpoint(), but with modern IDEs like PyCharm or VSCode it doesn’t matter at all.

pf_moore · September 3, 2023, 1:45pm

But that can be done right now in Python. So what’s the issue? I wasn’t saying it should never be done, just that arguing for a new Python feature just on this basis isn’t likely to work.

Melendowski · September 3, 2023, 1:48pm

I think if we want to argue in favor of this, we’d have to see some real world examples where having first class support in the language would make the implementation easier.

I don’t fully grasp these implementations but my understanding is that they’re using objects with operations overloaded to somehow make some delayed evaluation when finally called. Polars in particular does this because they construct DAGs in the background and then optimize them (their lazy API at least)

Project I found one time whose code I found fascinating, it’s focus to provide some declarative style for reading in and writing out data

github.com

construct/construct/blob/master/construct/expr.py

import operator
if not hasattr(operator, "div"):
    operator.div = operator.truediv


opnames = {
    operator.add : "+",
    operator.sub : "-",
    operator.mul : "*",
    operator.div : "/",
    operator.floordiv : "//",
    operator.mod : "%",
    operator.pow : "**",
    operator.xor : "^",
    operator.lshift : "<<",
    operator.rshift : ">>",
    operator.and_ : "and",
    operator.or_ : "or",
    operator.not_ : "not",
    operator.neg : "-",

This file has been truncated. show original

Polars API examples and the code

https://pola-rs.github.io/polars-book/user-guide/expressions/operators/

github.com

pola-rs/polars/blob/main/py-polars/polars/functions/lazy.py#L44


      
              from polars import DataFrame, Expr, LazyFrame, Series
              from polars.type_aliases import (
                  CorrelationMethod,
                  EpochTimeUnit,
                  IntoExpr,
                  PolarsDataType,
                  RollingInterpolationMethod,
              )
          
          
          def col(
              name: str | PolarsDataType | Iterable[str] | Iterable[PolarsDataType],
              *more_names: str | PolarsDataType,
          ) -> Expr:
              """
              Return an expression representing column(s) in a dataframe.
          
              Parameters
              ----------
              name
                  The name or datatype of the column(s) to represent. Accepts regular expression

github.com

pola-rs/polars/blob/c67d056c499f4f9c61ee9523aa07b9a377302a0c/py-polars/polars/expr/expr.py#L100


      
              else:
                  from typing_extensions import Concatenate, ParamSpec, Self
          
              T = TypeVar("T")
              P = ParamSpec("P")
          
          elif os.getenv("BUILDING_SPHINX_DOCS"):
              property = sphinx_accessor
          
          
          class Expr:
              """Expressions that can be used in various contexts."""
          
              _pyexpr: PyExpr = None
              _accessors: ClassVar[set[str]] = {
                  "arr",
                  "cat",
                  "dt",
                  "list",
                  "meta",
                  "str",

Melendowski · September 3, 2023, 1:51pm

One upside of the second approach is that it’s easier to debug with breakpoint() , but with modern IDEs like PyCharm or VSCode it doesn’t matter at all.

As much as remote developers with restricted software tooling are the overwhelming minority, they should still be supported. I’m one and all my development is in vanilla VIM and debugging is purely in pdb with a .pdbrc

I’d say a requirement would be improvements to pdb

vovavili · September 3, 2023, 1:56pm

Well, for example, in NumPy or Airflow 2.0 there’s no support for fluent interface at all. So either I as a library user would have to ask every single library developer to support fluent interface, or I can circumvent this shortcoming with a pipe operator or functional composition.

vovavili · September 3, 2023, 9:36pm

One way to do this could be to turn breakpoint() into a kind of identity function (lambda var=None: var), where a variable passed to it is assigned to an optional parameter var and then, when debugging has started, it is returned without alternations. This way, debugging composed functions could be just as easy as with the re-assigning approach.

gwerbin · September 5, 2023, 1:34am

I wouldn’t mind a different infix operator for partial function application / currying. Maybe & to suggest attaching something new to the function?

Here’s a slightly contrived but hopefully illustrative example of the kinds of use that one might expect see in boring conventional Python code:

import sqlite3
from collections.abc import Generator
from contextlib import closing, contextmanager
from dataclasses import dataclass

@dataclass
class User:
    id: int
    name: str

    def __format__(self, spec: str) -> str:
        if spec == "full":
            return f"User(id={self.id}, name={self.name})"
        elif spec == "short":
            return "User"
        else:
            raise ValueError(f"Unsupported format spec: {spec!r}.")

@contextmanager
def db_connect(src: str) -> Generator[sqlite3.Connection, None, None]:
    with closing(sqlite3.connect(src)) as db:
        db.row_factory = sqlite3.Row
        yield db

def load_user(db: sqlite3.Connection, user_id: int) -> User:
    q = "SELECT id, name FROM users WHERE user_id = ?"
    p = (user_id,)
    with closing(db.execute(q, p)) as curs:
        result = curs.fetchone()
    return User(id=result.id, name=result.name)

if __name__ == "__main__":
    user_ids = [1, 2, 3]
    with db_connect("./mydata.db") as db:
        _load_user = load_user & db
        _print_user = print @ (lambda u: format(u, "full"))
        users = [_load_user(i) for i in user_ids]
        for user in users:
            _print_user(user)

Of course, callable classes remain the big problem here. Should defining __call__ also automatically define __matmul__ and __and__? Then what happens if you want to also define conventional __matmul__ and/or __and__ methods (as you might with an array data type) on a callable class? In that case, users can’t use their notation for function composition and partial application on just that one class. A conscientious library author could maybe handle the former case with some cleverness in argument type checking, but I don’t know about the latter.

eyalk11 · September 10, 2023, 5:20am

Another realworld example:

collections.OrderedDict(map(lambda x: (x[0],x), self.get_buy_operations_with_adjusted(sorted(items))))

vs

buyoperations = collections.OrderedDict @ map @  (lambda x: (x[0],x) , self.get_buy_operations_with_adjusted @ sorted @ items )

dmoisset · September 10, 2023, 10:23am

This example is proposing something which isn’t what the original poster suggested. If the semantics of “f
@ x” is same as “f(x)”, then this code is running _print_user = print(lambda ...) which will print the lambda and assign None to the variable. In this case @ seems to be composition rather than application.

dmoisset · September 10, 2023, 10:24am

For this examples to work, The @ operator would need to be right-associative. However, in python it’s left associative and that can not be changed in a backwards compatible way

eyalk11 · September 11, 2023, 7:25pm

buyoperations = collections.OrderedDict @ map @  (lambda x: (x[0],x) , self.get_buy_operations_with_adjusted @ sorted @ items )

(From wiki ,Function composition)

The composition of functions is always associative—a property inherited from the composition of relations . That is, if f, g, and h are composable, then f ∘ (g ∘ h) = (f ∘ g) ∘ h

Maybe I am missing something, but I just don’t see how it is ambiguous.