On the first~classness of functions

JohnHind · January 9, 2022, 11:38am

I am a relative newby, but Python has finally stuck. Several previous attempts over the decades had me recoiling in horror at significant whitespace, but this time I found it to be tolerable since most text editors now work around its problems in a fairly intuitive and transparent way!

But, coming mainly from Lua, the claim of first class functions in Python still seems a stretch. If I am treading a well worn path here, please go easy on me and point me specifically to where the discussion is at!

Why the ‘lambda’ keyword?

myfunc = lambda x: x**2
myfunc = def(x): return x**2

The hypothetical second version deprecates a keyword and makes the definition of a lambda more consistent with the definition of an ordinary function.

Remove the single-line restriction on lambdas.

myfunc = def(x):
    return x**2

This syntax easily extends to multi-line function definitions allowing removal of the single-line restriction on lambdas.

Make this the ordinary way of defining functions.

def myfunc(x):
    return x**2

The traditional form of function definition becomes syntax sugar for the proposed new form in 2 (as in Lua).

Now functions can be truly first-class!

myfunclist = [
    def(x): return x**2,
    def(x, y):
        return x * y
]

We can define single-line or multi-line functions anywhere a reference to a function object would be valid without having to separately define them and invent a redundant dummy name. This also makes the syntax consistent with class definitions.

This proposal seems sufficiently obvious that it must have been thought of before. I am really interested to learn by understanding why it has been rejected (or at least not implemented), so if there is already extensive discussion, please point me at it rather than repeating it here.

steven.daprano · January 9, 2022, 4:31pm

When people describe entities in programming language, they typically
mean a first class value. While the concepts of “first and second
class” values are somewhat subjective, we can typically say that
first-class values:

can be stored in variables, lists, arrays, dicts or other data structures;
can be passed to functions as arguments;
and can be returned from functions.

If your language supports these features, then also:

can be inspected at runtime;
can be created on the fly, at runtime.

Python functions support every one of those requirements. There is
absolutely no doubt that Python functions are first class values.

An example of a second-class value would be subroutines (functions and
procedures) in standard Pascal. You can pass a subroutine into a
function as an argument, but you can’t return one out.

Another example would be functions in C. While pointers to a function
are first-class values, the function itself is cannot be passed as an
argument or returned.

Being “first-class” does not say anything about how you create the
value, only what you can do with the value once it is created. So it
doesn’t matter how you can create Python functions, only that they can
be created.

There are three ways to create Python functions in Python:

Use the types.FunctionType constructor. Nobody does that: the details
are poorly documented and hard to do correctly. But the interpreter
uses it, behind the scenes.
Use def statements. def, being a statement, introduces a block,
which can contain one or more statements.
Use lambda expressions. Being an expression itself, lambda is
limited to a single expression, with no statements.

It does not matter whether you use FunctionType(), def or lambda,
you still get a function object.

steven.daprano · January 9, 2022, 5:01pm

To answer your questions:

Why the lambda keyword?

“lambda” is the Greek letter L, and there is a branch of mathematics
closely related to theoretical computer science that uses lambda λ as
the symbol for function abstractions.

Lambda calculus had a big influence on functional-programming languages,
like Lisp and Scheme (and later, Haskell).

So this is pretty abstract stuff. How did it get into Python? Guido
(Python’s creator) can explain in his own words how it happened:

“2. Remove the single-line restriction on lambdas.”

There is no single line restriction on lambda. The restriction is a
single expression, which can extend over multiple lines.

Python has no syntax for multi-statement expression blocks, and is
unlikely to ever get one. For many years people have made suggestions
for this, but every single one ran into difficulty with parsing, or
ambiguity, but ultimately Guido simply decided that he likes the
restriction the way it is:

https://www.artima.com/weblogs/viewpost.jsp?thread=147358

If a function is big enough to need multiple statements, then you can
give it a name and use def.

Your items #3 and #4 won’t fly, the distinction between statements and
expressions are not just deeply built into the language, but we like
them that way. We (as a community – there are plenty of individuals who
disagree) feel that some operations are better treated as statements,
not as expressions. And that includes named functions.

By the way, your example of myfunclist:

myfunclist = [
    def(x): return x**2,
    def(x, y):
        return x * y
    ]

is a poor argument for your suggestion. We can do that right now:

myfunclist = [
    lambda x: x**2, 
    lambda x, y: x*y
    ]

If you want to make a suggestion for an improvement to the language, you
need to show how it can let us do things we can’t currently do. Not just
a change of spelling.

JohnHind · January 9, 2022, 7:03pm

Thanks for the references. I’ll use Guido’s own language: “I had made functions first-class objects”. I would say “close, but no cigar”. If functions are objects, then function definitions should be object constructors. Currently, function definitions are not composable in the same way the definitions of other objects are. You cannot define a function in a list constructor (unless it is a single expression lambda) in the way you can a string or an integer (or another list). Despite your inserting a crowbar into the tiny chink I left for the sake of terseness in my example, I think you can see that it removes that limitation. Does what I suggested “run into difficulty with parsing, or ambiguity”? At the moment, I cannot see why.

CAM-Gerlach · January 10, 2022, 12:18am

To note, many of the examples given assign the result of the lambda expression to a name, which is not an intended use of a lambda and highly discouraged/unidiomatic to begin with (as opposed to using def, as intended). See this recent discussion for more on why, as well as how lambdas and their syntax fit into the language.

As @steven.daprano points out, function object are constructed via an object constructor, types.FunctionType(); its just rather tedious and inelegant to do so. At a low level, def and lambda can be roughly be thought of as syntactic sugar for types.FunctionType() that makes this far less painful, clearer and more elegant. Your proposed syntax is not an object constructor either, and fundamentally at odds with the basic grammar of Python with regard to statements, callables and blocks.

While a minimal reproducible example is useful for isolating a bug, I think what we’re asking for here are useful real-world (not just theoretical) examples where the limitation has serious negative repercussions, and the proposed new syntax would have substantial tangible benefits for a significant fraction of users sufficient to justify such a relatively large change.

As both @steven.daprano and myself have mentioned, there are fundamental reasons (linked above and briefly discussed here) why the syntax is incompatible with the basic grammar of Python. At the minimum, there needs to be compelling real-world use cases that would justify the change, and a syntax and parsing grammar that would overcome the issues raised in past attempts. At this point, it seems rather unlikely to gain widespread support, but welcome to propose such.

steven.daprano · January 10, 2022, 1:24am

John Hind wrote:

“If functions are objects, then function definitions should be object
constructors.”

That is an idiosyncratic defintion of “object” that, as far as I can
tell, is not shared with anyone else.

Anyway, functions clearly are objects. This is not an opinion or open to
debate, it is a fundamental part of Python’s design:

def func():
    pass

isinstance(func, object)

returns True. You can swap out the reference to func for a lambda
expression, or a builtin like len, and it will still return True.

So functions are objects. They inherit from object, they have a type
and they have attributes and methods that you can inspect using dot
notation. Their type has a constructor, which the interpreter uses but
the rest of us never bother with, it is easier and better to use the def
or lambda syntax.

The signature for FunctionType is:

FunctionType(code, globals, name, argdefs, closure)

and the first argument is required to be a code object, which has
signature:

CodeType(argcount, posonlyargcount, kwonlyargcount, nlocals, 
         stacksize, flags, codestring, constants, names, varnames, 
         filename, name, firstlineno, linetable, freevars, cellvars)

most of which is undocumented, and if you think that it is easier to use
the CodeType and FunctionType constructors to generate a function
instead of def, then go right ahead

John said:

“Currently, function definitions are not composable in
the same way the definitions of other objects are. You cannot define a
function in a list constructor (unless it is a single expression
lambda)”

That’s a restriction on the def syntax, not a restriction on the
FunctionType class.

FunctionType is just a class. It’s a class with a complex signature,
that requires an instance of another class (CodeType) with an even more
complex signature, but I’m sure that there are folks out there who have
reverse-engineered what the CodeType constructor needs. People do all
sorts of things.

steven.daprano · January 10, 2022, 1:49am

John Hind asked:

“Does what I suggested “run into difficulty with parsing, or ambiguity”?
At the moment, I cannot see why.”

Yes. You had:

myfunclist = [
    def(x): return x**2,
    def(x, y):
        return x * y
    ]

and although I glossed over it in my earlier post, that’s ambiguous,
since it could mean either:

(1) A list of two anonymous function objects; or

(2) A list of one anonymous function object, which returns the tuple
(x**2, anonymous function returning x*y).

Of course there are ways to resolve that using parentheses, or
precedence, but whatever we do is likely to confuse somebody, or lead to
more problems in the grammar.

Here’s another example:

map(def(x):
        y=x+1
        return y
        , [1,2,3])

Is that…?

(1) A call to map() with two arguments, a function and a list; or

(2) A call to map() with one argument, a function that returns
a tuple y, [1,2,3].

These are just the simple, obvious problems. What about the subtle,
non-obvious problems? You should read this:

Quotes:

“At the time I didn’t understand the beautiful set of compromises Guido
van Rossum managed to make when designing the Python grammar. I became
curious as to why my language had multi-line lambdas and Python did not.
The answer to that question would lay in a problem with my grammar I
would soon discover and could not come up with a good solution to.”

And later:

“I also started to notice all sorts of strange syntactic oddities which
emerge in a grammar which embeds indentation blocks in statements. This
is exactly the kind of ugliness I believe Guido sought to avoid by
disallowing indent blocks within expressions.”

The author of Reia eventually decided to abandon Python-like indentation
sensitivity in favour of Ruby-like braces, which is a popular choice.

It comes down to this: you can pick two of the following:

a simple, consistent grammar;
indentation-based blocks;
multi-statement expressions;

but not all three.

JohnHind · January 10, 2022, 3:41pm

OK, I thought the use-cases for tables (or tuples, dictionaries etc.) of functions was fairly obvious. One would be a calculator app for assigning functions to buttons. One I faced in my first serious Python app is finite state machines. A tuple in which the index is the state and the function is the evaluation function for that state. The former case would usually (but not always) be possible with lambdas, but the latter would almost always require more complex functions.

In both/all cases there are other ways of doing it, but IMHO a language should be as regular as possible so if you can do something with one object type you can do it with all object types for which it makes sense (if ‘first class’ means anything practical, it ought to be this!)

If there is an argument for having lambdas at all, it is obtuse to justify their being limited. Functions may be first-class objects, but lambdas are certainly not first-class functions! Functionality is a continuum - there will always be some operations that are comfortably expressed as a single expression, some that cannot be so expressed and a dangerous twilight zone were there is some highly obtuse, un-readable and tricky way of hammering some complex operation into a single expression! Guido seems quite ambivalent about lambdas. OK they cannot be removed, but he could declare them ‘un-Pythonic’ and set the inquisition on anyone with the temerity to actually use them!

As for the tuple ambiguity, it does not seem unreasonable to require brackets for disambiguation in this case as the ambiguity is new - we are not braking any pre-existing code.

CAM-Gerlach · January 10, 2022, 10:39pm

I actually have used collections (mostly dicts) of lambdas quite a bit, but in that use case, it has been rare that they couldn’t fit in a single logical (if not physical) line. And if the functions you need are that much more complex, then it is generally cleaner, more explicit and easier and less ambiguous to read and follow to just define them normally with def and include just the function name in the collection itself.

As both of us have explained to you, it is possible to instantiate a function with a constructor like any other object (and is in fact def and lambda are arguably just syntactic sugar for), it just is much more tedious, and as I mentioned before, your proposed syntax still is not that of a regular constructor (unlike the actual existing constructor), nor does it follow the rules for regular Python. Likewise, strings, ints, floats and collections (among others) are objects, as are instances of complex classes that typically use different ways of constructing an instance object aside from calling a constructor for various valid reasons (enums, custom metaclasses, factory functions, etc), and likewise can otherwise be used like other objects. Are these not “first class objects” either, simply because the constructor isn’t the normal method of creating them?

See the detailed discussion in the issue I linked. Lambda creation is purposefully limited, since it is the view of the Python language designers and community at large that it is clearer and more idiomatic to use lambdas for short, throwaway, anonymous functions, and use def for anything more complex. But once a lambda is created, the resulting function object is essentially identical to one with the same body content created via def, and you can do all the things with one you could with another (again, see the other issue, where this was discussed at length).