Syntax for Generator iterables using angle brackets

Not a regular occurrence at all, quite the contrary. I was just wondering whether it would be a good idea with a shorthand syntax. Also, I was trying to see if itertools.chain could be used, but did not know about map(operator.call, [expensive_func]) - smart!

I think that a LazyConcatonatedQuery type could be a good idea if it becomes a regular occurrence.

Martin,

Now that perhaps we understand what you wanted, there is a way to avoid evaluation you might consider in select cases.

Consider the or operator in Python that does not evaluate any operations on the right as soon as a left operator works.

>>> True or print("hello")
True
>>> False or print("hello")
hello
>>> False or 0 or print("hello")
hello
>>> False or 0 or "" or print("hello")
hello
>>> False or 0 or "not empty" or print("hello")
'not empty'

This may not be applicable in your case but if the interpreter truly does not evaluate any part of your complex or expensive function call including evaluating arguments until needed, then it could be used for your specific kind of case.

I did an experiment with seeing if the walrus operator would generate a side effect and in my experiment, it does not:

>>> sidevar = ""
>>> True or print(sidevar := "Should not be run")
True
>>> sidevar
''

>>> False or print(sidevar := "Should be run")
Should be run
>>> sidevar
'Should be run'

As you can see, sidevar does not change unless the logic does reach what you wanted.

Your code was:

queries: Iterator[str] = (lambda: (
    (yield f"{ticker} site:yahoo.com"),
    (yield f"{n} site:yahoo.com" if (n := fetch_name(ticker) else None),
    (yield f"{symbol} {country_name} site:yahoo.com"),
))()

You controlled which one was used by calling this until one was OK, and then stopped calling it. But would some code like this without doing all that work as long as each step either returned a truthy or a falsie? Assume you had a function \textcolor{orange}{\textsf{f}} that took each argument and either got a valid result or not. I will call your variable parts something like A, B, C:

f(A) or f(B) or f(C)

Your second condition is a bit different and this may not meet your needs even if you rewrite it. But it does show a place the interpreter ignores things unless needed and when needed does evaluate.

Of course, the same applies to using the \textcolor{purple}{\textsf{and}} keyword except that and False stops evaluation.

I repeat. The above is part of a discussion and not really addressing what you think you wanted, but a possible method to code in a way that might get you what you want and skip running expensive functions unless you have to.

Fair enough?

Avi, thanks for your all of your input!

Something like this would be interesting, but probably also quite annoying from a UX perspective, having to input those symbols.

I agree that my original example should have been clearer (I unfortunately cannot find a way to edit the post).

I’m a big fan of the or operator, but I didn’t actually know that it deferred evaluation - thanks for bringing it to my attention!

I see, so having a shorthand syntax for generators would clash with the way other iterables are instantiated and used?

How about something like this instead:

queries = generator("abc", expensive_func(arg1, arg2), "something else")

or allowing yield statements in iterables:

queries = (
    "abc",
    yield expensive_func(arg1, arg2),
    "something else",
)

but maybe that’s a completely different can of worms.

My apology. To clarify, my comment is in support of your proposal

1 Like

But we do already have generator expression, which covers one of the most common use cases of a generator, and does return an instance of the generator type as an iterator.

2 Likes

Yes, it returns a generator iterator as would a(). My point is that the other iterable types get a special constructor syntax that doesn’t return an iterator, but generators must be declared in a function body and their type is necessarily confounded with generator iterators.

I’m not sure about the syntax here or what a better alternative would be but I have wanted a feature like this before.

The situation where this comes up for me is in SymPy’s core assumptions system. There we have to write code that uses three-valued logic where boolean values are True, False or None with None meaning unknown e.g.:

In [1]: s = sqrt(2)

In [2]: x = Symbol('x')

In [3]: s
Out[3]: √2

In [4]: x
Out[4]: x

In [5]: s.is_positive
Out[5]: True

In [6]: s.is_negative
Out[6]: False

In [7]: print(x.is_positive)  # unknown
None

In general the symbolic expressions can be more complicated than these two examples and so evaluating a condition like s.is_positive can be an expensive operation. Typically the code that uses this has no way to judge how expensive it might be though so in general you just want to avoid evaluating any condition if possible.

Working with such three-valued bools is awkward because the normal logical operators like and, or and not as well as all and any do not always handle the None case correctly e.g.:

In [8]: not None  # should be None
Out[8]: True

Instead we have special functions for this:

In [12]: print(fuzzy_not(None))
None

In [13]: print(fuzzy_and([True, None]))
None

In [15]: print(fuzzy_and([False, True, None]))
False

Functions like fuzzy_and and fuzzy_or are like all and any. They take iterables and can shortcut iteration which is nice if you have a lazy generator expression:

# Can short-cut if any False is seen
if fuzzy_and(arg.is_positive for arg in args):

It is awkward though to handle the case of having, say two or three discrete conditions if you want to shortcut e.g. if you have this two-valued logic:

return x.is_positive or y.is_negative

In three-valued logic using or like this does not work correctly:

In [17]: None or False # should be None
Out[17]: False

You can rewrite it for three-valued logic as

return fuzzy_or([x.is_positive, y.is_negative])

Now the problem is that the condition y.is_negative will be evaluated even if x.is_positive is True: we lost the shortcut behaviour of or.

I have contemplated rewriting this sort of thing as

def x_pos_y_neg():
    yield x.is_positive
    yield y.is_negative

return fuzzy_or(x_pos_y_neg())

That is sort of cryptic though, adds boilerplate and makes the code hard to read. It all works nicely if you already have a lazy iterable but wrapping discrete cases into a generator function like this makes the code convoluted and obscures the logical expressions. Code using three-valued logic is already hard to understand and hard to write correctly so we don’t want to make it any harder than it needs to be.

The proposal here would mean that you could do:

return fuzzy_or(<x.is_positive, y.is_negative>)

That would be the best option if it were possible so that it still looks close to normal code with two-valued logic but can handle the three-valued logic correctly and can also shortcut when possible to avoid evaluating the y.is_negative condition.

3 Likes

is a bit confusing because the word “generator” can
mean either a generator-function or generator-iterator depending on the
context. If it said instead, would that help?

Or are you suggesting that generators should somehow be given another
method of construction that returns a non-iterator iterable? Not sure
how that would work or what use it would have.

Ah, OK. So in a sense, this is “simply” a restatement of the request for lazy evaluation that’s been proposed (many times!) before? That makes more sense to me. I’m generally in favour of having some form of lazy evaluation - your example is one good case. The problem is, as with many things, finding a good design that captures the functionality, fits well with the rest of the language, and is sufficiently easy to understand.

Framing the feature as a generator expression is an interesting idea. I’d define the construct as follows:

The expression <expr1, expr2, ...> is equivalent to defining a generator function

def _gen():
    yield expr1
    yield expr2
    ...

and then replacing the <...> expression with the call _gen() (the _gen name is private to the implementation and cannot be accessed by the user).

I’m not sold on the <...> syntax, and I’d like to see some more examples of how this would be used and taught, because it feels like there’s a risk that people would find the evaluation order confusing, and it may be a little too limited (you have to consume the values in order, so you can’t define, for example, a shortcutting if function with this). But it has potential.

Obviously, to go anywhere, this would need a PEP. And anyone writing such a PEP would have to look at all of the previous discussions around lazy evaluation, and discuss how this fits alongside those proposals. But if someone has the energy to do this, I think it would be a worthwhile proposal. I’ve no idea if it would succeed, but that’s a separate question.

1 Like

I think the nuance can be used to support the idea.

I understand that the term “generator” can mean multiple things, perhaps reasonably so. In the language it’s defined as a function that returns a generator iterator. The interpreter (at for 3.9 and 3.12) doesn’t type functions that return a generator iterator as such. This makes the multiple meaning aspect confusing— a generator iterator being a generator.

The conclusion I’ve reached is that, in spite of its definition, a generator is a generator iterator. In this conclusion I find appeal in the argument “wait, a generator is a generator, and should be entitled to similar conveniences as the other popular iterables”.

Indeed the proposal as it is is too limited to be useful for me even though @oscarbenjamin showed a possible use case as deferred evaluations.

I think the proposal can be made more useful if the following two features are added:

Conditional yield:
An if clause that skips yielding an expression entirely:

<"abc", f"{expensive()} xyz" if cheap_check(), "something else">

equivalent to calling _gen defined as:

def _gen():
    yield "abc"
    if cheap_check():
        yield f"{expensive()} xyz"
    yield "something else"

Shorthand for yield from:
Using the star operator not to unpack, but to perform yield from, within angle brackets:

<1, *range(3, 5), 6>

equivalent to calling _gen defined as:

def _gen():
    yield 1
    yield from range(3, 5)
    yield 6

And a combination of the two features:

<1, *<3, 4> if condition, 6>

which yields 1, 3, 4, 6 if condition is true or 1, 6 if false.

1 Like

Martin,

This is becoming an ever-moving topic. Without quoting, you directly, I will just comment on your comments.

There are languages that expand the range of printable characters used to include some subset of UNICODE. An example would be how SCALA other characters to be used if they are in UNICODE categories mathematical symbol or “other symbol” so you can use

And SCALA allows unused combinations like &+ or fairly arbitrary UNICODE if regularly enclosed in backticks as in:

`r→f`

But Python is designed very differently when you look closer as SCALA was designed to be scalable as in some parts of it are really very sparse and you can pretty much add things to make your own languages. Many reserved words are not actually reserved and the language constructs are often rearranged into a series of method calls.

My point is that the ASCII limitation may relax and especially in newer or different languages. It can be done on a limited basis or like some languages, the editing environment or the language itself can allow you to type in something like [[:epsilon:]] and convert it for you. And note some really old languages like APL supported combinations of symbols overlaid on top of each other as variable names.

If you look at human languages, clearly some are a challenge to input using a keyboard of a standard size. When I have to write in the ones I know, using only my usual keyboard or perhaps a pop-up variable keyboard on my phone, it can be a slower process to get say the letter “o” with an acute accent or grave accent or an umlaut or other constructions as used in Hungarian or Esperanto or slashes embedded and so on. Or consider languages like Japanese which have several related alphabets plus lots of borrowed symbols from languages like Chinese. There cannot really be a keyboard that allows me to type everything including Cyrillic or Hebrew or Arabic.

This is not limited to programming and I suspect we will have reasonable solutions that expect languages to support some more symbols, at least in some parts. After a while, inputting some of these symbols may become a standard part of any advanced education, while at the same time, few will learn how to write in script!

:musical_score:

I just added an emoji in a language I am not very familiar with, musical notation. The actual code inserted was:

:musical_score:

This works because this system processes things using, among other things, markdown and supports the ability in some places to add HTML or maybe TeX to fine tune things. Clearly, programming languages and environments could be set up so users could use alternate forms like the above during editing stages or view it in a raw format when needed.

1 Like

Martin,

This is, as stated, wandering away from the topic but short-circuit evaluation is really just syntactic sugar on the built-in functionality such as using the if statement. In the use being discussed, you could make a nested series of if/elif/else statements that guide your code to only evaluate what is absolutely needed. The in-line versions of if may operate similarly.

So using or/and may just be a more compact way and, as I show, can avoid evaluation.

But in some sense the interpreter does have to at least read everything if only to be able to recognize where a region ends.

I have been thinking about that in the context of what can be done within the language with minimal impact to get what is effectively deferred evaluation.

Speaking in generalities, Python allows you to create objects with dunder methods that can be looked for, and if found, called. So what would happen if we created an object called a wrapper that held contents that would include an unevaluated function call as well as other fields such as a lifetime count where zero means it is ready to be evaluated and 9 means it has nine more lives before being evaluated. It would also contain some dunder method that when evaluated simply decrements the lives counter and if nonzero, returns itself while if zero, it returns what is to be evaluated instead.

This would perhaps also allow wrapped objects that when invoked, look around and decide whether winter would soon be over in Punxsutawney or simply return itself to be tried again later.

But the question then would be how exactly we could instantiate such a payload without invoking an immediate evaluation. Using some text version of the code has some dangers as using an eval can trigger effects. Using some kind of symbolic notation could perhaps be done, albeit it has drawbacks.

But many languages do use deferred abilities. Consider any web page and many other GUI where various methods are used by arranging callbacks to functions when a mouse passes over a region which result in a section of text suddenly shifting to be red and bold and to revert as soon as the mouse has moved on or a button is clicked.

But this seems to rely on encapsulating what you want to do in a function and then only supplying a pointer to the function. Perhaps using an anonymous function in-line may supply that for some cases.

Gotta go.

This looks pretty cool. I honestly didn’t even know that yield from was a thing, but it would make perfect sense with the star operator in this case.

Part of what struck me when transitioning from the C family of languages to Python several years ago was the focus on readability by avoiding clever use of symbols. (e.g. let’s say and, or , and not instead of &&,|| and !.

One of the nice things about generators is they use a word, yield, that can be googled and examples sought out, etc.

Let’s not make a symbolic mess of less thans / HTML tags that’s hard to read and difficult to google. (ChapGPT will be no help due to training set delays).

How about:

queries = yield from "abc", expensive_func(), "something else"

The reader coming up this will realize something new is up and it probably has something to do with generators.

1 Like

The yield from expression is already part of Python since 3.3. Also, the verb yield is already widely understood to imply making the current function a generator, so even if yield from isn’t already part of the existing grammar, it would make using it in a function confusing:

def f():
    # the following makes f look like a generator
    queries = yield from "abc", expensive_func(), "something else"`

We may use some other keywords such as a for expression enclosed in parentheses:

for i in (for 1, *(for 3, 4) if condition, 6, *(for 8, 9), 11):
    ...

It’s similar to a generator expression that already uses the for keyword so it’s easy to remember, and is disambiguated from a generator expression by having no expression before the for keyword.

The same code with angle brackets for comparison:

for i in <1, *<3, 4> if condition, 6, *<8, 9>, 11>:
    ...

I personally don’t find angle brackets too cryptic since < and > together still look like a container, and once made official people stumbling upon the new code should easily be able to search for “python angle brackets” to learn about the new syntax, which should then easily stick once learned because of the simplicity.

I’d be fine with either syntaxes though.

1 Like

New keywords can be introduced less painfully now, though. If the need for such iterator expressions was strong enough, something like

queries = iterate over "abc", expensive_func(), "something else"

could be considered.

Let’s do it sooner than later, while there are still COBOL enthusiasts around to bask in the increase of such English-like expressions. :smile:

2 Likes