This is my first time trying to write out a proposal so bare with me. I think generators are an amazing tooll for generating sequences that would typically end-up needing a heck of a lot of for loops and other such code to benefit from the re-usability + lazy revaluation they offer.
But in my experience I find there are two cases where they fall short.
It’s easy to end up with a rats nest i.e tqdm(enumerate(windowed(brackets, 2)))
Or can un-nest it but I’ve had many bugs caused by the order of lines being mixed up or people forgetting they need to convert to a list because what they are using is a generator because the list is stored in a variable with another name
I would love to keep the compactness and colocatedness of option 1 with the readability of 2.
Therefore my proposal is to introduce a funnel operator, it would accept an iterable on the left-hand side, a generator factory on the right and return an iterator of it’s own. Each invocation would be evaluated from left to right.
The list would be passed to the filter generator factory and return a generator which would then be passed to the map generator factories. Generator factories are just functions which accept an iterator and then return a generator which consumes that iterable.
So we end up with this kind of nice succinct code from my example ealier
for index, bracket in brackets |> windowed(2) |> enumerate() |> tqdm():
# Code here
pass
So why generator factories at all? At first I thought we could use some kind of system of currying the iterable to the existing generators, but unfortunetly the existing ones in itertool and python often have inconsistent interfaces where sometimes the iterable needs the be the first, second or N argument.
Therefore we need a factory function which handles the details of how to construct these generators. This also may allow for backwards compatibility for example if map were to become a factory since the function could perhaps handle cases where it’s called expecting to be a generator vs a factory.
But yeah I’m not really sure how you would pull this off regarding backwards compatability.
I like pipelines like this in general in most languages that have them, I’m not sure we should restrict this to generators though. Would be great if this just forwarded values and had behavior defined in a way that works for anything being forwarded on.
It could probably be done by providing a namespaced set of factory functions
itertools.pipeable (any other name or location possible)
where the public functions within that namespace are functions that return functions taking only an iterable.
I can’t think of a better way that has semantics that are nice to work with. It might be possible with a special dunder, but this would require detecting function calls that are part of a pipeline and handling them differently, which I’m not enthusiastic about.
itertools.pairwise does the job of functools.partial(windowed,n=2) if step and fillvalue are not required.
To the point, as Michael says, why limit the possible applications of the necessary work to generators? Why exclude other callables? I thought about homebrewing a Currying syntax too, using __rrshift__ and >>, accepting and returning kwargs. It’s a common pattern to manually create a compose function, and use that with functools.reduce.
At first I thought we could use some kind of system of currying the iterable to the existing generators, but unfortunetly the existing ones in itertool and python often have inconsistent interfaces where sometimes the iterable needs the be the first, second or N argument.
This is fixable with partial calls and introspection. Maybe explicit is better than implicit, however.
If you’re going to run with the original idea regardless, I’d prefer: (brackets |> pairwise |> enumerate |> tqdm) to: brackets |> windowed(2) |> enumerate() |> tqdm() , i.e. let |> make the calls.
I would personally be happy to have this in the standard library but I think this could be first done as a separate package where you create a wrapper object that implements its own __or__ method. This kind of feature will probably require a PEP because it’s non-trivial and affects the parser and its performances, though it would definitely help in readability (you wouldn’t have to introduce a bunch of temporary variables). The pipeline operator is somewhat exposed in Javascript using RxJs but is native in Elixir, Erlang, Haskell and probably others, so we have at least some precedence. However, those are functional languages by essence (or at least that’s how I would categorize them), which Python is not.
Note that brackets |> pairwise |> enumerate |> tqdm would be preferrable because enumerate() needs to be detected as being part of the pipeline and not being a function call, so it would probably be more work on Python’s side (I assume the use of enumerate() with parentheses is because you’re taking Elixir/Erlang syntax where you refer to the functions in the pipeline like this).
That’s clearly a for loop, so in this case, it cannot return a value:
Mmh. Yes but that’s not really an issue I think. We can have a |> /dev/null call at the end to suppress whatever it is being returned (i.e., instead of a map(), consider it to be a forEach()). Or change the syntax, e.g., |@ print which would just call print on each result without returning anything (and the entire pipe would just return a single None).
Yeah something like that which wraps in in an object will be useful especially as it makes it clear how the or operator should work
There have been a few points I would like to clarify from this thread. I’m not married to any particular syntax but I think I was ultimately after the currying syntax but wasn’t sure how we could make this backwards compatible with all generators from itertools
Ideally
[1,2] |> filter(deleted)
Is same as
filter(deleted, [1,2]) or partial(filter, deleted)([1,2]
Treating it as a curry operator keeps it simple since it’s just another way of passing arguments in a pipeline like manner without needing nesting or temp variables
For this reason that’s why in my example the generators we don’t configure still have empty parameters because otherwise we would need to figure out how to handle
This
list |> generator
And
list |> generator(args)
Keeping it the same with a () allow python to just treat both case as a simple function call just with an applied argument at the end which is pushed by the return value of the thing on the left hand side of the operator
This also solves questions like what happens if they use print
Something like this
bar= list |> filter(deleted) |> print
Would just print the generator object and return none same as it does with bar =. print(filter(deleted, list))
Instead they would need to wrap it in a each like so
list |> filter(deleted) |> each(delete)
This would also unlock a nice way of saying you want it to be a list at the end
|> flatten |> list
Which could make sure we get a list rather than a generator and I could see it maybe allowing for patterns such as trailing closure blocks one day
The problem is that oftentimes an iterable is not the only argument to a generator such as windowed and tqdm so it is necessary for the proposed syntax to accommodate additional arguments unless we are to clutter up the code with partial.
Making the piped iterable an implicit first argument to a call to a generator may be a necessary compromise to strike a balance between cleanliness and usefulness.
EDIT: One possible solution is to make the parentheses optional such that a call to the right operand is made with the piped iterable as the only argument if the right operand is callable:
One big issue is that many existing iterable helper functions take an iterable not as the first argument, but as the second. Examples include filter, map, reduce, starmap, takewhile, etc., and I don’t see a good way to allow specifying the position of the iterable while keeping the syntax clean, although one can always create a wrapper function to swap the position of the iterable argument to the first.
You can use >> for shifting the positions. The syntax would be ugly but it could help (pipeline('abcde') >> 1)(second, 1) with def second(number, letter): .... On the other hand, you could say that [1] acts as the shift, so that you don’t need the extras () because of >>, e.g., pipeline('abcdef')[1](second, 1) (still ugly).
You could also some .shift() method on pipeline objects or a shift function, e.g., pipeline('abcdef')(shift, 1)(second, 1).
Interesting suggestions, but all of them look still too verbose and clunky to me.
Since there is almost no iterable helper function that takes an iterable as the third argument (the multiple-iterable form of map notwithstanding), a possibly more eye-pleasing syntax may be to use > to denote piping the iterable as the first argument and >> to denote piping as the second:
If you gave yourself a name for the pipeline output, it might work better. One of the fun things about inventing completely new syntax is that you can create all sorts of interesting things like special names.
e.g.
I wrote another library, called pipeline_func, that might be of interest to those in this discussion. Here’s what it’s looks like in action:
>>> from pipeline_func import f, X
>>> a = [3, 4, 2, 1, 0]
>>> b = [2, 1, 4, 0, 3]
>>> a | f(zip, b) | f(map, min, X) | f(filter, bool, X) | f(sorted)
[1, 2, 2]
f is a class that wraps any arbitrary callable and implements the pipe syntax. X is an object that stands in for the output from the previous step, thereby accommodating functions like filter and map where the iterable is the second argument. By default, though, the first argument to the current step is the output from the previous step.
The X abstraction isn’t perfect; it doesn’t work if X is contained in another object (e.g. a list or a dict). To my knowledge, there’s no way to make an object that replaces itself with some other object the first time it’s accessed, which I think is what’s needed here. Maybe that would be a useful feature request for python itself. (It would be possible to replace X in nested data structures by pickling/unpickling each argument, but this would be way too much overhead.)
I’m probably biased, but I think this is a pretty nice syntax already. Here’s an example of what it looks like in a real project, which I think is a little easier to understand than the contrived example above.
I agree with the desire of the original post, though for general objects and not just generators.
Having opening and closing brackets far apart is ugly, hurts readability, and is annoying to write. And being forced to use intermediate variables leads to mistakes.
As an additional advantage, having such notation available would reduce the need for subclassing. (and wrappers.)
Though I would note that
for index, bracket in brackets |> windowed(2) |> enumerate() |> tqdm():
...
looked confusing to me at first.
I think it’s preferable to demand the brackets, so that f |> g can be interpreted as partial(g, f). The code you end up with that way looks more similar to other python code, specifically it will look very similar to method chaining, which python users are probably familiar with.
Then again it wouldn’t completely clear to me whether f |> g |> h should mean partial(h, partial(g, f)) or partial(partial(h, g), f) within that paradigm. Or whether that would even matter.
This reminds me of something someone else wrote, which was an alternative way to define lambda functions. I can’t remember exactly which symbol they proposed, but essentially
filter(lambda x: x % 2 == 0, __) === lambda PIPE: filter(lambda x: x % 2 == 0, PIPE).
Would there be a good reason to use a context-specific keyword, rather than introducing a more general tool that can be used to quickly define partial-ish lambda functions anywhere?
I don’t know whether your example or mine would work. It seems to me there is potential for it not to be clear where the function/expression that depends on PIPE ends. But if that problem can be fixed, then it can be fixed.