List Comprehension as Compulsory Functional Style

mlgtechuser · July 2, 2022, 11:43am

There’s little doubt that the remarks below these list comprehensions were made in good faith, but I can’t find any corroboration in docs.python.org for the assertion that list comprehensions are only meant to be used in a functional programming paradigm.

10: [a_list.pop(len(a_list) - 1 - i) for i,ss in
    enumerate(a_list[::-1]) if ss[-1].startswith('X1:')]
20: [a_list.remove(ss) for ss in a_list[::-1] if ss[-1].startswith('X1:')]

Comprehension expressions are functional style.
Do not misuse comprehensions as another way to write a for loop.

The case seems to be: IF you’re using a functional programming model, then list comprehensions should only produce outputs from the inputs and not modify the inputs.

Does anyone know of a Pythonic Style or best practices guide that says “List comprehensions should only be used for Functional Programming” ? The lines of code above are terse and somewhat obscure, but terse tends to be esoteric and is sometimes an acceptable compromise for compactness. (The lines came from a use case of parsing an esoteric and oddly “structured” dataset, so it’s not surprising that some esoteric code came out of it.)

One of Python’s main features–and strengths–is its willingness to work with statement structures that the programmer comes up with. Many Pythonistas are familiar with the power of prototyping in Python and then porting the algorithm to a faster language. In fact, you can optimize and perfect that prototype so quickly and easily in Python that you might achieve the performance you need fully in Python and not even bother to port it over. Using List Comprehension to mutate that state of an object in-situ appears to be simply a case of Python’s flexibility and ease-of-use.

List comprehensions are reportedly adopted from Haskell [docs.python.org] (although mathematical set theory is their original source; Haskell’s comprehension uses mathematical set theory notation and syntax). Perhaps the ‘Functional Programming only’ model of list comprehension is simply carrying over from Haskell. Haskell’s slogan, right under the logo on its home page is “An advanced, purely functional programming language”.

[EDIT: ] P.S. Here’s an essay by GvR on Origin of Python’s “Function” Features with an informative quote:

I have never considered Python to be heavily influenced by functional languages, no matter what people say or think.

(P.P.S. This Quora post has a well-written comparison on the difference between Haskell and Python list comprehension.)

The relevant docs I checked:

Functional Programming HOWTO (a singelton search result. See this Google search.)
Tutorial Library – Real Python
Many references on Functional Programming that don’t inherently apply to the context of List Comprehension.

vbrozik · July 2, 2022, 10:29pm

Ok, it was me, who recently pointed out that comprehensions should not be misused as an alternative way how to write a for loop. I will try to show my main arguments:

Readability

I would format the comprehension this way for readability:

[
    a_list.pop(len(a_list) - 1 - i)
    for i, ss in enumerate(a_list[::-1])
    if ss[-1].startswith('X1:')]

It is still a very complex piece of code: iteration in reverse, using iteration index, modifying the length of the original list, computing with this changing length , creating a new list… Originally I thought that the code splits the list into two according to the condition but I realized that it fails to do so because the enumerate index i comes from the original length but you compute the pop index combining it with the actual (different) length of the list. You can easily make the index to go out of range.^[1]

Here is much more readable implementation which works:

a_list_x1, a_list_nox1 = [], []
for item in a_list:
    if item[-1].startswith('X1:'):
        a_list_x1.append(item)
    else:
        a_list_nox1.append(item)

…and the same for lovers of dense code while retaining some readability:

a_list_x1, a_list_nox1 = [], []
for item in a_list:
    (a_list_nox1, a_list_x1)[item[-1].startswith('X1:')].append(item)

Conclusion for this use of comprehensions: readability suffers, similar dense code is a hotbed of mistakes.

Purpose

I think the philosophy of Python aims to minimize availability of multiple ways how can be the same thing accomplished. From the Zen of Python

There should be one-- and preferably only one --obvious way to do it.

By the way it is the opposite of the Perl motto:

There’s more than one way to do it

Comprehensions and generator expressions were designed to be used for creating new containers (lists, sets, dictionaries) and iterators. See the documentation, PEP202, PEP0274.

If these constructs were explained to be used as another way of writing a for loop then I am sure that they would have never been accepted into Python.

Result of the expression is always list of certain number of None. E.g. [None, None, None] What is this for? It looks like the only result wanted from the expression is its side-effect. I think this is an obvious misuse of the list comprehensions. They were never intended to be used like that.

Note that the way the code works is a little bit complicated again. It iterates a_list from the end and for every match list.remove() searches the list from the beginning ! and removes the item according to the value. So it can remove a different item that the item matched!

More direct implementation using a for loop. Here we remove directly the item matched:

a_last_index = len(a_list) - 1
for reversed_index, item in enumerate(a_list[::-1]):
    if item[-1].startswith('X1:'):
        del a_list[a_last_index - reversed_index]

As we could se in both examples, modifying lists (in the sense of adding and removing items) in-place is pretty complicated. In Python we usually create an iterator:

(item for item in a_list if not item[-1].startswith('X1:'))

or a new list - for small lists, when we need to iterate it multiple times etc:

[item for item in a_list if not item[-1].startswith('X1:')]

Interesting references

https://mail.python.org/pipermail/python-list/2008-May/632671.html

I hope I will soon publish my Jupyter notebooks with the tests. ↩︎

cameron · July 2, 2022, 11:42pm

By Leland Parker via Discussions on Python.org at 02Jul2022 11:53:

There’s little doubt that the remarks below these list comprehensions
were made in good faith, but I can’t find any corroboration in
docs.python.org for the assertion that list comprehensions are only
meant to be used in a functional programming paradigm.
10: [a_list.pop(len(a_list) - 1 - i) for i,ss in
   enumerate(a_list[::-1]) if ss[-1].startswith('X1:')]
20: [a_list.remove(ss) for ss in a_list[::-1] if ss[-1].startswith('X1:')]
Comprehension expressions are functional style.
Do not misuse comprehensions as another way to write a for loop.

The case seems to be: IF you’re using a functional programming model, then list comprehensions should only produce outputs from the inputs and not modify the inputs.
[…]

I think you’re reading a recommendation as a prescription. It’s more
about what’s appropriate.

To my mind there are 2 outstanding reasons to use a for-loop a lot of
the time:

often it is a more clear expression of what is being done (though
definitely not always)
a list comprehension precomputes the entire result; in a complex
nested comprehension you may be precomputing the entire intermediate
result; a for-loop provides progressive results which do not of
themselves need storing as a whole

The former point is basicly readability.

The latter is resource consumption, both in time and space.

One of the pleasures of a functional language is that is it possible to
write code about unbounded sequences eg “all the primes”. While that’s
an extreme case for a list comprehension in that it actually wouldn’t
complete, there are plenty of “large” comprehensions.

In a functional language, these expressions are actually almost
equivalent to modern Python “generator expressions”: values computed as
required.

The functional style writes code as static expressions such as list
comprehesions or generator expressions (or generators in general)
instead of overtly interactive procedural things like for-loops and
if-statements.

For lot of things the functional style makes reasoning about the code
more clear. Example:

x for x in primes() if x in cubess_plus_one()

That’s clearly a set intersection between the unbounded set of primes
and the unbounded set of cubes-plus-one (to pick something which might
plausibly contain some primes, since “cubes” obviously wouldn’t,
ignoring 0 and 1).

So the outcome of the above expression has a clear conceptual
definition.

It might be nice to write it as:

primes() ^ cubes_plus_one()

if the language supported that.

Unfortunately it won’t run. You can test membership in a generator
result with “in”:

>>> def cubes_plus_one():
...     for i in range(1024):
...         yield i*i*i + 1
...
>>> 9 in cubes_plus_one()
True
>>> 10 in cubes_plus_one()
False

but you will notice that it is not unbounded as written. We could write
an unbounded version (set i=0, then just count up indefinitely) but
while “in” would succeed when the target value occurs in the result, it
would never complete for values which do not occur.

You can write a generator to yield the target values progressively if
you have implementations of primes() and cubes_plus_one() which
yield results in numeric order (both easy) and where cubes_plus_one()
accepts a bound (like the hardwired 1024 in the example above), so as
to allow a deterministic True or False by only running
cubes_plus_one() far enough to be sure.

That would come out as a procedural generator function in Python, using
yield to yield values as encountered.

Written as list comprehensions it would (a) never complete and (b) try
to precompute the result in its entirely. Even for things which do
complete, that can easily consume a lot of memory and time and often
your larger task may not want all the results, making your
implementation needlessly slow and expensive.

So I’d take this as:

a recommendation to choose your implementation for readability and
resource frugality
a hint that misuse of list comprehensions can be excessively expensive

Python isn’t a functional language, but you can go a long way there with
generators. That doesn’t means they’re always the better course of
action.

“The code is more what you’d call ‘guidelines’ than actual rules.”
-Hector Barbossa

Cheers,
Cameron Simpson cs@cskk.id.au

vbrozik · July 3, 2022, 10:49pm

Finally I put my Jupyter notebooks to GitHub.

Here I examine the two list comprehensions:
python-ntb/problems_from_forums/2022-07-02_comprehensions.ipynb

Let me know if I should add an explanation.

mlgtechuser · July 4, 2022, 2:23am

I do view it as a recommendation, but it was phrased as a prescription (e.g. there are no qualifiers such as “usually”) and then the post was referenced later as a negative example by link in a subsequent topic. (Some context: The two examples above were an exercise to see how many ways the problem could be solved. These are the last two of the five posted, starting with the more ideal versions of the iteration.)

This is an important consideration, absolutely. It would seem that there’s room for license where the lists are short.

“Sometimes it’s okay to be a ‘Pyrate’.”
-Leland Parker

mlgtechuser · July 4, 2022, 3:24am

This is well stated. As mentioned in my reply to Cameron, the context for those list comprehensions was of a “let’s see what’s possible” type rather than an exhibition of best practices. Working with a copy is definitely a better practice than mutating your data in place, and favoring readability is, too.

There should be one-- and preferably only one --obvious way to do it.

I’ve not seen anything about minimizing ways to accomplish something, but rather there seems to be a bounty of different ways, with the goal of doing it the “most Pythonic” way. This is a fertile point of discussion, as references to The Zen of Python tend to be. Since these guidelines are largely theoretical and philosophical, they manifest in various ways in practice. Note that the above reference is highly qualified (worded so as to be limited in its absoluteness).

should…
preferably…

A more absolute phrasing that could be used, but isn’t:

“There is one–and only one–obvious way to do it.”

A more complete phrasing might be:

“There should be one-- and preferably only one --obvious way to do it in a given context.”

A given context has many facets (speed, resource usage, size of the code file, etc.). General principles help to move the actual outcome toward an ideal outcome, but sometimes compromise is not only okay in a given context, but necessary.

Trying to write a guideline that handles every case is not only an impossible task, but is a recipe for frustration for all involved. Guidelines, of necessity, should be broad rather than narrow and must have some flexibility. The genius is in striking a workable balance between broad and specific --and without including the policy maker’s individual opinion (preferably none, though sometimes individual judgment does come into play).

And for some balance against dogmatic tendencies, we have…

From PEP 8::
A Foolish Consistency is the Hobgoblin of Little Minds.

mlgtechuser · July 4, 2022, 3:26am

Thank you for the references, Václav.

The code lines were only included in the OP to illustrate the context of a mutating list comprehension.

Conclusion for this use of comprehensions: readability suffers, similar dense code is a hotbed of mistakes.

The terseness and opaque readability is acknowledged in the OP. I also converted to more grammatical variable names and there may be typos or other conversion errors. These are separate from the topic of using list comprehension for mutating.

I’ve been meaning to point out that the forum format strongly influences code formatting practices. A line of code that runs off the side of the code block degrades the reading experience, so the line might be cut here at Discuss whereas it would be kept intact in a codebase. Similarly, omitting blank lines provide a more concise post and might produce a more readable post than if the number of lines produce a scrolling code block. I prefer comments on a separate line but will tend to use appended comments here. Anyone studying the code in depth can copy/paste into an editor or use the wide-screen link in the upper right corner of the code block.

So some consideration of the forum context/format is appropriate, is it not?

From Python Tutorial 5.1.3 - List Comprehensions:

List comprehensions provide a concise way to create lists. Common applications are to make new lists

True, but it doesn’t say that they “aren’t a concise way to iterate over a list and shouldn’t be used for this purpose”.

From PEP 202 - List Comprehensions:

List comprehensions provide a more concise way to create lists in situations where map() and filter() and/or nested loops would currently be used.

This one contains an interesting reference to conciseness and, as above, doesn’t prohibit list comprehension from being used as a concise for: loop.

From PEP 274: Dict Comprehensions

comprehensions can provide a more syntactically succinct idiom than the traditional for loop.
[sic: original says “idiom that the”]

This reference explicitly endorses use of list comprehension for conciseness.

vbrozik · July 4, 2022, 8:58am

I will try to not make my reply too long.

I have to admit that my reaction which started this, was probably too strict:

It is my point of view how readable code should look like. I still strongly stand by this statement: Please do not misuse comprehensions as another way to write a for loop.

The principles of readability and intended use I see as the “Pythonic” way. I do not think that the intended use should be followed always but I think in both the examples there should not be modification of the input data inside the comprehension.

Yes, but when you start with these facets too early, it is called premature optimization. In Python I prefer readability. It is important especially in cases when the code should be well understood by someone else. (It is certainly the case of code in this forum.)

When the optimization causes the code to be less readable, it should be well explained in comments.

I still think the readability will suffer if you go away from the original intention how to use comprehensions. You do not need to be prohibited from doing something to feel that it is not a good way.

Yes, but I certainly understand the statement in the context that it is for the cases when the shorter code improves readability. I am convinced that this only happens when the comprehensions are not too complex.

In general I think that the urge to squeeze a lot of code to small number of lines is a very bad practice. It is good for puzzles like code-golfing though. How much time do you need to analyze the following code?

from functools import reduce
print(*filter(None,map(lambda y:y*reduce(lambda x,y:x*y!=0,
map(lambda x,y=y:y%x,range(2,int(pow(y,.5)+1))),1),range(2,1000))))

Solution and source: Programming FAQ — Python 3.12.1 documentation

As they said:

Don’t try this at home, kids!

My conclusion: In Python readability first

vbrozik · July 4, 2022, 9:50am

Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.

Brian W. Kernighan

mlgtechuser · July 4, 2022, 10:39am

I did think so and had let that pass but was then surprised when I soon found a link to it as an example of bad form. This made me want to get to the bottom of the strict assertion.

That was the exact case, actually. Perhaps the lesson in this is that sometimes posts contain code golfing purely for fun, so we should beware of taking that too seriously.

vbrozik · July 4, 2022, 11:07am

OK, I did not understand it this way. Now I see it that the series with @cheesebird transformed form helping a beginner to a code-golfing play

I do not take it much seriously but I think this attitude can harm the learning process of the beginners or their attitude to Python. Maybe it is a new theme to be opened about discuss.python.org?

My opinion:

By default the code in answers here should be written very clearly, especially when it is intended for beginners.
The Python constructs should preferably be used for their designed purpose.
If the code is part of playful puzzles it should be obvious that this is not the way Python should be used in normal projects.

steven.daprano · July 4, 2022, 12:08pm

I’m going to be blunt here: list comprehensions are a concise expression for creating lists, not for running a for-loop for its side-effects.

Using a list comprehension for a procedural loop (a loop that runs procedures for the sake of their side-effects) is an abuse of the syntax, and less efficient too. For tiny for-loops it may only be a small amount less efficient, but for large for-loops it may be extremely inefficient.

A list comprehension is more or less syntactic sugar for an accumulator in a loop:

# [expression for item in iterable]
# is equivalent to
accumulator = []
for item in iterable:
    accumulator.append(expression)
return accumulator

Yes, there is a for-loop buried there, but that’s not the purpose of the comprehension. The purpose is to collect the values from evaluating the expression and return it in a list. If you use a list comprehension for the side-effects only, and then disgard the list, your code is wasteful and ineffecient and will surprise your readers:

# this is poor quality code, don't do this
# equivalent to side-effect only
# [print(expression) for item in iterable]
accumulator = []
for item in iterable:
    accumulator.append(print(expression))
del accumulator

You wouldn’t write that as an explicit for-loop, and you shouldn’t do it as a list comp either.

That is why the documentation talks about list comprehensions as a concise syntax for building a list, not as a concise way of executing a for-loop.

The docs don’t have to explicitly say what list comps are not for:

list comprehensions are not for doing simple arithmetic: [1 + x for x in (x,)][0]

just as the instructions for your iPhone or Android probably don’t say that the phone is not for hammering nails, slicing bread or applying paint.

If you want a one-liner for-loop with no result, you can do this:

for item in iterable: print(expression)

But of course it is your code, and you can write anything you like, just as it is your iPhone, and if you want to use it as a very small frying pan, who are we to say you can’t?

If you are writing deliberately obfuscated one-liners, then sure, why not use a comprehension as a for-loop? If you are doing it for fun, that’s fine, just don’t expect me to respect your code if you are doing it in all seriousness.

How about using the map() function instead? An important difference is that map() returns an iterator, so you can evaluate the results at full-speed without accumulating them in a list using the recipe in the itertools documentation:

collections.deque(map(func, values), maxlen=0)

# Could also use a generator comprehension. Note the round brackets (parentheses).
collections.deque((func(value) for value in values), maxlen=0)

which costs only the creation and destruction of a zero-length deque. Last time I tried this, it was actually a little faster than a for-loop, so if you need those side-effects to be executed as quickly as possible, and are willing to sacrifice some readability to save an extra few microseconds, then this one possible hack.

There are times that we need to sacrifice readability for speed, but we should not make a habit of it.

mlgtechuser · July 4, 2022, 3:43pm

I appreciate everyone’s interest and consideration in addressing this question. I’m just trying to understand the philosophical foundation of using list comprehension only for functional style procedures. (Side note: I recently watched the 2018 Lex Fridman interview in which GvR said that he was challenged extensively by folks who wanted to assert that his view of PEP 572 and what aligns with Python’s philosophy was incorrect, so moving the discussion to a philosophical level is no magic wand.)

The damage is obvious in these examples, as is the unreadability of gruesome constructions.
^[1]
These cases are self-evident, though. There’s something more subtle in the side effects of list comprehensions because they appear to be useful shorthand in certain cases, such as mutating short lists in some trivial side process. To rephrase the question within the context of abuse: what is the damage in using list comprehension for its side effects? We can assert that something is bad/ugly/abuse or only intended for X, but what’s the philosophical situation? What’s the core-level case against this beyond style, opinion, or purist dogma?

@cameron hit on the philosophical question with the point that list comprehension creates a list in memory and therefore wastes resources. This is an adverse and objective (not based on opinion) outcome. But what happens to this potential consumption of memory if the comprehension isn’t assigned to anything? I’m genuinely interested in knowing so I can understand Python better.

This is along the same lines. What are the inefficiencies?

Readability doesn’t always prevent this shorthand for: because a simple list comprehension with side effects can be very transparent, just as a simple single-line if: can be more readable than when broken into two lines:
(else: break). (Off-topic example simply for illustration. No need to respond, especially since this is already the subject of a recent topic.)

I find this perfectly readable:

ints = [1,2,3,4,5,6,7,8,9]
[ints.remove(x) for x in ints[::-1] if x %2 == 1]  #removes the odd numbers

^{(Yes, it iterates backwards to ensure that the tail of the shrinking list doesn’t slip under the iteration. Yes, it’s probably “too clever” and yes, that was a fun puzzle to work out. )}

As it turns out, the list methods might be the only mutations that can operate in a list comprehension, so there’s some built-in self-policing. Pylance in VS Code complains about the following assignment and can’t resolve the scope for the first instance of ’ x ':

[x+=1 for x in ints if x %2 == 1]
#would bump odd numbers in the list to even--if it would run

(This is also quite readable if you know the grammar of conditional list comprehensions and the += shorthand. That’s a moot point, though, since it doesn’t run.)

I do agree that such a construct as [ints.remove(x) for x in...] can be abused. So can many legitimate functions. Maybe I just stumbled upon an unintended artifact while noodling around with list comprehensions. Is that worth publicly bonking someone with a hammer or other blunt instrument about…or with your iPhone?

@Vbrozik, did you make this up or did you pull it out of some Hall of Shame?
print(*filter(None,map(lambda y:y*reduce(lambda x,y:x*y!=0,map(lambda x,y=y:y%x,range(2,int(pow(y,.5)+1))),1),range(2,1000)))))
The stripped out spaces that make it less readable are cheating, of course, but then it would be even LONGER! ↩︎

vbrozik · July 4, 2022, 5:31pm

It is from the official Python documentation. There is a link below the code. Note that this is similar to some of your comprehensions, it is just a little bit more extreme. …but as I am able to see it is a pure functional code (no dirty side-effects).

I am afraid we are already going in circles. Steven already said something like this:^[1]

In your fun code do whatever you want.
In a code to be used by others, do not misuse the language constructs for something different.

So just some short reactions:

I think the answer was already said. So just some details to add:

Waste of computer resources (mainly operational memory). If you have a lot of free memory it does not mean that it is not a problem because you are filling your memory cache with garbage data making it ineffective.
Unexpected behaviour of the code making it difficult to understand it and easier to make mistakes.

About wasting your meory cache:

When the comprehension is finished (and the result is not bound to any variable) there is no more any reference to the created list and it gets garbage-colected (de-allocated):
https://devguide.python.org/garbage_collector/

This creates list of None items. The side-effect is unexpected from a comprehension. Earlier I also pointed out that this kind of code does not work how you probably imagine:

It finds a value x, going from the end of the list, satisfying the condition.
Then ints.remove(x) searches the list again! From the beginning and removes the first value x it finds.

So, the code can remove a different item than the item which matched the condition and the code is not very efficient. For very long lists it would perform poorly.

This code is invalid. x+=1 is not an expression.

And I think all the people who replied here would agree with that. ↩︎

mlgtechuser · July 4, 2022, 6:41pm

Agreed that the execution is problematic and the quirks of execution can fool later readers and maintainers. This was an example of readability for the use case given (non-repeating list values). The indexed ~.pop() example doesn’t have that execution issue but is definitely unfriendly and tortured as an expression, so doesn’t illustrate any readability–just the opposite, of course.

As I said in the closing of my previous post, the use of list comprehension as a concise for: doesn’t appear to be broadly applicable enough to have much value anyway. The fact that it can be used in only a limited number of narrow cases is evidence that such use is artificial (an artifact) and not a designed behavior. This seems to be the philosophical answer I was looking for.

Yes, I did an assignment to capture the list contents before my previous post. It was not surprising to find the None objects since print() and several others return None to prevent unintended assignment behaviors. Nor was it surprising that the ~.pop() comprehension creates a list of the popped values.

This part about the list creation when there’s no assignment seems potentially informative and not circular. Does the list of None actually get created somewhere as a list? My thought is that the returned values would go into a null object or some other bit bucket like that.

Quercus · July 4, 2022, 8:12pm

In accordance with much of the response above, the general mindset seems to tilt against invoking list comprehensions to accomplish side effects. In the upgrade from Python 2 to Python 3, some functions, such as range, transitioned from producing a list to producing an iterator. This may stem from a view that reducing unnecessary use of resources, such as memory, is a worthy cause. Accordingly, since what is below would be better handled with a for loop, most would likely frown upon it, though I wasn’t actually frowning when composing it. In any case, the output indicates that a list of None was created.

>>> import string
>>> [print(f"{ord(ch)} {ch}") for ch in string.punctuation]

Output:

33 !
34 "
35 #
36 $
37 %
38 &
39 '
40 (
41 )
42 *
43 +
44 ,
45 -
46 .
47 /
58 :
59 ;
60 <
61 =
62 >
63 ?
64 @
91 [
92 \
93 ]
94 ^
95 _
96 `
123 {
124 |
125 }
126 ~
[None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]

cameron · July 4, 2022, 11:27pm

By Leland Parker via Discussions on Python.org at 04Jul2022 15:53:

I appreciate everyone’s interest and consideration in addressing this
question. I’m just trying to understand the philosophical
foundation of using list comprehension only for functional style
procedures. […]

@cameron hit on the philosophical question with the point that list
comprehension creates a list in memory and therefore wastes resources.

For the record, I do not consider that a philosophical position but a
pragmatic position. I suppose that since programming is in the domain of
“getting things done” it ventures into the philosophy of programming,
but really I was speaking of pragmatic effects here.

Philosophically, to me a list comprehension (or its progressive variant
the generator expression) can be concise way of expressing “here’s a
bunch of values, all of which are derived from this expression here (the
leading expression in the comprehension)”. Thus making it clear that all
these things are instances of some generic situation or case.

[ fn(x)
  for x in source_of_data_here()
  if condition
]

Says to me:

all the values were computed the same way
they came from this source domain
intersected with the source domain of values satisfying condition

As such, the result has a clear semantic meaning.

Of course I also use list comprehensions for mundane practical reasons
too, like “I need a list” or “make a copy of these items” (often spelled
list(the_items) though).

As such, the list comprehension is a nice functional expression: that it
is implemented procedurally internally is not pertinent.

And that bring us to the issue of list comprehensions with side effects:
firstly they can be hard to read, requiring careful thought about the
order in which things happen and secondly the comprehension is such an
apt “functional” expression of some things that using it for operations
with side effects may be actively misleading.

Prior to the walrus operator (:= inline assignment) a comprehension
with side effects wouldn’t even have any overt assignments in it to give
the game away.

These are all reasons to my mind to pretty well never use a list
comprehension to modify data. I almost always read them as static
expressions producing functional results, and expect others would
usually do so as well.

At the very least such a thing requires a LOUD obvious leading comment.

This is an adverse and objective (not based on opinion) outcome. But
what happens to this potential consumption of memory if the
comprehension isn’t assigned to anything? I’m genuinely interested in
knowing so I can understand Python better.

The list gets constructed, consuming memory. Then its reference count
goes to zero and the list and memory are released. The heat death of the
universe advances further.

Steven D'Aprano:

Using a list comprehension for a procedural loop [is] less efficient too. For tiny for-loops it may only be a small amount less efficient, but for large for-loops it may be extremely inefficient.

This is along the same lines. What are the inefficiencies?

Because building a list, particularly incrementally, has costs. Whenever
the list gets bigger, more storage is required. Usually the internals of
such things allocate storage in bursts i.e. over allocate memory for the
list to grow into. But that just mitigates things. When the buffer for
the list references fills, it becomes necessary to allocate a new chunk
of memory and copy the references into it.

[ Aside: there’s a length hint available for objects:
3. Data model — Python 3.12.1 documentation
which the internals can use to size an initial preallocation
for a list being built from an iterable.
Still just mitigation.
]

If you’re just iterating, none of that overhead needs to occur.

Readability doesn’t always prevent this shorthand for: because a simple list comprehension with side effects can be very transparent, just as a simple single-line if: can be more readable than when broken into two lines:
(else: break). (Off-topic example simply for illustration. No need to respond, especially since this is already the subject of a recent topic.)

I find this perfectly readable:
ints = [1,2,3,4,5,6,7,8,9]
[ints.remove(x) for x in ints[::-1] if x %2 == 1]  #removes the odd numbers
^{(Yes, it iterates backwards to ensure that the tail of the shrinking list doesn’t slip under the iteration. Yes, it’s probably “too clever” and yes, that was a fun puzzle to work out. )}

I was going to say exactly this re the backwards iteration.

The reader has to look at the ints[::-1], a well defined idiom which
is still rarely seen, and think why did the author choose this weird
form of the source values?

Versus a functional form:

ints = [1,2,3,4,5,6,7,8,9]
ints2 = [ x for x in ints if x % 2 == 0 ]  # keep the even values

which is far easier to read and would work with the source values
(ints) in any order. Because there are no side effects.

This is why functional forms are so nice: you don’t have to think hard
about order of operations and side effects (the ints.remove(x) manking
the iterator driving the comprehension).

I actually find your example an argument against comprehensions with
side effects.

Cheers,
Cameron Simpson cs@cskk.id.au

steven.daprano · July 5, 2022, 2:03am

I think all this talk about philosophy is over-thinking it. We dislike using comprehensions for their side-effects not because of some theoretical or academic preference for pure functional programming, but because of the concrete practical and pragmatic reason that they are wasteful of machine resources, encourage poor coding techniques, and less obvious to comprehend (pun intended) when reading.

If Python had a “procedure comprehension” syntax that mimics a for-loop in a single line:

❬ procedure(a) for a in iterable if condition(a) ❭

say, without generating a potentially huge list of values which are ignored and then have to be thrown away, then I would probably use it.

But we don’t really need such a thing, because all it ultimately saves us is one or two lines, which is cheap.

Fundamentally, programming code is language, and it is fair to say that we write code equally for the human reader than for the machine to execute. If all we cared about was the machine, we would all be programming in assembly language. But we care so much about the human reader than we have invented hundreds, maybe thousands, of programming languages, trying to make code “more readable” and more comprehensible in some sense or another.

So when we write code, it is good to stick to using common idioms unless there is a good reason not to. If the reader has to think too hard to understand your code, that’s a bad thing.

Of course we should not discount all language innovations. But we should way up the potential benefit of the new idiom against the cost of your readers asking “what the hell is this code doing?”

BTW, “my readers” can include me in six weeks time, when I look back at my own code and say “What sort of insane maniac wrote this???”

steven.daprano · July 5, 2022, 2:30am

Looking at this code:


ints = [1,2,3,4,5,6,7,8,9]

[ints.remove(x) for x in ints[::-1] if x %2 == 1]  #removes the odd numbers

I had to stop and think about whether it always works, or only works because the numbers are in the correct order. What if there are duplicates? On thinking about it, and running some tests, I’m now sure that the code is correct.

But it is inefficient.

It makes a copy of ints, in reverse order. If there were a billion items in the list, it has to copy those billion ints into a new list before it can even start processing them!
You iterate over the copied list, which is an O(N) operation so it takes time proportional to the number of items in the list. But then the remove method is also O(N), so it too takes time proportional to the length of the list. So altogether, the time is proportional to the length of the string squared, which is bad.
You build up a new list consisting of nothing but the value None, which has to be created and then deleted and garbage collected. That isn’t free: it takes time, which can be significant when working with large lists.

So what’s the best way to solve this problem? The best way is not to use list.remove at all! The fastest and most idiomatic way is to use a list comp and slice assignment:


# Modify the original list ints in-place.

ints[:] = [x for x in ints if x%2 != 1]

# Alternative:

ints[:] = filter(lambda n: n%2 != 1, ints)

I think the list comp version will be a little faster, due to the overhead of calling a function in Python. But if both the list comp and filter versions need to call a function, the speed difference is negligible.

If you don’t care about modifying the list in place, you can drop the slice assignment ints[:] = ... and just use ints = .... That will probably be a little faster too.

Another advantage is that if you have threads running in your code, there is no point that another thread could see your list ints in an inconsistent state. At every moment, it is either the original ints, or it is the replacement with all the odd numbers stripped out.

steven.daprano · July 5, 2022, 2:42am

range() does not return an iterator!

Range objects are specialised lazy sequences that compute their contents on demand, so that they don’t have to produce a last list of values up front. But they are sequences, not iterators, and support the full sequence API:

from collections.abc import Sequence
isinstance(range(1000), Sequence)
# returns True

The definitive test for an iterator is to check whether iter(obj) returns obj itself. If it does, it is an iterator. If it does not, then it is some other kind of iterable object, like a string, list, sequence, set, or custom-made iterable object.

r = range(1000)
iter(r) is r
# returns False

Unlike iterators, iterating over a range object does not consume its values. Compare the difference between these:

r = range(100)
50 in r  # returns True
10 in r  # returns True

it = iter(r)
50 in it  # returns True
10 in it  # returns False because the 10 has been consumed.