Surprising behaviour from Python lexical analysis

FelixBlix · December 18, 2019, 3:34pm

Today at work I completely had my mind blown by something that turned out to be completely normal behavior from the lexical analysis of the python interpreter and would like to share, and hear if this is really common knowledge.
I was reviewing code for one of my colleagues and came across something to the effect of:

    if self. some_member:
        do_something()

When I noticed the space after the self. I thought he had made a typo, and could not have tested his code, since I would expect it to fail. He maintained however that the code was tested and working, so I had to think again.
After looking into it - mainly throught the python documentation for lexical analysis - I had to admit that his code, however unpleasing, worked like a charm.
I find the fact that the dot is treated the same as the parenthesis quite surprising, and a line like the above looks positively wrong. What’s even more surprising is the fact that no inspection was triggered in PyCharm which also seemed to find the code completely by the book.
I’m thinking this might be done for simplicity in the lexical analyser, and that there might be no inspection for the given case because people never write like this.
Please share any input to the how and why of it along with any similar surprises you might have had

brettcannon · December 18, 2019, 6:39pm

It may have to do with what it takes to call methods on numeric literals, e.g.:

>>> 4 .__str__()
'4'

Note the space after the 4 to not trigger a syntax error by parsing the number as a float.

storchaka · December 18, 2019, 7:23pm

It is possible (and not extremally difficult) to make the parser accepting the code like 4.__str__(). Should we?

thomas · December 18, 2019, 7:32pm

I think it’s more likely an artifact of ‘.’ being a separate token. The tokenizer doesn’t care about whitespace, other than indentation. It’s just like spaces being optional around ‘+’ and ‘=’ and ‘(’ and such.

And it’s useful for more than just 1 .__str__(); formatting longer lines like this isn’t really that uncommon:

result = (SomeLongSpamName
          .someLongMethodName()
          .SomeOtherMethodName())

brettcannon · December 19, 2019, 5:31pm

I don’t think it’s worth changing. It will inevitably break code and it doesn’t seem to cause people problems considering it has been like this for 30 years.

vstinner · January 7, 2020, 9:38pm

If someone considers that it’s an issue, I suggest to implement a check in a linter, rather than changing Python.

mjpieters · January 8, 2020, 3:12pm

Take into account that you are reading this code with a background of reading a lot of well-formatted, probably PEP-8 compliant code. Don’t confuse good style hygiene with language requirements!

Whitespace is allowed because it is helpful when breaking up longer call chains when coding against a fluent interface API such as SQLAlchemy:

    summed_actual = db.func.sum(FinancialData.actual).label("costs")
    base = (
        db.session.query(FinancialData.name.label("label"), summed_actual)
        .join(Project)
        .filter(Project.project_type == ProjectTypes.foobar)
        .filter(db.func.lower(FinancialData.name) != "project total")
        .group_by(FinancialData.name)
        .order_by(summed_actual.desc())
    )

It helps to see the . as the attribute reference operator: <left_operand> . <attribute_operand> will look up the result of the attribute_operand expression as an attribute on left_operand. At which point the operation becomes exceptional in that no-one is putting spaces around it, while they do around = and + and /, etc.

If you and your colleagues were to start using black these kinds of hickups will simply melt away with the auto-formatted output. I’d also recommend a good linter, but neither pyflakes nor pylint handle whitespace around ., not without custom extensions.