PEP 561 Clarification Regarding `\n`

ahendry · September 3, 2023, 3:52pm

In the Partial Stub Packages section of PEP 561, I’m partly unsure how the\n should be interpreted:

Literally? (i.e. the file must exactly contain partial\n
Newline or Escape Sequence? (i.e. this is saying partial must exist on its own line and the file must end with a new line, either with a literal escape sequence or a new line)
Optional? (i.e. as long as partial is in the py.typed file, a new line is optional)

Windows uses \r\n for new lines, hence I’m wondering how strict/literal this is to be interpreted? Also (not that there are currently any other items to add to a py.typed file), but are other items allowed in the file? ATM, the only thing I could think of would be a comment. Can other items exist in this file and the package be interpreted tas partially typed so long as partial\n exists in the file?

git config core.autocrlf can be configured to store files with Linux line endings, but when checking out the repo on Windows, these are converted back to \r\n, which I wonder if type checkers will honor or if they only honor \n…?

guido · September 3, 2023, 6:32pm

The intent is clearly that the word partial occurs followed by a line separator. This is either \n (the ASCII LF character) or \r\n (ASCII CR LF characters) depending on local convention. The PEP was probably written by a UNIX user.

I don’t know whether the PEP should be read as implicitly allowing additional lines containing other text as well. Maybe you can experiment with mypy and report how it handles various cases, and we can consider that “best practice”.

Also, why do you want to know? Are you considering writing a tool that cares bout the contents of py.typed?

adeak · September 3, 2023, 7:10pm

I disagree about “clearly”

There are two possible interpretations of the exact wording in the PEP that is

If a stub package distribution is partial it MUST include partial\n in a py.typed file.

(and a similar note at the end of the section).

The intent is that the file must contain “partial” followed by a linefeed character. But then what about Windows (as pointed out earlier)? Universal newlines have been a thing in Python for the last 20 years or so. And why does a high-level tool care about linefeeds at line endings in the first place? I’m aware that POSIX text files mandate having a linefeed character at the end of each line. But I don’t think any Python tool should have such a requirement.
The intent is that the file must contain “partial\n”, where the last two characters are a literal backslash and a literal n. The only issue with this interpretation is that it’s confusing why anyone would mandate this literal r'\n' to be included in the file.

So considering the above I agree that the intent was probably point 1, as you also confirmed. But

I would find it a lot clearer if this were phrased as “MUST include partial in a py.typed file, followed by a line separator.” (Then we can assume/hope that the exact kind of line separator doesn’t matter.)
Typing tools should handle that line separator universally, as expected in Python.
Typing tools should really not care about the line separator at all, just check that there are no other lines in the file (or be more forgiving if they want to). Then the PEP’s wording could be simplified, making it unambiguous.

I suspect typing tools are already forgiving when it comes to py.typed, but I never tested any of the possibilities.

ahendry · September 3, 2023, 8:38pm

@guido Thank you!

maybe you can experiment with mypy

Yes, agreed, this should’ve been my first approach.

Also, why do you want to know?

Great question. I recently got agreement with a published package to implement partial typing as we wanted to implement it, but could not commit to completely typing everything right now.

I took the statement in the PEP literally to mean that the file must contain partial\n, but a reviewer showed several other packages did not include the \n.

My “anal-retentive mind” won out since I could not reconcile in my mind if there was a deliberate need for the \n. The only thing I could think of was Windows versus Linux/MacOS line ending differences. I openly admit I feel myself splitting hairs here.

That said, is a line-feed actually required at all? As the reviewer of my PR showed, there are several packages where the py.typed simply has partial in it with no extra line feed.

ahendry · September 3, 2023, 8:41pm

@adeak Yea, also agreed. The MUST verbiage plays with my mind a bit. I’m only one person, but I interpreted it literally.

adeak · September 3, 2023, 8:53pm

That said, is a line-feed actually required at all? As the reviewer of my PR showed, there are several packages where the py.typed simply has partial in it with no extra line feed.

To be clear, the question is not whether the line-feed is required. When we put the equivalent of r'partial\n' in a text file we don’ have a line-feed, we have a literal backslash with a literal n. This is opposed to having the text “partial” followed by a single '\n' character. I’m probably pointing out the obvious here, but it’s better to make sure we’re all on the same page.

To be even clearer, this is the review comment that this started from. The commenter points out two examples: mypy and another one in poetry (that I can’t link because new users here in Discuss can only have at most 2 links in a post, apparently…). Both examples only contain “partial” in the file (so no r'\n'), but it’s not obvious in github’s UI whether a proper line-feed is there (I know missing line-feeds are indicated on PRs, but I’m unsure about direct file links). We could clone the respective repos and investigate manually to make sure.

In either case the point of the reviewer was that the “literal backslash-literal n” combo is unnecessary (potentially harmful). This doesn’t tell us whether an actual line-feed character is necessary.

adeak · September 3, 2023, 8:57pm

In the mypy example’s corresponding PR it’s clear that a trailing line-feed is there. (This should’ve been an edit, but this would’ve been again a third link in my post…)

ahendry · September 3, 2023, 9:05pm

To be clear, the question is not whether the line-feed is required

Yes, I’m asking this in addition to the main question. I’m trying to understand what we gain from adding a line feed (either “explicitly” in the form of an escape sequence character or “implicity” by pressing Enter/Return on the keyboard)

ahendry · September 4, 2023, 2:32pm

Would it truly be harmful? Is there a security vulnerability by including the newline escape sequence?

I think the only problem that could occur would be mypy wouldn’t honor the package as partially typed, but perhaps I’m wrong?

This is either \n (the ASCII LF character) or \r\n (ASCII CR LF characters) depending on local convention. The PEP was probably written by a UNIX user.

@adeak I’m satisfied with Guido’s answer;
I needed an appeal to authority since I didn’t know the answer.

I’ll continue with the PR (perhaps we can add a formatting/linting rule to the file, if needed, should we have any for matters/linters that insist in removing the final new line from the file? We can discuss that there).

I would like to keep this post/question open, though, to continue to discuss whether a newline is needed in the file, and in what manner, and whether it would be worthwhile to add some clarification to the wording in the PEP. It’s a longer discussion, but shouldn’t prevent us from successfully getting the package partially typed (we can experiment with type checkers if necessary)

adeak · September 4, 2023, 2:52pm

Yes, the feature we’re trying to use not actually doing the one thing that it’s meant to do is harmful in my book And “potentially” harmful because it’s up to the discretion of the type checker whether it checks line.startswith('partial') or something more like line.strip() == 'partial'. r'partial\n' would pass the former but not the latter.

My experience is that there are no tools that remove line-feeds at line endings. It’s the other way around: some (typically Windows) tools don’t go out of their way to include these line-feeds, producing files that are not text files according to POSIX. Some (typically *nix) tools may refuse to work as expected on not-POSIX-text-files.

So it’s not that some tools remove these line-feeds. It’s that some editors might not add them in the first place. Once a file is properly POSIX-text I don’t expect any tool to change that, unless the whole file is rewritten from scratch for some reason. So, as I said earlier, part of the question is whether any Python tools (in particular, typing tools) should care about POSIX text fileness (which is the source of the confusion surrounding the current phrasing of the PEP). And I really think they shouldn’t.

ahendry · September 4, 2023, 4:32pm

@adeak Let’s move this discussion back to our repo

hauntsaninja · September 5, 2023, 12:17am

For whatever it’s worth: https://github.com/python/mypy/blob/c7d2fa1525c9cbf0ab8859fd9ded526658677c28/mypy/modulefinder.py#L439

ahendry · September 5, 2023, 12:45pm

@hauntsaninja Thank you!

This confirms 2 things for me for mypy:

The file py.typed can only contain the word partial (e.g. no comments)
The file can contain partial by itself (neither a \n escape sequence nor an entered new line are required). In fact, all whitespace is ignored.

@guido This leads me to believe what was meant in the PEP, but not precisely written, was something like

py.typed MUST contain nothing but the word partial (note: surrounding whitespace is ignored)

Would it be possible to amend the PEP to clarify this?

guido · September 5, 2023, 2:40pm

Remember “be strict in what you write, lenient in what you accept”.

I think the PEP should continue to state that the file should only contain “partial” followed by a linefeed character, modulo universal newlines.

jack1142 · September 5, 2023, 3:38pm

Putting comments in py.typed works fine when not using “partial\n” (I never tried “partial\n” but you said it doesn’t so I assume that’s the case) and there are some projects that do it so restricting it in the PEP years after it’s final probably isn’t the best course of action.

ahendry · September 5, 2023, 7:02pm

Remember “be strict in what you write, lenient in what you accept”.

I hadn’t heard that one, so that’s good to know, thank you. However, and forgive me, I’m still confused on the strictness in this instance.

Doesn’t it seem like the \n in the PEP was a minor typo? I say this only because I don’t understand what benefit is to be had by having the newline in the first place. Also, the MUST feels like RFC 2119 verbiage.

I truly don’t mean to split hairs, but I honestly don’t understand the purpose of the \n.

ahendry · September 5, 2023, 7:04pm

I would not want to prevent existing behavior either. I am not proposing to change this. I just wanted to know what mypy specifically accepted (other type checkers might not, just FYI)

effigies · September 5, 2023, 7:12pm

POSIX defines a line to end with a newline (see unix - Why should text files end with a newline? - Stack Overflow for more discussion).

guido · September 5, 2023, 7:23pm

The intention of the PEP author was clearly (in my mind) that the contents of the file should be equal to the Python string literal "partial\n" (this implies that \n is meant to refer to the ASCII LF control character, not to the sequence of two characters “backslash” followed by “lowercase letter n”). So that is what everyone should put in the file.

Presumably the PEP author was not thinking about the Windows convention. Possibly because they were assuming that the tool would be written in Python and open the file in text mode, which translates ASCII CR LF into just ASCII LF. (In newer Pythons, at least, it does that even on UNIX.) So I recommend that tools be lenient and allow CR LF as well as LF, on all platforms. But to be strictly conforming with the PEP you should just have LF.

It doesn’t matter what the purpose of the \n is, it’s just what the PEP prescribes. (My guess about its purpose is that if the file is created using a simple UNIX text editor it’s hard to avoid having a \n at the end.)

I never thought this would be deserving of so much discussion, and I’m confused why that is.

fungi · September 5, 2023, 7:57pm

Remember “be strict in what you write, lenient in what you accept”.

I hadn’t heard that one, so that’s good to know, thank you.

It’s commonly referred to as Postel’s Law and is a cornerstone of
early Internet protocol design to maximize interoperability (sadly
forgotten by or unknown to recent protocol designers and software
developers): Robustness principle - Wikipedia