Why does Discourse sometimes add extra blank lines to my posts?

steven.daprano · February 21, 2023, 1:57pm

I receive many emails from Windows users, and they don’t appear this way. I’m not an expert on email message formats, but it is my understanding that all email bodies must be formatted with DOS style CRLF line endings, and must not include isolated CR or LF. See RFC 5322:

“”"
The body of a message is simply lines of US-ASCII characters. The only two limitations on the body are as follows:

o CR and LF MUST only occur together as CRLF; they MUST NOT appear independently in the body.

o Lines of characters in the body MUST be limited to 998 characters, and SHOULD be limited to 78 characters, excluding the CRLF.
“”"

I daresay there are other complications to do with various MIME types etc but I’m pretty sure it is not a simple “its a Windows text file” issue.

Looking more closely at one of the offending emails, I see that the email (generated by Discourse) is using quoted-printable:

Content-Type: text/plain;
 charset=UTF-8
Content-Transfer-Encoding: quoted-printable

and the body of the email seems to have escaped the carriage returns but not newlines. E.g. the first few lines of this post look like this:

=0D
=0D
Make `=CE=BB` a keyword with identical meaning to `lambda`. =0D
Keyword `=CE=BB` would be more readable because it is shorter than `lambd=
a`, besides `lambda` basically means `=CE=BB`. =0D
=0D
Now that almost every PC can type non ASCII characters, we have snippets =
in every serious editor & we have formatters like Autopep & Black, I don'=
t see any reason to restrict code to English alphabet.=0D
=0D

Looking at the behaviour of the quopri module in Python, carriage returns need not (should not?) be encoded:

>>> text = "Make `λ` a keyword with identical meaning to `lambda`.\r\n"
>>> quopri.encodestring(text.encode('UTF-8'))
b'Make `=CE=BB` a keyword with identical meaning to `lambda`.\r\n'

I assume that Python’s implementation is correct

So I have a hypothesis:

When Discourse emails a post containing non-ASCII characters, by default it uses quoted-printable.
The Discourse implementation of quoted-printable wrongly (?) encodes the carriage returns to =0D
mutt, following Postel’s Principle, accepts the bare LF as a line ending as if it were the mandated CRLF pair
and then decodes the =0D, which it displays as if it were an extra carriage return.

If my hypothesis is correct, then what I am seeing is the collision between a bug(?) in Discourse’s quoted-printable implementation, and mutt trying to be helpful by accepting bare LFs as line delimiters.

It may be that encoding the CR as =0D is allowed, in which case mutt is definitely to blame here. It would be nice if other mutt users could chime in with their experiences. @cameron you use mutt don’t you?

Topic		Replies	Views
What do people like about Discourse? Discourse Feedback	13	1900	December 25, 2019
Reply to Topic vs. Reply to Last Post Discourse Feedback	5	1204	July 27, 2023
Shortening the email subject lines in mailing list mode Discourse Feedback solved	4	773	September 8, 2020
Discourse modified my post :-( "Automatically removed quote of whole previous post" Discourse Feedback	15	1525	January 9, 2020
Email limit? (plus some junk since this has to be at least 15 characters... really?) Discourse Feedback	7	1281	November 15, 2018

Why does Discourse sometimes add extra blank lines to my posts?

Related Topics