Handling of trailing linesep by set_content()

eyzmeng · June 18, 2024, 11:42am

Hello Python Discourse!

I have a question regarding the creation of text email messages that don’t end with a line terminator (linesep) using the content manager interface. I hope this is the right place to ask since I am not sure if this is a feature or an issue…

As an example, say I want to encode this string:

$ python3.12
Python 3.12.3 (main, Apr  9 2024, 08:09:14) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> test_body = 'hello\n\nNo EOL here'

With the legacy interface, I can do this:

>>> import email.message, email.charset
>>> UTF8_QP = email.charset.Charset('utf-8')
>>> UTF8_QP.body_encoding = email.charset.QP
>>> msg = email.message.EmailMessage()
>>> msg.set_payload(test_body, UTF8_QP)
>>> str(msg)
'MIME-Version: 1.0\nContent-Type: text/plain; charset="utf-8"\nContent-Transfer-Encoding: quoted-printable\n\nhello\n\nNo EOL here'

With content manager, however, the linesep is always introduced to the end:

>>> msg = email.message.EmailMessage()
>>> msg.set_content(test_body, charset='utf-8', cte='quoted-printable')
>>> str(msg)
'Content-Type: text/plain; charset="utf-8"\nContent-Transfer-Encoding: quoted-printable\nMIME-Version: 1.0\n\nhello\n\nNo EOL here\n'

The same goes with all other transfer encodings too, and I think I know what’s happening…

Both set_payload() and set_content() in the examples above ultimately call email.quoprimime.body_encode(), which explicitly handles the case of missing EOL.

The difference is that the default text content setter of email.contentmanager.raw_data_manager doesn’t pass the original string, but rather a reassembled string:

def _encode_text(string, charset, cte, policy):
    lines = string.encode(charset).splitlines()
    linesep = policy.linesep.encode('ascii')
    ### Highlighted code here:
    def embedded_body(lines):
        return linesep.join(lines) + linesep
    #                                ^^^^^^^
    def normal_body(lines):
        return b'\n'.join(lines) + b'\n'
    #                              ^^^^^
    ### A linesep is always appended...

I was wondering if this is the intended behavior? It’s one tiny edge case, but seeing how quoprimime handles trailing linesep so delicately I couldn’t help but confirm if that is the case.

As far as I know this behavior isn’t documented anywhere (although there are many tests that make use of this current behavior), but I could be missing things… I’d like to know what people think about this! ^^

Topic		Replies	Views
Metadata format: metadata is not a plain mapping of strings Packaging	13	1517	November 25, 2020
From pynliner import fromString as inlinecss Python Help	4	434	October 11, 2023
PEP 597: Enable UTF-8 mode by default on Windows PEPs	67	21994	July 20, 2020
Unicode in e-mail reading Python Help help	2	1329	August 15, 2022
PEP 597: Use UTF-8 for default text file encoding PEPs	83	31999	September 7, 2019

Handling of trailing linesep by set_content()

Related Topics