Here is a quick analysis of the use of splitlines
inside the stdiib. It is not so clear what is intended (or even if sometimes the data is str
or bytes
). So, take my analysis with a grain of salt as I did it quickly.
codecs.py
509: lines = newchars.splitlines(keepends=True)
551: line = line.splitlines(keepends=False)[0]
568: lines = line.splitlines(keepends=True)
584: line = line.splitlines(keepends=False)[0]
587: line0withoutend = lines[0].splitlines(keepends=False)[0]
600: line = line.splitlines(keepends=False)[0]
619: return data.splitlines(keepends)
822: return data.splitlines(keepends=True)
Marc-Andre suggest that codecs should split lines according to the Unicode
ISNEWLINE rules and should not match file.readlines() for text files. I think
otherwise but let’s leave this case. At minimum, I think getreader() should
get an option newline
to make it match what open() does.
nntplib.py
896: f = f.splitlines()
In this case f
is bytes so this is okay maybe. Not sure splitting on \r is
okay though.
doctest.py
1413: return example.source.splitlines(keepends=True)
1676: want_lines = want.splitlines(keepends=True)
1677: got_lines = got.splitlines(keepends=True)
Split on ‘\n’ only.
argparse.py
655: return ''.join(indent + line for line in text.splitlines(keepends=True))
666: return text.splitlines()
2047: for arg_line in args_file.read().splitlines():
Split on ‘\n’ only.
linecache.py
113: [line+'\n' for line in data.splitlines()], fullname
Split on ‘\n’ only.
pprint.py
250: lines = object.splitlines(True)
Split on ‘\n’ only.
difflib.py
785: ... '''.splitlines(keepends=True)
794: ... '''.splitlines(keepends=True)
880: >>> print(''.join(Differ().compare('one\ntwo\nthree\n'.splitlines(True),
881: ... 'ore\ntree\nemu\n'.splitlines(True))),
1247: >>> print(''.join(context_diff('one\ntwo\nthree\nfour\n'.splitlines(True),
1248: ... 'zero\none\ntree\nfour\n'.splitlines(True), 'Original', 'Current')),
1366: >>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True),
1367: ... 'ore\ntree\nemu\n'.splitlines(keepends=True))
2070: >>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True),
2071: ... 'ore\ntree\nemu\n'.splitlines(keepends=True))
Split on ‘\n’ or ‘\r’?
pydoc.py
288: result = module.__doc__.splitlines()[0] if module.__doc__ else None
2155: desc = module.__doc__.splitlines()[0] if module.__doc__ else ''
Split on ‘\n’ only.
smtpd.py
709: lines = data.splitlines()
Split on ‘\r’ or ‘\n’?
site.py
180: for line in record.splitlines():
Split on ‘\n’ only.
textwrap.py
478: for line in text.splitlines(True):
Split according to Unicode ISNEWLINE? I think splitting on ‘\r’ or ‘\n’ would
be fine too.
email/contentmanager.py
143: lines = string.encode(charset).splitlines()
Okay, this is bytes.
email/header.py
86: for line in header.splitlines():
369: lines = string.splitlines()
Split on ‘\n’ only?
email/quoprimime.py
186: for line in body.splitlines():
243: for line in encoded.splitlines():
Split on ‘\n’ only?
email/policy.py
142: if isinstance(value, str) and len(value.splitlines())>1:
207: lines = value.splitlines()
Split on ‘\n’ only?
email/message.py
286: value, defects = decode_b(b''.join(bpayload.splitlines()))
Split on ‘\n’ only?
distutils/util.py
511: for line in template.splitlines():
Split on ‘\n’ only?
distutils/_msvccompiler.py
149: (line.partition('=') for line in out.splitlines())
Split on ‘\n’ only?
lib2to3/main.py
19: a = a.splitlines()
20: b = b.splitlines()
Split on ‘\n’ only.
lib2to3/refactor.py
550: for line in input.splitlines(keepends=True):
594: new = str(tree).splitlines(keepends=True)
Split on ‘\n’ only.
urllib/robotparser.py
70: self.parse(raw.decode("utf-8").splitlines())
Split on ‘\n’ or ‘\r’?
unittest/case.py
1017: difflib.ndiff(pprint.pformat(seq1).splitlines(),
1018: pprint.pformat(seq2).splitlines()))
1130: pprint.pformat(d1).splitlines(),
1131: pprint.pformat(d2).splitlines())))
1207: firstlines = first.splitlines(keepends=True)
1208: secondlines = second.splitlines(keepends=True)
Split on ‘\n’ or ‘\r’? Like difflib.
test/bisect.py
62: tests = proc.stdout.splitlines()
Split on ‘\n’.
lib2to3/pgen2/grammar.py
185:for line in opmap_raw.splitlines():
Split on ‘\n’.
lib2to3/pgen2/tokenize.py
328: readline = iter(newcode.splitlines(1)).next
Split on ‘\n’.
I don’t see how the current behavior is a good default. I can understand if we can’t change it due to backwards compatibility. I might argue that we would be doing more good than harm by changing it.