Semantic line breaks

encukou · February 22, 2022, 10:06am

Should we allow/encourage semantic line breaks for newly written prose (docs, PEPs, etc.)?

Semantic line breaks are an alternative to
word-wrapping paragraphs at (say) the 79th column
in formats like ReST, Markdown or HTML, which allow arbitrary line wrapping.
Instead, lines are wrapped after periods
(or commas, semicolons, semantic pauses).
There's still a *maximum* line length, but lines are broken before that.
The advantage is that editing a few words
doesn't cause the whole paragraph to reflow.

[edit: this post originally started with a quote about one sentence per line, which led some people in the wrong direction. I added the summary above; click for the original text.]

@CAM-Gerlach recently wrote in a PEP editing issue:

One Sentence Per Line (aka semantic line breaks) is an increasingly popular convention for prose like this (its been used for years in most of the reST/myST, website and docs-related repos I’m involved in, as well as on most others for Readmes, Contributing Guides, etc. Its the standard for the AsciiDoc docs format and some others.

[end edit]

If you’re not familiar with semantic line breaks, check out Brandon Rhodes’ short post (which includes an even shorter quote from 1974!). There’s also a sembr “standard” (which I find a bit too nitpicky TBH).

Personally, I find this style of line wrapping much more natural. When I re-wrap text in Python docs and PEPs, I’m sad about making the diff unreadable.

hukkinj1 · February 22, 2022, 2:00pm

A +1 for semantic line breaks from me! It makes editing effortless and makes for smaller diffs.

If SemBr is not allowed, I think we could use a tool that wraps reST automatically. Manually wrapping at a fixed length is a pain. I’ve written mdformat which can do this for Markdown/MyST but I’m not aware of a tool that can wrap reST.

hugovk · February 22, 2022, 3:06pm

Semantic line breaks/linefeeds sounds good, breaking on ideas (and punctuation and RST/MD syntax), like in Brandon’s example:

 ...
 the definition in place of it.

-The beauteous scheme is that now,
+The beauty of this scheme is that now,
 if you change your mind
 about what a paragraph should look like,
 you can change the formatted output
 merely by changing
 the definition of ‘‘.PP’’
 and re-running the formatter.

 As a rule of thumb, for all but the most
 ...

“One Sentence Per Line” as a name brings something else to mind, so I hope not this:

-The beauteous scheme is that now, if you change your mind  about what a paragraph should look like, you can change the formatted output merely by changing the definition of ‘‘.PP’’ and re-running the formatter.
+The beauty of this scheme is that now, if you change your mind  about what a paragraph should look like, you can change the formatted output merely by changing the definition of ‘‘.PP’’ and re-running the formatter.

Which isn’t great for editors with word wrap off:

And is very difficult to edit in the GitHub UI on mobile:

AA-Turner · February 22, 2022, 3:19pm

‘Start Each Sentence On A New Line’ (title case abound!) may be a reasonable compromise?

I tend to review quite a lot of things (especially PEP text) on mobile, and really appreciate line wrapping. I’d suggest that the only constraint we keep is the 70-79 character limit, but there’s no reason authors couldn’t use the techniques here to write their texts.

A

AA-Turner · February 22, 2022, 3:21pm

I can’t see any reason we wouldn’t allow it, as reST re-flows everything regardless.

A

erlendaasland · February 22, 2022, 7:02pm

One Sentence Per Line sounds like a misinterpretation of both SemBr and Kernighan’s thoughts from the early 70s. Quoting the latter:

Make lines short, and break lines at natural places, such as after commas and semicolons, rather than randomly.

I hope we end up somewhere closer to SemBr and Kernighan than OSPL.

stoneleaf · February 22, 2022, 7:23pm

While I appreciate shorter, easier-to-read diffs, the below are not multiple sentences on multiple lines, but one sentence with multiple clauses spread across several lines:

While the next example (the one @hugovk does not want) is, in fact, one sentence on one line:

I think it is important to remember that documents need to still be readable when in text format,

and when
the sentences are spread
across several lines
it looks like poetry
where there are pauses
at the end of lines
and it completely destroys
the flow of the prose
making it harder
to understand
the intent.

hukkinj1 · February 22, 2022, 8:08pm

I think there’s a rare case where a “semantic piece of text” happens to be say 81 chars wide, there’s a conflict. Which one takes priority: having a max 79 char lines or semantic line breaks?

AA-Turner · February 22, 2022, 8:31pm

I’m not sure how common this would be (it depends somewhat on where one delineates one ‘idea’ or clause or etc. from the next).

Under the current system then 79 would still be the law, but it would be a very pedantic editor who requested the line to be broken as of a 2 character overage. Indeed we are currently quite lenient on URLs, whereas on strict reading we could require these to be broken as well (reST copes with URLs broken over multiple lines).

I think this a letter of the law versus application of the law matter, although I see why you make the argument.

A

pf_moore · February 22, 2022, 9:18pm

I think that the idea of breaking lines to match the sentence structure is reasonable on the face of it, and probably does result in better diffs. But only if done with the above principle firmly in mind, and with the strong caveat that readability wins in all cases.

Unfortunately, the tendency with any style guideline is to attract people who like to apply that rule as if it were set in stone, and what was originally a useful and sensible principle becomes a burden. Hopefully this is less likely with text than with code (because it’s harder to write linters for text that blindly enforce a rule like this) but the risk is definitely there.

So I guess I’m in favour of the idea as a principle, but would prefer not to have it written down in a style guide.

zware · February 22, 2022, 9:44pm

If we need to specify something in this regard, we could specify a maximum line length (which I think we already have?) and explicitly state that there is no minimum line length, and that it is not necessary to reflow an entire paragraph if its first line(s) become shorter.

erlendaasland · February 22, 2022, 9:49pm

IIRC, the maximum line length for docs is 80 chars.

ferdnyc · February 22, 2022, 11:17pm

In fairness, it’s a standard — that’s its job. Even the law recognizes that vague standards are worse than no standards at all.

encukou · February 23, 2022, 10:25am

Starting the post with one sentence per line was somewhat misleading. I edited the first post to make my interpretation of the concept clearer:

Semantic line breaks are an alternative to
word-wrapping paragraphs at (say) the 79th column
in formats like ReST, Markdown or HTML, which allow arbitrary line wrapping.
Instead, lines are wrapped after periods
(or commas, semicolons, semantic pauses).
There's still a *maximum* line length, but lines are broken before that.
The advantage is that editing a few words
doesn't cause the whole paragraph to reflow.

mwichmann · February 23, 2022, 3:35pm

I’ve worked on many doc efforts over the years, and I think semantic line breaks when you’re using markup/markdown are the way to go if you’re not faced with too many users using editors (i.e. wysiwyg) that make it hard. As long as it’s not too pedantic. If it’s a guideline couched as “if you’re looking for a place to break a line, prefer something that is a logical break over just hitting Enter any old place”. Breaking at every bit of punctuation Just Because is not great.

I guess the question is what are the goals? Markdown-type systems (md, reST, adoc, etc.) generally exist on the principle that the document should be fully readable when viewing the “source” rather than a rendering. Thus, editors might feel free to reflow things on changes. On the other hand, when changes are done via PR, reviewers need to have their lives made at least not completely miserable. [aside: I get periodically yelled at by a maintiner on a project I’m working on when I submit attempted surgery on doc parts and “make his eyes bleed” because it’s so hard to pick out the changes in github diff presentation. That one is Docbook xml, which is fiddlier than the markdown class. I do try, but some of the existing mess means it can’t always be avoided]. If making reviewing reasonable is a larger goal then semantic line breaks - definitely.

fungi · February 23, 2022, 4:00pm

I’ve also found Git’s --word-diff (and its several operational
modes) helps a lot for reviewing documentation commits. I know some
code review platforms like Gerrit do a fairly good job of
highlighting wording changes independently of where lines have been
re-wrapped, but as I don’t really go near others like GitHub or
Gitlab all that often, I have no idea whether they do the same.

stoneleaf · February 23, 2022, 8:21pm

I still find that more difficult to read than, say:

As far as diffs go, I would suggest using two:

make the changes without reflow (so the changes are easier to see)
do the reflow, with the only changes being the location of line breaks

gpshead · February 23, 2022, 10:31pm

While I might use semantic-ish breaking myself when writing new text, I expect everything to be autoformatted upon save in many editor configs. No reasonable autoformatter is going to get semantics as “right” as a human (I’m sure some ML model could, but so what…). So I’m fine allowing it… but not in order to exceed the line limit… and absolutely no requirement for it to be preserved and maintained by anyone during future edits.

In the end we’d be best off always auto-formatting.

diff readability problems in code reviews are a problem with the diffing tool, not with the editing. It is entirely reasonable for code review diff tools to not be line based but instead understand reflowing and highlight only the meaningfully changed bits.

encukou · February 24, 2022, 9:58am

Sounds reasonable.

How would you avouid editors with slightly different settings reflowing whole files in slightly different ways on each save?

That essentialy means the diff tool must know the markup language and handle places where newlines are significant (code blocks, blank lines for paragraphs…). I don’t think it’s a realistic default.
AFAIK, GitHub (the default tool for CPython reviews) doesn’t do this.

CAM-Gerlach · February 24, 2022, 8:01pm

Since this topic is pretty important to me, due to huge the amount of time, stress, cognitive load and sub-optimal content-relevant choices its saved me over the years in docs/website/etc repos that switched to OSPL, and the amount of the same it costs me every day as a PEP writer and editor with those that don’t, I’d been intending to cogently outline the detailed case for it (at least in the context of the PEPs repo) once I’d established more credibility as a PEP editor and in the community. Unfortunately, though I should have known better and refrained from mentioning it until ready to do so, it seems my OT aside let the cat out of the bag, and on a day when we were dealing with a severe weather situation too.

In any case, I’ve created a related thread with a detailed proposal for OSPL, at least nominally scoped to the PEPs repo initially, which also contains a section that addresses some of the merits and practical difficulties with what appears to be the nominal proposal here, to use SemBr instead. I welcome your feedback over there, and can address points relevant to SemBr over here as well. Thanks!

EDIT: Just to clarify, my position on SemBr is that it is generally an improvement over “dumb” hard wrapping, especially for reST where it is a semi- or completely manual process anyway, and it could make more sense for projects like the CPython documentation, in terms of being practically easier to adopt incrementally and non-strictly-enforced on existing, gradually-updated content that currently uses hard breaks, despite the practical downsides I highlight versus OSPL.

However, the case for OSPL is much stronger particularly for repos like the PEPs, where:

The benefits of OSPL are more acute (given the high, concentrated amount of rewriting, editing and review during the PEPs’ pre-draft and draft stage)
The difficulties in understanding, teaching and consistently enforcing SemBr come to the fore (since many authors are first-timers or don’t write PEPs regularly, versus a contributor base experianced in technical English writing)
The adoption issues are mostly moot (existing non-Draft PEPs stay are rarely edited and stay as they are, new PEPs can adopt it and Draft PEPs can if their authors choose to).

Also, this opens up the opportunity to trial it there, and then expand to others if successful and incorporating any lessons learned.