Add underscore as a thousandths separator for string formatting

I can think of two different ways to do this:

  1. Special case the _ grouping_option to add an underscore in the thousandths places
  2. Add a new format specifier (like float_grouping):
format_spec     ::=  [[fill]align][sign][#][0][width][grouping_option][.precision[float_grouping]][type]
fill            ::=  <any character>
align           ::=  "<" | ">" | "=" | "^"
sign            ::=  "+" | "-" | " "
width           ::=  digit+
grouping_option ::=  "_" | ","
float_grouping  ::=  "_"
precision       ::=  digit+
type            ::=  "b" | "c" | "d" | "e" | "E" | "f" | "F" | "g" | "G" | "n" | "o" | "s" | "x" | "X" | "%"

Let me know if there’s been discussion of this already, I haven’t looked.

1 Like

I don’t understand what you are proposing.

>>> i=10**10
>>> f"{i:_d}"
'10_000_000_000'

Why do we need another way?

Current: f”{4321.23456:_.6f}” → 4_321.123456

Proposed:

f”{4321.23456:_.6f}” → 4_321.123_456

Or

f”{4321.23456:_.6f}” → 4321.123_456
f”{4321.23456:_._6f}” → 4_321.123_456

Edit: above two should be:

f”{4321.23456:_.6f}” → 4_321.123456
f”{4321.23456:._6f}” → 4321.123_456

/edit

This is a different spec than the original post:

format_spec ::= [[fill]align][sign][#][0][width][grouping_option][.[float_grouping][precision]][type]

1 Like

Ah, funny misread of the title :smiley:

“Thousandths”, not “thousands”

1 Like

I think f”{4321.23456:_._6f}” is better than f”{4321.23456:_.6f}” as it preserves backwards compatibility and makes it obvious the separators comes both after and before the decimal place.

You could then also do

f”{4321.23456:._6f}” → 4321.123_456

I’d be curious to understand why this doesn’t already use _ in the fractional part. An oversight? Intentional? Left for future generations to solve?

1 Like

I think we were focused on ints and only later applied it to floats, and the idea of also formatting the fractional part just never came up.

1 Like

AFAIK, typographical conventions in most languages do not apply the thousands separator to the fractional part.

It is true that most human languages do not apply a thousands separator
to the fractional part, but they do often apply a narrow space instead.
I can’t show narrow spaces in this email, so I’ll use a regular space.
Pretend it is half the width of a digit, or even less:

12,345,678.000 123 456 789

Many European countries swap the decimal point and the comma; India, I
believe, groups the digits in fours rather than threes.

In any case, practicality beats purity: there is no typographical
convention to use underscores in any human language, as far as I know,
nevertheless it is a useful thing to do when programming. We support
underscores in numeric literals:

>>> 123_456.123_456
123456.123456

and we can output underscores in the integer part of the number. We
should support them in the fractional part as well.

It sounds like the backwards-compatible version would be preferable:

f”{4321.23456:_.6f}” → 4_321.123456
f”{4321.23456:._6f}” → 4321.123_456
f”{4321.23456:_._6f}” → 4_321.123_456

Alternative names for “float_grouping”:

  • fractional_grouping
  • fractional_grouping_option
  • ??
format_spec ::= [[fill]align][sign][#][0][width][grouping_option][.[float_grouping][precision]][type]
fill            ::=  <any character>
align           ::=  "<" | ">" | "=" | "^"
sign            ::=  "+" | "-" | " "
width           ::=  digit+
grouping_option ::=  "_" | ","
float_grouping  ::=  "_"
precision       ::=  digit+
type            ::=  "b" | "c" | "d" | "e" | "E" | "f" | "F" | "g" | "G" | "n" | "o" | "s" | "x" | "X" | "%"

Is there a way to say that the the dot must be followed by one or both of float_grouping and precision?
I believe this grammar specification would allow this f"{123:0.f}" which is invalid and meaningless.

Also, since we can format numbers in a non-round-trippable way, why not add an option to group the fractional part with a special space (something like float_grouping ::= "_" | "s"):

print(".000\N{NO-BREAK SPACE}345")           
print(".000\N{MEDIUM MATHEMATICAL SPACE}345")
print(".000\N{THIN SPACE}345")               
print(".000\N{HAIR SPACE}345")               

Here’s an image, since the spaces are replaced by disqus:
image
edit: you can see the difference better using ones:
image

1 Like

Since all this is language specific (What are fraction and group separators? What is the size of group?) should not it be in the locale module instead of general str method or builtin?

1 Like

That boat sailed over the horizon a long time ago: we’ve accepted
underscores in numeric literals, and as output in the whole number part,
for a long time now.

>>> format(12_34_56.12_34_56, '_f')
'123_456.123456'

works back to at least 3.7 and maybe older.

We’re not suggesting a fully configurable formatting language capable of
specifying the group size and seperators, just allowing output of
underscores in the fraction part.

>>> format(12_34_56.12_34_56, '_._f')  # Proposal.
'123_456.123_456'

The group size will remain fixed at three; the grouping seperator will
be fixed as either underscore or comma.

Anything more complicated than that probably belongs in locale, or maybe
the string module as a float formatter class. But that’s not this
proposal.

Any chance of this making it into 3.10? The last alpha is a month away, and the first beta is a month after that.

I’ve never contributed to CPython, and I have minimal experience with C. Could this be a reasonable first contribution?

bpo issue here:
https://bugs.python.org/issue43624