I want to list all the features that I would find useful for scientific formatting. Then after I’ll give proposals for how these features/options could be integrated with existing float formatting
Some definitions
- “precision” is defined to be an integer that represents the number of digits after the decimal point, so that
1.2
has a precision of 1 while 1.2345
has a precision of 4. A precision of zero indicates no digits after the decimal place.
- “sig figs” is defined to be an integer that represents the total number of digits displayed excluding leading zeros so
1.2
has 2 sig figs, 1.2345
has 5 sig figs, 001.00
has 3 sig figs. sig figs must be greater than or equal to 1.
- All non-zero real numbers can be expressed as
r = m x b^e
where m
is a real number mantissa, b
is a non-zero natural number base, and e
is an integer exponent.
Proposed features:
-
- This new formatting specification should be accessible on the built in float objects so that users can access it with standard f-string formatting for built in floats. (without this requirement users can use a pypi function or class to achieve the rest of the requirements).
-
-
nan
, inf
, and -inf
get formatted to 'nan'
, 'inf'
and '-inf'
respectively
-
- If the precision of a printed mantissa is 0 then, by default, no decimal point is shown.
- 3a. I have no use for floats to be formatted like
123.
instead of 123
. If there is demand for this then I wouldn’t be opposed to a flag that can be used to include a decimal point on all mantissas with 0 precision.
-
- Specification of the precision of the mantissa for the formatted float. Here the user provides an integer indicating the precision with which the number is displayed. The float will first be rounded to the corresponding precision and then all digits with smaller precision than specified will be truncated.
- 4a. I think negative precisions should be supported. In this case the float will be rounded to the corresponding precision and then displayed with precision down to (and including) the ones place. e.g.
123.45
with precision = -1
would be displayed as 120
.
-
- Specification of the sig figs of the mantissa for the formatted float. Here the user provides an integer indicating the number of sig figs with which to display the formatted float. e.g.
123
would be displayed as 120
with 2 sig figs and 123.00
with 5 sig figs.
-
- If no neither precision nor sig figs are specified then the number will be displayed with the minimum precision such that the resulting string round trips back to the same float. I believe this is the current behavior for float formatting.
-
- Standard formatting (default format type) in which the float is simply displayed as a number with digits before and maybe after a possible decimal point.
-
- Scientific formatting in which the float is displayed in base 10 with an exponent chosen so that the mantissa satisfies
1 <= m < 10
. However, if the float is 0 then the mantissa will also be equal to 0
.
-
- Engineering notation in which the float is displayed in base 10 with an exponent chosen so that it is an integer multiple of 3 and the mantissa satisfies
1 <= m < 1000
. However, if the float is 0 then the mantissa will also be equal to 0
.
-
- Shifted* engineering notation in which the float is displayed in base 10 with an exponent chosen so that it is an integer multiple of 3 and the mantissa satisfies
0.1 <= m < 100
. However, if the float is 0 then the mantissa will also be equal to 0
.
-
- Scientific, Engineering, and Shifted engineering notation all display floats in the following way. The mantissa is displayed according to rules above immediately followed by either the
e
or E
character (configurable by user option) followed immediately by the exponent displayed with a minimum of 2 digits. i.e. 123.45
formatted with 3 sig figs would be 1.23e+02
.
- 11a. I don’t think the minimum number of digits displayed in the exponents should be user configurable
-
- Binary formatting in which the float is displayed in base 2 with an exponent chosen so that the mantissa satisfies
1 <= m < 2
. However, if the float is 0 then the mantissa will also be equal to 0
.
-
- IEC Binary formatting in which the float is displayed in base 2 with an exponent chosen so that it is an integer multiple of 10 and so that the mantissa satisfies
1 <= m < 1024 = 2^10
. However, if the float is 0 then the mantissa will aso be equal to 0
.
-
- Binary and IEC Binary notation display floats in the following way. The mantissa is displayed according to rules above immediately followed by either the
b
or B
character (configurable by user option) followed immediately by the exponent displayed with a minimum of 2 digits. i.e. 15000
formatted with a precision of 2 would be 1.83b+13
in Binary or 14.65b+10
in Binary IEC.
- 14a. I don’t think the minimum number of digits displayed in the exponents should be user configurable
-
- Three available sign modes.
-
, +
, and ' '
. In -
mode (default) a sign character is shown for negative floats but not for positive floats. In +
mode a sign character is shown for all finite floats. In ' '
mode a negative sign is shown for negative floats and an extra space character is prepended for positive floats (This feature can help ensure positive and negative floats have the same string width).
-
- There shall be no appending of trailing zeros. The number of trailing zeros shall only be configured by the precision/sig fig specification.
-
- Leading zeros may be prepended by specifying the minimum digit slot to which zeros must be padded. In other words,
-10.45
padded to the thousands (10^3
) spot would be -0010.45
. Leading zeros always appear after the sign character. No extra padding by default.
- 17a. I recommend prepending leading zeros be specified in terms of the digit slot to which zeros should prepended as opposed to the controlling the width of the overall string. See 17b. below.
- 17b. Current python float formatting supports left, right and center justification of a float within a field of an arbitrary fill character. I do not think this feature should be supported for strict float formatting. I consider left and right padding by arbitrary characters until the string is a certain size to be a string formatting feature, not a float formatting feature. If a user needs to do this can first format the float and then re-format that string so that it has the width they desire. However, this strategy doesn’t work to insert leading zeros between the sign and the top digit, so that specific feature makes sense for float formatting.
-
- The ability to use commas as delimiters between groupings of 3 digits for decimal places above the decimal point. Also the ability to use spaces or underscores as delimiters between groupings of 3 digits above AND below the decimal place. This will allow formatting like
8.854 187 812 8
. Default is no grouping characters. (Sorry for sneaking this feature request in here, but it came up when I was reminding myself about details of the current formatting mini language. This feature is technically orthogonal to the feature requests about engineering/binary notation etc.)
- 18a. I could imagine someone wanting to mix grouping symbols and have commas for digits above the decimals and spaces or underscores below. Not sure if that should be supported…
I would like to see all of the above features go through. For completeness I’ll include requirements for the prefix proposals:
-
- Option to convert exponents into SI and IEC prefixes. For Scientific, Engineering, and Shifted engineering formats, if the exponent is a multiple of 3 then it can be replaced with the corresponding SI prefix. e.g.
3.1415e+06
is converted to 3.1415 M
Likewise, for Binary and Binary IEC formats, if the exponent is a multiple of 10 then it can be replaced with the corresponding IEC prefix. i.e. 123b+20
can be replaced by 123 Mi
. This option is off by default.
- 19a. I do not think there should be an option for no space between the mantissa and the prefix. For the SI notation that would violate NIST standards for formatting numbers with units. I’m not sure about the IEC standards.
- 19b. I do not like the idea of supporting
123 Mi = 123 M
. If that is supported then it’s ambiguous whether 123 M = 123e+06
or 123b+20
. Users who want to do that will have to strip off the i
themselves. See proposal 20 below.
- 19c. Usage of prefix mode coerces one of Engineering, Shifted Engineering, or Binary IEC modes. Mixing requests for prefix mode with standard of binary modes should result in an invalid format error.
- 19d. If the exponent falls outside the range for which prefixes are defined then no prefix conversion occurs.
- 19e. Some way (how?) to optionally convert some possibly user-selected subset of
e-02, e-01, e+01, e+02
to c, d, da, h
respectively.
-
- Strings expressing floats using prefix notation should be convertible back to floats. That is
float(12.3 Ki)
should be yield the float 12.3b+20
*This is my invented term, not an officially recognized term. See section 7.9 here on “Choosing SI Prefixes”: NIST Guide to the SI, Chapter 7: Rules and Style Conventions for Expressing Values of Quantities. This is the most official guidance I’ve seen on the use of engineering notation.
Implementation discussion:
It might be possible to integrate this with existing float formatting. In my opinion, the largest challenge here would be handling the padding behavior (proposals 16 and 17). The existing fill, align, and width behavior aren’t really compatible with those proposals.
Here’s an example regex that captures almost all of the features above. I think it captures everything except 18a and 19e.
import re
pattern = re.compile(r'''
^
(?P<sign_mode>[-+ ])?
(?P<trailing_decimal>\#)?
(?P<top_pad_digit>\d+)?
(?P<grouping_option>[,_v])?
(?:(?P<prec_type>\.|\.\.)(?P<prec>-?\d+))?
(?P<format_type>[neEhHkK]|sh|SH)?
(?P<prefix_mode>p)?
$
''', re.VERBOSE)
I’ve used v
to denote space-character grouping symbols. I’ve also used the ..
separator to flag sig fig rather than precision mode. Inspired by the prefixed
package I’ve used h
and H
for engineering notation and k
and K
for binary and binary iec notation. I’ve included a p
flag at the end to indicate exponents should be replaced with prefix characters.
edit: edited the regex to support negative precisions. Note I haven’t fully tested this regex. Just posting it here for now as a concept.
I think this regex is pretty much compatible with the existing float formatting specification. The existing fill and align fields would be ignored. The “z” field could be included in the regex I’ve shown here. The 0 field would be ignored. I’d recommend the width field be re-interpreted as the top digit place the user would like to fill with zeros (rather than the desired final width of the string). The grouping option is the same with the addition of the whitespace grouping character. The precision is now specified with either .
for precision or ..
for sig figs and the type includes nehk
or EHK
, where the capitalization here will set the capitalization of the exponent symbol. I’ve also co-opted the n
symbol for “standard” format.
I would say that existing formatting types such as f
and g
should be parsed exactly as they always have been.
One idea is that rather then forcing this new formatting specification into the existing language, we could open up a new namespace, perhaps prepending with a colon (again inspired by prefixed
) or something to indicate we want “scientific” float formatting. Then we would be free to design the format specification however we wish, unconstrained by the existing format. But actually, given the regex above, I don’t actually think such a measure is necessary.
I don’t know if this would all need to be written in the cpython or if it could be in python. I’ve tried to write formatter satisfying these requirements in python before. It’s not too bad. The hardest part is making it compatible with the existing formatting mini language (i.e. supporting f
and g
) if you want the formatter to be an extension, rather than a replacement, for the existing string formatting.
Next steps
What are next steps here? It seems there is decent appetite for “engineering formatted” floats. Is there enough appetite to actually try further steps beyond forum posts for this? I’ve never written a pep before but I think some of the material in this post and throughout the thread could be a start for the skeleton for a pep.