New format specifiers for string formatting of floats with SI and IEC prefixes

ajoino · May 20, 2023, 9:13pm

A couple of things:

I was pointing out that your statement that centi- and deci-i are not part of the SI standard is false. I had never heard of “engineering notation” before (and I have a master’s in engineering physics) and at best it seems to be a soft standard.
Neither your OP nor the package description mentions “engineering notation”, which makes it seem like you are moving the goalposts of what should be accepted into the float type.
I think that by suddenly claiming that only “engineering notation” should be accepted excellently supports the point that people don’t agree on what kind of formatting should be supported.
From personal experience I don’t see the need for automatically generated SI-prefixes. In my experience, SI-prefixes are typically used in normal text or tables, in which I tended to format the text in Latex. In figures generated from matlab/python I used scientific notation.
Much like others have said, I think the kind of formatting one would generally want to achieve is complex and requires more info than simple formatting specifiers allow. Of the top of my head, I think the API of such a formatting function needs four inputs. The number (any non-complex type), the number of significant digits, the SI/IEC-prefix you want to use, and lastly wether to write the whole number out or use scientific notation.

avylove · May 21, 2023, 2:30pm

@ajoino To be honest, I didn’t know what I implemented was called engineering notation (or maybe I forgot) until I was talking with my friend who used to work at NIST about this thread, and he said, “Oh, you mean engineering notation?” But it’s been the same everywhere I’ve been since I first took technical drafting in high school almost 30 years ago.

I think you’re right in we can’t know everything someone wants to use, but your mistake is focusing on the prefix. If you know what prefix you want to use, then you do the math and use that prefix. The purpose of engineering notation is to express magnitude. Whether you use the exponential form or the SI-prefix form, that is the intent. It is not to address specific use cases of specific prefixes.

avylove · May 21, 2023, 2:31pm

Justin Gerber:

My opinion is that it would be nice if the following were built in format specifications for float:

A formatting mode that ALWAYS displays a specified number of sig figs (not precision) along with additional options for forcing the display to be in

Standard notation. i.e. no scientific notion/exponent: 123.456 → 120 for 2 sig figs

Scientific notation: The non-zero number is set such that its mantissa satisfies 1 <= m < 2 so that 123.456 -> 1.2e2 for 2 sig figs

Standard engineering notation: The number is shown in scientific notation but the mantissa is chosen such that 1 <= m < 1000 and the exponent is forced to be a multiple of 3 so 123.456 -> 120e0.

“Shifted” engineering notation: the exponent is again a multiple of 3 but the mantissa ranges between 0.1 <= m < 100 so that 123.456 -> 0.12e3.

The existing #.2g format specification realizes the goal of always displaying the correct number of sig figs BUT, it has an automated routine to determine whether it uses standard or scientific notation. The user can’t control this automated selection with options in the format specification. Furthermore, there is no functionality for engineering notation.

I have long thought the treatment of precision as number of decimal places or as significant digits should be controlled through a flag. The only reason I implemented it as additional presentation types was to try to align with 'g'. But perhaps that was the wrong approach. I would very much be in support of introducing a new flag to indicate precision should be interpreted as significant digits.

I’d also be in support of adding a presentation type for engineering notation in exponential form. There seems to be a lot of support here for that. However, if we do this, then it would also make sense to include binary engineering notation (ex: 3.5b20 or 3.5B20 for 3.5 x 2²⁰ or 3.5 Mi).

The prefixes are not a NIST standard, it’s a International Bureau of Weights and Measures (BIPM) standard. BIPM is supervised by the International Committee for Weights and Measures (CIPM), the same organization that sets the international standard for things like the length of a meter, weight of a kilogram, length of a second, etc. I think a lot of the push back is from a misunderstanding of what this form of engineering notation is. It’s the same concept as the exponential form. 1.25 x 10^-9 is 1.25E-9 in exponential form and 1.25 n in prefix form. The n tells you the magnitude and can only be expressed as a base 1000 prefix, that is set by an international standard. If you only want to use a specific prefix, you’re not using engineering notation.

Yes, it would make it easier, but it would be better if there was a mechanism to register additional presentation types at runtime so you don’t have to subclass types.

Elegance, yes, but also usability. If you have to call a function, you can do that with f-string formatting, but a common use case for me is specifying the format of progress bars using str.format() syntax with named fields. This requires a subclass in order to provide the end user with specifiers they can use to control the format. Ex: {count:!.2m}{unit} / {total:!.2m}{unit} You can look at the Enlighten progress bar package for more details.

I think what we’ve touch on here is the current specification is lacking and there is some appetite for change though not exactly in the way I proposed. Perhaps the discussion should be about what ways can we change it rather than about my initial proposal.

One thing no one has seemed to push back on is use of IEC prefixes (Ki, Mi, Gi, …). I’m not sure if that is due to lack of objection or if its just been overshadowed.

I’d like to evolve the initial proposal to more general things:

Add a flag to denote precision should be treated as significant digits
Add a presentation type for engineering notation in base-10 exponential form
Add a presentation type for engineering notation in base-2 exponential form
Add a presentation type for engineering notation in base-10 prefix form (base 1000 prefixes)
Add a presentation type for engineering notation in base-2 prefix form (base 1024 prefixes)
Investigate if there is an elegant way to allow users to register additional presentation and map them to functions at runtime to reduce the need to subclass built-in types. Ex: register_presentation_type(float, 'c', convert_to_cm)

pf_moore · May 21, 2023, 2:49pm

Personally, it’s because I’d ignore them in favour of just using the more “natural” K, M and G. But that’s because my use case is “human readable” output, not “engineering notation”, so you’d probably consider my use case out of scope

jagerber · May 21, 2023, 5:34pm

Ok, I see you want letters like k and n to stand as aliases for e+03 and e-09. I guess I’m not as strongly opposed to that as I was before. But still not in strong support. Usually if I find my self wanting to use kilo or nano suffixes it would be better to include the unit as well, e.g. kg, nm. In this case I would be comfortable passing the float and unit information into a function that can parse it, come up with the right order of magnitude, and print it along with the unit.

But I can imagine use cases for convering 12e-09 into 12 n. I’ve seen something like this in a plotting package where tick labels used this sort of notation. It was a nicer way to show order of magnitude than x10e-09 somewhere near the plot axis, and it was unit-independent. But again, a plotting module already has a LOT of formatting helper machinery, I don’t think it’s too burdensome on such a package to do the conversion from 12e-09 to 12 n.A

I like your new breakdown of the proposal

Yes, I would love this.

Yes, I would love this. Though a warning: I would really prefer to also have a “shifted” engineering notation as well. So I’d like an ADDITIONAL presentation type or “alternate” flag I could use to access shifted engineering notation. This requires another feature and a more complicated format specification.

I would not use this but I can imagine there are a lot of users who would be in interested in something like this. I can definitely see the benefit. But at this point I do really start to worry the formatting notation is getting very jam packed. It’s already pretty jam packed with lots of confusing options. In your post you write 3.5 x 2^20 as 3.5B20. Is such notation standard or are you making that up?

Like I said above, I’m soft no on this. I can see use cases, but feel this level of convenience is probably going a little too far… Especially re: the difficulty wrt c, d, da, h prefixes. See the last point below. I think there’s also difficulty for number close to one. We have 1e+06 -> 1 M, 1e+03 -> 1 k, 1e-03 -> 1 m and 1e-06 -> 1u but 1 -> 1 with no alphabetical prefix. A minor inconsistency I guess.

Same arguments as the point above. I’d have this feature come or go with the one above.

Yeah this is messy and seems to be a radically different feature for float string formatting than exists now. i.e. the ability for users to modify at runtime how built-in formatting is going to work. What if I want 1e-02 to format as 1 c in one part of a session or module but I want it to register are 10 m in another?

MRAB · May 21, 2023, 7:03pm

Shouldn’t 1e-06 -> 1u be 1e-06 -> 1µ? Note that that’s 'µ' ('\N{MICRO SIGN}'), not 'μ'('\N{GREEK SMALL LETTER MU}').

If it was possible to register additional formatters, perhaps they should be called in an explicit extension to the current format:

>>> register_format('si', format_si)
>>> value = 123456
>>> f'{value:.2:si}'
123.46k

ajoino · May 21, 2023, 8:07pm

I tend to get confused by the formatter, does .2 mean two decimals or significant digits in f"{123.456:.2f}? In any case, for your example I think for an SI-formatter you should print 120k, since you typically work with significant digits if you use SI. Though I guess that shines a light on why you’d don’t want to use prefixes, 120k is ambiguously both 2 and 3 significant digits.

MRAB · May 21, 2023, 8:19pm

The . represents the decimal point and the 2 represents the number of digits after the decimal point.

jagerber · May 21, 2023, 11:12pm

Many points here:

In this example it’s clear that .2 means a “precision” of 2 which means two digits past the decimal place.

This sig digits behavior would be desirable to me as a default, but I think the best course of action is to provide options for formatting either based on precision (digits past the decimal point) or based on sig figs (digits past the largest digit). (1) I think there are legitimate use cases for specifying digits past the decimal place (currency perhaps?) and (2) the .2 notation is so ubiquitous to indicate 2 digits past the decimal place that it would be challenging to drop that style of formatting entirely.

But there are a few things mixed up in this sentence. (1) is the use of SI prefixes (2) is the use of engineering notation (i.e. expressing a number in base/exponent format with the exponent forced to be a multiple of 3) (3) formatting a number with a specified number of sig figs and (4) using sig figs to indicate uncertainty on a quantity. All 4 of these things are orthogonal. 1-3 are all about formatting of floats into strings and can all be selected independently (e.g. you can use engineering notation formatted based on either precision or sig figs with or without SI prefixes*). 4 is about the science of uncertainty and is totally unrelated to formatting numbers. I personally think sig figs are a terrible way to express uncertainty, in part because of the ambiguity you point out. Nonetheless, whether you use sig figs to indicate uncertainty or not, scientists ALWAYS have reasons to display numbers to a specified level of uncertainty. But I emphasize, the use of sig figs to indicate uncertainty should NOT be conflated with formatting/rounding floats based on sig figs.

*I guess SI prefixes don’t makes sense unless you’re also using engineering notation.

ajoino · May 22, 2023, 5:45am

Fair enough, though I disagree that using significant digits is terrible just because of one particular way of notating things. Had it said 1.2e5 there would be no ambiguity, but that is already supported so nothing to do there

jagerber · May 22, 2023, 6:41am

I want to list all the features that I would find useful for scientific formatting. Then after I’ll give proposals for how these features/options could be integrated with existing float formatting

Some definitions

“precision” is defined to be an integer that represents the number of digits after the decimal point, so that 1.2 has a precision of 1 while 1.2345 has a precision of 4. A precision of zero indicates no digits after the decimal place.
“sig figs” is defined to be an integer that represents the total number of digits displayed excluding leading zeros so 1.2 has 2 sig figs, 1.2345 has 5 sig figs, 001.00 has 3 sig figs. sig figs must be greater than or equal to 1.
All non-zero real numbers can be expressed as r = m x b^e where m is a real number mantissa, b is a non-zero natural number base, and e is an integer exponent.

Proposed features:

1. This new formatting specification should be accessible on the built in float objects so that users can access it with standard f-string formatting for built in floats. (without this requirement users can use a pypi function or class to achieve the rest of the requirements).
1. nan, inf, and -inf get formatted to 'nan', 'inf' and '-inf' respectively
1. If the precision of a printed mantissa is 0 then, by default, no decimal point is shown.
- 3a. I have no use for floats to be formatted like 123. instead of 123. If there is demand for this then I wouldn’t be opposed to a flag that can be used to include a decimal point on all mantissas with 0 precision.
1. Specification of the precision of the mantissa for the formatted float. Here the user provides an integer indicating the precision with which the number is displayed. The float will first be rounded to the corresponding precision and then all digits with smaller precision than specified will be truncated.
- 4a. I think negative precisions should be supported. In this case the float will be rounded to the corresponding precision and then displayed with precision down to (and including) the ones place. e.g. 123.45 with precision = -1 would be displayed as 120.
1. Specification of the sig figs of the mantissa for the formatted float. Here the user provides an integer indicating the number of sig figs with which to display the formatted float. e.g. 123 would be displayed as 120 with 2 sig figs and 123.00 with 5 sig figs.
1. If no neither precision nor sig figs are specified then the number will be displayed with the minimum precision such that the resulting string round trips back to the same float. I believe this is the current behavior for float formatting.
1. Standard formatting (default format type) in which the float is simply displayed as a number with digits before and maybe after a possible decimal point.
1. Scientific formatting in which the float is displayed in base 10 with an exponent chosen so that the mantissa satisfies 1 <= m < 10. However, if the float is 0 then the mantissa will also be equal to 0.
1. Engineering notation in which the float is displayed in base 10 with an exponent chosen so that it is an integer multiple of 3 and the mantissa satisfies 1 <= m < 1000. However, if the float is 0 then the mantissa will also be equal to 0.
1. Shifted* engineering notation in which the float is displayed in base 10 with an exponent chosen so that it is an integer multiple of 3 and the mantissa satisfies 0.1 <= m < 100. However, if the float is 0 then the mantissa will also be equal to 0.
1. Scientific, Engineering, and Shifted engineering notation all display floats in the following way. The mantissa is displayed according to rules above immediately followed by either the e or E character (configurable by user option) followed immediately by the exponent displayed with a minimum of 2 digits. i.e. 123.45 formatted with 3 sig figs would be 1.23e+02.
- 11a. I don’t think the minimum number of digits displayed in the exponents should be user configurable
1. Binary formatting in which the float is displayed in base 2 with an exponent chosen so that the mantissa satisfies 1 <= m < 2. However, if the float is 0 then the mantissa will also be equal to 0.
1. IEC Binary formatting in which the float is displayed in base 2 with an exponent chosen so that it is an integer multiple of 10 and so that the mantissa satisfies 1 <= m < 1024 = 2^10. However, if the float is 0 then the mantissa will aso be equal to 0.
1. Binary and IEC Binary notation display floats in the following way. The mantissa is displayed according to rules above immediately followed by either the b or B character (configurable by user option) followed immediately by the exponent displayed with a minimum of 2 digits. i.e. 15000 formatted with a precision of 2 would be 1.83b+13 in Binary or 14.65b+10 in Binary IEC.
- 14a. I don’t think the minimum number of digits displayed in the exponents should be user configurable
1. Three available sign modes. -, +, and ' '. In - mode (default) a sign character is shown for negative floats but not for positive floats. In + mode a sign character is shown for all finite floats. In ' ' mode a negative sign is shown for negative floats and an extra space character is prepended for positive floats (This feature can help ensure positive and negative floats have the same string width).
1. There shall be no appending of trailing zeros. The number of trailing zeros shall only be configured by the precision/sig fig specification.
1. Leading zeros may be prepended by specifying the minimum digit slot to which zeros must be padded. In other words, -10.45 padded to the thousands (10^3) spot would be -0010.45. Leading zeros always appear after the sign character. No extra padding by default.
- 17a. I recommend prepending leading zeros be specified in terms of the digit slot to which zeros should prepended as opposed to the controlling the width of the overall string. See 17b. below.
- 17b. Current python float formatting supports left, right and center justification of a float within a field of an arbitrary fill character. I do not think this feature should be supported for strict float formatting. I consider left and right padding by arbitrary characters until the string is a certain size to be a string formatting feature, not a float formatting feature. If a user needs to do this can first format the float and then re-format that string so that it has the width they desire. However, this strategy doesn’t work to insert leading zeros between the sign and the top digit, so that specific feature makes sense for float formatting.
1. The ability to use commas as delimiters between groupings of 3 digits for decimal places above the decimal point. Also the ability to use spaces or underscores as delimiters between groupings of 3 digits above AND below the decimal place. This will allow formatting like 8.854 187 812 8. Default is no grouping characters. (Sorry for sneaking this feature request in here, but it came up when I was reminding myself about details of the current formatting mini language. This feature is technically orthogonal to the feature requests about engineering/binary notation etc.)
- 18a. I could imagine someone wanting to mix grouping symbols and have commas for digits above the decimals and spaces or underscores below. Not sure if that should be supported…

I would like to see all of the above features go through. For completeness I’ll include requirements for the prefix proposals:

1. Option to convert exponents into SI and IEC prefixes. For Scientific, Engineering, and Shifted engineering formats, if the exponent is a multiple of 3 then it can be replaced with the corresponding SI prefix. e.g. 3.1415e+06 is converted to 3.1415 M Likewise, for Binary and Binary IEC formats, if the exponent is a multiple of 10 then it can be replaced with the corresponding IEC prefix. i.e. 123b+20 can be replaced by 123 Mi. This option is off by default.
- 19a. I do not think there should be an option for no space between the mantissa and the prefix. For the SI notation that would violate NIST standards for formatting numbers with units. I’m not sure about the IEC standards.
- 19b. I do not like the idea of supporting 123 Mi = 123 M. If that is supported then it’s ambiguous whether 123 M = 123e+06 or 123b+20. Users who want to do that will have to strip off the i themselves. See proposal 20 below.
- 19c. Usage of prefix mode coerces one of Engineering, Shifted Engineering, or Binary IEC modes. Mixing requests for prefix mode with standard of binary modes should result in an invalid format error.
- 19d. If the exponent falls outside the range for which prefixes are defined then no prefix conversion occurs.
- 19e. Some way (how?) to optionally convert some possibly user-selected subset of e-02, e-01, e+01, e+02 to c, d, da, h respectively.
1. Strings expressing floats using prefix notation should be convertible back to floats. That is float(12.3 Ki) should be yield the float 12.3b+20

*This is my invented term, not an officially recognized term. See section 7.9 here on “Choosing SI Prefixes”: NIST Guide to the SI, Chapter 7: Rules and Style Conventions for Expressing Values of Quantities. This is the most official guidance I’ve seen on the use of engineering notation.

Implementation discussion:

It might be possible to integrate this with existing float formatting. In my opinion, the largest challenge here would be handling the padding behavior (proposals 16 and 17). The existing fill, align, and width behavior aren’t really compatible with those proposals.

Here’s an example regex that captures almost all of the features above. I think it captures everything except 18a and 19e.

import re 

pattern = re.compile(r'''
                         ^
                         (?P<sign_mode>[-+ ])?  
                         (?P<trailing_decimal>\#)?                         
                         (?P<top_pad_digit>\d+)?
                         (?P<grouping_option>[,_v])?                     
                         (?:(?P<prec_type>\.|\.\.)(?P<prec>-?\d+))?
                         (?P<format_type>[neEhHkK]|sh|SH)?
                         (?P<prefix_mode>p)?
                         $
                      ''', re.VERBOSE)

I’ve used v to denote space-character grouping symbols. I’ve also used the .. separator to flag sig fig rather than precision mode. Inspired by the prefixed package I’ve used h and H for engineering notation and k and K for binary and binary iec notation. I’ve included a p flag at the end to indicate exponents should be replaced with prefix characters.

edit: edited the regex to support negative precisions. Note I haven’t fully tested this regex. Just posting it here for now as a concept.

I think this regex is pretty much compatible with the existing float formatting specification. The existing fill and align fields would be ignored. The “z” field could be included in the regex I’ve shown here. The 0 field would be ignored. I’d recommend the width field be re-interpreted as the top digit place the user would like to fill with zeros (rather than the desired final width of the string). The grouping option is the same with the addition of the whitespace grouping character. The precision is now specified with either . for precision or .. for sig figs and the type includes nehk or EHK, where the capitalization here will set the capitalization of the exponent symbol. I’ve also co-opted the n symbol for “standard” format.

I would say that existing formatting types such as f and g should be parsed exactly as they always have been.

One idea is that rather then forcing this new formatting specification into the existing language, we could open up a new namespace, perhaps prepending with a colon (again inspired by prefixed) or something to indicate we want “scientific” float formatting. Then we would be free to design the format specification however we wish, unconstrained by the existing format. But actually, given the regex above, I don’t actually think such a measure is necessary.

I don’t know if this would all need to be written in the cpython or if it could be in python. I’ve tried to write formatter satisfying these requirements in python before. It’s not too bad. The hardest part is making it compatible with the existing formatting mini language (i.e. supporting f and g) if you want the formatter to be an extension, rather than a replacement, for the existing string formatting.

Next steps

What are next steps here? It seems there is decent appetite for “engineering formatted” floats. Is there enough appetite to actually try further steps beyond forum posts for this? I’ve never written a pep before but I think some of the material in this post and throughout the thread could be a start for the skeleton for a pep.

jagerber · May 22, 2023, 6:43am

It’s best to specify uncertainty explicitly like 12.34 +/- 0.33. Relying on sig figs has a lot of ambiguity. The formatting of val/uncertainties into strings is actually the use case that got me deeply interested in how python can convert floats to strings. The proposals here would really help make that use case easier I think.

ajoino · May 22, 2023, 7:16am

I agree that if you have a model where you can calculate error bounds they should definitely be included. But then one might ask what the error bounds of the error bounds are? You say they are 0.33 precisely, but surely those have error bounds as well? At some point you got to ve practical and round to significant digits IMO

Rosuav · May 22, 2023, 7:59am

Error bounds don’t themselves have error, they are an outer limit to the potential truth. But Python float formatting isn’t trying to represent values with their error bars, it’s just trying to format numbers. As such, “significant digits” isn’t actually all that useful either - the most useful metric is “digits after the decimal place”, which allows you to line up tabular data conveniently. That’s also how other languages define precision, so Python does well to maintain that.

I’m not sure how useful it is to have both precision and sig figs, but I’m leaning towards that being unnecessary complexity.

ajoino · May 22, 2023, 8:38am

I know I’m going severly off-topic here, so this’ll be the last thing I say about this unless we split this error bound/significant digits discussion into its own topic.

Error bounds are, like all other parts of a statistical model, calculated given some data. As the data is only a sample of the true distribution, all parameters of that model, including typical error bounds like standard deviation of a normal distribution, will not be precise and can (should says the bayesian in me) be modeled such that you get an error bound on them as well. But I know not all would agree, otherwise there wouldn’t be a split between bayesian and frequentist statisticians :). I’d be happy to continue the discussion but then I think we’d need branch into a new topic

Rosuav · May 22, 2023, 9:05am

Ah, you’re thinking in terms of stats. There are other situations where they have different meaning.

In any case, I think we can agree that a single float value simply doesn’t HAVE error attached to it - there’s no way for a Python float object to represent both a value and its accuracy - so for the purposes of this discussion, it’s more useful to look at “digits after the decimal” and other formatting-related concepts.

(A proper stats-oriented or measurement-oriented “value span” would be a completely different beast, and probably would need its own formatting codes.)

jagerber · May 22, 2023, 12:59pm

Yes, we round the error bars to one or two sig figs. But this is just a conventional truncation of the information. We don’t round to one or two sig figs because our uncertainty on the uncertainty is at the one or two sig fig level, it is because we don’t care about information below that level (since we’re usually more deeply interested in the value than the uncertainty). But, you are right that in the process of formatting values with uncertainties, rounding and presenting to a specified number of sig figs is always a required step. Hence a very important use case for the ability to present numbers based on sig figs, not digits past the decimal.

In my post above I emphatically tried to point out that formatting a number to a specified number of sig figs should NOT be conflated with implying the uncertainty on that number. It’s true that specifying the uncertainty on the number is ONE common use case for formatting a number based on sig figs but it’s not the only one. Maybe I have a table of data with 11.2874 and 8.56135 and I want them to appear in a table together where each row has 3 digits. I want to display 11.3 and 8.56. That is, I prefer this to 11.3 and 08.6 because I prefer the extra precision on the 8.56 reading to a meaningless padded zero.

Yes, my understanding is that the current string formatting conventions are based on conventions from other languages. Here’s what I’ll say: python ALREADY has the ability to format data to a specified number of sig figs. This can be done using the #.{n}g formatting flags where n is the number of significant digits to be displayed. I think this speaks to the importance of being able to format numbers based on sig figs rather than precision.
The downside of this strategy is that you cannot explicitly specify whether you would like to use standard or scientific notation. In an overly opinionated (in my opinion) move, the algorithm automatically makes a decision about this based on the magnitude and precision of the number being formatted. And then, of course, there’s no options for formatting to engineering notation.

And I guess I’ll just emphasize again: it’s extremely frustrating when the existing formatting methods fall short because it means I either have to cast all floats to a custom type before formatting or call a custom function to do formatting, in the first case incurring extra lines of code for formatting, or in the latter, losing pretty much all of the elegance of f-string formatting. So if I’m insistent on how I want something to be formatted, “close-but-not-quite” on the features offered by built-in formatting is pretty much the same as built-in formatting being useless.

It is certainly additional complexity, that can’t be denied. Most feature requests into the language are going to increase complexity. For scientific users I think it will bring a lot of value.

pf_moore · May 22, 2023, 2:20pm

It would be very unusual in the contexts I’ve encountered to do this. Far more normal is to align the decimal points, in which case “digits after the decimal point” is very definitely the best form, and is difficult to achieve unless it is available as a primitive operation.

steven.rumbalski · May 22, 2023, 2:21pm

This seems like a lot of complexity added to support a minority of users. These seem like good ideas for an external package on PyPI, but not for inclusion in Python. How do other languages support these use cases?

jagerber · May 22, 2023, 3:07pm

As I’ve said, the g formatting flag almost covers the sig fig use cases. The problem is the auto-selection it does for standard vs. scientific notation. Here is the documentation on that feature:

In this github issue @mdickinson points out that this and other potentially confusing behaviors of the existing formatting language (including the highly variable behavior of the g formatting flag) were inherited from C. I don’t know much about other languages but a very quick google search looks like this behavior was inherited in some other languages also. Unfortunately (because this is the crux of my issue), I don’t know specifically if this behavior about auto-selecting standard vs scientific notation is also adopted in other languages.

Are scientific users really such a minority of python users that no complexity can be added for their benefit?

At minimum, what would be really helpful would be an additional flag which forces the g formatting mode to always use scientific notation. If that existed then users could pretty easily spin their own scientific → engineering notation codes without needing to worry about ALL the complexity of formatting floats AND without needing to introduce an additional dependency if they don’t want to do that for some reason.

Here’s a question about a hypothetical pypi implementation. If someone were interested in making an authoritative scientific float formatting pypi package, would it be preferable for the formatting language in that package to extend the python formatting language, or replace the python formatting language?

The pros I see for replacing the language are (1) The new language will be unrestricted by previous conventions (2) Many of the features in the python float formatting language are actually more geared towards formatting strings than floats, so these are seen as cruft from the perspective of scientific foat formatting (3) Using the language from above in this thread, the e, h, and k formatting flags combined with explicit specification of precision vs sig fig formatting mostly obsolete the f and g flags so they could be dropped.

An important point about this approach is that it relies on the user at some point looking up the new formatting language and noting that it differs from the python float formatting language. I think this is a somewhat reasonable expectation because, no matter what, the user is already importing something special and adding functions to convert floats to strings or casting floats to a special class. I think it can be expected that the user looks up the documentation for those functions/classes.

The pros I see for extending the language are (1) If users want access to the familiar python flags and conventions on the scientific formatting float type then they are available. (2) In particular, the g flags does provide some auto-selection of standard vs scientific notation that some users (not me) might be interested in. (3) Extending the python float formatting language probably makes it more likely that the language will be adopted at some point in the future by python.

What are others’ opinions about whether a scientific float formatting pypi package should extend or replace the existing float formatting language?