New format specifiers for string formatting of floats with SI and IEC prefixes

avylove · May 22, 2023, 4:49pm

Negative precision is an interesting concept. I think this could be very useful!

I think this would be better done through a “margin” field like it’s done in Prefixed. Then it is more flexible and can be applied more generally. For example, moving the boundary from 1000 to 100 would be a margin of -90, since 100 is 90% of 1000. In Prefixed this looks like f'{Float(123.456):%-90h}'.

I disagree that the user should have to reformat the string, but I do agree that fill on the number and fill on the string are different operations and the current spec is lacking in this area. It would be great if there was a precision-like field for the left of the decimal.

I’m not apposed to this. I almost always add the space for readability.

I’m not apposed to this either, but as @pf_moore indicated, many people use a single capital letter. I agree this is ambiguous and I never know if they mean base-10 or base-2. It would probably be good if the language enforced the standards.

Yes, I did leave this off the original post, but Prefixed does do this and I think it’s important.

I’m torn on this one. In one way it’s simple, but in another, it’s easy to accidentally include two periods when one was intended and not realize it.

Agreed. None of the existing behavior should break. Both for backwards compatibility and to ease people moving from other languages that also derive their format specification from C.

This is an interesting idea! I think it would only make sense if we were to come up with a very functional, yet incompatible specification. I’m not even sure what that would look like, but it would be interesting to think about. Perhaps a new spec could address things like customization.

I don’t think we’re talking a lot of complexity nor a minority of users. Just looking at the number of code examples, questions, and existing PyPI packages that exist indicate the current spec if lacking. Even if we implement a few changes it would reduce downstream complexity and duplication of code.

I think there is still some things to work out. The question is if it make sense to do that in this thread or in a working group. One thing is we should probably break things down into problems and proposed solutions. For example, “The existing format specification does not provide sufficient control of significant digits.” Then the proposals of a new flag or a change to the precision field such as supporting '..' with examples and pros and cons. We’ll need this for a PEP anyway, but I don’t think there is enough consensus yet for a PEP. However I haven’t really participated in this process before, so I’m not really sure.

ChrisBarker-NOAA · May 23, 2023, 8:01pm

Absolutely

That COMPLETELY depends on the use case. Sometimes you know the order of magnitude of your numbers, and sometimes you don’t.

And for any values that are very far off from O(1) number of digits after the decimal makes no sense either. Though the e formater works OK for that.

Anyway, on the the OP’s idea – my thoughts as a scientist/engineer:

+1 on the IEC formats – I, as I’m sure many others, have written hacked together versions of this many, many times. (and put it in a utility function that I then forget, and rewrite…). This would be really nice to “just have”

-1 on the SI prefixes – I am almost always using particular units for a reason, and want to stick with that, I can’t imagine when I’d want, e.g. grams to be converted to kg for me, and just for display. And if I did, the would I do:
f"{mass:.2h}g" ??

That’s just me, but I’d never use it.

+1 on engineering notation, e.g. like g but always a multiple of three exponent.

I’d use that.

jagerber · May 27, 2023, 12:11am

Upon more thought, I’ve identified exactly what sig fig feature is missing in the built-in format specification. Including the details here in case anyone now or in the future is interested.

The built in formatting can always display numbers to a specified number of sig figs just using exponential mode directly. E.g. f'{123.456:.3e}' gives 1.235e+02. '.pe' will format to p+1 sig figs.

The built in formatting will also always display a number to a specified number of sig figs using the #.pg mode. The downsides are (1) the inclusion of the # means that mantissas with no fractional part will have hanging decimal points and (2) The user has no way to coerce fixed point mode.

More on point (2): specifically, according to the -4 <= exp < p rule (a) it is impossible to display floats less than 0.001 in fixed point mode while specifying sig figs and (b) It is impossible to get fixed point mode while specifying fewer sig figs than there are digits to the left of the decimal point.

Point number (b) says that it is impossible to pass any formatting string that will convert the float 123456 into the string '120000'. I imagine this was the intention behind requiring exp < p for fixed point mode. The original authors didn’t want the possibility that float format would look like it’s rounding (as opposed to just truncating) the floats passed in. Of course the float formatting DOES do rounding all the time, f'{1.77:.1f}' gives '1.8', but I can see the argument for not rounding “above the decimal point”.

So in other words, built in formatting is comfortable with sig figs as long as the least significant digit is at or below the decimal point.

Including sig fig formatting would certainly make it possible to pass a formatting specification that converts 123456 to '120000'.

jagerber · May 27, 2023, 10:32pm

Ok, I’ve been wanting scientific formatting for a long time and have been kicking around code to do what I want (and researching existing packages to see what is currently offered) for quite some time. I’ve finally taken some time to bundle the code a little bit more nicely and make it more flexible. Here’s the result:

This code was heavily influenced by the prefixed package we’ve been discussing, as well as the uncertainties package which has functionality to format pairs of value/uncertainty floats into value/uncertainty strings like 84.23 +/- 0.08. I’ll also highlight the sigfig package (which uses the round function for sig fig rounding as well as string formatting) which has a stated goal to be included in stdlib.

See the readme for details about what features are offered. The notable features are:

Explicit control over sig fig vs precision mode
Engineering and shifted engineering notation mode
Binary and Binary IEC modes
Explicit exponent selection
Prefix mode
Flexible thousands, thousandths, and decimal separator configuration
Global configuration overrides (including temporary override with a context manager)
Formatting for value + uncertainty pairs (forthcoming)

To bring it back to the discussion at hand:
There is a section near the top of the readme where I describe the shortcomings of the built-in float formatting for scientific float formatting and there is a section near the bottom of the readme where I describe the incompatibilities between the design choices I took for sciform and the built-in formatting. This is basically a list of the features which I think, while they may be useful for generic string and maybe numeric formatting, they are not critical for scientific float formatting.

The minimal interesting changes to built in string formatting would be adding one or both of (1) floating point sig fig formatting and/or (2) engineering notation. I’m getting the sense from this thread that none of the features discussed in it will make it into stdlib/built-in at any point. Nonetheless, I think there’s still a next step to convince people which would be if someone put together minimal and backward compatible extensions to the built in formatting language which support one or both of the above features, especially handling/making decisions about some of the tricky edge cases. Perhaps with hard implementations in hand people could be convinced to add at least one of these features to built in.

@avylove I think what I described in the last paragraph would be pretty much exactly what prefixed did if prefixed continued to use numerical exponents (like e+02) and if prefixed didn’t handle binary scientific/IEC formatting. I wonder if it would be easy to fork prefixed and make something which is a more minimal extension of the built in formatting.

mdickinson · May 28, 2023, 8:44am

I think this is the crux of it. I think of formatting as the composition of two separate operations: a rounding operation (whose target is some kind of intermediate decimal type), followed by a presentation operation. Those two operations are largely independent, but can’t be controlled independently: f format says “round to p places after the point, then present in non-scientific notation”, while e format says “round to p+1 significant figures, then present in scientific notation”. There’s no built-in way to mix and match (“round to p significant figures, then present in non-scientific notation”), even though it’s perfectly meaningful to do so. (As already mentioned, g mode is more of a do-what-I-mean-without-me-having-to-think-too-much mode, and has hard-coded choices that make it not particularly useful in situations where you want fine control.)

The more general problem here, which seems quite fundamental, is that (a) there are many many possible variations that people might want for formatting, but (b) there’s a limit to what can reasonably be encoded in the formatting mini-language without it becoming an unmaintainable and complex (not to mention unreadable) mess - IMO, the problem space is just too big to be able to be expressed clearly in the mini-language. The best that the mini-language can hope to do is to cover common use-cases.

Luckily, it’s easy to escape the restrictions of the mini-language by doing your own formatting directly, and I think f-strings make that option non-horrible to read:

from fancy_formatting import FancyFormatter

f5sig = FancyFormatter(
    significant_figures=5,
    presentation_mode="fixed",
    trim_trailing_zeros=False,
    sign_on_negative_zero=False,
    ...  # more options here
)

fruitiness: float = do_calculations_here()
print(f"Optimal fruitiness is {f5sig(fruitiness)}")

avylove · June 5, 2023, 1:38pm

I think that’s what we’re trying to do here, expand the language slightly to cover common use cases. I agree, we will never satisfy every user need, but we can cover the most common and most obvious use cases without adding too much bloat.

There has been a lot of drift in this thread and some changes to the original proposal, but the crux of it is that the current implementation requires programmers to reimplement the same workarounds over and over again. Change is inevitable in a spec based on ~50 year old design. And I don’t think there the changes proposes would hinder the mini-language. We’re mostly talking about one new flag and a few presentation types.