Options for a long term fix of the special case for float/int/complex

mikeshardmind · May 26, 2024, 2:57pm

The problem

The current special casing of numerics is error-prone and leaves valid use cases inexpressible.

The varying numeric types all have methods that are unique to them.

“useless” methods have been added to help hide this (3.12, int gained an is_integer() method, which if you statically know you have an int, you’d never call.)

There’s no way to express in the type system “This really only takes a float”. this has real performance consequences when arbitrary precision numerics are passed in some cases.

The places where people can run into this right now

The numeric types serve different purposes, and they interact with ffi very differently. This might seem innocuous to some, but tools that auto-generate bindings and signatures have no way to communicate they don’t handle int as a type. In the other direction, some libraries wrap native code instead branch on the ffi type to “help” users, but this case creates worse performance for users who call with arbitrary precision integers in cases where a float would have been fine. A consumer of a library that intends to support both, but that itself only has a case for floats therefore has no way to communicate this intent in the type system to its downstream users.

Is this a bug and is it worth fixing?

This is an issue that has been raised repeatedly in tangent whenever people want to change the special case or reword it, while retaining it, but not remove this special case, but the argument is always “but that’s what’s specified, so it’s not a bug”, and the effort of change is spent on rewording what we currently have rather than fixing the issues at the root of what the cause of the error is.

So, it’s not a bug in any of the implementors. Sure. But does that mean it’s not a bug? The spec itself is in error, these aren’t valid substitutes of each other, and even if the interfaces between them were entirely shared with appropriate dummy methods, there would still be good cause to only accept certain numeric types due to the prevalence of ffi with numeric computation libraries.

Okay, what options do we have for fixing it?

Option 1: Just remove the special case

It would seem that the correct option should be the union of specific types you want to accept (eg. float | int) or an abstract type if you intend to support anything that quacks like a duck, not the type system guessing that most people can probably take both, and then leaving reasonable use cases with no way to express their intent.

This would undoubtedly create a large amount of noise for people currently relying on the special case. This would be my preferred option, as it’s a single time breakage to fix this, and the existing special casing will certainly have an impact on other features people want (refinement types)

Option 2: A type checker directive

Adding a type checker directive into the spec, to not duck type numerics for things defined in that module would be the least disruptive for users, but might be the most disruptive for static type checkers and unfortunately leaves out runtime type checkers.

This gives people an off-switch for only the affected cases that are in the specification and applied to their code, (not a type checker flag that could be off in user code). I don’t particularly like the idea of more type comments that type checkers would need to understand or the impact this would have on runtime consumers.

This option would mean type checkers need to do extra work and maintain separate behavior for this case, and to track the actual type of things. if some variable x is annotated as a float and later passed as an argument to code which disables this behavior, then x must also have that behavior. This adds a flow analysis requirement to type checkers

Option 3: A type qualifier

eg. Exactly[float], where the type checker may not allow subtypes. This comes with many negative consequences, type checkers would need to track if they even know something is Exactly[float] to begin with, and this would either need be incompatible with float as a result or the presence of a single use of Exactly needs to enable flow analysis that then treats all interacting uses of float as Exactly[float]

This also comes with the implication that Exactly might be valid on other types, even those without type checker special behavior. I don’t think the blanket disallowing of subtypes is a direction truly worth exploring, but I’m including this here anyway.

Option 4: a special typing type for “just this number type”

if there was something like typing.FloatNoDuckTyping (and corresponding other numerics) this would have the same consequences of needing flow analysis and for this to bleed in based on use as exist in options 2 and 3, or comes with a situation where a float isn’t compatible with this.

Right now, I would say the only viable option is the first one here, removing the special case.

There are many other reasons why type checkers might have a use or need for flow-analysis, but options 2-4 either introduce a hard requirement of it or significantly break users as much as just removing the special case will by requiring users handle that propogation.

This hard requirement also presents new challenges for runtime type checkers, as presumably if they are only checking a specific annotation, they now need to predict future use to handle this.

Daverball · May 26, 2024, 3:24pm

I think the ship for getting rid of the special case has sadly kind of sailed already a little bit, especially considering how linters like flake8-pyi will complain if you write float | int instead of float within a stub.

So the amount of additional work you’re creating by requiring every existing float annotation to re-audited, just to get rid of the false negatives for the small number of them that actually don’t work with int, seems like a difficult trade-off to make at this point, since there’s likely a much, much higher portion of float that’s meant to be float | int than pure float out there.

While you could just replace every occurence of float with float | int in a first step in order to avoid false positives, that’s still putting a lot of responsibility on end-users for something that arguably the average end-user will not perceive as a net-win for them.

So I’d prefer any of the other options, even if having fewer special cases would make the type system easier to understand and reason about.

Nineteendo · May 26, 2024, 3:25pm

Hmm, can we special case builtins.float & builtins.complex? Doesn’t require backporting a new type:

import builtins

def foo(value: builtins.float) -> None:
    print(value)

foo(1)  # NOK

ntessore · May 26, 2024, 3:28pm

Apropos the other thread, perhaps it could suffice to change the typing spec language such that type checkers can accept ints for float, but leave the door open for a “strict” mode where float means float.

Personally, doing almost exclusively scientific computing, finding an integer where a float should be is usually a red flag. As such, I don’t think I have ever wanted to accepts ints as floats, but I have often wanted to not accepts ints as floats.

Liz · May 26, 2024, 3:44pm

A bit of a nit:

Type checkers already have to do flow analysis for typing.assert_never. It’s a weaker requirement than you’d be suggesting with options 2, 3 or 4, but it exists.

I think option 1 is the only option for fixing it, but there is still a need to overcome the status quo.

I think this is an exceptionally strong case, but even if ffi wasn’t a concern, and even if they implemented the same methods, This is also the only place where the type checker converts your type to something like a protocol for you rather than require you to do so yourself. We don’t special case list-invariance to assume people meant Sequence, we catch when list is wrong.

A discord server where this topic has already had some discussion has someone with a rough draft of a code-mod that would rewrite existing uses of float to float | int and complex to complex | float | int I don’t think this is going to be hard to enable users transition to correct anntoations here, and this could definitely also be detected by tools like ruff or pylance and provide a suggested change in IDEs to annotations if they see you were relying on the old special case.

jamestwebber · May 26, 2024, 3:57pm

I’ve definitely been bitten by this on occasion with numeric code. But it’s also pretty convenient when writing exploratory code that e.g scipy.stats distributions will take ints for parameters.

There’s another option here that doesn’t require changing the special case: functions that want exactly a float should convert integers eagerly rather than trying to branch on the type or hope that someone downstream will do the conversion for them.

Maybe some convenience methods could make this easier and avoid boilerplate. It doesn’t require changing the special case because the type signature remains accurate: yes, you can pass an int, and it will be converted into a float.

jamestwebber · May 26, 2024, 4:06pm

I’ll just add that numba is already doing this, if you annotate a function as taking a double and give it an int it converts the value to the expected type. So some code generators are able to handle this case.

mikeshardmind · May 26, 2024, 4:13pm

This doesn’t scale.

If something has a return type of float does it return a float? Every function has to then have knowledge of the inner behavior of the function or wrap every single interaction with things that say they return floats with a… conversion to float.

This really defeats the purpose of static analysis in the first place.

Nineteendo · May 26, 2024, 4:13pm

github.com

nineteendo/pyvz2/blob/7f4b90ea4637591cf00c7689e480555cd6948956/pyvz2/clinteract/utils.py#L79-L85


      
          def real_to_float(real: float) -> float:
              """Convert real to float."""
              if isinstance(real, float):
                  return real
          
              # Clamp int
              return float(max(-float_info.max, min(real, float_info.max)))

The performance impact should be minimal. Yes, it currently looks a bit weird with the annotations.

Yes, my type checker doesn’t suggest to simplify it.

alicederyn · May 26, 2024, 4:27pm

I think there’s another option: wait for type differences and use float - int?

JamesParrott · May 26, 2024, 4:30pm

Excluding ints from floats is an excellent optional feature for type checkers for users who want even stricter type checking (an implementation detail).

I think Option 1 is over kill and will cause as many problems as it solves. But would Algebraic Data Types help with this, e.g. if it could be typed:

x: float - int

or

x: float & ^int

So how about

Option 5: Wait and see.

jamestwebber · May 26, 2024, 4:38pm

As I said, there could be ways to remove the boilerplate here. There should be negligible cost if you get a float as input, and you remain compatible with the broader Python ecosystem who will definitely try to pass ints as input (and they will raise issues on the project if you don’t handle it correctly).

As others have I said, I don’t think breaking the special case is terribly likely–the backwards-compatibility implications sounds like a nightmare. So I’m trying to think of alternatives.

I can understand why the situation with numeric types is conceptually unsatisfying, but it makes a ton of sense for the user experience, which is one of Python’s biggest strengths. I don’t think compromising that experience is worth the gains here, especially when there are acceptable alternatives (in my opinion).

mikeshardmind · May 26, 2024, 4:41pm

Neither Intersections nor Differences can be added to the type system in a way which would be consistent without rules for consistent subtyping. This special case violates all concepts of consistent subtyping itself. While there is an ongoing effort there, this continues to be a pain point in the type system that’s been brought up over the years, and every time people seem to ignore the obvious answer: Let people express what they intend. it’s not hard to write float | int, it works today and isn’t waiting on some feature that has a long road ahead of it on specification.

This has been raised multiple times over the years, and keeps getting kicked down the curb. The further it gets kicked, the more people say “well, it’s the way it’s always been” and ignore that there are users negatively impacted by this. I’d rather see this fixed than continue to be ignored.

The existing special cases that beak the general rules in the type system have all caused pain points for more advanced features.

Nineteendo · May 26, 2024, 4:45pm

How about this?

mikeshardmind · May 26, 2024, 4:46pm

For those not aware, this has received multiple bandaid fixes rather than a full fix dating back to at least 2017, there’s a lot of history here, and we’re currently at a point in time where the specification for typing is being cleaned up, clarified, and fixed.

If it’s ever going to be fixed, the best time was when it was originally raised, the second best time is now.

alicederyn · May 26, 2024, 4:49pm

No, I take that back, you haven’t ignored it

mikeshardmind · May 26, 2024, 4:52pm

Yeah, but I’ll repeat it.

I am actively arguing for an option that might require some users to replace existing annotations that say float with float | int and complex with complex | float | int. This is likely an automatable process for users who want their existing semantics.

I understand this will have an impact, I also think fixing it is better for the long-term health of the type system, and for use cases that have been deemed unimportant by comparison over the years.

alicederyn · May 26, 2024, 4:56pm

Not “some users”, every library that uses the annotation. And it can’t be fixed retroactively; that’s a lot of old library version that are suddenly going to fail type-checking.

You haven’t ignored the problem, but I do think you’re minimising it.

mikeshardmind · May 26, 2024, 5:01pm

Only cases where people were relying on the special case, rather than their intent was “this takes this type” and the type system allowed more.

It can’t be fixed retroactively, but it could be a coordinated change announced in advance to take place on a specific date. This would allow current code to write float | int now, and work now and after the date.

I don’t think I am, by contrast, multiple people here are saying that “it’s not a big deal to do an extra conversion to float in numeric code”, and have missed that this conversion would have to be pervasively applied everywhere because a function that says it returns a float can’t actually be trusted to.

alicederyn · May 26, 2024, 5:04pm

I can’t imagine how you’d actually communicate this to users though. You can’t even do a deprecation notice because then you’d be preventing people using float to mean “without int” (without having to suppress that deprecation warning everywhere). What’s the actual, step-by-step plan for getting the whole community to do this migration safely?