What should be the default value for `int.to_bytes(..., byteorder=?, ...)`

barry · September 13, 2021, 4:52pm

Over in bpo-45155 and this PR there is a discussion about adding default arguments to int.to_bytes() and int.from_bytes() in order to make the former easier to use to create single byte bytes objects:

>>> (65).to_bytes()
b'A'

The shedding centers around what the default value should be for the byteorder argument. Any choice is arbirtrary. Do you follow the example of struct module and choose “native” byte order (i.e. sys.byteorder)? Do you choose “network” (i.e. big-endian) by default? Or do you choose little-endian because it’s an itsy-bit faster?

The choice doesn’t matter for the use case that originated this topic; IOW byte order doesn’t matter when the byte length is 1.

Here’s your chance to weigh in.

What should be the int.to_bytes() default value for byteorder?

sys.byteorder
big-endian
little-endian

0 voters

storchaka · September 13, 2021, 4:56pm

Option “no default”.

barry · September 13, 2021, 5:00pm

That’s deliberately not a choice. Allowing to_bytes() to be called without arguments is a useful use case.

encukou · September 13, 2021, 5:10pm

A default for length=1 (where it doesn’t matter which one)
No default otherwise.

pxeger · September 13, 2021, 5:13pm

If there has to be a default, I vote big, because new users are likely to expect it to follow the most “logical” order for writing numbers in binary by hand (even if little-endian is typical on modern systems and has good reasons behind it, a new user who’s just learnt what binary is won’t care).

steve.dower · September 13, 2021, 5:15pm

sys.byteorder because if you know you want little or big, it’s easy to specify it explicitly (just like it’s easy to use PosixPath or WindowsPath when you know which you want, but Path is native by default).

You shouldn’t have to look up the “normal” value just to explicitly specify it.

mdickinson · September 13, 2021, 5:37pm

Anything but sys.byteorder, because the only thing worse than an unnecessary environmental dependence in what would otherwise be a simple pure function is a hidden unnecessary environmental dependence.

I also suspect that the sys.byteorder default, in addition to opening up opportunities for bugs (tested, reviewed code that it turns out only worked in testing because the machine it was tested on happened to be little-endian, and then fails when it first meets a big-endian machine) would almost always be useless in practice. If I’ve received 3 bytes from somewhere and want to decode them using int.from_bytes, the byte order of the machine I happen to be running on right now is likely completely irrelevant when it comes to determining what order was used to encode those bytes.

malemburg · September 13, 2021, 5:57pm

If the purpose is to work on the bytes with the struct module, it’s best to use the same default. Also, if the data is not meant to leave the current machine or to interface to other code running on the same machine, sys.byteorder makes most sense.

if you intend to work with data from other machines or send data elsewhere, it’s always best to be explicit, but I guess you can’t have both

tiran · September 13, 2021, 6:04pm

I’m with Mark here. If you want to introduce a default endian, than make it a fixed default. Do not make it depending on the current platform endianess. The majority of people write and test their code on little endian platforms. A platform dependent default will be a nightmare for people like me who supports big endian platforms like s390x and big endian PPC.

If you want to default to an endian, then network endian (aka big endian) makes more sense than little endian. Binary network protocols typically use network byte order.

steve.dower · September 13, 2021, 6:17pm

If the majority are on little-endian machines, and the majority are going to need big-endian, then the default of sys.byteorder will be noticed very quickly.

On the other hand, if the majority are on little-endian machines and the default is big-endian, the “obvious” fix is to specify "little" explicitly.

I would vote for no default (or “raise an error for unspecified and more than 1 byte”) over a platform-independent default.

tim.one · September 13, 2021, 6:53pm

Whatever is picked, int.from_bytes() should get the same default for byte ordering. I personally find it a minor PITA to need to keep typing "big" every time I invoke one of those . (Why “big”? Because it’s less typing than “little”.)

barry · September 13, 2021, 7:03pm

So, would you throw an exception for eg (65).to_bytes(2)?

steve.dower · September 13, 2021, 7:57pm

I would. “In the face of ambiguity, refuse the temptation to guess.”

(Unless we define that there is no ambiguity by saying “matches the platform” or “matches some arbitrary value”. My preference for the platform default isn’t based on this line of the zen - only my preference for raising the exception is.)

Jelle · September 13, 2021, 10:22pm

I would suggest avoiding the question by allowing the argument to be omitted only if the int fits in one byte (and therefore, byte order doesn’t matter). So we’d have:

(65).to_bytes() → works
(655).to_bytes(2, “little”) → works
(655).to_bytes() → OverflowError
(655).to_bytes(2) → TypeError
(65).to_bytes(2) → TypeError

cameron · September 13, 2021, 11:14pm

sys.byteorder for me, like struct (IIRC?). When I want control I will
exert control, but the default should match the machine, surely? That
way one can more easily write “native” stuff (which I don’t do very
much).

Cheers,
Cameron Simpson cs@cskk.id.au

barry · September 14, 2021, 4:01am

This suggestion makes sense to me. I suspect the function will get slower to add the extra checks, but semantically, I’m not against this proposal.

barry · September 14, 2021, 4:03am

I personally think “matches the platform” is a fine default, and mirrors the struct module. Others disagree. I’m not against raising a ValueError when length>2 and byteorder isn’t given.

barry · September 14, 2021, 4:20am

Make that a ValueError rather than a TypeError. It actually might not be too bad on performance because I think you only need to check for length != 1 when byteorder == NULL. Then the question is what to do about int.from_bytes(). If the semantics of to_bytes() is that byteorder must be given when length != 1, then there is no sensible default for byteorder in from_bytes() as @tim.one suggests. I’m probably okay with that.

storchaka · September 14, 2021, 5:33am

Some statistics:

native: 1 in th stdlib, 2 in tests
little endian: 16 in th stdlib, 22 in tests
big endian: 22 in th stdlib, 33 in tests

My preferences:

No default (status quo).
Only default for length=1. It complicates the mental model and implementation without great benefit. It is not the clearest way of converting an integer in range 0-255 to a bytes object.
“big”.
“little”. Since it matches the native endianess of common platforms it is a bug magnet.
Native. The same, but in addition it is the least useful default.

AndersMunch · September 14, 2021, 9:34am

Another vote for no default byteorder.

In the face of ambiguity…