What should be the default value for `int.to_bytes(..., byteorder=?, ...)`

Over in bpo-45155 and this PR there is a discussion about adding default arguments to int.to_bytes() and int.from_bytes() in order to make the former easier to use to create single byte bytes objects:

>>> (65).to_bytes()
b'A'

The shedding centers around what the default value should be for the byteorder argument. Any choice is arbirtrary. Do you follow the example of struct module and choose “native” byte order (i.e. sys.byteorder)? Do you choose “network” (i.e. big-endian) by default? Or do you choose little-endian because it’s an itsy-bit faster?

The choice doesn’t matter for the use case that originated this topic; IOW byte order doesn’t matter when the byte length is 1.

Here’s your chance to weigh in.

What should be the int.to_bytes() default value for byteorder?
  • sys.byteorder
  • big-endian
  • little-endian

0 voters

Option “no default”.

6 Likes

That’s deliberately not a choice. Allowing to_bytes() to be called without arguments is a useful use case.

  • A default for length=1 (where it doesn’t matter which one)
  • No default otherwise.
7 Likes

If there has to be a default, I vote big, because new users are likely to expect it to follow the most “logical” order for writing numbers in binary by hand (even if little-endian is typical on modern systems and has good reasons behind it, a new user who’s just learnt what binary is won’t care).

1 Like

sys.byteorder because if you know you want little or big, it’s easy to specify it explicitly (just like it’s easy to use PosixPath or WindowsPath when you know which you want, but Path is native by default).

You shouldn’t have to look up the “normal” value just to explicitly specify it.

3 Likes

Anything but sys.byteorder, because the only thing worse than an unnecessary environmental dependence in what would otherwise be a simple pure function is a hidden unnecessary environmental dependence.

I also suspect that the sys.byteorder default, in addition to opening up opportunities for bugs (tested, reviewed code that it turns out only worked in testing because the machine it was tested on happened to be little-endian, and then fails when it first meets a big-endian machine) would almost always be useless in practice. If I’ve received 3 bytes from somewhere and want to decode them using int.from_bytes, the byte order of the machine I happen to be running on right now is likely completely irrelevant when it comes to determining what order was used to encode those bytes.

6 Likes

If the purpose is to work on the bytes with the struct module, it’s best to use the same default. Also, if the data is not meant to leave the current machine or to interface to other code running on the same machine, sys.byteorder makes most sense.

if you intend to work with data from other machines or send data elsewhere, it’s always best to be explicit, but I guess you can’t have both :slight_smile:

2 Likes

I’m with Mark here. If you want to introduce a default endian, than make it a fixed default. Do not make it depending on the current platform endianess. The majority of people write and test their code on little endian platforms. A platform dependent default will be a nightmare for people like me who supports big endian platforms like s390x and big endian PPC.

If you want to default to an endian, then network endian (aka big endian) makes more sense than little endian. Binary network protocols typically use network byte order.

2 Likes

If the majority are on little-endian machines, and the majority are going to need big-endian, then the default of sys.byteorder will be noticed very quickly.

On the other hand, if the majority are on little-endian machines and the default is big-endian, the “obvious” fix is to specify "little" explicitly.

I would vote for no default (or “raise an error for unspecified and more than 1 byte”) over a platform-independent default.

3 Likes

Whatever is picked, int.from_bytes() should get the same default for byte ordering. I personally find it a minor PITA to need to keep typing "big" every time I invoke one of those :wink:. (Why “big”? Because it’s less typing than “little”.)

2 Likes

So, would you throw an exception for eg (65).to_bytes(2)?

1 Like

I would. “In the face of ambiguity, refuse the temptation to guess.”

(Unless we define that there is no ambiguity by saying “matches the platform” or “matches some arbitrary value”. My preference for the platform default isn’t based on this line of the zen - only my preference for raising the exception is.)

2 Likes

I would suggest avoiding the question by allowing the argument to be omitted only if the int fits in one byte (and therefore, byte order doesn’t matter). So we’d have:

(65).to_bytes() → works
(655).to_bytes(2, “little”) → works
(655).to_bytes() → OverflowError
(655).to_bytes(2) → TypeError
(65).to_bytes(2) → TypeError

4 Likes

sys.byteorder for me, like struct (IIRC?). When I want control I will
exert control, but the default should match the machine, surely? That
way one can more easily write “native” stuff (which I don’t do very
much).

Cheers,
Cameron Simpson cs@cskk.id.au

1 Like

This suggestion makes sense to me. I suspect the function will get slower to add the extra checks, but semantically, I’m not against this proposal.

I personally think “matches the platform” is a fine default, and mirrors the struct module. Others disagree. I’m not against raising a ValueError when length>2 and byteorder isn’t given.

Make that a ValueError rather than a TypeError. It actually might not be too bad on performance because I think you only need to check for length != 1 when byteorder == NULL. Then the question is what to do about int.from_bytes(). If the semantics of to_bytes() is that byteorder must be given when length != 1, then there is no sensible default for byteorder in from_bytes() as @tim.one suggests. I’m probably okay with that.

Some statistics:

  • native: 1 in th stdlib, 2 in tests
  • little endian: 16 in th stdlib, 22 in tests
  • big endian: 22 in th stdlib, 33 in tests

My preferences:

  1. No default (status quo).
  2. Only default for length=1. It complicates the mental model and implementation without great benefit. It is not the clearest way of converting an integer in range 0-255 to a bytes object.
  3. “big”.
  4. “little”. Since it matches the native endianess of common platforms it is a bug magnet.
  5. Native. The same, but in addition it is the least useful default.

Another vote for no default byteorder.

In the face of ambiguity…