What should default machine parameters be in docs?

smontanaro · November 4, 2022, 8:17pm

Maybe this belongs in the documentation forum. I don’t know. Kick it over there if that’s the case.

Proofreading the struct doc, I see this:

Note: All examples assume a native byte order, size, and alignment with a big-endian machine.

I try out the first example on my shiny new MacBook Pro:

Python 3.12.0a1+ (heads/main:f09da28768, Nov  4 2022, 15:02:48) [Clang 14.0.0 (clang-1400.0.29.102)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from struct import *
>>> pack('hhl', 1, 2, 3)
b'\x01\x00\x02\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00'
>>> 
>>> unpack('hhl', b'\x00\x01\x00\x02\x00\x00\x00\x03')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
struct.error: unpack requires a buffer of 16 bytes
>>> 
>>> calcsize('hhl')
16
>>> sys.byteorder
'little'

Hmmm, okay. The M1 must be different than my old Dell. Go over there and try. Same thing.

Now, this has me scratching my head. Where it matters, shouldn’t the default byte order in doc examples mimic the architecture with the largest installed base? Git credits the above note to Mark Dickinson in 2010. I wonder what he was using to generate the examples…

pitrou · November 4, 2022, 8:26pm

cc @mdickinson

gpshead · November 4, 2022, 9:01pm

The docs should always use an explicit byte order and size specifier in struct examples so that they work regardless of platform. (other than bits demonstrating the ability to go wild and use a platform specific value)

eryksun · November 4, 2022, 9:12pm

On Windows, regardless of 32-bit or 64-bit, Intel or ARM architecture, for this “hhl” example, the size is 8 bytes, and the byte order is little-endian. On Linux they can vary. As Gregory suggested, the examples should use standard sizes and explicit byte order, rather than native.

smontanaro · November 4, 2022, 9:23pm

Works for me. Whatever the ultimate decision is, I think the simple case of copying the example out of the docs (which Sphinx makes drop dead easy) should have the highest probability of “just working.”

cameron · November 4, 2022, 9:52pm

Maybe this belongs in the documentation forum. I don’t know. Kick it
over there if that’s the case.

Proofreading the struct doc, I see this:

Note: All examples assume a native byte order, size, and alignment with a big-endian machine.

Right. So your machine may well be different. Notice the word
“alignment” above.

I try out the first example on my shiny new MacBook Pro:

Python 3.12.0a1+ (heads/main:f09da28768, Nov  4 2022, 15:02:48) [Clang 14.0.0 (clang-1400.0.29.102)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from struct import *
>>> pack('hhl', 1, 2, 3)
b'\x01\x00\x02\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00'

Looks little endian. And… padded in a native form. I would guess
you’re seeing a C struct with implied internal padding (alignment):

little endian 2 byte short
little endian 2 byte short
4 pad bytes to an 8 byte boundary
little endian 4 or 8 byte value (can’t tell, will need a bigger value)

The docs do not say that “native” means pack things in without padding.

Personally, I had not considered this scenario.

How does it behave with a < or > prefix?

Given the above, this:

 >>> unpack('hhl', b'\x00\x01\x00\x02\x00\x00\x00\x03')
 Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
 struct.error: unpack requires a buffer of 16 bytes

behaves reasonably.

Hmmm, okay. The M1 must be different than my old Dell. Go over there
and try. Same thing.

Here’s my Intel mac:

 Python 3.9.13 (main, Aug 11 2022, 14:01:42)
 [Clang 12.0.0 (clang-1200.0.32.29)] on darwin
 Type "help", "copyright", "credits" or "license" for more 
 information.
 >>> from struct import *
 >>> pack('hhl', 1, 2, 3)
 b'\x01\x00\x02\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00'
 >>> pack('<hhl', 1, 2, 3)
 b'\x01\x00\x02\x00\x03\x00\x00\x00'
 >>> pack('>hhl', 1, 2, 3)
 b'\x00\x01\x00\x02\x00\x00\x00\x03'

and a Linux on some kind of Celeron:

 Python 3.9.5 (default, Nov 18 2021, 16:00:48)
 [GCC 10.3.0] on linux
 Type "help", "copyright", "credits" or "license" for more 
 information.
 >>> from struct import *
 >>> pack('hhl', 1, 2, 3)
 b'\x01\x00\x02\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00'
 >>> pack('<hhl', 1, 2, 3)
 b'\x01\x00\x02\x00\x03\x00\x00\x00'
 >>> pack('>hhl', 1, 2, 3)
 b'\x00\x01\x00\x02\x00\x00\x00\x03'

I would say this means that “native order” is what a C struct would be
padded to by the C compiler. While a big or little endian prefix causes
unpadded layout.

Now, this has me scratching my head. Where it matters, shouldn’t the
default byte order in doc examples mimic the archicture with the
largest installed base? Git credits the above note to Mark Dickinson in
2010. I wonder what he was using to generate the examples…

A less agressive compiler, or a machine architecture where the memory
bus wasn’t 64 bits wide? Padding exists in structs essentially to
maximise memory I/O efficiency, and it the compiler’s aware of the
details it may well do this to allow high performance “get from memory”
instructions.

Summary:

I agree that this is unclear in the docs. Iw ould be hard to write
something concise and not misleading about whatever “native” padding is
involved, but it would be Very Good to make clear that the “native” form
is not necessarily (or, these days, even probably) the same os one of
the < or > prefixed forms.

Now I need to go and revisit this function:

github.com

cameron-simpson/css/blob/707bfbe2ad22c802c8754bb7c2248b22175d73ce/lib/python/cs/timeseries.py#L224


      
            def default_fill(self):
              ''' The default fill for the type code.
              '''
              if self == 'd':
                return nan
              if self == 'q':
                return 0
              raise RuntimeError('no default fill value for %r' % (self,))
          
          @typechecked
          def deduce_type_bigendianness(typecode: str) -> bool:
            ''' Deduce the native endianness for `typecode`,
                an array/struct typecode character.
            '''
            test_value = TypeCode(typecode).type(1)
            bs_a = array(typecode, (test_value,)).tobytes()
            bs_s_be = pack('>' + typecode, test_value)
            bs_s_le = pack('<' + typecode, test_value)
            if bs_a == bs_s_be:
              return True
            if bs_a == bs_s_le:

to see if it is correct. (I think it is, because it deals in single
value pack formats.)

Cheers,
Cameron Simpson cs@cskk.id.au

smontanaro · November 4, 2022, 10:04pm

@cameron I don’t dispute anything you wrote (it’s all correct, I’m sure), however… I don’t think most people naively trying the first example should get errors on the platform (Intel x86) which (like it or not) is what most desktop users use.

The struct module probably isn’t for people who just finished the Python tutorial. Still, the very first example in the module docs really ought to just work. People shouldn’t wonder what they might have done wrong, or dive into debug mode because they got output quite different than the example showed.

cameron · November 4, 2022, 10:26pm

Replying to @smontanaro and @eryksun …

@cameron I don’t dispute anything you wrote (it’s all correct, I’m
sure), however… I don’t think most people naively trying the first
example should get errors on the platform (Intel x86) which (like it or
not) is what most desktop users use.

The struct module probably isn’t for people who just finished the
Python tutorial. Still, the very first example in the module docs
really ought to just work. People shouldn’t wonder what they might have
done wrong, or dive into debug mode because they got output quite
different than the example showed.

Maybe the first example should just work.

However, I disagree with you and Eryk in this regard: an example with
native byte order (no < or >) cannot work “as is” on all
platforms. And further, having it “just work” on the commonest
platform is actively misleading. I am AGAINST that.

I think the “just works” examples should all use < or >.

I think there needs to be at least one “native” example, and it should
be prefaced clearly that this may well not work identically on a
user’s machine because it is machine type (and compiler type) dependent.

And then it should be presented, with commentary.

I’d even advocate presenting the existing hhl example, with
contradicting example outputs from different platforms. So:

keep the existing output, and explain the source platform and its unpadded behaviour
add a current example (yours or any of mine) and explain its padding behaviour

Cheers,
Cameron Simpson cs@cskk.id.au

smontanaro · November 5, 2022, 7:35pm

I don’t think I implied or said that the first example as written should just work. I think the first example (however it’s written) should just work. I should have stated explicitly that it would need modification to work on all platforms.

smontanaro · November 5, 2022, 8:06pm

To that end:

cameron · November 5, 2022, 8:48pm

Point taken.

I do think the first “native” example should be unlikely to work
identically on a modern machine. I’d really like it overtly in people’s
face that "native isn’t just big or little endianness, but also includes
alignment/padding.

Cheers,
Cameron Simpson cs@cskk.id.au

smontanaro · November 5, 2022, 9:31pm

Maybe all that needs doing is to explicitly warn the reader that examples without tight control on the structure layout aren’t going to work everywhere.

smontanaro · November 5, 2022, 9:38pm

GH chided me for opening a doc PR without an issue, so I opened one.

Does discussion typically proceed on the issue or the PR? (Or doesn’t it matter?)

guido · November 5, 2022, 10:25pm

On the issue.

storchaka · November 6, 2022, 6:05am

I concur with Cameron. It is a common error to use “native” format with little-endian data. All works on platform used by the author of the code, and then fail on unusual platforms, with different endianess or padding. So it is better if examples use unusual platform. Or maybe explicitly show that the result can be different on different platforms.

mdickinson · November 6, 2022, 9:10am

It was an iBook G4, IIRC. Not that exotic a machine at the time. But the choice to use big-endian here predates me: cpython/Doc/lib/libstruct.tex at 5fdeeeae2a12b9956cc84d62eae82f72cabc8664 · python/cpython · GitHub

Topic		Replies	Views
A missing closing (round) bracket in unpack example Documentation	2	452	February 22, 2023
ctypes.Structure bitfield packing for 64bit field Python Help	11	482	April 15, 2024
Request for guidance on gh-95532 Core Development	0	482	August 10, 2022
The pprint module should use the terminal width, where available Ideas	5	1041	June 12, 2023
New default/preferred dbm backend Core Development stdlib	25	1067	February 14, 2024

What should default machine parameters be in docs?

Related Topics