In my current project I need to parse binary data from different types of machines. Some of them may be very specific and use non standard endianness such as middle endian.
If we have bytes A B C D then we have following orderings:
- Big endian: ABCD
- Little endian: DCBA
- Mid Big endian: BADC
- Mid Little endian: CDAB
We have some uint_32 encoded as bytes 0x19AA41AF.
- In big-endian it would be intepreted as 430588335. For this we convert 19 AA 41 AF as uint_32 and that’s it.
- For little-endian case we need first to flip bytes, get AF 41 AA 19, and then interprete them as uint_32. The result will be 2940316185.
In case of mid-endian we need to split our bytes on 2 halves 19AA 41AF, swap them and only then use rules of little or big endian converts.
- For mid-big-endian we swap bytes in halves. Get AA19AF41 and then convert.
- For mid-little-endian we swap halves.
Get 41AF19AA. Results are 2853809985 and 1101994410 respectively.
To add “native” support for such formats in struct module two new format specifier can be added. For example ^ and $ for mid-big and mid-little formats.
I’m starting to work on C extension for my project and can implement this feature in struct.
I appreciate that you need this but it seems too obscure to add support to the struct module. Andi the proposed characters seem very non-mnemonic.
I think you’re better off just taking the 16-bit halves and combining them yousroin a small wrapper.
I’m curious what computers have implemented mid-endian?
Would they be very old 16-bit CPUs by any chance? Or is this custom silicon?
Bytes in a word in one order and the words in memory in the other?
The only place where I’ve seen this is the modbus transport protocol: IIRC the “native” modbus register size in 16 bits wide and multiple consecutive registers are aggregated to create wider values, however there is no protocol or standard implementation for this and funky things are found in the wild, probably as result of some implementations not paying too much attention to endianes (when you code server and client at the same time, you can get away with getting endianes wrong, as long as you are consistent with your mistakes).
I agree with the implicit suggestions of Guido and Barry, that struct should handle 16-bit words and higher software handle word pairs. modbus is, multiple sites say, widely used to commucate between electronic devices. A PyPI modbus search returns 193 projects, including one intended for generic use on micropython. I suggest looking at how any of these projects have dealt with the ‘middle-endian’ issue.
Facing a same problem regarding the module structure but now hopefully it will be resolved
Thank you all for your suggestions. As for now we decided to stick to flipping necessary bytes and then parsing it by struct module.
As Terry said, I’ve looked at some libs for parsing modbus. It seems that those libraries, that i’ve seen doesn’t implement this feature. I had hopes for Construct project, but didn’t find any mentioning of middle endian support there either. I’m planning to suggest my help on this to them, as it is suitable place for this.
I can’t really tell you much because of NDA. It is some old industrial hardware. And as others supposed we faced some modbus data
PyModbus and MinimalModbus are among the most used modbus libraries and both have support for funny endianes, see pymodbus package — PyModbus 3.1.0 documentation and API for MinimalModbus — MinimalModbus 2.0.1 documentation for example.