In my current project I need to parse binary data from different types of machines. Some of them may be very specific and use non standard endianness such as middle endian.
Short example
If we have bytes A B C D then we have following orderings:
Big endian: ABCD
Little endian: DCBA
Mid Big endian: BADC
Mid Little endian: CDAB
Large example
We have some uint_32 encoded as bytes 0x19AA41AF.
In big-endian it would be intepreted as 430588335. For this we convert 19 AA 41 AF as uint_32 and thatās it.
For little-endian case we need first to flip bytes, get AF 41 AA 19, and then interprete them as uint_32. The result will be 2940316185.
In case of mid-endian we need to split our bytes on 2 halves 19AA 41AF, swap them and only then use rules of little or big endian converts.
For mid-big-endian we swap bytes in halves. Get AA19AF41 and then convert.
For mid-little-endian we swap halves.
Get 41AF19AA. Results are 2853809985 and 1101994410 respectively.
Proposal
To add ānativeā support for such formats in struct module two new format specifier can be added. For example ^ and $ for mid-big and mid-little formats.
Iām starting to work on C extension for my project and can implement this feature in struct.
Iām curious what computers have implemented mid-endian?
Would they be very old 16-bit CPUs by any chance? Or is this custom silicon?
Bytes in a word in one order and the words in memory in the other?
The only place where Iāve seen this is the modbus transport protocol: IIRC the ānativeā modbus register size in 16 bits wide and multiple consecutive registers are aggregated to create wider values, however there is no protocol or standard implementation for this and funky things are found in the wild, probably as result of some implementations not paying too much attention to endianes (when you code server and client at the same time, you can get away with getting endianes wrong, as long as you are consistent with your mistakes).
I agree with the implicit suggestions of Guido and Barry, that struct should handle 16-bit words and higher software handle word pairs. modbus is, multiple sites say, widely used to commucate between electronic devices. A PyPI modbus search returns 193 projects, including one intended for generic use on micropython. I suggest looking at how any of these projects have dealt with the āmiddle-endianā issue.
Thank you all for your suggestions. As for now we decided to stick to flipping necessary bytes and then parsing it by struct module.
As Terry said, Iāve looked at some libs for parsing modbus. It seems that those libraries, that iāve seen doesnāt implement this feature. I had hopes for Construct project, but didnāt find any mentioning of middle endian support there either. Iām planning to suggest my help on this to them, as it is suitable place for this.