The below code is okay to find the header locations in a .bin file. But
I don’t want to read whole file at a time.
Ineed not. That would use a heap of memory and I/O.
Because the header has different byte format. If I read the whole as a
uint16, I will not be able to get those bytes(some are uint8 and
uint32). What is the solution so that I can read the file to find the
header location and then I can assign the datatype for the header for
the information. Basically discard all data before the header and read
the header(2bytes-2bytes[little endian]-2bytes[big
endian]-1bytes-1byte-…) Thanks in advance!!!
The stdlib “struct” module provides methods for parsing basic binary
types like uint16 little endian and so forth from binary data. You will
still need to read the file (in pieces, not all at once!)
In terms of avoiding I/O, you can probably use the mmap module to map
the file into memory ,and use struct on those data. That avoids reading
the whole file with f.read(). The OS will read pages from the file as
needed when you access the memory.
Stepping beyond the standard library for a bit:
I’ve been reading a lot of binary data recently, and tend to want to
read it as a stream, like you would read lines from a file.
For this purpose I’ve got 2 modules on PyPI:
cs.buffer, which will take any iterable of bytes-like things and present
you pieces in the sizes you want (eg 2 bytes for a 16 bit value). It has
factories for making buffers from files (like your
open(filename,mode=‘rb’)), mmapped files, lists or bytes, etc etc. That
way you can use it on all sorts of things depending where your data are
coming from.
cs.binary, a suite of binary data structure parsing classes which are
crafted to operate on a buffer from cs.buffer.
This includes a bunch of classes like UInt16LE to read a 2-byte little
endian value from a buffer; these actually use the struct module
internally and for structures with multiple fields they return a
namedtuple instead of a plain tuple like struct does.
But you can easier make you own classes for whatever structure you need
to parse. There’s a BinaryMultiStruct factory which takes a struct
format string and list of matching field names, and returns you a class
for parsing that struct, which hands you namedtuples.
They all have common methods like parse (gives you a class instances
from a buffer), parse_value (for single value things like a 16-bit
number, gives you the value), scan which yields instances as an
iterator, eg:
for obj in MyStruct.scan(bfr):
... do stuff with obj, which has been parsed from the buffer ...
Your use case wouldlook something like this:
from cs.buffer import CornuCopyBuffer
from cs.binary import UInt16LE
with open(filename, mode='rb') as f:
bfr = CornuCopyBuffer.from_file(f)
v16 = UInt16LE.parse_value(bfr)
... do whatever you need to parse the .bin file ...
Cheers,
Cameron Simpson cs@cskk.id.au