Python supports integers in unlimited range (if memory is enough), C has several types of integers with limited ranges. There are several ways to convert Python integer to C integer and back:
- Dedicated C API functions like
PyLong_AsLong()
andPyLong_FromLong()
. PyArg_Parse()
with corresponding format unit like'l'
.Py_BuildValue()
with a similar format unit.PyMemberDef
with corresponding type likePy_T_LONG
.
These sets are not equivalent, especially for unsigned integers.
-
Most of C API functions except
PyNumber_AsSsize_t()
has thePyLong_
prefix. There is usually three variants for conversion to the C integer:PyLong_AsLong()
converts integers in rangeLONG_MIN
toLONG_MAX
to signedlong
.PyLong_AsUnsignedLong()
converts integers in range 0 toULONG_MAX
tousigned long
.PyLong_AsUnsignedLongMask()
accepts arbitrary integers and convert them tousigned long
moduleULONG_MAX+1
.
-
PyArg_Parse()
has variants of format units for signed and unsigned types. For example,'l'
works likePyLong_AsLong()
and'k'
works likePyLong_AsUnsignedLongMask()
. There is no variant forPyLong_AsUnsignedLong()
, the only way to convert tounsigned long
with range check is to use a custom converter. -
PyMemberDef
API also has variants for signed and unsigned types.Py_T_LONG
is equivalent toPyLong_AsLong()
, butPy_T_ULONG
which converts tounsigned long
is more tricky. It accepts Python integers in rangeLONG_MIN
toULONG_MAX
. It is larger than the range ofunsigned long
, so it converts negative integers in rangeLONG_MIN
to -1 moduloULONG_MAX+1
.
Why there is so strange API for unsigned types? I think there are several reasons:
- In is not clear whether some types like
uid_t
ordev_t
are implemented as signed or unsigned types (it varies between OSes). - Even if some types are unsigned and supports values larger than maximal limit for corresponding type (like
uid_t
ordev_t
on some OSes), some negative values can still be used as special signs for unknown or unavaliable value, so you can see(uid_t)-1
or(size_t)-1
in the C code. It is better to accept Python integer -1 as a special value than require to use 4294967295 or 18446744073709551615.
There are also differences in supporting int-like objects with __index__()
method, but this is a different painful issue.
Due to to differences between these three sets, it is diffucult to write a code that supports the same range as argument as a value for attribute setters. It is difficult to change the code from using PyArg_Parse()
to manual parsing with the C API and vica verse. How can we unify these APIs? API like PyLong_AsUnsignedLongMask()
is the most lenient, but it allows integer overflow errors. Should we limit its range as in Py_T_ULONG
? Or maybe limit it even more, allowing only -1 as negative value? There is a specialized private C API like _Py_Uid_Converter()
which only accepts -1 as negative value. In some cases any negative value is invalid (when we specify a length etc)and all positive values that fits the target type are valid, so there is a value of more strict PyLong_AsUnsignedLong()
. Should we add corresponding strict codes in PyArg_Parse()
and PyMemberDef
?
I am going to add wrappers for some C structs, and need support of types like uint32_t and off_t for this, so I need to resolve these questions for older types before adding support for new types.