`calloc` and `malloc` in `array` module

page200 · September 2, 2022, 8:02pm

The array module claims to be efficient. So it should offer ~~wrappers for~~ something similar to calloc and malloc. Otherwise initializing an array is not optimally efficient.

EpicWink · September 2, 2022, 10:35pm

I see arrays as typed lists, not statically-sized. If you want programmer-controlled memory management, use a ctypes array (ctypes.c_uint * 20000), or NumPy ndarray

da-woods · September 3, 2022, 10:53am

I don’t think it makes sense to directly add calloc and malloc - they’re return raw memory and Python wouldn’t know what to do with it.

I do think the array module is missing a way to construct an array that isn’t initialized from a list. Essentially equivalent to np.zeros or np.empty. That’s seemed like a genuinely useful missing feature in the past.

storchaka · September 3, 2022, 1:33pm

array('I', [0]) * 10000

da-woods · September 3, 2022, 2:05pm

That look like what I want. I wouldn’t have found that by myself though.

page200 · September 3, 2022, 4:03pm

I don’t mean statically sized. I mean that initialization of arrays currently is bad.

Exactly.

For typecodes larger than 1 byte, such as the 'I' you use here, this reads that initial value 0 from memory 10000 times in order to copy it 10000 times. That’s unnecessary overhead, especially if the desired array size is much larger than the 10000 in your example.

barry-scott · September 3, 2022, 5:44pm

Are you commenting on the C implementation of array from code inspection or from benchmarking?

Do you have a PoC that you can benchmark to show the problem?

page200 · September 3, 2022, 6:41pm

Code inspection. Here memset is used only for very small data types, whereas data types like int16 invoke memcpy:

github.com

python/cpython/blob/v3.10.6/Modules/arraymodule.c#L928


      
          size = Py_SIZE(a) * n;
          np = (arrayobject *) newarrayobject(state->ArrayType, size, a->ob_descr);
          if (np == NULL)
              return NULL;
          if (size == 0)
              return (PyObject *)np;
          oldbytes = Py_SIZE(a) * a->ob_descr->itemsize;
          newbytes = oldbytes * n;
          /* this follows the code in unicode_repeat */
          if (oldbytes == 1) {
              memset(np->ob_item, a->ob_item[0], newbytes);
          } else {
              Py_ssize_t done = oldbytes;
              memcpy(np->ob_item, a->ob_item, oldbytes);
              while (done < newbytes) {
                  Py_ssize_t ncopy = (done <= newbytes-done) ? done : newbytes-done;
                  memcpy(np->ob_item+done, np->ob_item, ncopy);
                  done += ncopy;
              }
          }
          return (PyObject *)np;

In newer versions of CPython, this is refactored, but not changed.

I guess that something like array('Q', [0]) * N with a sufficiently large N is slower than calloc. Moreover, something like malloc, i.e. just initializing a large array with unknown values (so that filling it iteratively later has no overhead, as opposed to growing the array), is not available at all.

barry-scott · September 3, 2022, 9:30pm

Calloc returns a block of memory that has been zeroed out.
Now if you want 0 in it that’s great.

But if you want int(4173) in 32 bit int then it’s a waste of time having calloc zero the memory only for the array code to overwrite it.

I am not seeing how calloc helps.

page200 · September 4, 2022, 10:10am

This is for cases where the later code iteratively fills the array (which is very common). If the array is large at initialization (as I propose, consisting of zeros or of whatever used to be in that part of memory), writing values into the array is fast.

On the other hand, if the array isn’t large at initialization, then the array needs to be grown iteratively as the later code writes values into it iteratively. Dynamically increasing the size of the array can have considerable overhead.

barry-scott · September 4, 2022, 1:51pm

You said " Moreover, something like malloc, i.e. just initializing a large array with unknown values".

This not what malloc does. It does not write any values into the memory it allocates at all.
Given that malloc can return memory that was previously used and freed it is not safe to assume what is in the memory.

Using calloc will write zeros into the memory that is returned for the cases where that is
important.

But this code does not need that zeroing as it makes sure that each bytes of the returned memory is initialised.

Calloc will slow down that code as it doubles the number of writes to memory by using it.
1 write of 0 and 1 write of init data.

Are you asking for a special case for init data that is all 0 for wider then a byte?
Would would check that np->ob_item is all 0 and then use a memset?

page200 · September 5, 2022, 10:42am

I know. We mean the same thing. By “initializing a large array with unknown values”, I mean that a large array variable is created, and the values are not written into it, they are whatever there was in memory. For speed. I mean “initializing” as in __init__ (create an array variable), not necessarily writing stuff into it.

I am not assuming what is in the memory. Anything can be there.

Yes, if you call calloc a “special case” of creating an array. It’s one of the most normal ways to create an array in many other libraries.

No. Instead, write a calloc-like method (for example called array.zeros, similarly to numpy.zeros) and a malloc-like method (for example called array.malloc) for the Python array module.

pf_moore · September 5, 2022, 12:01pm

-1 on having a method that exposes uninitialised memory. I’ve no particular opinion on the calloc method, other than to note that if you’re sufficiently concerned about performance that array('Q', [0]) * N is too slow, you probably want something better than the array module, such as numpy, in any case.

storchaka · September 5, 2022, 2:20pm

If array('Q', [0]) * N is much slower than it could be, and it is a bottleneck in your program, create a PR with optimization of this case. I do not promise that it will be accepted, it depends on the benefit/complexity ratio, but it is worth to try. If it does not help, try to use NumPy. If it does not help, then perhaps Python is a wrong tool for solving your problems.