Thanks for the insights, @mjpieters! I have been implementing various custom multi-dimensional slicing methods for some time now, though I have never thought to modify the object typeâs own __getitem__
attribute through a class , so thank you for this. Iâm pretty new to programming, and your example really cemented some concepts for me. I built the below code to play around with:
# needed imports
from collections import deque
from itertools import islice
# imports to illustrate my point
from random import random
from timeit import timeit
class CustomDeque(deque):
def __init__(self, *args, **kwargs):
"""
My deque class initializes from the standard class to get all
the sweet optimization and attributes of a standard deque
"""
super(CustomDeque, self).__init__(args[0])
def __getitem__(self, idx):
"""
Define my own __getitem__ attribute to accept slices:
if slice is really just an index, default to standard __getitem__
if slice is "single dimensional", nuild a returnable deque
by passing slicing parameters into islice
if slice is "multi-dimensional" the "for" through the
sliced deque to get only the column of interest and return
"""
if isinstance(idx, int): # only need single index, revert to stdlib
return super(CustomDeque, self).__getitem__(idx)
elif isinstance(idx, slice): # use islice and return as deque
return deque(islice(self, idx.start, idx.stop, idx.step))
elif isinstance(idx, tuple): # there's more than just a slice
return deque([row[idx[1]] \
for row \
in islice(self,
idx[0].start,
idx[0].stop,
idx[0].step)])
custom_stack = CustomDeque([], maxlen=20) # create a custom deque
standard_stack = deque([], maxlen=20) # create a standard deque
for i in range(100):
# comb over random data and FILO identical data to each deque
row = [i, random(), random(), random()]
custom_stack.append(row)
standard_stack.append(row)
test_cnt = 1000000
# print average execution times for various slicing operations
print('\n')
print('standard index:',
timeit("standard_stack[6]",
globals=globals(),
number=test_cnt) / test_cnt)
print('custom index:',
timeit("custom_stack[6]",
globals=globals(),
number=test_cnt) / test_cnt)
print('\n')
print('standard slice:',
timeit("deque(islice(standard_stack, 3, 8))",
globals=globals(),
number=test_cnt) / test_cnt)
print('custom slice:',
timeit("custom_stack[3:8]",
globals=globals(),
number=test_cnt) / test_cnt)
print('\n')
print('standard 2D slice:',
timeit("deque([row[2] for row in islice(standard_stack, 3, 8)])",
globals=globals(),
number=test_cnt) / test_cnt)
print('custom 2D slice:',
timeit("custom_stack[3:8, 2]",
globals=globals(),
number=test_cnt) / test_cnt)
This outputs the following timing results:
standard index: 3.208229999290779e-08
custom index: 3.5772309999447314e-07
standard slice: 2.502960000419989e-07
custom slice: 6.012820000178181e-07
standard 2D slice: 5.054112999932841e-07
custom 2D slice: 1.1654833999928087e-06
My custom deque implementation is an order of magnitude slower than if I pepper my code with a bunch of deque([row[idx] for row in islice(my_deque, start, stop)])
statements, which would quickly degrade readability and maintainability. Which brings me back to my concern, I have always had a nagging suspicion that I am deploying a sub-optimal implementation. The benefit of putting something in the standard library is consensus and significant peer review. I can then implement my slicing and rest assured that it is probably the best possible way of doing it with the present language, all the while producing readable and maintainable code.
For context, I am integrating Python with data acquisition, and then making automated decisions based on data analysis (e.g. data smoothing and discrete calculus to find data peaks, etc). These automated decisions culminate in actuation of devices in the physical world (e.g. relays, solenoids, etc). My deques are FILOâd with the data window under analysis at any given time. The âtwo dimensionsâ comes from multiple data acquisition channels. Iâd like for all this to happen as close to real time as reasonably achievable. I am also fairly new to programming, which is why I would like to leave the optimization to all the smart and talented folks in core development (I aspire to be one of these folks one day).
I think I am caught in a trade-off with lists or deques. If I use lists, I have more efficient random access, but then I would need to define my own pushing and popping routines. If I use deques, I have less efficient random access, but I have optimized my queue functionally.
An obvious question would be: why donât I use a âfasterâ language? The answer is that, for various reasons, my human-machine-interface (HMI) is a common web browser. I serve the machine controls and data insights to a browser via JavaScript, HTML5, and Python flask. I have yet to find a language better suited for melding web development and back end data science under one roof (though again, I am relatively inexperience, if you know of one, let me know).