Add startswith and friends as methods on memoryview

tapple · February 12, 2024, 12:18am

I was working on an ASN.1 scanner for a SNMP app, and I’m using memoryview for speed. I needed to use startswith to test one oid against another (oid’s are keys in an SNMP database). While doing a snmp walk, you need to see if the data the server sent is within the tree that you are looking for, and this is a prefix match. It can be done on the binary representation of the oid (hence startswith), and does not require parsing the oid. (oid’s are a sequence of base-127 encoded bigints, with the high bit used to indicate end-of-number)

I really expected memoryview objects to have the startswith method, but they don’t. I think adding them would be a good idea

While I was thinking about it, I checked all the other bytes/bytearray methods to see which ones might also be useful to add to memoryview. I came up with the following rankings:

1. boolean-valued slicing shortcuts

I rank these as my #1 pick, because I already demonstrated a use for them, and they are pretty simple to implement.

I also really wish these methods existed on collections.abc.Sequence. They are suitable for mixin

startswith
endswith

2. read-only searching from collections.abc.Sequence.

Add these, and memoryview could be typed as collections.abc.Sequence. I think that would be a good thing. They can be mixed in, and that may be fine, but, folk would probably prefer an optimized c implementation (which can likely be copied from bytes)

count
index

3. memoryview-valued slicing shortcuts

I think these would be a really good match for memoryview, but I don’t have a use case. I can immagine them being useful in some binary data handling scenarios, like zero-delemited data:

removeprefix
removesuffix
partition
rpartition

4. remaining searching ops from bytes

They enhance index. If index is included, I can’t see a good argument against adding these

find
rfind
rindex

5. slicing shortcuts that are only really useful on text

They would work on memoryview of any type, but I can’t think of a reason anyone would usethey would really only be used if it’s datatype is bytes representing ascii text. I don’t recommend adding them

rsplit
split
strip
lstrip
rstrip

6. slicing shortcuts that require the datatype to be bytes

This makes zero sense on non-bytes memoryviews. Do not recommend

splitlines

7. character testing.

These make zero sense on non-byte memoryviews. Do not recommend

isalnum
isalpha
isascii
isdigit
islower
isspace
istitle
isupper

8. operations that copy

If you need to copy, convert to bytes. copying defeats the whole point of memoryview. Do not recommend

join
replace
zfill

9. operations that copy, and only make sense on text

If you need to copy, convert to bytes. copying defeats the whole point of memoryview. Do not recommend

decode
maketrans
translate
center
ljust
rjust
capitalize
expandtabs
lower
swapcase
title
upper

Daverball · February 12, 2024, 6:38am

You may also be interested in this recent discussion that came up in the context of PEP 467 which proposed more types of view objects and/or better interpreter level support for zero-copy slice operations:

alexprengere · February 12, 2024, 9:20am

I had a similar need very recently.
I maintain software that reads big messages coming from sockets (about a few dozen Mo). I wanted to switch from bytes to memoryviews to avoid copying when slicing. And it worked perfectly, with a nice performance improvement
But! This is not a drop-in replacement, so I had to make some changes during that transition.
I mostly got away with it using regex, it turns out they work well on memoryviews. For example, I used re.finditer(b"'", mview) to replace some s.find(b"'") calls.
YMMV of course, as you may introduce slowdowns by using more regex.

Topic		Replies	Views
Add search methods from bytes/bytearray objects to io.BytesIO Ideas	10	2501	November 12, 2022
Functionality for working with different strides `memoryview`s Ideas	1	480	May 13, 2022
More kinds of view objects in the standard library to enable zero copy slicing Ideas	8	340	February 9, 2024
Buffer protocol and arbitrary (data) types Ideas	9	1438	July 18, 2023
Practical applications of string-like bytes methods? Python Help	13	644	July 26, 2023

Add startswith and friends as methods on memoryview

1. boolean-valued slicing shortcuts

2. read-only searching from collections.abc.Sequence.

3. memoryview-valued slicing shortcuts

4. remaining searching ops from bytes

5. slicing shortcuts that are only really useful on text

6. slicing shortcuts that require the datatype to be bytes

7. character testing.

8. operations that copy

9. operations that copy, and only make sense on text

Related Topics