Add startswith and friends as methods on memoryview

I was working on an ASN.1 scanner for a SNMP app, and I’m using memoryview for speed. I needed to use startswith to test one oid against another (oid’s are keys in an SNMP database). While doing a snmp walk, you need to see if the data the server sent is within the tree that you are looking for, and this is a prefix match. It can be done on the binary representation of the oid (hence startswith), and does not require parsing the oid. (oid’s are a sequence of base-127 encoded bigints, with the high bit used to indicate end-of-number)

I really expected memoryview objects to have the startswith method, but they don’t. I think adding them would be a good idea

While I was thinking about it, I checked all the other bytes/bytearray methods to see which ones might also be useful to add to memoryview. I came up with the following rankings:

1. boolean-valued slicing shortcuts

I rank these as my #1 pick, because I already demonstrated a use for them, and they are pretty simple to implement.

I also really wish these methods existed on collections.abc.Sequence. They are suitable for mixin

  • startswith
  • endswith

2. read-only searching from collections.abc.Sequence.

Add these, and memoryview could be typed as collections.abc.Sequence. I think that would be a good thing. They can be mixed in, and that may be fine, but, folk would probably prefer an optimized c implementation (which can likely be copied from bytes)

  • count
  • index

3. memoryview-valued slicing shortcuts

I think these would be a really good match for memoryview, but I don’t have a use case. I can immagine them being useful in some binary data handling scenarios, like zero-delemited data:

  • removeprefix
  • removesuffix
  • partition
  • rpartition

4. remaining searching ops from bytes

They enhance index. If index is included, I can’t see a good argument against adding these

  • find
  • rfind
  • rindex

5. slicing shortcuts that are only really useful on text

They would work on memoryview of any type, but I can’t think of a reason anyone would usethey would really only be used if it’s datatype is bytes representing ascii text. I don’t recommend adding them

  • rsplit
  • split
  • strip
  • lstrip
  • rstrip

6. slicing shortcuts that require the datatype to be bytes

This makes zero sense on non-bytes memoryviews. Do not recommend

  • splitlines

7. character testing.

These make zero sense on non-byte memoryviews. Do not recommend

  • isalnum
  • isalpha
  • isascii
  • isdigit
  • islower
  • isspace
  • istitle
  • isupper

8. operations that copy

If you need to copy, convert to bytes. copying defeats the whole point of memoryview. Do not recommend

  • join
  • replace
  • zfill

9. operations that copy, and only make sense on text

If you need to copy, convert to bytes. copying defeats the whole point of memoryview. Do not recommend

  • decode
  • maketrans
  • translate
  • center
  • ljust
  • rjust
  • capitalize
  • expandtabs
  • lower
  • swapcase
  • title
  • upper
3 Likes

You may also be interested in this recent discussion that came up in the context of PEP 467 which proposed more types of view objects and/or better interpreter level support for zero-copy slice operations:

I had a similar need very recently.
I maintain software that reads big messages coming from sockets (about a few dozen Mo). I wanted to switch from bytes to memoryviews to avoid copying when slicing. And it worked perfectly, with a nice performance improvement :wink:
But! This is not a drop-in replacement, so I had to make some changes during that transition.
I mostly got away with it using regex, it turns out they work well on memoryviews. For example, I used re.finditer(b"'", mview) to replace some s.find(b"'") calls.
YMMV of course, as you may introduce slowdowns by using more regex.

2 Likes