More kinds of view objects in the standard library to enable zero copy slicing

Big +1 for this, especially if we can also have a writeable strview backed by e.g. a io.StringIO object. It always kind of irked me how wasteful str is for simple slicing operations (or concat in the writeable case, although StringIO is a decent solution there, but the API for a strview could be a bit more ergonomic compared to StringIO, since you’d likely want strview to be fully compatible with str, i.e. everywhere that accepts a str can also take a strview without any surprises, so things like __iadd__ could modify the contents of the strview rather than return a new str in the case of a writeable strview).

That one would have to be backed by a bytearray, and so isn’t going to benefit from any internals. You could do it as a 3rd party module today and propose it for inclusion. (And there’s no way it’s going to be compatible with str apart from implementing __str__ to realize an actual string.)

Zero-copy slicing of str and bytes would be done inside the runtime. Even if it were implemented separately first, it would get a complete rewrite when being merged in. That’s the only way to make it compatible with the actual types, unfortunately.

I wouldn’t say there’s no way, but it would certainly be a much more invasive change to the internals of the interpreter if you wanted to make it work. I’m not sure how much a runtime library would really be able to demonstrate the benefits of something like a strview, exactly because of the lack of support in the interpreter. There are a lot of methods on str that boil down to slicing in the end, such as (r)split, (r|l)strip, (r)partition remove(pre|suf)fix, that could benefit from returning views as well, but if you need to create a new str instance to then call methods on those views such as e.g. " 123 ".strip().isnumeric(), then you actually end up using more memory instead of less and the same amount of copies, since you go from str to strview only to immediately convert the view back to a str so the isnumeric call works.

So you would only really truly see benefits in consecutive slicing operations, such as e.g. {k: v for p in "1:2 2:3 4:5".split() for k, v in p.split(':', 1)} which are a little less common by comparison.

I think to fully reap the benefits from a strview there would have to be pretty broad interpreter support for it (i.e. internals generally expect to operate on a strview rather than a str, with str being a valid strview as well from the interpreter’s perspective[1]).

I might investigate the possibility of a C-Extension as a proof of concept if I find myself with ample time, although I’m not very hopeful to fully take advantage of its conceptually copy-free nature there either. But this is getting severly off-topic, so I’ll stop there.


  1. and once you have to change the internals that much, you might as well allow it to be writeable, backed by an arbitrary buffer ↩︎

Indeed, I view these as internal implementation details and would not want an explicit strview or bytesview exposed to users as a Type or API. More basic builtin types sounds complicated and confusing. It could be transparent and automatic within str and bytes implementations themselves.

The need for that has not been demonstrated to be high that I’m aware of, even though it obviously tickles many of our purist spidey-senses about theoretically “unnecessary” memory copies. Just doing a copy is a lot simpler and often faster than pointer chasing, reference tracking & releasing, and related conditional branching. Thresholds for this should vary a lot by specific microarchitecture. For existing larger data I suspect people are already manually using memoryview or doing equivalents within third party library types (numpy? pandas?).

Regardless, discussion of how this could be done and what it’d look like is better left for a separate memory performance focused thread if anyone wants to explore that. – I do believe it could be implemented but I wouldn’t be convinced that doing it is worthwhile without an implemented demo and practical Python application benchmarks running on top of that showing the automatic resource savings.

1 Like

(Aside:

FWIW, there’s a decent amount of chatter out there about .NET’s Span type for doing zero-copy slices (example, and there are massive compile time improvements since they made the compiler use it too, but I didn’t find the post about that so easily). The threshold where it starts helping is definitely going to vary a lot though, so it’s not a slam dunk. Plus I would much prefer it to be transparent, since we can do that.

It’ll be interesting to see how our memory allocators handle synchronisation changes for free-threaded builds. I think there’s a chance we’ll start seeing real-world contention there, and anything to reduce allocation count/frequency (as opposed to size) might start to matter.
)

Hmm, can the discussion about zero-copy slices be moved to a separate discussion somewhere?

2 Likes

Sure, if people want. Just lmk what specific posts should be moved, to make sure I don’t mess up :slight_smile:

Probably these ones:

https://discuss.python.org/t/pep-467-minor-api-improvements-for-binary-sequences/42001/25
https://discuss.python.org/t/pep-467-minor-api-improvements-for-binary-sequences/42001/26
https://discuss.python.org/t/pep-467-minor-api-improvements-for-binary-sequences/42001/27
https://discuss.python.org/t/pep-467-minor-api-improvements-for-binary-sequences/42001/28
https://discuss.python.org/t/pep-467-minor-api-improvements-for-binary-sequences/42001/32

1 Like

Done, thanks; @Daverball please feel feel to rename to something more specific/descriptive :slight_smile: