Hello, I just found gh-139871: Add `bytearray.take_bytes([n])` to efficiently extract `by… · python/cpython@732224e · GitHub and realized so so so many of the changes I had been making on my own were being realized! This work is so so exciting!!!
I am not sure of the best venue to mention this, but I have been working for much of last year on path strings and vfs (filesystem) operations in std::{fs,path} in the rust stdlib. In particular the getdents() libc call provides much greater atomicity guarantees and has been standardized by POSIX as of 2024 Add musl and glibc bindings for getdents{,64} by cosmicexplorer · Pull Request #4522 · rust-lang/libc · GitHub and was immediately supported in musl libc.
This thread investigates the goal of zero-copy i/o, which I’m just delighted to see. There are a few avenues I’ve been investigating along these lines:
-
First of all, the
getdents()libc call introduces a very peculiar set of alignment and lifetime constraints: https://github.com/rust-lang/rust/issues/43467#issuecomment-3741642799- I tried to describe how the buffer provided to
getdents()has very specific alignment requirements, but does not know the size nor field layout of of each entry in the buffer. I also noted that the OS writing to your memory through the syscall introduces a type of ABI. - cpython will have somewhat of an easier time with this using ref counting to explicitly mark those lifetimes.
- My current attempt at this in the rust stdlib is here: Comparing rust-lang:main...cosmicexplorer:getdents-fs-read_dir · rust-lang/rust · GitHub
- This does not quite work yet, but it demonstrates one way to handle these complex lifetimes.
- I tried to describe how the buffer provided to
-
On the subject of buffers:
readlink{,at}()is sometimes done with an allocating loop, as in the rust stdlib.- If you examine the spec, you can do it without any new allocations: Muti- lated in any way.
- This logic is plugged into an allocating loop later in the file with
ops::ControlFlow.
-
SIMD operation checks in the configure script for byte scanning: Comparing python:main...cosmicexplorer:byte-set-splitting · python/cpython · GitHub
- So this was motivated particularly by optimizing pip until it became unrecognizable (Remove the experimental fast-deps feature by pradyunsg · Pull Request #11478 · pypa/pip · GitHub has some context), but as the workstream in this thread has found (and fixed), there are many many operations upon bytes that are highly pessimal.
- For pip, I was able to introduce some very wicked caching into url quoting that improved over cpython, but found that the C-level string searching in cpython was really at fault here: Comparing python:main...cosmicexplorer:byte-set-splitting · python/cpython · GitHub
- While we should probably look to hyperscan for the really complex work, we can do some simpler efforts with just a few instructions: https://0x80.pl/notesen/2018-10-18-simd-byte-lookup.html
- Also note my incredibly in-depth documentation for my re2 and hyperscan rust wrapper crates:
- See other notes I provided in this prototype branch: Comparing python:main...cosmicexplorer:byte-set-splitting · python/cpython · GitHub
I am absolutely not yet an expert on bits and bytes and simd, but I know I can make finding SIMD instructions in our configure script extremely robust. I am actually looking to do a phd thesis on parsing and text search some day and would love to help contribute to this kind of work in any way I can.
Once again: I was overjoyed to see someone else working on this and identifying how generally useful it can be for perf. I have two specific cases (url quoting and reading directory entries) where I think we can improve perf a huge amount. I also think URL parsing can be improved in this and other ways.
Please let me know if any of that would be useful to investigate further! I have *not" yet prototyped using getdents() over readdir() in cpython (but getdents is just perfect for python coroutines!!!).
Thanks!!