Simulating on-disk files

Stefan2 · October 14, 2024, 3:23pm

What times did you get?

elis.byberi · October 14, 2024, 3:36pm

It depends on the computer, but here are the times on a single old machine for different file sizes:

1.6 MB:

time_no_buffer  : 0.4188990592956543
time_with_buffer: 0.1870136260986328

160 MB:

time_no_buffer  : 6.558740615844727
time_with_buffer: 3.561988353729248

Stefan2 · October 14, 2024, 3:43pm

Tried it myself now, I consistently get times like this, the “no buffer” version being ~twice as fast:

time_no_buffer  : 0.02334427833557129
time_with_buffer: 0.0419316291809082

elis.byberi · October 14, 2024, 4:04pm

Kaggle Notebooks:

time_no_buffer  : 0.014356851577758789
time_with_buffer: 0.010879755020141602

tunedal · October 14, 2024, 4:30pm

Named pipes are available on Windows as well, although the API is different. You can call CreateNamedPipeW to create the server end of the pipe and ConnectNamedPipe to accept a connection from a client. The filename will be something like \\.\pipe\foobar.

There are wrappers for these functions hidden away in Python’s private _winapi module (maybe not recommended?) as well as various third-party wrappers, such as the win32pipe module which is part of pywin32. Or you can call the win32 API directly with ctypes.

fribl · October 14, 2024, 4:41pm

Interesting. Thanks for the information!

fribl · October 14, 2024, 4:56pm

Trying this a few times on my Windows laptop, I get a ratio (buffer/no-buffer) of about 0.5-0.55 .

mwr · October 17, 2024, 1:10pm

Your question is interesting because it can be expanded even more.
Talking about a complete emulation of the file system and the creation of file sandboxes right at the interpreter level.
Python has an implementation of event auditing, and of course a lot of file-related calls are monitored there. It seems to me that can go much further and create a full-fledged hook system when can replace the results of system calls with own.
And as for your specific question it doesn’t seem to be a problem to use temporary byte arrays through io.BytesIO (or io.StringIO for text arrays) regardless of the operating system.