Concurrent file opening on Windows (multi threads or multi processes)

Hello!

I’m a developer on dd-trace-py and I recently encountered some issues when trying to concurrently open files on Windows. I would appreciate any feedback on this, whether I did something wrong or if perhaps CPython should improve its public API for opening files on Windows (I’d be happy to help).

Problem and Context
I needed to use a simple low-level queue to exchange data between processes. multiprocessing was not an option for various reasons, so I decided to use a simple shared file. On Unix (Linux, macOS), there was no problem.
For opening a file, I used open, and for locking a file to ensure no concurrent accesses would create problems, I used fcntl.lockf. Everything worked great.

However, I also needed to support Windows for our product, so I started to use msvcrt.locking for locking files on Windows. During tests on Windows platforms, nothing was working; I was only getting “Permission Error”. This was related to the fact that, by default, files opened on Windows are not shared, meaning that any other thread or process trying to open the same file will get a permission error. After reading more documentation, Stack Exchange, and CPython source code, I finally used _winapi to open the file with the right share mode flags to allow several threads or processes to open the same file. But this is private CPython api.

def open_file(path):
   import _winapi
   mode = "r+b"
   flag = _winapi.GENERIC_READ | _winapi.GENERIC_WRITE
   fd_flag = os.O_RDWR | os.O_CREAT | os.O_BINARY | os.O_RANDOM
   SHARED_READ_WRITE = 0x7
   OPEN_ALWAYS = 4
   RANDOM_ACCESS = 0x10000000
   handle = _winapi.CreateFile(path, flag, SHARED_READ_WRITE, 0, OPEN_ALWAYS, RANDOM_ACCESS, 0)
   fd = msvcrt.open_osfhandle(handle, fd_flag | os.O_NOINHERIT)
   return open(fd, mode)

Questions/Proposals

  • Why does the public API of CPython provide the ability to lock files on Windows, but this is completely useless because the default open can’t be used to open files with the appropriate share mode enabled? I really feel like I missed something here, but I did not find any clues about it in the current CPython documentation.
  • If it’s really missing, shouldn’t we add the missing pieces to allow Windows users to open files with share mode enabled?
  • Also, the documentation of msvcrt.locking is missing an important piece of information: for the unlock to succeed, you need to be at the same position in the file as when you locked it. Otherwise, there is a risk of system failure (permission error again).

Thank you for reading to the end!

Christophe

Because its Windows specific I’d assume and not portable.

On macOS, linux, freeBSD there is nothing like these Windows shring/locking API at the open point.

I think you are doing the right thing by using msvcrt and _winapi.
I also use ctypes to access win32 API functions for special actions.
There is also GitHub - mhammond/pywin32: Python for Windows (pywin32) Extensions that I use for some stuff.

So why not add it to the os module ?

There are various third party file locking libraries on pypi like filelock · PyPI

I’d recommend just using one of those universally to minimize os specific code.

Is this claim based on MS docs as well as personal experience? Either way, you can open a doc issue separate from your code feature proposal.

I’ll do that. To be honest the MS doc is also not very explicit, but it warns the user not to unlock several ranges at once and if you think about it, the only way in the api to specify the start of the region is to move the descriptor to the start of it.
The region to unlock must correspond exactly to an existing locked region. Two adjacent regions of a file cannot be locked separately and then unlocked using a single region that spans both locked regions.

I still think it’s strange to have useless functions in the public api unless you’re using private api or third party libraries even if it’s for a specific os only.

At least, it should be specified in the documentation too that the files need to be open in a different manner to be able to use that api.

But _winapi is named with a leading underscore, indicating that it’s a private module. And indeed, it’s not documented in the library reference guide. So it’s not in the public API.

And indeed, _winapi only contains whatever Windows API functions have been needed over time to support other stdlib APIs - it’s not intended to be a complete or even consistent set of Windows API functions.

I was talking about msvcrt.locking that is useless with the current default open on windows.

Have you tried using os.open to get your file descriptor? That’s the usual way to set OS-specific flags on things. You don’t mention it, so I do wonder if it was even tried.

I did not find any way to specify the share mode using flags with open on windows. By default it seems that the shared mode flag is hardcoded to 0 in cpython for open.
I may have miss something of course.

I wouldn’t know, I don’t have Windows to test on. All I know is, that’s a common method, so it’s at least worth exploring.

Reading the os module documentation, it does not mention any specific flag for windows for the share mode of opened files.
I also read some parts of the cpython code to understand where it was possible to specify that share mode and I did not find a way to do it with open.

Apologies, I misread your comments.

Basically, from what I can determine from reading the MS documentation, locking needs a fd opened using sopen - which as you note, isn’t exposed by the stdlib (the MS open function doesn’t allow you to set the share mode). Adding sopen to the msvcrt module wouldn’t be unreasonable, but given how long it’s taken for anyone to bring this up, I guess most people are simply using _winapi or ctypes to work around this. Or using a 3rd party module like pywin32, which exposes far more of the WIndows API.

I doubt there’s justification for anything more than exposing sopen TBH. And even that would need someone to submit and champion a PR.

Ok, I’ll probably have some time to open a PR in a couple of weeks, I’ll be happy if you can review/champion/help it.

Sorry, no. I don’t personally have enough interest in or need for this to be willing to help with it. I’m much more inclined to think that the existing approaches (_winapi, ctypes or pywin32) are perfectly fine.