I’m a developer on dd-trace-py and I recently encountered some issues when trying to concurrently open files on Windows. I would appreciate any feedback on this, whether I did something wrong or if perhaps CPython should improve its public API for opening files on Windows (I’d be happy to help).
Problem and Context
I needed to use a simple low-level queue to exchange data between processes. multiprocessing was not an option for various reasons, so I decided to use a simple shared file. On Unix (Linux, macOS), there was no problem.
For opening a file, I used open, and for locking a file to ensure no concurrent accesses would create problems, I used fcntl.lockf. Everything worked great.
However, I also needed to support Windows for our product, so I started to use msvcrt.locking for locking files on Windows. During tests on Windows platforms, nothing was working; I was only getting “Permission Error”. This was related to the fact that, by default, files opened on Windows are not shared, meaning that any other thread or process trying to open the same file will get a permission error. After reading more documentation, Stack Exchange, and CPython source code, I finally used _winapi to open the file with the right share mode flags to allow several threads or processes to open the same file. But this is private CPython api.
Why does the public API of CPython provide the ability to lock files on Windows, but this is completely useless because the default open can’t be used to open files with the appropriate share mode enabled? I really feel like I missed something here, but I did not find any clues about it in the current CPython documentation.
If it’s really missing, shouldn’t we add the missing pieces to allow Windows users to open files with share mode enabled?
Also, the documentation of msvcrt.locking is missing an important piece of information: for the unlock to succeed, you need to be at the same position in the file as when you locked it. Otherwise, there is a risk of system failure (permission error again).
I’ll do that. To be honest the MS doc is also not very explicit, but it warns the user not to unlock several ranges at once and if you think about it, the only way in the api to specify the start of the region is to move the descriptor to the start of it. The region to unlock must correspond exactly to an existing locked region. Two adjacent regions of a file cannot be locked separately and then unlocked using a single region that spans both locked regions.
I still think it’s strange to have useless functions in the public api unless you’re using private api or third party libraries even if it’s for a specific os only.
At least, it should be specified in the documentation too that the files need to be open in a different manner to be able to use that api.
But _winapi is named with a leading underscore, indicating that it’s a private module. And indeed, it’s not documented in the library reference guide. So it’s not in the public API.
And indeed, _winapi only contains whatever Windows API functions have been needed over time to support other stdlib APIs - it’s not intended to be a complete or even consistent set of Windows API functions.
Have you tried using os.open to get your file descriptor? That’s the usual way to set OS-specific flags on things. You don’t mention it, so I do wonder if it was even tried.
I did not find any way to specify the share mode using flags with open on windows. By default it seems that the shared mode flag is hardcoded to 0 in cpython for open.
I may have miss something of course.
Reading the os module documentation, it does not mention any specific flag for windows for the share mode of opened files.
I also read some parts of the cpython code to understand where it was possible to specify that share mode and I did not find a way to do it with open.
Basically, from what I can determine from reading the MS documentation, locking needs a fd opened using sopen - which as you note, isn’t exposed by the stdlib (the MS open function doesn’t allow you to set the share mode). Adding sopen to the msvcrt module wouldn’t be unreasonable, but given how long it’s taken for anyone to bring this up, I guess most people are simply using _winapi or ctypes to work around this. Or using a 3rd party module like pywin32, which exposes far more of the WIndows API.
I doubt there’s justification for anything more than exposing sopen TBH. And even that would need someone to submit and champion a PR.
Sorry, no. I don’t personally have enough interest in or need for this to be willing to help with it. I’m much more inclined to think that the existing approaches (_winapi, ctypes or pywin32) are perfectly fine.