Blockquote
import aiofile as a
await a.print(f"example")
async with a.open(“myfile”) as f:
await f.readline()…
aoifiles has an api for asynchronous io to files, but the implementation is using synchronous io in a threadpool, and really doesn’t take much advantage of asyncio.
Well win32 has had asynchronous IO (overlapped IO) since about 1992 , that might be the place to start. Using the technique of aiofiles for operating systems lacking asynchronous io.
It really does seem a shame not to leverage asynchronous io on operating systems that have it.
It’s best if programming languages remain OS agnostic and that most functionality is abstracted, except for OS specific functionality. Take the dbm module as example.
Users of Unix-based systems might expect and know of the Unix-dbm functionality and it is known that there isn’t a Windows alternative. I haven’t seen the dbm module used as much in the wild or in modern projects, but it is there for backwards compatibility and Unix systems.
Asyn IO for files is however a functionality to be expected by users of any OS, for obvious reasons. So implementing it only for Windows we would need to implement something for when the code runs on Unix. What do you do? Ignore the expectation of asyn and just provide synchronise behaviour, thus not reacting as expected, or provide and alternative solution to the OS implementation?
For this reason, I think it would be safer to have an additional module, possibly third-party, for Windows async file operations, until the problem is solved on Linux. Then it could safely be implemented in the core Python code.
It is evolving rapidly, which makes implementing it in the Python stdlib challenging. This rapid evolution also increases the number of 0-day security vulnerabilities.
The consensus is to implement it as a third-party library on PyPI.
It would be easy enough on linux to offer an asynchronous api on top of synchronous io (like aoifiles). So at least the interface would be ready, we could code away, and if a native linux asynchronous api became usable the implementation could be more efficient.
Add to IOBase these functions or coroutine functions: async_readable , async_readline , async_readlines , ‘async_tell’, async_writable , and async_writelines.
Add to RaIOBase:
‘async_readino’, ‘async_write’
Add to BufferedIOBase: ‘async_read, async_read1, async_write’
Add to TextIOBase: ‘async_read’, ‘async_readline’, ‘async_write’
In the base classes, the stub for for the blocking methods would be to await for the coroutine method. All the concrete subclasses in the stdlib presumably replace the stub methods anyway.
Any future file like objects that are asynchronous would be able to be used with code expecting synchronous file like objects, provided there is an event loop.
We have an API to develop asynchronous file like objects the would be pretty standard. Now as suggested asynchronous file io can be implemented outside of the standard lib.
A little bit of refactoring in io would allow us to use the existing code for buffering etc. to develop asynchronous file io using features available in the operating system such as built in asynchronous io, or just another thread/thread pool for blocking io (like aiofiles).
None of them. Let’s just get some standard interfaces in io and some reusable code, so it becomes dead easy to implement asynchronous versions of the various implementations of io, and so we can pass them around as file like objects.
These coroutines can be implemented within the asyncio module, where both high-level and low-level API designs are already defined.
However, these coroutines should yield better performance numbers than the io module. This will justify their presence in the asyncio package. In other words, these should not be more blocking than the blocking approach.
For example, aiofiles is consistently slower than io on my computer using an SSD drive. See benchmark:
Benchmark
import asyncio
import time
import timeit
import aiofiles
async def aio():
t = timeit.default_timer()
async with aiofiles.open('aio.test', mode='wb') as f:
await f.write(b'0' * 1024 * 1024 * 1024)
t = timeit.default_timer() - t
print('aio: ', t)
t = timeit.default_timer()
async with aiofiles.open('aio.test', mode='wb') as f:
for i in range(1024):
await f.write(b'0' * 1024 * 1024)
t = timeit.default_timer() - t
print('aio small chunks: ', t)
def io():
t = timeit.default_timer()
with open('io.test', mode='wb') as f:
f.write(b'0' * 1024 * 1024 * 1024)
t = timeit.default_timer() - t
print('io: ', t)
t = timeit.default_timer()
with open('io.test', mode='wb') as f:
for i in range(1024):
f.write(b'0' * 1024 * 1024)
t = timeit.default_timer() - t
print('io small chunks: ', t)
async def run():
io()
time.sleep(1.0)
await aio()
asyncio.run(run())
Given the complexity of these libraries, their maintenance burden is too high. Furthermore, the performance gain from utilizing these operating system’s async IO facilities might not be noticeable in Python, as CPython could become the bottleneck in this scenario.
aiofiles is a meaningless performance comparison as it adds a pile of overhead emulating asynchronous io with workers in thread pools.
What i am calling for is an asynchronous interface on every file like object, which if not overridden simply. alls the synchronous one. I can write my code as if asynchronous and if that code receive a file object that actually does asyncio then ther performance will be better.