Asynchronous DBAPI

methane · September 3, 2019, 11:54am

Please don’t mix about connection API and connection pool API.
Enable autocommit by default at DB-API layer would not make it useless for production application, because application will use Pool API, instead of DB-API. It’s very difficult to manage raw DB-API connection from application without the Pool.

Connection pool can provide autocommit APIs and transactional APIs for applications. It may look like this.

# autocommit APIs
result = await pool.query("SELECT * FROM tbl")

# transactional APIs
async with pool.begin() as trx:
    result = await trx.query("SELECT * FROM tbl")
    await trx.execute("UPDATE tbl2 SET v=? WHERE id=?", (val, id))

On the other hand, the pool will use async DB-API. This is very rough scketch of pool implementation. This is out of scope of DB-API.

class Pool:
    ...
    async def query(self, query, args=None) -> Result:
        raw_conn = await self._get_conn()  # autocommit is enabled by default.
        try:
             cursor = raw_conn.cursor()
             await cursor.query(query, args)
             result = await self._fetch_result(cursor)
        except:
            self._close_conn(raw_conn)
            raise
        else:
             self._put_conn(raw_conn)
             return result
    ....
    async def begin(self) -> Transaction:
        return Transaction(self, await self._get_conn())
    ....

class Transaction:
    def __init__(self, pool, conn):
        self._pool = pool
        self._conn = conn

    async def query(self, query, args) -> Result: ...
    async def execute(self, query, args): ...

    async def __aenter__(self):
        await conn.begin()  # transaction is start here
        return self

    async def __aexit__(self, exc_type, exc, tb):
        if exc:
            await self.rollback()
            self._pool._close_conn(self._conn)
        else:
            await self.commit()
            self._pool._put_conn(self._conn)
        self._conn = None

    async def commit(self): ...
    async def rollback(self): ...

Additionally, note that I don’t propose enable autocommit by default. I’m proposing adding a standard way to enable autocommit by default. For example, mysqlclient has autocommit=False option. Connection pool can use autocommit=True option if this option is standardized.

I don’t have a strong opinion about which should be the default value. In async application, using DB-API directly is too difficult to human being. There should be a Pool. Async DB-API is for Pool, not for human. So the default value of this option is not important for usability.

methane · September 4, 2019, 7:04am

Pool can support it. Users don’t need to check another connection out and call new_con.kill(another_con.thread_id) manually. Pool may do it automatically instead. So there are no boilerplate code in applications.

When another connection is not required to cancel a running query, DB-API driver can support it directly.
When another connection is necessary to cancel a running query, DB-API driver may not support automatic canceling. Canceling a running query is implemented in Pool instead.

If we need to standardize it in async DB-API, these APIs should be added:

Query that the driver supports automatic canceling a query. It may be similar to the threadsafe attribute. (Options are: No, Yes, Yes (but uses another connection))
Option to enable/disable automatic cancellation. (autocancel=False)
API to cancel the query executed from another connection. (e.g. Connection.kill(token))
API to get some “token” (query token, or connection token) to cancel the query from another connection. (e.g. cursor.kill_token())

But I’m not so interested in autocancelling. We have some production applications in Go and Python and live without autocancelling.

agronholm · September 5, 2019, 8:38pm

Forgive me for being dumb, but wouldn’t simply closing the connection do the trick?

What would you have the driver do if a task that is running a query gets cancelled?

methane · September 6, 2019, 1:48am

In case of MySQL, just closing connection may not cancel the query.
Closing connection cancels the “SLEEP” query or queries waiting for locks. But it doesn’t cancel slow SELECT queries.

We don’t execute any query which can take a long time from the production API / Web server. So just closing a connection is enough. Even if we have autocancel feature, the query will be finished before sending KILL query from another connection.

Long queries are run only from batch or manual operation. We use KILL query from MySQL CLI when we need, but it is a very rare case.

agronholm · September 6, 2019, 4:32am

So what I am gathering from this is that for MySQL at least, the driver should close the connection if the enclosing task is cancelled. Got it. I take it that MySQL will not receive commands over the connection if a query is already running? If another connection wants to cancel the query via a KILL command, is the thread ID available from the object carrying said query?

smurfix · September 6, 2019, 5:32am

Forgive me for being dumb, but wouldn’t simply closing the connection do the trick?

Ordinarily yes. If you also want to abort the transaction. You might not.

Also, a connection close might not be effective until the SELECT actually starts to send some data. If you abort for other reasons than terminal buffer overrun (client connection closed, operator mistake, …) you still need a way to tell the server.

agronholm · September 6, 2019, 5:35am

Might not what? A connection being closed would abort the transaction on any backend, would it not?

methane · September 6, 2019, 5:36am

Yes, legacy MySQL protocol is simple “request & response” protocol. At protocol level, we can send next request before sending previous response (pipelining). But it is dangerous and libmysqlclient (official C client library) doesn’t support it.

Yes, but it is “thread” id, not “query” id.

If server is load balanced, it will kill an unrelated query on other server.
If the query we want to cancel is already finished and the next query is sent in the same connection, the KILL query will cancel the next query.

MariaDB extends this area but it is not portable.

agronholm · September 6, 2019, 5:43am

What I wanted to figure out here is what the driver is expected to do when receiving a cancellation exception. My current answer to that would be: cancel the running query, if any, in a shielded block if the backend supports cancelling running queries within the scope of the same connection. If not, simply close the connection. The remaining question that I have is whether the connection should be closed even if the query was successfully cancelled.

smurfix · September 6, 2019, 6:11am

You might not want to abort the transaction. Thus, you need a way to kill a query without closing the connection (which will do that implicitly, of course).

smurfix · September 6, 2019, 6:15am

Load balancing needs a way to direct additional connections to the same server anyway. Otherwise you get interesting inconsistencies when the sync lags, as it invariably will.

agronholm · September 6, 2019, 6:17am

None of this really answers the question: What should the driver do if it’s running a query and the task is cancelled? PostgreSQL might be able to cancel the query and even keep the transaction going but not all backends can do the same, and we want consistent behavior from compliant drivers.

methane · September 6, 2019, 7:01am

We have one master server and some read replicas. We send queries which are relatively slow and lag/inconsistency is acceptable to replicas. And these replicas are load-balanced via Virtual IP and there is no way to connect to dedicated server from application. (DB OPs can use real IP, but application can not use it.)

But it seems you are going to off topic. What is your point? Do you want to say everyone can use KILL query so MySQL drivers can implement safe and reliable canceling?

My point is async DB-API shouldn’t assume every driver can implement efficient and reliable automatic cancellation. I don’t against supporting automatic cancellation if the driver can, but it should not a requirement. See this comment.

malemburg · September 6, 2019, 7:56am

I must be missing something in this discussion: Why do so many of
you think that canceling a query is important ?

If you correctly structure your application, you should never have
a need for this. Cancellation should definitely not be part of
an architecture, only an escape path in rare situations to get a
system responsive again - but that can much better be done directly
in the database and outside the application using system administration
tools.

agronholm · September 6, 2019, 8:28am

So what would you have the driver do when the enclosing task is cancelled? Or do you feel that the cancellation of tasks is pointless?

aeros · September 9, 2019, 3:57am

I agree that cancelling a query within the database itself would be vastly preferable to cancelling it from a DBAPI.

I could see it being potentially useful though for running low priority queries which utilize a large amount of system resources and cancelling them if the resources are needed for something that has a high priority. Allowing for cancellation within the DBAPI would allow for this to be done more programmatically. This may not be at all optimal or particularly advisable, but it was the first somewhat useful scenario that I could think of.

However, I’m not convinced that it’s worth blocking the entire implementation of an asynchronous DBAPI standard. The other features seem like they would be far more useful and widely applicable. I think this discussion might be fixating a bit too much on cancellation.

I don’t think it’s entirely pointless, but it doesn’t seem to be as worthwhile to implement as some of the other features.

njs · September 9, 2019, 5:13am

I think the issue is that any async code can be cancelled, because that’s how async libraries are written. So an Async DBAPI needs to pick some kind of strategy for addressing that, whether it’s by declaring that DB operations should refuse to be cancelled (which can create complications with control-C handling), declaring that DB operations should support cancellation, making it configurable, making it implementation dependent, … you have to pick something.

agronholm · September 9, 2019, 5:49am

Yes, this is what I was trying to say all along. If the task is cancelled, the driver just cannot keep going with the query, regardless of whether it will still keep going on the server, so something must be done. A brief shielded block that issues a cancel command to the server would be acceptable, and closing the connection would be too.

malemburg · September 9, 2019, 10:08am

The standard approach to cancel a transaction in an unexpected
situation is to close the connection. The server will detect this
and then abort the transaction, since transactions are tied to
connections.

agronholm · September 9, 2019, 10:13am

The server may not notice this until it tries to send the query results to the client. Thus it would be prudent to cancel the running query earlier if that is technically possible. Agreed?