Sqlite3 executemany with RETURNING clauses

erlendaasland · April 28, 2023, 8:43am

TL;DR

~~Should it be possible to return resulting rows from~~ Should sqlite3.Cursor.executemany() not discard resulting rows?

Background

In 2021, SQLite 3.35.0 added support for the RETURNING clause in DML^[1] statements. Traditionally, an SQL DML statement would not return any values. With the RETURNING clause, resulting rows are now a thing for DML statements. Example:

CREATE TABLE t(t);
INSERT INTO t VALUES(1) RETURNING t;

The DB API 2.0 (PEP-249) defines the executemany cursor method pretty vague; there are no restrictions on which types of statements it can execute; the return value is not defined. When Gerhard Häring implemented the sqlite3 extension module almost 20 years ago, executemany() was deliberately implemented to only execute DML statements, and to discard any resulting rows.

The Problem

The new RETURNING clause seems to be a very popular addition to SQLite, and for many users it is confusing that executemany("INSERT INTO t VALUES(?) RETURNING t", data) does not return the expected resulting rows.

Possible Solutions

For now, I’ve updated the docs (see gh-103939) to make it clearer that executemany discards any resulting rows, including DML statements with RETURNING clauses.

There is also the possibility of changing the semantics of executemany() to not discard resulting rows, however changing semantics includes the risk of breaking existing code; the more subtle the semantic change, the more difficult it is to discover the bug. For now, I land on the conservative side; I prefer to keep the existing behaviour.

Thoughts?

INSERT, UPDATE, and DELETE ↩︎

pf_moore · April 28, 2023, 9:04am

Currently, the connection-level executemany is documented as returning the new cursor. So changing the return value is probably unacceptable (even if only because it creates a significant difference between the connection-level and cursor-level forms). But maybe we could add a method/attribute on the cursor object that returned the results from the last executemany? It could even be an iterable, if sqlite makes iterating over the results more efficient than collecting them all into a list.

erlendaasland · April 28, 2023, 9:07am

Correct, my explanation is faulty:

All the execute*() methods return a new cursor. The problem still applies: executemany discards all resulting rows, so it is more accurate to say that a consecutive fetch*() does not return any resulting rows. If we are to change anything, the change must be in executemany so any following fetch*() will return resulting rows.

pf_moore · April 28, 2023, 9:20am

Ah, that makes more sense. Sorry I didn’t review the existing API better. I think adding the ability to fetch might be nice. But it would be hard to get the semantics right. Consider

sql = "update t set n=n+1 where name=? returning id"
c = executemany(sql, name_list)
for result in c:
    ...

the loop will give id values for every updated row, but there’s no obvious way to see which name in the name_list a given result is associated with. Of course the user can work around this (for example, in this case adding name to the returned expression) but it could still be clumsy.

I’m in favour of returning the data, but it’ll take care to get the details right.

erlendaasland · April 28, 2023, 9:24am

Definitely, it will be hard to get this right. Especially since I already messed up the semantics a few years ago^[1] by making fetch*() lazy (earlier the first resulting row was prefetched). IIRC, that was in 3.10. If we are to try and implement this, we’d need to undo that change first. It is definitely not a straight-forward solution.

yep, I’m more conservative now; I would’ve rejected my own change today ↩︎

malemburg · April 28, 2023, 9:25am

It is well possible for .executemany() to create multiple result sets. This can happen not only with e.g. INSERTs using RETURNING, but also when calling stored procedures or when running SELECTs against a set of parameters.

The standard way to handle this is by having .fetchXXX() methods return the first result set and then using .nextset() to skip to the next until you have processed all sets.

erlendaasland · April 28, 2023, 9:27am

This is a related issue; sqlite3’s executemany() only allows DML statements. If we opened up for returning resulting rows from RETURNING clauses, we would open up the door for removing the DML requirement also. IMO, that’s a less risky change.

malemburg · April 28, 2023, 9:28am

BTW: The DB-API does not define return values for the .executeXXX() methods. Implementations are free to define their own, though. Connection level .executeXXX() and .fetchXXX() methods are not defined in DB-API 2 - they were removed as part of the migration from DB-API 1 to 2. Still, implementations are free to support those, as long as they also provide the cursor based methods (if they want to be DB-API 2 compatible).

erlendaasland · April 28, 2023, 9:32am

Yep. I find them convenient though:

Python 3.12.0a7 (v3.12.0a7:b861ba4a82, Apr  4 2023, 14:31:00) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sqlite3
>>> cx = sqlite3.connect(":memory:")
>>> for row in cx.execute("select 12"): print(row)
... 
(12,)
>>> ^D

However, we’re drifting slightly off topic now

Rosuav · April 28, 2023, 9:40am

Prior art: PostgreSQL has had RETURNING clauses for some time, and psycopg2’s handling of executemany doesn’t try to return any results at all.

https://www.psycopg.org/docs/cursor.html#cursor.executemany

So I would be fine with the docs change to make it clearer that sqlite3 does the same thing.

pf_moore · April 28, 2023, 9:45am

For what it’s worth, I looked at what apsw does:

>>> c = conn.executemany("insert into t2 values('a', ?) returning n", [(i,) for i in range(5)])
>>> c = conn.executemany("insert into t2 values('b', ?) returning n", [(i,) for i in range(5)])
>>> c = conn.executemany("insert into t2 values('c', ?) returning n", [(i,) for i in range(5)])
>>> c = conn.executemany("update t2 set n=n+1 where name=? returning n", [('a',),('b',),('c')])
>>> list(c)
[(1,), (1,), (1,)]
>>> list(conn.execute("select * from t2"))
[('a', 1), ('b', 1), ('c', 1)]
>>> c = conn.executemany("insert into t2 values('d', ?) returning n", [(i,) for i in range(5)])
>>> list(c)
[(0,), (1,), (2,), (3,), (4,)]
>>> c = conn.executemany("insert into t2 values('e', ?) returning n", [(i,) for i in range(5)])
>>> list(conn.execute("select * from t2"))
[('a', 1), ('b', 1), ('c', 1), ('d', 0), ('d', 1), ('d', 2), ('d', 3), ('d', 4), ('e', 0)]
>>>

Note that the set of values for names a,b,c and e are wrong. It seems that executemany fails loop through all of the parameter sets unless you fetch from the returned cursor. However, if you do, it returns all of the results in a single flattened result set:

>>> c = conn.executemany("update t2 set n=n+1 where name=? returning n", [('a',),('b',),('c'),('d',),('e',)])
>>> list(c)
[(2,), (2,), (2,), (1,), (2,), (3,), (4,), (5,), (1,)]

erlendaasland · April 28, 2023, 10:21am

True, but that’s not the case for psycopg3 (which is what you get if you pip install psycopg):

https://www.psycopg.org/psycopg3/docs/api/cursors.html#psycopg.Cursor.executemany