Unexpected behavior in sqlite3 module with query parameterization

ahferroin7 · April 19, 2023, 12:50pm

I’ve recently been working on some Python code that uses the sqlite3 module from the standard library (the exact details are not particularly relevant here), and noticed that strangely, any parameterized query that looks like SELECT * FROM table WHERE ? = foo; simply fails to find anything even if running the same query without parameterization works correctly. This is somewhat unexpected from my perspective given that SELECT * FROM table WHERE ? IS NOT NULL; appears to work correctly.

Minimal reproducer:

import sqlite3

db = sqlite3.connect(':memory:')

db.execute('CREATE TABLE test(id INTEGER, data INTEGER);')
db.execute('INSERT INTO test VALUES (1, 1);')

assert db.execute('SELECT * FROM test WHERE data = 1;').fetchall()
assert db.execute('SELECT * FROM test WHERE ? = 1;', ['data']).fetchall()

Both queries should in theory end up being identical, and both assertions should thus pass, but the second one fails with an assertion error on at least CPython 3.10.11 and CPython 3.11.3.

A look through both the module documentation and PEP 249 gives no indication that things should behave this way, and the SQLite C API documentation is not particularly helpful either (I understand that under the hood the parameterization functionality is using the various sqlite3_bind_* functions, but the documentation there also doesn’t make any indication that things should behave this way), so I’m at a bit of a loss here. I’m hoping someone else may understand what’s going on here better than I do so that I can determine whether I should be opening a bug or a documentation issue about this.

erlendaasland · April 19, 2023, 4:05pm

I suggest you try out these things in the SQLite CLI first. The string literal 'data' is not equal to the numeric literal 1. Note that SQLite use single quotes to denote string literals.

SQLite version 3.39.5 2022-10-14 20:58:05
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> CREATE TABLE test(id INTEGER, data INTEGER);
sqlite> INSERT INTO test VALUES (1, 1);
sqlite> SELECT * FROM test WHERE data = 1;
1|1
sqlite> SELECT * FROM test WHERE 'data' = 1;
sqlite> 
sqlite> ^D

MRAB · April 19, 2023, 4:27pm

Parameterisation applies only to values, not to field names or table names.

erlendaasland · April 19, 2023, 5:02pm

Correct, that is also true.

ahferroin7 · April 19, 2023, 5:11pm

Ah, that’s exactly what I was missing then.

This doesn’t seem to be explicitly documented anywhere (at least, not in the Python documentation), and seems to be at least a little counterintuitive to me given my understanding that parameterization is supposed to be part of sanitizing data used in the queries (and substituting in a column name gives just as much opportunity for SQL injection attacks as substituting in a value).

Of course, part of the issue with my understanding may also just be my use case. I entirely accept that needing to choose a column name for a query at runtime based on user input when the possible column names are also computed at runtime is a bit atypical (but then, I’ll readily take this limitation over the performance implications of trying to use a list of dicts for the same purpose, already benchmarked that and SQLite is so much faster).

Rosuav · April 19, 2023, 5:23pm

That’s exactly the problem though. You’re sanitizing DATA, not column names. If the parameter is the string "data", that has to be treated as a string in SQL, not a column name, because otherwise you achieve nothing that can’t be done with string interpolation.

Do you ever need to be able to substitute in either a column name or a string literal? In my experience, any given location only ever gets one or the other. Use parameterization when you need to supply custom data, and use string interpolation when you need to dynamically choose the column name.

It’s like how, in Python, you can’t use a variable in an attribute lookup:

m = "append"
stuff = []
stuff.m(42) # doesn't work

but instead have to use a dedicated feature that makes it obvious that you’re doing dynamic programming:

getattr(stuff, m)(42)

Interpolation of dynamic column names is very rare, at least compared to insertion of (often untrusted) data.

Topic		Replies	Views
Sqlite3: consider deprecating combining named placeholders with sequence of params Core Development stdlib	8	1590	February 9, 2023
Coding a dynamic SQLite update query where a criterion or assignment may be None? Python Help	0	361	December 17, 2023
Sqlite3: Could `None` be an alias for `":memory:"` database? Ideas	1	456	January 15, 2021
Clean up some sqlite3 APIs Core Development stdlib , sqlite3	1	601	August 28, 2023
Should sqlite3.Row be more dict-like? Core Development stdlib , sqlite3	4	629	August 29, 2023

Unexpected behavior in sqlite3 module with query parameterization

Related Topics