Deprecate PEP 249 (DB-API 2.0)

We were told to move this here from python-ideas. So:

We would like to propose the following improvements to DB-API 2.0 that would require bumping it up to DB-API 3.0:

  • Get rid of SQL strings
  • Get rid of SQL strings
  • Use package resources to store what would otherwise be SQL strings

While we cannot prevent someone from going out of their way to define package resources at runtime just so they can implement SQL injection, ultimately the goal is to provide a small speed bump so they don’t feel so inclined to jump straight into SQL injection before trying to do easier, more secure things.

For example, if your code currently looks like this:

cursor.execute("INSERT INTO ...", values)

This change would have it look something like this:

cursor.execute("mypackage.mymodule.insertvalues", values)

If your code instead looks like this:

cursor.execute(f"INSERT INTO ...")

This change would have it look like this:

N/A. No really, you just can’t do this with the proposed API.

This proposal is vaguely inspired by the White House’s threat of memory safety. We believe Python could benefit from language/library-level SQL safety.

I have plenty of code that looks like that and handles user provided values without being susceptible to sqli. The correct way of handling this is well known, documented, and not unique to python, python just implements what is necessary to have parameterized queries

If you’re concerned about people doing the wrong thing here, there are plenty of code-scanning tools that detect misuses of this

6 Likes

If you remove the ability to use raw SQL from DB-API, then you need to provide a new interface that allows mypackage.mymodule.insertvalues to do so.

I can see the appeal of using a library that doesn’t expose the ability to use raw SQL, but encouraging that doesn’t mean altering the API such a library itself would use.

1 Like

Thankfully this very proposal suggests a mechanism by which an ORM can produce the raw SQL it needs.

I don’t see it in this thread. Is there more that wasn’t migrated from the mailing-list?

Sorry, I interpreted mypackage.mymodule.insertvalues as a reference to a function, not an actual string.

Would there be something to prevent locally defined strings?

foo = f'INSERT {bad_sql} TO ...'
cursor.execute("foo", ...)

Rather than just stringified qualified names, I would think you’d want to use pkgutil.get_data instead, which gets clumsy fast. I still think this is something to be solved by a properly abstract library, rather than the API used by libraries.

1 Like

It is very common to dynamically construct a query, e.g. to customise a WHERE clause based on a request structure, while still using parameter binding to provide specific values. This proposal appears to categorically rule out this wide class of legal, safe code.

3 Likes

Please explain how an API will distinguish between this and a non-parameterized query. In both cases, the execute method will receive a single string.

cursor.execute("insert into tablename values (1)")
n = 1; cursor.execute(f"insert into tablename values ({n})")

These are utterly indistinguishable from the API’s perspective. It’s just a string.

Your proposal is, quite frankly, a pointless attempt to capitalize on something that is currently in the news, with no material benefit. It has nothing whatsoever to do with improving security or preventing SQL injection.

Good luck finding a core dev to sponsor your proposal.

2 Likes

I believe the idea is that there’s no way to provide the name of a global string literal that contains the result of the f-string. (Which is obviously untrue.) Or perhaps it loads a text file with importlib? The OP wasn’t entirely clear.

This proposal intends to build on importlib resources, available since Python 3.7.

The string passed to execute is a resource path, composed of a module path and an sql resource name. the module mypackage.mymodule would need to carry the insertvalues.sql resource.

If you truly need to execute raw SQL, inject a custom package loader and implement the resource loader API.

So, at best, what you’re doing is forcing people to put their queries into a completely separate file, and adding huge amoutns of overhead whenever you need to make changes. If you think people would do that rather than use the existing API, you are sorely mistaken.

You might as well try to solve road accidents by requiring that people drive their cars from inside the boot (trunk) using cameras. It’s no safer and adds way too much hassle, so you’re not going to solve anything.

This thread is probably better in the Python Help category, since it’s not a viable idea. You would have a much more productive thread by discussing what CAN be done to improve the situation, instead of saying “Why don’t we just” with a terrible idea.

1 Like

This is a reasonably well solved problem with static analysis. There are lints for it. You can set up types (such as with LiteralStr) to ensure non-arbitrary construction while allowing dynamic query building to get immediate feedback in IDEs for functions which need to be careful about how they take in inputs.

Creating APIs that don’t do what people need to avoid a problem isn’t helpful or useful, it just causes people not to use the new API and resent the efforts of those who don’t take the time to understand what is needed.

That’s before getting into the fact that the added complexity and indirection of hiding this behind a custom package loader makes this harder to audit properly.

6 Likes

Ironically I think you can’t even do this as written, since importlib intentionally no longer takes single paths – I believe due to issues with zip installations or something complex

The problem with lints is that they don’t stop you.

They are opt-in.

Opt-in safety is no safety at all.

Rust makes safety opt-out for a reason.

Again, you seem to be advocating for people to use libraries like SQLAlchemy instead of using DB-API directly. That doesn’t require DB-API itself to change.

1 Like

But using rust is still opt in. Do you want to deprecate the entire python language and force everyone to migrate to rust?

3 Likes

Nothing about what you have proposed here makes things safer. It doesn’t stop things you are considering (This distinction matters) to be unsafe .

All user input needs to be handled with care, and only as appropriate. That’s a fact of design that no language except one designed so obtusely and uselessly as to never handle user input will avoid having developers need to handle.

And bringing up rust is interesting here because rust as a language doesn’t prevent misuse of sqlite either.

Rust, unlike Python, doesn’t come bundled with sqlite.

Okay, let me put this another way for you:

The functionality of interacting with a database incorrectly is not the job of a language to address. It is the job of libraries to provide an interface which is documented and has a safe use. (python’s sqlite3 does this) It is the job of developers to properly handle user input. Rust doesn’t do anything to prevent you from having user input or passing it to things that it would be unsafe to pass user input to, it’s not the job of the language to do that.

The language can only check that you gave it the correct type of data for the underling C library. There’s nothing else to do here beyond tell people to understand the perils of user input and not trust it. Feel free to read more on this here: SQL Injection | OWASP Foundation but python is already providing the correct interface based on the recommendations here (read about parameterization)

1 Like

But your proposal doesn’t stop anything either, because as you said in your initial post, it’s easy to work around with some hack to wrap whatever strings you want in a fake “resource”. Someone will just write a library that has a function make_resource so instead of doing execute("INSERT INTO...") people will call execute(make_resource("INSERT INTO...")) and then everyone will use that library because it will be much less painful than not doing so.

we will believe it when we see it.