Fixing sqlite TIMESTAMP converter to handle UTC offsets

iafisher · October 6, 2021, 12:33am

The converter for TIMESTAMP columns that the sqlite3 module automatically registers ignores UTC offsets and always returns a naive datetime object.

I filed bpo-45335, and erlendaasland suggested deprecating the current behavior and switching to a timezone-aware converter in Python 3.13. sqlite3’s custom datetime parser can be replaced with datetime.datetime.fromisoformat, which was added in Python 3.7 and handles UTC offsets properly.

I am starting a discussion here to solicit feedback on this proposal. I think it is the best solution, but it should be kept in mind that changing the TIMESTAMP converter is backwards-incompatible and could result in some very confusing errors (like this), so we should make sure to give users plenty of advance warning if this is the route we choose to take.

malemburg · October 6, 2021, 8:46am

I don’t think this is a good way forward.

SQLite does not have a date/time storage type. Instead you can use a text field
(ISO, with/without UTC offset), a real (JDN) or an integer (Unix ticks):

https://sqlite.org/datatype3.html#date_and_time_datatype

Only the text field format can have a timezone associated with the value, which
is then taken into account by the SQLite date/time functions, but in most cases,
the calculations are then done using UTC. SQLite assumes that date/time string
values without offset are UTC times and returns UTC for its date/time functions:

https://sqlite.org/lang_datefunc.html

but it doesn’t even use a UTC offset in cases where it should (e.g. when using
datetime(‘now’, ‘localtime’)).

The problem with all this is that data read into a database will often not carry
timezone information. It can be local time, UTC or some other time zone, which
can only be deducted from related context (e.g. origin of data, source
conventions, separate database fields, etc.).

The naive datetime type is meant for exactly those cases.

Now, you could argue that having an offset is an indicator that a specific
timezone is given, which is correct, but what do you then do with time values
which do not carry this extra information ? The only correct way to go about it
is to return a naive datetime instance in those cases, creating issues when the
data has mixes of values stored with and without time offset.

IMO, the decision should be left to the application. The current default is the
most correct in terms of not creating problems downstream. Had there been major
problems with the default, this would have been raised many years ago.

People who do need the UTC offset interpretation can register a different
converter and apply extra logic based on the application context to fill in the
gaps for values which do not have a UTC offset in the database.

PS: Best practice is to always store datetime values as UTC in a database and
then use a separate field to hold the origin timezone string / location (where
needed). This causes the least surprises and errors. It can also be faster,
depending on how the database stores these values.

erlendaasland · October 6, 2021, 9:46am

Good points, @malemburg. I withdraw my suggestion.

iafisher · October 6, 2021, 10:37am

I understand your point, but it seems like a basic invariant is being violated here, which is that writing back the same data that I read from the database should not result in data loss.

It seems to me that if application programmers want to ignore UTC offsets which are in the database, then that should be the opt-in choice, rather than the other way around.

MrFuppes · October 6, 2021, 11:18am

I 100% agree with that, however the problem I see with Python’s naive datetime is that it’s not assuming UTC but local time. That’s why I originally though that parsing the UTC offset correctly would be a good idea.

malemburg · October 6, 2021, 5:23pm

Something else, I forgot to mention: sqlite3 will only use those converters if you pass in detect_types=sqlite3.PARSE_DECLTYPES to the .connect() method. Converters are not used per default. If you do want to use converters, it’s typically best to provide your own implementations, as the default ones may or may not be exactly what you want. E.g. the timestamp converter doesn’t even work with time values which have ISO time offset deltas.

Without detect_types, sqlite3 returns strings for e.g. timestamp columns.

erlendaasland · October 7, 2021, 7:16am

See also:

bpo-19065: sqlite3 timestamp adapter chokes on timezones
bpo-26651: Deprecate register_adapter() and register_converter() in sqlite3

cc. @berkerpeksag

iafisher · October 7, 2021, 11:10am

In the meantime, can I add a line to the sqlite3 docs noting that the TIMESTAMP converter ignores UTC offsets?

erlendaasland · October 7, 2021, 11:13am

IMO, yes that’s an improvement. See the devguide on how to add notes and warnings to the docs:

https://devguide.python.org/documenting/#paragraph-level-markup

Strongbeard · November 5, 2021, 8:41pm

Could you change your documentation update to a warning instead of a note and specifically call out that the converter crashes when attempting to interpret a UTC offset aware datetime? I arrived at this thread after misinterpreting your note and assuming that the converter would strip the UTC offset that was stored in the DB.

I’d suggest altering it to:
Warning The default “timestamp” converter is unable to interpret datetimes with UTC offsets from the database. If using both the default adapter and converter always use naive datetime.datetime objects (aka tzinfo=None). To preserve UTC offsets in timestamps, either leave converters disabled, or register an offset-aware converter with register_converter() .

erlendaasland · November 5, 2021, 8:44pm

When you say crash, do you mean a segfault, or a raised exception?

Strongbeard · November 5, 2021, 8:46pm

A raised ValueError.

iafisher · November 5, 2021, 8:54pm

Could you share your exact code and Python version? When I run https://bugs.python.org/file50324/timestamp.py with Python 3.8 I get a naive datetime object, not a ValueError.

Strongbeard · November 5, 2021, 9:22pm

I am using python 3.6.8. I was able to replicate what I am seeing with the following:

#!/usr/bin/python3

import sqlite3
from datetime import datetime, timezone

with sqlite3.connect(
    ":memory:", detect_types=sqlite3.PARSE_DECLTYPES|sqlite3.PARSE_COLNAMES
) as db:
    cursor = db.cursor()
    cursor.execute("CREATE TABLE data(timestamp TIMESTAMP NOT NULL)")
    timestamp = datetime(1970,1,1,0,0,0,0,tzinfo=timezone.utc)
    cursor.execute("INSERT INTO data(timestamp) VALUES (?)", [timestamp])
    cursor.execute("SELECT * FROM data")

I get the following traceback:

Traceback (most recent call last):
  File "/home/kmahon/git/test/test.py", line 17, in <module>
    cursor.execute("SELECT * FROM data")
  File "/usr/lib64/python3.6/sqlite3/dbapi2.py", line 70, in convert_timestamp
    hours, minutes, seconds = map(int, timepart_full[0].split(b":"))
ValueError: invalid literal for int() with base 10: b'00+00'

iafisher · November 5, 2021, 9:41pm

Ah, okay. I reproduced your failure with Python 3.8. It seems that sqlite3 crashes when the timestamp includes a UTC offset but does not have a decimal point in the seconds field. Here’s the source code: cpython/dbapi2.py at 3.8 · python/cpython · GitHub

The error is on line 70: timepart_full is equal to the remainder of the string, including the UTC offset, so when it splits on ":" and tries to parse each component as an int, it chokes on the offset.

Incidentally, it will also fail if there are fewer than six digits after the decimal point because of line 72.

iafisher · November 10, 2021, 1:14pm

@erlendaasland Do you think this is a good reason to deprecate the default converters entirely?

erlendaasland · November 10, 2021, 1:38pm

IMO, yes. I won’t have time to create an issue or propose a PR until earliest next week. Feel free to work on it, if you want I would close bpo-26651 as won't-fix and open a new issue that reference bpo-26651 and this topic.

iafisher · November 21, 2021, 4:46pm

I filed Issue 45858: Deprecate default converters in sqlite3 - Python tracker