In the data world, many processing jobs rely on date-time partitioned data and or process.
This may be the case for tools such as Apache Airflow (which trigger jobs every N minutes or hours), or Apache Spark which will probably partition your data according to a usual year=YYYY/month=MM/day=DD pattern.
This requires that you build a “boundary-aligned” datetime object from an arbitrary one.
A common way to do it is as follow:
However, this requires two uses of the delta object and can be quiet error prone (forgetting to truncating to int, or to multiply by the same value, etc.)
Personally, I find it more semantic to apply “truncating” method. Currently, most of the operator-function are not used by datetime objects (they only support addition and substraction of timedeltas).
I propose to implement the above code as the floordiv (//) operator, allowing such syntax: aligned = src_datetime // delta.
Does it make sense to other people ? Is there any cons against such idea ?
I think the problem with division or modulus would be the need to specify a zero point to fully determine the result.
dt = zero_time + q * delta + r
By selecting a different zero time, I can produce different quotient q or remainder r, pushing events above or below a boundary. Since you can’t provide three arguments to a binary operator, I think you can’t use an operator here.
If I understand you right, you need something like the following:
def truncate(dt, delta, zero):
n = (dt.timestamp() - zero.timestamp()) // delta.total_seconds()
return zero + n * delta
The zero point is necessary because a datetime is a point, not a sum of intervals. The problem is simply under-determined unless we convert the datetime to a timedelta, which requires a reference time.
For the proposed operation, we could pick various reference times that are more or less reasonable for different cases. Generally, we’ll probably want ones that conveniently overlap with midnight on some day, but which day? We could use the epoch, or the current day, or the first of the current year. Whatever we pick, there will be some edge case where it turns out suboptimal.
Anyway, we have these two separate classes specifically to avoid these conflations and make programmers think carefully about whether they want a point or an interval and how they want to combine them. The fact that this operation is ill-defined without an extra parameter is a feature, not a bug.
I get a different result before noon or afternoon. Does it matter? Hard to say without knowing the application. Maybe I care about trading days on the NYSE, and want to treat actions that happen before 9am in New York as belonging to the previous day, in which case midnight is a bad zero point.
It would be interesting to see what other datetime libraries do. I skimmed pendulum and arrow, could not find anything for the former, and the latter has floor and ceiling methods for predefined timeframes such as ‘hour’ or ‘day’.
Ah, so basically the same issue as with using months – the shape, size, and boundaries of the individual buckets. I.e. simple truncating works fine for seconds, minutes, and hours (and possibly days), but fails with anything more complex.
Thanks for all the feedbacks.
Yes, actually I implicitly use a zero point which is the epoch (as I use the timestamp method, which returns the number of seconds since epochs).
I also understand that using the floordiv operator can be confusing because very different from the numeric cases. Actually, what I did, is doing a floordiv of the timestamp and then remultiplying it. In this regard, maybe a dedicated method may be better.
@psarka There is a dedicated library for that: datetime-truncate · PyPI. But, as for most known implementations (Apache common datetime, Postgres date_trunc), it only allows “simple” truncation, to well defined values (to seconds, weeks, etc.) not to arbitrary values (such as 1.5h, 15 minutes, etc.)
So, maybe a dedicated method, allowing to provide a custom zero point (that can default to the epoch or datetime.datetime.min, that should suit most of the needs) can be a viable way, cannot it ?
Thought a little bit about @effigies concern: there are two different classes (datetime and timedelta) for two different use cases.
In my example, I try to convert a datetime object to a new one, but what I actually do is computing a multiple of the given timedelta and adding it to a 0 value (which is the epoch in my case).
This was not obvious to me at the beginning and so now his first answer make more sense.
Any datetime object can be expressed as: dt = zero_time + q * delta +r. We are used to express them as: dt = epoch + timestamp_seconds * "PT1S" + micros. But you’re free to express them as dt = my_birthday + q * "P1D" + r.
As raised by @stoneleaf, this works well with simple truncation, but can be weird with some ill-defined deltas, such as month or year. I think this is a non-problem, as timedelta class does not support this kind of values. So any method truncating to a given datetime.timedelta would not have to support such case. Maybe another method can be proposed for “month-truncation” or similar, but I feel that the dt.replace(day=0, hours=0, minutes=0, seconds=0, microseconds=0) is enough.
I propose a new method in the datetime.datetime class to truncate to a given timestamp with an arbitrary zero point:
I’ve had a need for floor, ceil, and/or round multiple times. It’s not quite what the OP is asking for, but I think would be a good addition to datetime – it’s not that hard to write, but it is a bit fiddly, and requires more than cursury understanding of how datetimes work (it took my maybe 1/2 hour to write what I needed recently, and I’m not a total newbie). And if you look at SO – you don’t get one simple answer.
The reason this is useful is that datetime has millisecond precision, and many application can only really usefully deal with seconds, or minutes, or hours …
My thought would be to only be able to use the standard units: seconds, minutes, hours, days.
Maybe months or years, but that’s tricky with round() as they aren’t consistently defined. Or maybe don’t even do round() – floor and ceil may be enough.
Maybe the answer is “use arrow”, but I hate to add a dependency for only one or two basic functions.