Simplify datetime calculation

I’m surprised to see constructs such as return _DAYS_BEFORE_MONTH[month] + (month > 2 and _is_leap(year)) in python’s datetime routines.
But the calculation year-month-daydays past a certain reference date can be done by simple integer arithmetic without any “if” conditional or whatever conditional in the calculation.
The only thing to do is a time shift to March 1st:

                       |   |   |   |   |
                       v   v   v   v   v
Mar,Apr,May,Jun,Jul:  31, 30, 31, 30, 31   ∑: 153 days
Aug,Sep,Oct,Nov,Dec:  31, 30, 31, 30, 31   ∑: 153 days
Jan,Feb:              31, rest             leap year day just added at end

So you see the logic behind the Gregorian calendar.

Then calculation can go this way: see gregorian/__init__.py at master · galuschka/gregorian · GitHub
year-month-day → days past March 1st 0 (don’t worry about non existence of year “0”)

    m3     = (month + 9) % 12   # 1,2 -> 10,11 / 3,4,..,12 -> 0..9
    y_corr = year - (m3 // 10)  # jan/feb: 1 year before y
    mar1st = y_corr * 365 + (y_corr//4) - (y_corr//100) + (y_corr//400)

    d153 = m3 // 5  # 153 days every 5 months
    m5   = m3 %  5
    d61  = m5 // 2  # 61 days every 2 months
    d31  = m5 %  2  # mar,may,jul etc.: 31 days
    days_past_mar1 = (d153 * 153) + (d61 * 61) + (d31 * 31) + day - 1

    return mar1st + days_past_mar1  # Mar 1st plus just arithmetic calculation

and inverse function: days → year-month-day:

    y400  =     days // self._DAYS400Y
    days -=     y400 *  self._DAYS400Y
    y100  = min(days // self._DAYS100Y,3)  # 0..3: 146096/36524==4! - every 4th century has 1 more day
    days -=     y100 *  self._DAYS100Y
    y4    =     days // self._DAYS4Y       # 0..24: (every century has one day less - 24 is max. anyway)
    days -=     y4   *  self._DAYS4Y
    y1    = min(days // self._DAYS1Y, 3)   # 0..3: 1460/365==4! - every 4th year has 1 more day
    days -=     y1   *  self._DAYS1Y       # days here: days past Mar 1st (0..365)

    y_corr = (y400 * 400) + (y100 * 100) + (y4 * 4) + y1

    # print( f"{y_corr=} = {y400=}*400 + {y100=}*100 + {y4=}*4 + {y1=}" )

    m153  = days // 153  # 0..2: number of 5 months blocks 31,30,31,30,31
    days -= m153 *  153
    m61   = days //  61  # 0..2: number of 2 months blocks 31,30
    days -= m61  *   61
    m31   = days //  31  # 0..1: number of 31 days months
    days -= m31  *   31  # remaining days: days past 1st of month - also when February
    day   = days + 1                    # day of month 1..31
    m3 = (m153 * 5) + (m61 * 2) + m31   # 0=mar, .. 11=feb

    # print( f"{m3=} = {m153=}*5 + {m61=}*2 + {m31=} / remaining {days=}" )

    year = y_corr + (m3 // 10)          # revert y_corr for jan/feb

    month = ((m3 + 2) % 12) + 1  # 0->3, 1->4, ..., 9->12, 10->1, 11->2

    return year, month, day

Hi, are you suggesting a change to the code in the standard library? You showed one line of existing code, then showed dozens of lines of your code, and described it as simplifying.

I don’t understand what you are proposing, or why. Is there a problem with the existing code?

3 Likes

The referred line is just a hint to other many lines. (“… to see constructs such as…”!)

  • _DAYS_BEFORE_MONTH[month] → other calculation behind
  • month > 2 → unnecessary conditional
  • _is_leap(year) → another hint about not knowing about the schematic behind Gregorian calendar.

When we roll out all the calls and nested calls behind datetime.datetime and the invert routine, you will see, that it is unnecessarily complex implementation.
Comparison of code with nested calls to code without a nested call is quite unfair. :wink:

To make a change to standard library code, you’ll need to make a strong case. Incorrect answers are the strongest justification. Inefficiency might be a reason. So far, it seems like your justification is, “there’s another way to do it.”

You mentioned conditionals: what’s wrong with a conditional? Or nested calls? Why is that a problem? Nested calls are often the best way to modularize code to express common operations once instead of sprinkling them throughout the module.

1 Like

Also note datetime has both a C implementation that is used whenever possible, and a Python one that is used as a fallback.

For example, if CPython is explicitly built without the C version, or something like PyPy is using the pure Python one. But in most cases the much faster C one is used, and the Python one is mostly for historical/prototyping reasons, so there’s a higher bar to modify it.

I recall this algorithm from Calendrical Calculations by Dershowitz and Reingold, I had to look it up, equation 2.29;-) The book also notes it is not particularly efficient. I predict any performance gain would be negligible, as our algorithm is quite fast already. Unless you have benchmarks that show otherwise?

Also note datetime has both a C implementation that is used whenever possible, and a Python one that is used as a fallback.

We have the same algorithm in both implementations IIRC.

1 Like