Question about polars

I have a rather technical question about the polars library. I have looked for a polars-specific forum, but have not been able to find it. If someone would be able to point me to the proper place to ask this question, that would be very helpful.

For completeness, here is the question:


Edit

I realise wrote the question a little poorly. The code probably isn’t as self-explanatory as I intended. The output of the scalar case (which works) is

basic array
i32 array[i32, 4]
0 [0, 1, 2, 3]
1 [1, 2, 3, 4]
2 [2, 3, 4, 5]
3 [3, 4, 5, 6]

I’d like to be able to create a similar output for the datetime case.


I would like to be able to do something like the following:

from datetime import UTC, datetime, timedelta

import numpy as np
import polars as pl

start = datetime(2025, 4, 1, 15, 0, tzinfo=UTC)
series_1 = pl.Series(
    "datetime_3pm",
    [start + timedelta(minutes=30 * i) for i in range(4)]
)

array = np.array([timedelta(minutes=30 * i) for i in range(4)])
series_1 + array

but this raises SchemaError: failed to determine supertype of i32 and object.

I think this should be possible, because for example the following code works:

import numpy as np
import polars as pl

basic_series = pl.Series(range(4), dtype=pl.Int32)
array = np.arange(4, dtype=np.int32).reshape((1, 4))
array_series = basic_series + array
df = pl.DataFrame({"basic": basic_series})
df.with_columns(
    array=pl.col("basic") + array
)

I’ve tried a lot of different things by this point.
I’ve got a workaround for my algorithm where I do an outer join instead of creating an array column. But it bothers me that I don’t know how to make this[1] work.
Any help would be appriciated :slight_smile:


  1. adding a timedelta array to a datetime column ↩︎

I’d use polars’ own Datetime datatype instead of Python’s.

series_1 properly gets converted to polars datetime by the pl.Series constructor. I can’t imagine there’s a problem there.
I’ve inspected it. The column dtype is proper polars datetime.

The timedelta gives me trouble, there might be room for improvement there. pl.duration feels finicky and I’m never quite certain that I’m dealing with an actual equivalent of a timedelta. My most confident attempt was to construct 2 datetime arrays and take their difference. That way I’ve definitely got a polars-type timedelta series. Then to construct an array out of them… I’ve tried various things. I’d like to think at least one of them worked.
For example, series.to_numpy().reshape((4,1)).
but anytime I construct something that should be a pl.duration inside an array, it seems to register either as object or as an array of ints.

Explicit is better than implicit. Specifying dtype=polars.Datetime gives it a much bigger hint about which supertype to determine, instead of leaving it to the default logic, or to chance.

I’m sure making a datetime.datetime: start + timedelta(minutes=30 * i) is fine. If there was still a problem, I would just be also explicit about how exactly to convert each datetime.datetime to a polars.Datetime.

@JamesParrott look I appreciate you’re trying to help, but I feel like you’re talking down at me. This isn’t a noob issue.

The following works:

from datetime import UTC, datetime, timedelta

import numpy as np
import polars as pl

start = datetime(2025, 4, 1, 15, 0, tzinfo=UTC)
series_1 = pl.Series(
    "time",
    [start + timedelta(minutes=30 * i) for i in range(4)]
)

array = np.array([timedelta(minutes=30 * i) for i in range(4)]).reshape((1, 4))
pl.DataFrame(series_1).with_columns(
    pl.col("time").map_elements(lambda x: x + array)
)

So there is no simple / fundamental problem with types.

The problem is that I want to do array broadcasting, because array broadcasting is more efficient than the .map_elements method.
And putting datetimes in a numpy array seems to stop polars from detecting that it’s a timedelta. I can’t figure out how to put them into a polars-native array, or at least not in a way that works.
So I’m looking for very specific knowledge about polars arrays.

from datetime import UTC, datetime, timedelta

import numpy as np
import polars as pl

start = datetime(2025, 4, 1, 15, 0, tzinfo=UTC)
series_1 = pl.Series(
    "datetime_3pm",
    [start + timedelta(minutes=30 * i) for i in range(4)]
)

array = np.array([timedelta(minutes=30 * i) for i in range(4)], dtype=pl.duration).reshape((1, 4))
series_1 + array
TypeError: Cannot interpret '<function duration at 0x7f046803f740>' as a data type

meanwhile duration returns an Expression. So I don’t know how to specify the dtype.

I’m no polars expert, but I did some experimenting, and it seemed that specifying dtype=np.timedelta64 on the array gave the output I was expecting. array = np.array([timedelta(minutes=30 * i) for i in range(4)], dtype=np.timedelta64).

from datetime import UTC, datetime, timedelta

import numpy as np
import polars as pl

start = datetime(2025, 4, 1, 15, 0, tzinfo=UTC)
series_1 = pl.Series(
    "datetime_3pm",
    [start + timedelta(minutes=30 * i) for i in range(4)]
)
print(series_1)
array = np.array([timedelta(minutes=30 * i) for i in range(4)], dtype=np.timedelta64)
print(series_1 + array)

Output:

shape: (4,)
Series: 'datetime_3pm' [datetime[μs, UTC]]
[
	2025-04-01 15:00:00 UTC
	2025-04-01 15:30:00 UTC
	2025-04-01 16:00:00 UTC
	2025-04-01 16:30:00 UTC
]
shape: (4,)
Series: 'datetime_3pm' [datetime[μs, UTC]]
[
	2025-04-01 15:00:00 UTC
	2025-04-01 16:00:00 UTC
	2025-04-01 17:00:00 UTC
	2025-04-01 18:00:00 UTC
]

In case it’s helpful, here’s the process I went through.

  1. Run the code sample you gave. I noticed it gave a slightly different error than what’s in your post – I’m not sure if I’m using a different version of something, or if you copied the wrong error message. The message I received is polars.exceptions.SchemaError: failed to determine supertype of datetime[μs, UTC] and object
  2. This gave me a hint that the error was with the numpy array not giving any numpy-specific datatype to the list of timedelta it was passed.
  3. After doing some searching around about how datetime works in polars, I tried changing the numpy array to a polars Series. That worked, but not knowing how polars works, I wasn’t sure if it would be equivalent to what you wanted in terms of performance.
  4. After doing a similar search for numpy and reading up on timedelta64, I tried replacing the timedelta list with a list like [np.timedelta64(30 * i, 'm') for i in range(4)], but that failed with an error about polars only supporting some timedelta64 resolutions. I guess I could have made this work by multiplying the minutes by some amount to change it to one of the supported resolutions, but that felt like a lot of work, and you specifically wanted to use Python timedeltas, which might not just be some simple number of minutes.
  5. Did some searching about how to convert a Python timedelta to a numpy timedelta64, but couldn’t find anything. Tried the only reasonable thing I could think of, which is to just tell numpy that I expect the array to be dtype=timedelta64. That worked, surprisingly!

Since I was able to get the correct output, using the same Series and array data structures that you used, I felt confident enough that this is what you’re looking for to go ahead and post it. On the other hand, given my inexperience with polars and numpy, I may have missed something. If so, I apologize for wasting your time with my lengthy post.

No, thanks for looking at it.

Unfortunately that’s not quite what I need. You’re adding two (4,1) arrays together. Which is possible in a variety of ways. Converting them both to pl.Series is a good solution actually.

But as you noted I messed up copy-pasting my code somehow. And it’s very core to my problem that I need to add a (1,N) array to a (M,1) array.

So that would be

from datetime import UTC, datetime, timedelta

import numpy as np
import polars as pl

start = datetime(2025, 4, 1, 15, 0, tzinfo=UTC)
series_1 = pl.Series(
    "datetime_3pm",
    [start + timedelta(minutes=30 * i) for i in range(4)]
)
print(series_1)
array = np.array([timedelta(minutes=30 * i) for i in range(4)], dtype=np.timedelta64).reshape((1, 4))
print(series_1 + array)

which raises InvalidOperationError: cannot cast Array type (inner: 'Int64', to: 'Duration(Microseconds)')

Interestingly even

from datetime import UTC, datetime, timedelta

import numpy as np
import polars as pl

start = datetime(2025, 4, 1, 15, 0, tzinfo=UTC)
series_1 = pl.Series(
    "datetime_3pm",
    [start + timedelta(minutes=30 * i) for i in range(4)]
)
print(series_1)
array = np.array([timedelta(minutes=30 * i) for i in range(4)], dtype=np.timedelta64).reshape((1, 4))
(series_1.cast(int) + array)

raises InvalidOperationError: cannot cast Array type (inner: 'Int64', to: 'Duration(Microseconds)'), so I think that has to happen when polars tries to implicitly convert the numpy. Indeed pl.Series(np.array([timedelta(minutes=30 * i) for i in range(4)], dtype=np.timedelta64).reshape((1, 4)), dtype=pl.Duration("us")) raises the same error. But so does pl.Series(np.array([timedelta(minutes=30 * i) for i in range(4)], dtype=np.timedelta64).reshape((1, 4)), dtype=pl.Int64).

pl.Series(np.array([i for i in range(4)]).reshape((1, 4))) works fine.

pl.Series(np.array([i for i in range(4)]).reshape((1, 4))).cast(pl.Array(pl.Duration("us"), 4))

works, which I only found because I was thinking about your reply.

As does

array = pl.Series(np.array([i*timedelta(minutes=15).total_seconds()*10**6 for i in range(4)]).reshape((1, 4))).cast(pl.Array(pl.Duration("us"), 4))
array

but

from datetime import UTC, datetime, timedelta

import numpy as np
import polars as pl

start = datetime(2025, 4, 1, 15, 0, tzinfo=UTC)
series_1 = pl.Series(
    "datetime_3pm",
    [start + timedelta(minutes=30 * i) for i in range(4)]
)
print(series_1)
array = pl.Series(np.array([i*timedelta(minutes=15).total_seconds()*10**6 for i in range(4)]).reshape((1, 4))).cast(pl.Array(pl.Duration("us"), 4))
print(series_1 + array)

once again results in ComputeError: cannot add non-numeric inner dtypes: (left: datetime[μs, UTC], right: duration[μs]). Which looks like BS. What else is duration[μs] for, if not to add it to datetime[μs, <timezone>] ?!