How to convert the Zulu datetime to UTC with offset in python and spark sql

I have the below date time in string type. I want to convert that into UTC with offset

> spark = SparkSession.builder.appName("Test").enableHiveSupport().getOrCreate()
>     print("Print statement-1")
>     schema = StructType([
>         StructField("author", StringType(), False),       
>         StructField("dt", StringType(), False)
>     ])
> 
>     data = [
>         ["author1", "2022-07-22T09:25:47.261Z"],
>         ["author2", "2022-07-22T09:26:47.291Z"],
>         ["author3", "2022-07-22T09:23:47.411Z"],
>         ["author4", "2022-07-224T09:25:47.291Z"]
>     ]
> 
>     df = spark.createDataFrame(data, schema)
> ```

I want to convert dt column as UTC with offset.

For example the first row value as 2022-07-22T09:25:47.2610000 +00:00

How to do that in pyspark and sparkSQL.

I can easily do that using regex_replace

df=df.withColumn(“UTC”,regexp_replace(‘dt’, ‘Z’, ‘000 +00:00’))


bcoz Z is same as +00:00. But I am not sure that regexp_replace is correct of doing the conversion. Is there any method which can do the correct conversion rather than regex_replace?

Hello, @suren, and welcome to Python!

You are in luck! Python has several modules devoted to handling date and time values, including the datetime and zoneinfo. I encourage you to read the documentation for those modules.

Some code using strptime(), like the following, might help (not tested):

from datetime import datetime as DT
utc_dt = "2022-07-22T09:25:47.261Z".replace("Z","UTC")
dt = DT.strptime(utc_dt, "%Y-%m-%dT%H:%M:%S.%f%Z")
# dt is a datetime instance with tzinfo set to timezone.UTC . 

In general, when I am working on a certain kind of data or interaction, my first question is usually, “Does the Python standard library already have a way of handling this?” The place to look is The Python Standard Library — Python 3.10.6 documentation . My second question is, “has someone else written a package to handle this?” The place to look is https://pypi.org/ .

I hope this is helpful. Good luck!
—Jim DeLaHunt, Vancouver, Canada

1 Like

Your orginal date and time strings are UTC. That is what the tailing Z means for an ISO-8601 date and time.

If you are passing the string to software that understands ISO-8601 no conversion should be needed.

1 Like

Hi Thank you for the quick response. We are using Spark 2.4 version with Python 2.7.

How can we do this conversion in this pyspark version.

Sorry, I can’t help you there. I don’t know anything about Spark. I do know that Python 2.7 is very old by now. It is no longer supported. If you have a way to move your project to Python 3 (preferably Python 3.10) you will perhaps save yourself some problems.

3 Likes