I have the below date time in string type. I want to convert that into UTC with offset
> spark = SparkSession.builder.appName("Test").enableHiveSupport().getOrCreate()
> print("Print statement-1")
> schema = StructType([
> StructField("author", StringType(), False),
> StructField("dt", StringType(), False)
> data = [
> ["author1", "2022-07-22T09:25:47.261Z"],
> ["author2", "2022-07-22T09:26:47.291Z"],
> ["author3", "2022-07-22T09:23:47.411Z"],
> ["author4", "2022-07-224T09:25:47.291Z"]
> df = spark.createDataFrame(data, schema)
I want to convert dt column as UTC with offset.
For example the first row value as 2022-07-22T09:25:47.2610000 +00:00
How to do that in pyspark and sparkSQL.
I can easily do that using regex_replace
df=df.withColumn(“UTC”,regexp_replace(‘dt’, ‘Z’, ‘000 +00:00’))
bcoz Z is same as +00:00. But I am not sure that regexp_replace is correct of doing the conversion. Is there any method which can do the correct conversion rather than regex_replace?
Hello, @suren, and welcome to Python!
You are in luck! Python has several modules devoted to handling date and time values, including the datetime and zoneinfo. I encourage you to read the documentation for those modules.
Some code using
strptime(), like the following, might help (not tested):
from datetime import datetime as DT
utc_dt = "2022-07-22T09:25:47.261Z".replace("Z","UTC")
dt = DT.strptime(utc_dt, "%Y-%m-%dT%H:%M:%S.%f%Z")
# dt is a datetime instance with tzinfo set to timezone.UTC .
In general, when I am working on a certain kind of data or interaction, my first question is usually, “Does the Python standard library already have a way of handling this?” The place to look is The Python Standard Library — Python 3.10.6 documentation . My second question is, “has someone else written a package to handle this?” The place to look is https://pypi.org/ .
I hope this is helpful. Good luck!
—Jim DeLaHunt, Vancouver, Canada
Your orginal date and time strings are UTC. That is what the tailing Z means for an ISO-8601 date and time.
If you are passing the string to software that understands ISO-8601 no conversion should be needed.
Hi Thank you for the quick response. We are using Spark 2.4 version with Python 2.7.
How can we do this conversion in this pyspark version.
Sorry, I can’t help you there. I don’t know anything about Spark. I do know that Python 2.7 is very old by now. It is no longer supported. If you have a way to move your project to Python 3 (preferably Python 3.10) you will perhaps save yourself some problems.