By NaziaFarooqui via Discussions on Python.org at 18Apr2022 13:39:
I am trying to change Date format in an online CSV file;
given format: 20200301
to plot the graph, I want ‘Month-Year’. I have tried the following code but the outcome is not what I expected.
covid_cases[‘Date’] = pd.to_datetime(covid_cases[‘Date’], format=‘%Y %m %d’)
covid_cases
out- 1970-01-01 00:00:00.020200301
Any expert advice?
I think you’re confusing saving a date in a particular format in a CSV
file with parsing a string representing a date. It looks to me like
you have a str in the variable covid_cases['Date']
and want to get a
date (or datetime) from it? Is that actually the case? Or do you have
a datetime in covid_cases['Date']
and want to write that your in your
preferred format (maybe "2020 03 01"
, but I am only guessing)?
Please elaborate on what you’re trying to do, and what you’re starting
with. For example, what is the output of:
print(type(covid_cases['Date']), repr(covid_cases['Date']))
before your call to pd.to_datetime()
?
The docs for to_datetime()
here:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html
show a function which is capable of takes a great many different
things-which-might-be-a-date and returning a Python datetime
object.
Very handy for importing all sorts of weird stuff from CSV files, for
example.
I think your output above:
1970-01-01 00:00:00.020200301
is simply that datetime
object which to_datetime()
returns. It looks
quite strange - clearly it has misinterpreter what you gave it and got a
fractional second as the final component. Ah… I think it has decided
that your string 20200301
is nanoseconds.
The usual convention in Python is that internally, times are usually
stored as “UNIX timestamps”, which are an offset in seconds from the
start of 1970-01-01 UTC. So that is where your 1970 above comes from:
you have received a datetime
representing 20200301 nanoseconds beyond
that starting point.
If your task is to take the string '20200301'
and turn it into a
string formatted as Month-Year, for example 'March 2020'
, your best
approach is to use the datetime.strptime()
function to decode your
source string and datetime.strftime
to write it out as desired. The
docs are here:
Note that the process I’m talking about here is: you have a string in
covid_cases['Date']
and you want a string in your desired format.
Personally, I would not rewrite covid_cases['Date']
in place. So
something like (untested):
from datetime import datetime
# get a datetime instance
dt = strptime(covid_cases['Date'], '%Y%m%d')
print("dt =", dt)
formatted = dt.strftime('%B %Y')
print("formatted =", formatted
and then do whatever with formatted
.
Cheers,
Cameron Simpson cs@cskk.id.au