I am trying to change Date format in an online CSV file

I am trying to change Date format in an online CSV file;
given format: 20200301

to plot the graph, I want ‘Month-Year’. I have tried the following code but the outcome is not what I expected.

covid_cases[‘Date’] = pd.to_datetime(covid_cases[‘Date’], format=’%Y %m %d’)
covid_cases

out- 1970-01-01 00:00:00.020200301

Any expert advice?

I would not call myself an ‘expert’, but I would simply do that in a down loaded spreadsheet.

With the date of 20200301 in a given cell (say a1) I’d put this =LEFT(A1,4)&" - "&MID(A1,5,2) in a2, which will display 2020 - 03

edit: Opps, just seen you want Month Year, but that’s easy to change.

Sorry. Ignore that as it’s probably not what you’re looking for.

By NaziaFarooqui via Discussions on Python.org at 18Apr2022 13:39:

I am trying to change Date format in an online CSV file;
given format: 20200301

to plot the graph, I want ‘Month-Year’. I have tried the following code but the outcome is not what I expected.

covid_cases[‘Date’] = pd.to_datetime(covid_cases[‘Date’], format=’%Y %m %d’)
covid_cases

out- 1970-01-01 00:00:00.020200301

Any expert advice?

I think you’re confusing saving a date in a particular format in a CSV
file with parsing a string representing a date. It looks to me like
you have a str in the variable covid_cases['Date'] and want to get a
date (or datetime) from it? Is that actually the case? Or do you have
a datetime in covid_cases['Date'] and want to write that your in your
preferred format (maybe "2020 03 01", but I am only guessing)?

Please elaborate on what you’re trying to do, and what you’re starting
with. For example, what is the output of:

print(type(covid_cases['Date']), repr(covid_cases['Date']))

before your call to pd.to_datetime()?

The docs for to_datetime() here:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html
show a function which is capable of takes a great many different
things-which-might-be-a-date and returning a Python datetime object.
Very handy for importing all sorts of weird stuff from CSV files, for
example.

I think your output above:

1970-01-01 00:00:00.020200301

is simply that datetime object which to_datetime() returns. It looks
quite strange - clearly it has misinterpreter what you gave it and got a
fractional second as the final component. Ah… I think it has decided
that your string 20200301 is nanoseconds.

The usual convention in Python is that internally, times are usually
stored as “UNIX timestamps”, which are an offset in seconds from the
start of 1970-01-01 UTC. So that is where your 1970 above comes from:
you have received a datetime representing 20200301 nanoseconds beyond
that starting point.

If your task is to take the string '20200301' and turn it into a
string formatted as Month-Year, for example 'March 2020', your best
approach is to use the datetime.strptime() function to decode your
source string and datetime.strftime to write it out as desired. The
docs are here:
https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior

Note that the process I’m talking about here is: you have a string in
covid_cases['Date'] and you want a string in your desired format.
Personally, I would not rewrite covid_cases['Date'] in place. So
something like (untested):

from datetime import datetime

# get a datetime instance
dt = strptime(covid_cases['Date'], '%Y%m%d')
print("dt =", dt)

formatted = dt.strftime('%B %Y')
print("formatted =", formatted

and then do whatever with formatted.

Cheers,
Cameron Simpson cs@cskk.id.au

1 Like