I am encountering an issue with selecting a column by integer in Pandas. The syntax used by the course instructor is in the code below. The data is provided. I see in a video that this syntax works for them. I’m not sure which version of Python or Pandas they are using. I’m hoping someone can help me understand why this is failing for me.
import pandas as pd
# read data from URL and store in a DataFrame
url = "https://github.com/chendaniely/pandas_for_everyone/blob/master/data/gapminder.tsv?raw=true"
df = pd.read_csv(url, sep='\t')
# show data from dataframe
print("# --- --- --- --- --- --- --- --- --- --- --- --- ")
print(df.head())
print("# --- --- --- --- --- --- --- --- --- --- --- --- ")
# show 3rd column of DataFrame
print(df[[2]])
# AFAICT, line above should be equivalent to line below
# print(df[['year']])
Error Traceback
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.11/site-packages/pandas/core/frame.py", line 3767, in __getitem__
indexer = self.columns._get_indexer_strict(key, "columns")[1]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 5876, in _get_indexer_strict
self._raise_if_missing(keyarr, indexer, axis_name)
File "/usr/local/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 5935, in _raise_if_missing
raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Index([2], dtype='int64')] are in the [columns]"
I’m not sure why you have the code line print(df[[2]]) ?
If you look at the head frame, you’ll see:
# --- --- --- --- --- --- --- --- --- --- --- ---
country continent year lifeExp pop gdpPercap
0 Afghanistan Asia 1952 28.801 8425333 779.445314
1 Afghanistan Asia 1957 30.332 9240934 820.853030
2 Afghanistan Asia 1962 31.997 10267083 853.100710
3 Afghanistan Asia 1967 34.020 11537966 836.197138
4 Afghanistan Asia 1972 36.088 13079460 739.981106
# --- --- --- --- --- --- --- --- --- --- --- ---
So, you can use any of the column names, such as: print(df[['year']]) print(df[['lifeExp']])
… or whatever column name is there.
Caveat: I’m not a Pandas user, so I may be wrong.
To add: Just an FYI, in case you don’t know:
If you want to see what version of Pandas you are using, just pop in the code line print(pd.__version__). I usually put that in at the beginning of a script, just after the import so that I can see the version of a package.
AB! thanks for your illustrative reply. And it seems like that ambiguity is what caused the approach to be deprecated after version 0.19 with the arrival of iloc in version 0.20. I had encountered this in an old pandas course.