I looked up the 0th value of the series called ‘list’, but two values came out. For example, if I run list,
It comes out like this.
And if I look at the format with type(list), it says Pandas series. (If it has 1 value in 0th index, it appears as str. but in this case it appears as series)
And if I print the list with the print statement, I get 98 values from 0 to 97. However, if I query the length of the list with len(list) with the len statement, 198 more than twice the length are displayed.
Any idea why this is happening?
So as you have discovered, Pandas does allow duplicate indices, which can be useful in some cases. As you say, if you try and get an index that appears multiple times, then they will be returned as a series (or DataFrame if the original object is a DataFrame).
An index can be anything, and doesn’t have to be a be the integers in order 0, 1, 2,… Looks like in this case the first few indices are 0,1,2,3,4 and the last few are 93,95,95,96,97 but the ones in between could be anything at all! It doesn’t mean there are only 97 elements, and in this case as you say there are 198 elements.
If you concatenate two series together, then the original indices are preserved, which is one possible way you can get duplication. I’d guess in this case the indices are something like 0,1…, 99,100,0,1,2… but you can check with
If you want the index to be just the integers in order, you can do
series = series.reset_index().
Also, note that you have called a variable
list which overwrites the builtin python
list object, which will cause unexpected behaviour. It is best to avoid using the builtin names.