Pandas dataframe assignment with multi-index

john316 · May 3, 2024, 9:59pm

Hi,

test = pd.DataFrame({'val' : [4,5,6]}, index = [('a', 1), ('b', 2), ('c', 3)] )

I want to assign a value, but it does not work

test.loc[('a', 1), 'val'] = 7

I got an error message, of which the last line is

KeyError: "None of [Index(['a', 1], dtype='object')] are in the [index]"

However , it works

test['val'][('a', 1)] = 7

I wonder why the first does not work? What would be a right way to do this?
The complete error message:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-507-832a4fbdfb46> in <module>
----> 1 test.loc[('a', 1), 'val'] = 7

~\AppData\Roaming\Python\Python37\site-packages\pandas\core\indexing.py in __setitem__(self, key, value)
    717         else:
    718             key = com.apply_if_callable(key, self.obj)
--> 719         indexer = self._get_setitem_indexer(key)
    720         self._has_valid_setitem_indexer(key)
    721 

~\AppData\Roaming\Python\Python37\site-packages\pandas\core\indexing.py in _get_setitem_indexer(self, key)
    658         if isinstance(key, tuple):
    659             with suppress(IndexingError):
--> 660                 return self._convert_tuple(key, is_setter=True)
    661 
    662         if isinstance(key, range):

~\AppData\Roaming\Python\Python37\site-packages\pandas\core\indexing.py in _convert_tuple(self, key, is_setter)
    783             self._validate_key_length(key)
    784             for i, k in enumerate(key):
--> 785                 idx = self._convert_to_indexer(k, axis=i, is_setter=is_setter)
    786                 keyidx.append(idx)
    787 

~\AppData\Roaming\Python\Python37\site-packages\pandas\core\indexing.py in _convert_to_indexer(self, key, axis, is_setter)
   1255                 return inds
   1256             else:
-> 1257                 return self._get_listlike_indexer(key, axis)[1]
   1258         else:
   1259             try:

~\AppData\Roaming\Python\Python37\site-packages\pandas\core\indexing.py in _get_listlike_indexer(self, key, axis)
   1312             keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr)
   1313 
-> 1314         self._validate_read_indexer(keyarr, indexer, axis)
   1315 
   1316         if needs_i8_conversion(ax.dtype) or isinstance(

~\AppData\Roaming\Python\Python37\site-packages\pandas\core\indexing.py in _validate_read_indexer(self, key, indexer, axis)
   1372                 if use_interval_msg:
   1373                     key = list(key)
-> 1374                 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
   1375 
   1376             not_found = list(ensure_index(key)[missing_mask.nonzero()[0]].unique())

KeyError: "None of [Index(['a', 1], dtype='object')] are in the [index]"

kknechtel · May 3, 2024, 10:12pm

As described in the documentation, Pandas attempts to interpret the arguments to loc differently depending on their type (and capabilities).

('a', 1) is iterable (and not a string); so when .loc tries to use it for the row indices, it doesn’t look for a single row labelled that way, but for two separate rows labelled 'a' and 1. It’s handled the same way as “a list or array of labels” in the doc’s terminology.

We can fix this by adding another layer of wrapping:

>>> test.loc[[('a', 1)], 'val']
(a, 1)    4
Name: val, dtype: int64

(In short, it’s the same issue that %-style string formatting has when you want to format a single value that happens to be a tuple - Python interprets the elements as separate values to format. To fix it you need to wrap the argument in another 1-tuple. Of course, this case is more annoying, since a list wrapper doesn’t work quite right either )

Better yet, if the purpose is to locate a single cell (always), use .at instead:

>>> test.at[('a', 1), 'val'] # gives the cell value rather than a row
4
>>> test.at[('a', 1), 'val'] = 7
>>> test
        val
(a, 1)    7
(b, 2)    5
(c, 3)    6