I’m generating a dataframe df1, making a copy df2, modifying values in df2, then comparing them and attempting to return the differences as df3
The function that compares df1 and df2, returning the rows that differ between df1 and df2 as df3 is as follows:
def compare_large_dataframes(df1, df2):
if df1.shape != df2.shape:
raise ValueError("DataFrames must have the same number of rows and columns")
merged_df = pd.merge(df1, df2, how='outer', indicator=True).query('_merge == "right_only"').drop('_merge', axis=1)
return merged_df
When I run it I’m getting the following error:
df3 = compare_large_dataframes(df1, df2)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/x/ztm.py", line 4586, in compare_large_dataframes
merged_df = pd.merge(df1, df2, how='outer', indicator=True).query('_merge == "right_only"').drop('_merge', axis=1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/x/mypy/lib/python3.12/site-packages/pandas/core/reshape/merge.py", line 184, in merge
return op.get_result(copy=copy)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/x/mypy/lib/python3.12/site-packages/pandas/core/reshape/merge.py", line 886, in get_result
join_index, left_indexer, right_indexer = self._get_join_info()
^^^^^^^^^^^^^^^^^^^^^
File "/home/x/mypy/lib/python3.12/site-packages/pandas/core/reshape/merge.py", line 1151, in _get_join_info
(left_indexer, right_indexer) = self._get_join_indexers()
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/x/mypy/lib/python3.12/site-packages/pandas/core/reshape/merge.py", line 1125, in _get_join_indexers
return get_join_indexers(
^^^^^^^^^^^^^^^^^^
File "/home/x/mypy/lib/python3.12/site-packages/pandas/core/reshape/merge.py", line 1740, in get_join_indexers
zipped = zip(*mapped)
^^^^^^^^^^^^
File "/home/x/mypy/lib/python3.12/site-packages/pandas/core/reshape/merge.py", line 1737, in <genexpr>
_factorize_keys(left_keys[n], right_keys[n], sort=sort)
File "/home/x/mypy/lib/python3.12/site-packages/pandas/core/reshape/merge.py", line 2570, in _factorize_keys
llab, rlab = _sort_labels(uniques, llab, rlab)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/x/mypy/lib/python3.12/site-packages/pandas/core/reshape/merge.py", line 2631, in _sort_labels
_, new_labels = algos.safe_sort(uniques, labels, use_na_sentinel=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/x/mypy/lib/python3.12/site-packages/pandas/core/algorithms.py", line 1543, in safe_sort
raise ValueError("values should be unique if codes is not None")
ValueError: values should be unique if codes is not None
df1 is created as follows:
df1 = pd.read_sql_query('SELECT rowid AS alib_rowid, col1, col2, col3, col4, col5, col6, col7 from table order by __path', conn)
col1 through col7 are all text fields from a SQLite table.
df2 is structurally identical to df1.
Any ideas what may be triggering this error?