Panda dataframe update existing table with to_sql

DevJMA · May 30, 2024, 7:06pm

Hi

I am new to Python and is trying to make my first python application. But I have some problem with panda.

How do I update an existing table with panda dataframe with out getting duplicate errors saying key already exists. Is it possible to skip record that already exists or what is best practice?

The only solution I can come up with is to write the data to a buffer table and then make an insert from that table for new records. But I think that this solution is very inefficient.

kknechtel · May 30, 2024, 9:48pm

It would be much easier to understand the problem if you could show a simple example of what you are trying, and a copied and pasted error. (In particular, your title mentions .to_sql, but I can’t see how it relates to the task you’re describing here - perhaps because “update” is too vague of a word to explain what you actually want to do.) Please read the pinned thread to understand the forum’s formatting tools, and format both code and error messages in separate code blocks.

DevJMA · May 31, 2024, 6:54am

def get_sma(df,MA):
    df['value'] = df['value'].rolling(int(MA)).mean()
    return df

db = SessionLocal()
query = 'select stock.id, indicator.code from stock,indicator where stock.blocked=false order by stock.id,indicator.code'
stocks = engine.execute(query)
for stock in stocks:
    query = 'select stock_id, \'' + stock.code + '\' as indicator_code, date, close as value from stock_exchange_rate where stock_id=' + format(stock.id)
    df = pd.read_sql(query, engine)
    match stock.code[:3]:
        case 'SMA':
            get_sma(df,200)
            df.to_sql(name='stock_indicator_value', con=engine, if_exists='append', index=False)
        case 'EMA':
            print('EMA ' + stock.code)
        case 'MAC':
            print('MAC ' + stock.code)
        case '_':
            print('indicator ' + stock.code + ' error')

it gives me this error.

EMA EMA10
EMA EMA150
EMA EMA5
MAC MACD
MAC MACDSignal
SMA SMA200
Traceback (most recent call last):
  File "/Users/jma/.pyenv/versions/3.10.4/lib/python3.10/site-packages/sqlalchemy/sql/visitors.py", line 77, in _compiler_dispatch
    meth = getter(visitor)
AttributeError: 'PGCompiler_psycopg2' object has no attribute 'visit_on_duplicate_key_update'. Did you mean: 'visit_on_conflict_do_update'?

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/jma/.pyenv/versions/3.10.4/lib/python3.10/site-packages/pandas/io/sql.py", line 1322, in insert_records
    return table.insert(chunksize=chunksize, method=method)
  File "/Users/jma/.pyenv/versions/3.10.4/lib/python3.10/site-packages/pandas/io/sql.py", line 950, in insert
    num_inserted = exec_insert(conn, keys, chunk_iter)
  File "/Users/jma/Documents/Python/FastAPI/stockscreen/technical_analysis.py", line 26, in insert_on_duplicate
    conn.execute(on_duplicate_key_stmt)
  File "/Users/jma/.pyenv/versions/3.10.4/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1306, in execute
    return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)
  File "/Users/jma/.pyenv/versions/3.10.4/lib/python3.10/site-packages/sqlalchemy/sql/elements.py", line 325, in _execute_on_connection
    return connection._execute_clauseelement(
  File "/Users/jma/.pyenv/versions/3.10.4/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1490, in _execute_clauseelement
    compiled_sql, extracted_params, cache_hit = elem._compile_w_cache(
  File "/Users/jma/.pyenv/versions/3.10.4/lib/python3.10/site-packages/sqlalchemy/sql/elements.py", line 536, in _compile_w_cache
    compiled_sql = self._compiler(
  File "/Users/jma/.pyenv/versions/3.10.4/lib/python3.10/site-packages/sqlalchemy/sql/elements.py", line 558, in _compiler
    return dialect.statement_compiler(dialect, self, **kw)
  File "/Users/jma/.pyenv/versions/3.10.4/lib/python3.10/site-packages/sqlalchemy/sql/compiler.py", line 776, in __init__
    Compiled.__init__(self, dialect, statement, **kwargs)
  File "/Users/jma/.pyenv/versions/3.10.4/lib/python3.10/site-packages/sqlalchemy/sql/compiler.py", line 451, in __init__
    self.string = self.process(self.statement, **compile_kwargs)
  File "/Users/jma/.pyenv/versions/3.10.4/lib/python3.10/site-packages/sqlalchemy/sql/compiler.py", line 486, in process
    return obj._compiler_dispatch(self, **kwargs)
  File "/Users/jma/.pyenv/versions/3.10.4/lib/python3.10/site-packages/sqlalchemy/sql/visitors.py", line 82, in _compiler_dispatch
    return meth(self, **kw)
  File "/Users/jma/.pyenv/versions/3.10.4/lib/python3.10/site-packages/sqlalchemy/sql/compiler.py", line 3937, in visit_insert
    post_values_clause = self.process(
  File "/Users/jma/.pyenv/versions/3.10.4/lib/python3.10/site-packages/sqlalchemy/sql/compiler.py", line 486, in process
    return obj._compiler_dispatch(self, **kwargs)
  File "/Users/jma/.pyenv/versions/3.10.4/lib/python3.10/site-packages/sqlalchemy/sql/visitors.py", line 79, in _compiler_dispatch
    return visitor.visit_unsupported_compilation(self, err, **kw)
  File "/Users/jma/.pyenv/versions/3.10.4/lib/python3.10/site-packages/sqlalchemy/sql/compiler.py", line 470, in visit_unsupported_compilation
    util.raise_(
  File "/Users/jma/.pyenv/versions/3.10.4/lib/python3.10/site-packages/sqlalchemy/util/compat.py", line 207, in raise_
    raise exception
sqlalchemy.exc.UnsupportedCompilationError: Compiler <sqlalchemy.dialects.postgresql.psycopg2.PGCompiler_psycopg2 object at 0x140a65ab0> can't render element of type <class 'sqlalchemy.dialects.mysql.dml.OnDuplicateClause'> (Background on this error at: https://sqlalche.me/e/14/l7de)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/jma/Documents/Python/FastAPI/stockscreen/technical_analysis.py", line 44, in <module>
    df.to_sql(name='stock_indicator_value', con=engine, if_exists='append', index=False, chunksize=4096, method=insert_on_duplicate)
  File "/Users/jma/.pyenv/versions/3.10.4/lib/python3.10/site-packages/pandas/core/generic.py", line 2951, in to_sql
    return sql.to_sql(
  File "/Users/jma/.pyenv/versions/3.10.4/lib/python3.10/site-packages/pandas/io/sql.py", line 697, in to_sql
    return pandas_sql.to_sql(
  File "/Users/jma/.pyenv/versions/3.10.4/lib/python3.10/site-packages/pandas/io/sql.py", line 1739, in to_sql
    total_inserted = sql_engine.insert_records(
  File "/Users/jma/.pyenv/versions/3.10.4/lib/python3.10/site-packages/pandas/io/sql.py", line 1328, in insert_records
    err_text = str(err.orig)
AttributeError: 'UnsupportedCompilationError' object has no attribute 'orig'
jma@MBPtilhrendeJan stockscreen % python technical_analysis.py
EMA EMA10
EMA EMA150
EMA EMA5
MAC MACD
MAC MACDSignal
SMA SMA200
Traceback (most recent call last):
  File "/Users/jma/.pyenv/versions/3.10.4/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1799, in _execute_context
    self.dialect.do_executemany(
  File "/Users/jma/.pyenv/versions/3.10.4/lib/python3.10/site-packages/sqlalchemy/dialects/postgresql/psycopg2.py", line 953, in do_executemany
    context._psycopg2_fetched_rows = xtras.execute_values(
  File "/Users/jma/.pyenv/versions/3.10.4/lib/python3.10/site-packages/psycopg2/extras.py", line 1270, in execute_values
    cur.execute(b''.join(parts))
psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "stock_indicator_value_pkey"
DETAIL:  Key (stock_id, indicator_code, date)=(1, SMA200, 2012-06-18) already exists.


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/jma/Documents/Python/FastAPI/stockscreen/technical_analysis.py", line 43, in <module>
    df.to_sql(name='stock_indicator_value', con=engine, if_exists='append', index=False)
  File "/Users/jma/.pyenv/versions/3.10.4/lib/python3.10/site-packages/pandas/core/generic.py", line 2951, in to_sql
    return sql.to_sql(
  File "/Users/jma/.pyenv/versions/3.10.4/lib/python3.10/site-packages/pandas/io/sql.py", line 697, in to_sql
    return pandas_sql.to_sql(
  File "/Users/jma/.pyenv/versions/3.10.4/lib/python3.10/site-packages/pandas/io/sql.py", line 1739, in to_sql
    total_inserted = sql_engine.insert_records(
  File "/Users/jma/.pyenv/versions/3.10.4/lib/python3.10/site-packages/pandas/io/sql.py", line 1332, in insert_records
    raise err
  File "/Users/jma/.pyenv/versions/3.10.4/lib/python3.10/site-packages/pandas/io/sql.py", line 1322, in insert_records
    return table.insert(chunksize=chunksize, method=method)
  File "/Users/jma/.pyenv/versions/3.10.4/lib/python3.10/site-packages/pandas/io/sql.py", line 950, in insert
    num_inserted = exec_insert(conn, keys, chunk_iter)
  File "/Users/jma/.pyenv/versions/3.10.4/lib/python3.10/site-packages/pandas/io/sql.py", line 857, in _execute_insert
    result = conn.execute(self.table.insert(), data)
  File "/Users/jma/.pyenv/versions/3.10.4/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1306, in execute
    return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)
  File "/Users/jma/.pyenv/versions/3.10.4/lib/python3.10/site-packages/sqlalchemy/sql/elements.py", line 325, in _execute_on_connection
    return connection._execute_clauseelement(
  File "/Users/jma/.pyenv/versions/3.10.4/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1498, in _execute_clauseelement
    ret = self._execute_context(
  File "/Users/jma/.pyenv/versions/3.10.4/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1862, in _execute_context
    self._handle_dbapi_exception(
  File "/Users/jma/.pyenv/versions/3.10.4/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2043, in _handle_dbapi_exception
    util.raise_(
  File "/Users/jma/.pyenv/versions/3.10.4/lib/python3.10/site-packages/sqlalchemy/util/compat.py", line 207, in raise_
    raise exception
  File "/Users/jma/.pyenv/versions/3.10.4/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1799, in _execute_context
    self.dialect.do_executemany(
  File "/Users/jma/.pyenv/versions/3.10.4/lib/python3.10/site-packages/sqlalchemy/dialects/postgresql/psycopg2.py", line 953, in do_executemany
    context._psycopg2_fetched_rows = xtras.execute_values(
  File "/Users/jma/.pyenv/versions/3.10.4/lib/python3.10/site-packages/psycopg2/extras.py", line 1270, in execute_values
    cur.execute(b''.join(parts))
sqlalchemy.exc.IntegrityError: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "stock_indicator_value_pkey"
DETAIL:  Key (stock_id, indicator_code, date)=(1, SMA200, 2012-06-18) already exists.

[SQL: INSERT INTO stock_indicator_value (stock_id, indicator_code, date, value) VALUES (%(stock_id)s, %(indicator_code)s, %(date)s, %(value)s)]
[parameters: ({'stock_id': 1, 'indicator_code': 'SMA200', 'date': datetime.date(2012, 6, 18), 'value': None}, {'stock_id': 1, 'indicator_code': 'SMA200', 'date': datetime.date(2012, 6, 19), 'value': None}, {'stock_id': 1, 'indicator_code': 'SMA200', 'date': datetime.date(2012, 6, 20), 'value': None}, {'stock_id': 1, 'indicator_code': 'SMA200', 'date': datetime.date(2012, 6, 21), 'value': None}, {'stock_id': 1, 'indicator_code': 'SMA200', 'date': datetime.date(2012, 6, 22), 'value': None}, {'stock_id': 1, 'indicator_code': 'SMA200', 'date': datetime.date(2012, 6, 25), 'value': None}, {'stock_id': 1, 'indicator_code': 'SMA200', 'date': datetime.date(2012, 6, 26), 'value': None}, {'stock_id': 1, 'indicator_code': 'SMA200', 'date': datetime.date(2012, 6, 27), 'value': None}  ... displaying 10 of 3004 total bound parameter sets ...  {'stock_id': 1, 'indicator_code': 'SMA200', 'date': datetime.date(2024, 5, 23), 'value': 2.8596}, {'stock_id': 1, 'indicator_code': 'SMA200', 'date': datetime.date(2024, 5, 24), 'value': 2.85875})]

MegaIng · May 31, 2024, 11:24am

The error is very descriptive

Your table has a uniqueness constraint, meaning you can’t use if_exists='append'. If you are ok with overwriting the original table, use if_exists='replace'. If you aren’t ok with that, you will need to rethink your solution.

But also, to_sql writes the entire table, which isn’t exactly what you want I think.

MegaIng · May 31, 2024, 11:30am

Or with other words: You can’t use to_sql, you would need to manually build your update sql statements.

DevJMA · May 31, 2024, 7:26pm

Thanks, no I do not want to write existing records, only new.

I have tried that also, but do not know how to loop through the data frame and insert data.

Is there a good guide somewhere? I havn’t found any.