What would be a good way to insert one to many into a relational DB using Pandas/SQLAlchemy?

I have a problem I am trying to solve / come up with a good method to solve.

I have a project that has a PostgreSQL back-end. I am building a package to do some data work to help do some initial tasks and some basic analysis to get a baseline in the beginning.

I have four tables. (names are simplified for post clarity)

  • body – has an FK to category
  • body_tag – has an FK to tag and an FK to body
  • category
  • tag

The body_tag table has a one-to-many to body. It has the columns - id, body_id, tag_id.

As I said above, each body row can have multiple tags. So body_tag will end up having data that looks like;

id - body_id - tag_id
1 - 2 - 3
2 - 2 - 6
3 - 2 - 8
4 - 3 - 3
5 - 3 - 7

I have an xlsx file and I am using to_sql() to insert four rows into the body table. This works very well. My problem is - I need a way to insert the body_tag records.

My xlsx file has four columns that go into the body table. My initial thought was to - add a fifth column called tags and make it a list. Then just iterate over the list and insert each value. However, Pandas doesn’t work well with lists in columns. And that isn’t really how Pandas is aimed at working. This is a fantastic read btw - Dealing with List Values in Pandas Dataframes | by Max Hilsdorf | Towards Data Science.

My second thought was to use something like the following.

xls = pd.ExcelFile('path_to_file.xls')
df1 = pd.read_excel(xls, 'Sheet1')
df2 = pd.read_excel(xls, 'Sheet2')

Sheet1 being the body data and Sheet2 being the tag data. Then each row lining up. So Row 1-Sheet 1 matching Row 1 - Sheet 2.

Then building a loop that basically does;

  1. Insert body row from Sheet 1
  2. Grab the latest body id using something like,
    pd.read_sql_query('select COALESCE(max(id), 0) as id from table', con=cnx)
  3. Then grab the corresponding row from Sheet 2 and insert into tags each into their own row.

The second method would mean I wouldn’t have to deal with lists. I could just have as many columns as I did tags.

Can I get anyone to weigh in with their thoughts on a good approach here?

Again, my goal is -

Insert records into the body table and insert the corresponding tags into the body_tag table. Where body_tag is a one-to-many to body.

Thanks for reading.

EDIT

I was massively over-complicating this. I guess I was just tired. Took a nap and it hit me - just add several columns at the end and grab those columns in their own dataframe. I am going to write something with the SQLAlchemy ORM to streamline it a bit better.

I have a django app and I just modified the admin form to have all of this in one form and it works brilliantly. But I have to do it one by one. So I just looked at how it was doing it behind the scenes and got some ideas.

Thanks a lot to all of those that read.

1 Like