Easy connect to Postgresql!

c-rob · March 28, 2024, 10:34am

Question is below. I just wanted to say how easy it was to get records from Postgresql. From the time I started searching pages, to the time I got results from actual code, was about 20 minutes. That’s a record!

I used import psycopg2. I hope that’s a good one to use, i.e. is it updated often enough?

EDIT:

56,000 records returned in 1.65 seconds.
There will be times when I have to return 100,000+ records or more and process each one by adding up data.

JamesParrott · March 28, 2024, 10:41am

psycopg2 is the industry standard library for Postgres.

But it’s not a bad idea to take a look at Psycopg 3 which they’re moving towards.

barry-scott · March 28, 2024, 11:20am

For simple stuff psycopg is great, however there is a great library if your needs become more complex SQL Alchemy - https://www.sqlalchemy.org/

c-rob · March 28, 2024, 11:26am

Python noob here. At what point should I use SQL Alchemy for Python? We have a case where Power BI cannot handle the huge number of records we select, it just runs out of memory on our VM (hosted by Microsoft). If I can rewrite that report in Python the exec will be thrilled.

Our temporary solution is I told the exec to select a smaller date range.

JamesParrott · March 28, 2024, 12:11pm

At what point should I use SQL Alchemy for Python?

As soon as you’re running queries based on arbitrary user input, or you’re just plain sick of writing SQL and prefer an ORM instead.

You should try to write queries that return the minimum amount of data, not the entire record. An SQL level library gives you more control over that, but I’d try SQL ALchemy first myself (I mainly use Psycopg2 via Django, only using it directly for creating roles for non-Django services).

Then you could use Python to selecting the records in blocks that don’t crash the VM, extract write what you need, and save/append that where your execs want it.

Or if it’s an Azure VM, I’m sure Microsoft are able to charge you a little bit more, for a bigger VM with more memory.

Rosuav · March 28, 2024, 12:12pm

psycopg2 is an excellent choice. SQLAlchemy (or any other ORM) is a completely different way of thinking about your data; instead of performing queries, you would work with objects in memory. I’d recommend first working with SQL and doing queries, but then afterwards, learning about how SQLAlchemy might make your life easier (it also might not, depending on exactly what your workload is, hence it’s better to learn SQL first).

I don’t have any experience with psycopg3 yet but based on what I’m seeing, it looks like they’re following best practices in ways that are slightly backward incompatible with psycopg2, but will be better long-term. I’ll probably look into using it for my next project.

barry-scott · March 28, 2024, 4:38pm

Seems like you are loading all the records into memory in one go.
If the processing can work 1 record at a time then do that.
All DBs supports this mode with their cursors.

sr-murthy · March 28, 2024, 5:16pm

We have a case where Power BI cannot handle the huge number of records we select, it just runs out of memory on our VM (hosted by Microsoft).

Have you tried Tableau? The data analysts in my previous team seemed quite satisfied with Tableau, although I’m not an analyst.

One nice thing about SQLAlchemy is the expression language API - this allows you to define queries as DB-independent objects, using classes in the expressions API.

This way, you can let SQLAlchemy translate the query definitions into whatever dialect your target DB speaks. So if you change your DB backend, e.g. PostgreSQL to something else (MySQL, SQL Server etc.) your query definitions will still be usable. Also, SQLAlchemy can’t be compared with psycopg2 because that is a DB driver.