I’m doing some tests using a Python Jupyter Notebook on Visual Code to connect pyspark to my localhost PostgreSQL, running as a Docker container.
'''
from pyspark.sql import SparkSession
# create a spark instance
spark = SparkSession.builder \
.appName("ETL_PostgreSQL") \
.config("spark.master", "local") \
.config("spark.jars.packages", "org.postgresql:postgresql:42.5.4") \
.getOrCreate()
# Source PostgreSQL database connection settings
source_url = "jdbc:postgresql://localhost:5430/chinook"
source_properties = {
"user": "root",
"password": "root",
"driver": "org.postgresql.Driver"
}
table_df = spark.read.jdbc(url=source_url, table="genre", properties=source_properties)
table_df.show()
spark.stop()
'''
I get the following error on the spark.read command:
"…
Py4JJavaError: An error occurred while calling o1946.jdbc.
: java.lang.ClassNotFoundException: org.postgresql.Driver
…
"
Can you help me please?