Cant connect to specific database

yrstruly · March 13, 2023, 7:11pm

This code gives me a list of all databases created `##import required libraries
import pyspark

##create spark session
spark = pyspark.sql.SparkSession
.builder
.appName(“Python Spark SQL basic example”)
.config(‘spark.driver.extraClassPath’, r"C:\Users\Anthony.DESKTOP-ES5HL78\Downloads\postgresql-42.5.4.jar")
.getOrCreate()

##read databases from db using spark jdbc
databases_df = spark.read
.format(“jdbc”)
.option(“url”, “jdbc:postgresql://34.30.48.164:5432/”)
.option(“user”, “postgres”)
.option(“password”, “Musica321”)
.option(“driver”, “org.postgresql.Driver”)
.option(“query”, “SELECT datname FROM pg_database WHERE datistemplate = false”)
.load()

##print the data in databases_df
print(databases_df.show())`

I want to show only one of these databases and the code I am using is `##create spark session
spark = pyspark.sql.SparkSession
.builder
.appName(“Python Spark SQL basic example”)
.config(‘spark.driver.extraClassPath’, r"C:\Users\Anthony.DESKTOP-ES5HL78\Downloads\postgresql-42.5.4.jar")
.getOrCreate()

##read table from db using spark jdbc
Marketing_df = spark.read
.format(“jdbc”)
.option(“url”, “jdbc:postgresql://34.30.48.164:5432/postgres”)
.option(“dbtable”, “postgres”)
.option(“user”, “postgres”)
.option(“password”, “”)
.option(“driver”, “org.postgresql.Driver”)
.load()

##print the data in Marketing_df
print(postgres_df.show())`

I am getting this error: `23/03/13 21:08:56 WARN Shell: Did not find winutils.exe: java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. -see https://wiki.apache.org/hadoop/WindowsProblems
Setting default log level to “WARN”.
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/03/13 21:08:57 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
Traceback (most recent call last):
File “C:\Users\Anthony.DESKTOP-ES5HL78\Documents\Scrapy\projects\m.py”, line 19, in
.load()
File “C:\Users\Anthony.DESKTOP-ES5HL78\AppData\Local\Programs\Python\Python310\lib\site-packages\pyspark\sql\readwriter.py”, line 184, in load
return self._df(self._jreader.load())
File “C:\Users\Anthony.DESKTOP-ES5HL78\AppData\Local\Programs\Python\Python310\lib\site-packages\py4j\java_gateway.py”, line 1321, in call
return_value = get_return_value(
File “C:\Users\Anthony.DESKTOP-ES5HL78\AppData\Local\Programs\Python\Python310\lib\site-packages\pyspark\sql\utils.py”, line 190, in deco
return f(*a, **kw)
File “C:\Users\Anthony.DESKTOP-ES5HL78\AppData\Local\Programs\Python\Python310\lib\site-packages\py4j\protocol.py”, line 326, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o33.load.
: org.postgresql.util.PSQLException: ERROR: relation “order100” does not exist
Position: 30
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2676)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2366)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:356)
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:496)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:413)
at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:190)
at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:134)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.getQueryOutputSchema(JDBCRDD.scala:68)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:242)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:37)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171)
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
at java.base/java.lang.reflect.Method.invoke(Method.java:578)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:1589)

PS C:\Users\Anthony.DESKTOP-ES5HL78\Documents\Scrapy\projects> SUCCESS: The process with PID 19224 (child process of PID 12356) has been terminated.
SUCCESS: The process with PID 12356 (child process of PID 31008) has been terminated.
SUCCESS: The process with PID 31008 (child process of PID 26912) has been terminated.
python m.py
23/03/13 21:10:16 WARN Shell: Did not find winutils.exe: java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. -see https://wiki.apache.org/hadoop/WindowsProblems
Setting default log level to “WARN”.
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/03/13 21:10:16 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
Traceback (most recent call last):
File “C:\Users\Anthony.DESKTOP-ES5HL78\Documents\Scrapy\projects\m.py”, line 19, in
.load()
File “C:\Users\Anthony.DESKTOP-ES5HL78\AppData\Local\Programs\Python\Python310\lib\site-packages\pyspark\sql\readwriter.py”, line 184, in load
return self._df(self._jreader.load())
File “C:\Users\Anthony.DESKTOP-ES5HL78\AppData\Local\Programs\Python\Python310\lib\site-packages\py4j\java_gateway.py”, line 1321, in call
return_value = get_return_value(
File “C:\Users\Anthony.DESKTOP-ES5HL78\AppData\Local\Programs\Python\Python310\lib\site-packages\pyspark\sql\utils.py”, line 190, in deco
return f(*a, **kw)
File “C:\Users\Anthony.DESKTOP-ES5HL78\AppData\Local\Programs\Python\Python310\lib\site-packages\py4j\protocol.py”, line 326, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o33.load.
: org.postgresql.util.PSQLException: ERROR: relation “order100” does not exist
Position: 30
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2676)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2366)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:356)
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:496)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:413)
at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:190)
at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:134)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.getQueryOutputSchema(JDBCRDD.scala:68)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:242)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:37)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171)
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
at java.base/java.lang.reflect.Method.invoke(Method.java:578)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:1589)

PS C:\Users\Anthony.DESKTOP-ES5HL78\Documents\Scrapy\projects> SUCCESS: The process with PID 20472 (child process of PID 30268) has been terminated.
SUCCESS: The process with PID 30268 (child process of PID 21920) has been terminated.
SUCCESS: The process with PID 21920 (child process of PID 28104) has been terminated.
What am I doing wrong with the second code?`

cameron · March 13, 2023, 8:18pm

Before we get to the problem, please always enclose code between lines
of triple backticks, like this:

 ```python
 your code
 goes here
 ```

This preserves important formatting and punctuation.

And your should really avoid including specific publicly addressable IP
addresses and credentials such as usernames and password.

That said, to your problem. The exception you recite says:

 py4j.protocol.Py4JJavaError: An error occurred while calling o33.load.org.postgresql.util.PSQLException: ERROR: relation "order100" does not exist
   Position: 30

This says that something’s trying to access (eg SELECT) from a
database table (relation) called order100, which is a name I do no see
in the code.

It isn’t clear to me exactly which .load() is triggering this because
you’ve got a few bits of code cited here but it is from line 19 of your
m.py source file.

Is the code you recite definitely what was run to get that traceback?

Can you run the same SELECT query from a native PostgreSQL client line
psql talking to your database?

Cheers,
Cameron Simpson cs@cskk.id.au

yrstruly · March 14, 2023, 11:40am

I have amended the details and still are getting an error:

‘’'# import required libraries
import pyspark

create spark session

spark = pyspark.sql.SparkSession
.builder
.appName(“Python Spark SQL”)
.config(‘spark.driver.extraClassPath’, r"C:\Users\Anthony.DESKTOP-ES5HL78\Downloads\postgresql-42.5.4.jar")
.getOrCreate()

read table from db using spark jdbc

Marketing_df = spark.read
.format(“jdbc”)
.option(“url”, “jdbc:postgresql://34…164:5432/Marketing”)
.option(“dbtable”, “PostgreSQLDestination”)
.option(“user”, “postgres”)
.option(“password”, “”)
.option(“driver”, “org.postgresql.Driver”)
.load()

show the contents of the Marketing_df

Marketing_df.show()
‘’’
Output is ‘’'23/03/14 13:37:31 WARN Shell: Did not find winutils.exe: java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. -see https://wiki.apache.org/hadoop/WindowsProblems
Setting default log level to “WARN”.
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/03/14 13:37:32 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
Traceback (most recent call last):
File “C:\Users\Anthony.DESKTOP-ES5HL78\Documents\Scrapy\projects\m.py”, line 19, in
.load()
File “C:\Users\Anthony.DESKTOP-ES5HL78\AppData\Local\Programs\Python\Python310\lib\site-packages\pyspark\sql\readwriter.py”, line 184, in load
return self._df(self._jreader.load())
File “C:\Users\Anthony.DESKTOP-ES5HL78\AppData\Local\Programs\Python\Python310\lib\site-packages\py4j\java_gateway.py”, line 1321, in call
return_value = get_return_value(
File “C:\Users\Anthony.DESKTOP-ES5HL78\AppData\Local\Programs\Python\Python310\lib\site-packages\pyspark\sql\utils.py”, line 190, in deco
return f(*a, **kw)
File “C:\Users\Anthony.DESKTOP-ES5HL78\AppData\Local\Programs\Python\Python310\lib\site-packages\py4j\protocol.py”, line 326, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o33.load.
: org.postgresql.util.PSQLException: ERROR: relation “postgresqldestination” does not exist
Position: 15
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2676)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2366)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:356)
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:496)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:413)
at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:190)
at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:134)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.getQueryOutputSchema(JDBCRDD.scala:68)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:242)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:37)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171)
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
at java.base/java.lang.reflect.Method.invoke(Method.java:578)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:1589)

PS C:\Users\Anthony.DESKTOP-ES5HL78\Documents\Scrapy\projects> SUCCESS: The process with PID 33032 (child process of
PID 15976) has been terminated.
SUCCESS: The process with PID 15976 (child process of PID 7608) has been terminated.
SUCCESS: The process with PID 7608 (child process of PID 18756) has been terminated.
‘’’

Database with table does exist:

cameron · March 14, 2023, 11:53pm

This needs to be a triple backtick, not a triple single quote. On my
keyboard that’s at top left below the tilde (~) symbol. There’s a
</> button in the compose window on the web forum to start one of
these sections for you.

I’m assuming the .load() below is line 19 of your m.py file:

 # read table from db using spark jdbc
 Marketing_df = spark.read \
     .format("jdbc") \
     .option("url", "jdbc:postgresql://34..164:5432/Marketing") \
     .option("dbtable", "PostgreSQLDestination") \
     .option("user", "postgres") \
     .option("password", "") \
     .option("driver", "org.postgresql.Driver") \
     .load()

Superficially this looks ok according to various examples on the web and
your screenshot suggests that the table is indeed there under the names
you’re using.

I’m not a pyspark person myself and I think you might need to ask in a
pyspark specific forum. Have you tried anywhere listed here?
https://spark.apache.org/community.html
Make sure you mention in the subject line that you’re accessing
postgresql, since your primary problem seems to be specifying the tables
at this point.

Cheers,
Cameron Simpson cs@cskk.id.au