This code gives me a list of all databases created `##import required libraries
import pyspark
##create spark session
spark = pyspark.sql.SparkSession
.builder
.appName(“Python Spark SQL basic example”)
.config(‘spark.driver.extraClassPath’, r"C:\Users\Anthony.DESKTOP-ES5HL78\Downloads\postgresql-42.5.4.jar")
.getOrCreate()
##read databases from db using spark jdbc
databases_df = spark.read
.format(“jdbc”)
.option(“url”, “jdbc:postgresql://34.30.48.164:5432/”)
.option(“user”, “postgres”)
.option(“password”, “Musica321”)
.option(“driver”, “org.postgresql.Driver”)
.option(“query”, “SELECT datname FROM pg_database WHERE datistemplate = false”)
.load()
##print the data in databases_df
print(databases_df.show())`
I want to show only one of these databases and the code I am using is `##create spark session
spark = pyspark.sql.SparkSession
.builder
.appName(“Python Spark SQL basic example”)
.config(‘spark.driver.extraClassPath’, r"C:\Users\Anthony.DESKTOP-ES5HL78\Downloads\postgresql-42.5.4.jar")
.getOrCreate()
##read table from db using spark jdbc
Marketing_df = spark.read
.format(“jdbc”)
.option(“url”, “jdbc:postgresql://34.30.48.164:5432/postgres”)
.option(“dbtable”, “postgres”)
.option(“user”, “postgres”)
.option(“password”, “”)
.option(“driver”, “org.postgresql.Driver”)
.load()
##print the data in Marketing_df
print(postgres_df.show())`
I am getting this error: `23/03/13 21:08:56 WARN Shell: Did not find winutils.exe: java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. -see https://wiki.apache.org/hadoop/WindowsProblems
Setting default log level to “WARN”.
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/03/13 21:08:57 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
Traceback (most recent call last):
File “C:\Users\Anthony.DESKTOP-ES5HL78\Documents\Scrapy\projects\m.py”, line 19, in
.load()
File “C:\Users\Anthony.DESKTOP-ES5HL78\AppData\Local\Programs\Python\Python310\lib\site-packages\pyspark\sql\readwriter.py”, line 184, in load
return self._df(self._jreader.load())
File “C:\Users\Anthony.DESKTOP-ES5HL78\AppData\Local\Programs\Python\Python310\lib\site-packages\py4j\java_gateway.py”, line 1321, in call
return_value = get_return_value(
File “C:\Users\Anthony.DESKTOP-ES5HL78\AppData\Local\Programs\Python\Python310\lib\site-packages\pyspark\sql\utils.py”, line 190, in deco
return f(*a, **kw)
File “C:\Users\Anthony.DESKTOP-ES5HL78\AppData\Local\Programs\Python\Python310\lib\site-packages\py4j\protocol.py”, line 326, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o33.load.
: org.postgresql.util.PSQLException: ERROR: relation “order100” does not exist
Position: 30
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2676)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2366)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:356)
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:496)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:413)
at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:190)
at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:134)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.getQueryOutputSchema(JDBCRDD.scala:68)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:242)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:37)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171)
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
at java.base/java.lang.reflect.Method.invoke(Method.java:578)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:1589)
PS C:\Users\Anthony.DESKTOP-ES5HL78\Documents\Scrapy\projects> SUCCESS: The process with PID 19224 (child process of PID 12356) has been terminated.
SUCCESS: The process with PID 12356 (child process of PID 31008) has been terminated.
SUCCESS: The process with PID 31008 (child process of PID 26912) has been terminated.
python m.py
23/03/13 21:10:16 WARN Shell: Did not find winutils.exe: java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. -see https://wiki.apache.org/hadoop/WindowsProblems
Setting default log level to “WARN”.
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/03/13 21:10:16 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
Traceback (most recent call last):
File “C:\Users\Anthony.DESKTOP-ES5HL78\Documents\Scrapy\projects\m.py”, line 19, in
.load()
File “C:\Users\Anthony.DESKTOP-ES5HL78\AppData\Local\Programs\Python\Python310\lib\site-packages\pyspark\sql\readwriter.py”, line 184, in load
return self._df(self._jreader.load())
File “C:\Users\Anthony.DESKTOP-ES5HL78\AppData\Local\Programs\Python\Python310\lib\site-packages\py4j\java_gateway.py”, line 1321, in call
return_value = get_return_value(
File “C:\Users\Anthony.DESKTOP-ES5HL78\AppData\Local\Programs\Python\Python310\lib\site-packages\pyspark\sql\utils.py”, line 190, in deco
return f(*a, **kw)
File “C:\Users\Anthony.DESKTOP-ES5HL78\AppData\Local\Programs\Python\Python310\lib\site-packages\py4j\protocol.py”, line 326, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o33.load.
: org.postgresql.util.PSQLException: ERROR: relation “order100” does not exist
Position: 30
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2676)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2366)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:356)
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:496)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:413)
at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:190)
at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:134)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.getQueryOutputSchema(JDBCRDD.scala:68)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:242)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:37)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171)
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
at java.base/java.lang.reflect.Method.invoke(Method.java:578)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:1589)
PS C:\Users\Anthony.DESKTOP-ES5HL78\Documents\Scrapy\projects> SUCCESS: The process with PID 20472 (child process of PID 30268) has been terminated.
SUCCESS: The process with PID 30268 (child process of PID 21920) has been terminated.
SUCCESS: The process with PID 21920 (child process of PID 28104) has been terminated.
What am I doing wrong with the second code?`