Error file not found

i was trying to install pyspark in my computer and while testing it this error pops out:
File “c:\Users\parsb\Downloads\DATA SCIENCE\Pedro Alvaro Rios Suxo (Big Data)\pruebaSpark.py”, line 3, in
spark = SparkSession.builder.master(“local”).appName(“PySpark Installation Test”).getOrCreate()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “C:\Users\parsb\AppData\Roaming\Python\Python312\site-packages\pyspark\sql\session.py”, line 497, in getOrCreate
sc = SparkContext.getOrCreate(sparkConf)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “C:\Users\parsb\AppData\Roaming\Python\Python312\site-packages\pyspark\context.py”, line 515, in getOrCreate
SparkContext(conf=conf or SparkConf())
File “C:\Users\parsb\AppData\Roaming\Python\Python312\site-packages\pyspark\context.py”, line 201, in init
SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
File “C:\Users\parsb\AppData\Roaming\Python\Python312\site-packages\pyspark\context.py”, line 436, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
^^^^^^^^^^^^^^^^^^^^
File “C:\Users\parsb\AppData\Roaming\Python\Python312\site-packages\pyspark\java_gateway.py”, line 100, in launch_gateway
proc = Popen(command, **popen_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “C:\Program Files\Python312\Lib\subprocess.py”, line 1026, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File “C:\Program Files\Python312\Lib\subprocess.py”, line 1538, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [WinError 2] El sistema no puede encontrar el archivo especificado

and this is my code:
from pyspark.sql import SparkSession

spark = SparkSession.builder.master(“local”).appName(“PySpark Installation Test”).getOrCreate()

df = spark.createDataFrame([(1, “Hello”), (2, “World”)], [“id”, “message”])

df.show()

You might not have the pyspark module package installed on your computer. psyspark does not intall with Python. You have to install it manually using pip.

To get started:

  1. Open the Command Prompt window by typing CMD in the search widget on the bottom of the computer panel window.
  2. Once opened, type: pip install pyspark or pip install pyspark.sql (I am not familiar with the package so try them in that order).
  3. Retry your code after installing pyspark module package.

If that was the case, it would’ve raised an ImportError, but the traceback shows that it’s failing on the next line.

1 Like

Yes, I actually thought about that (since I have experienced that very same error myself a few times). But, to start the debugging at ground zero so to speak, especially since his script is only 3 lines (not counting the importing package line), I figured the best thing to do is to make sure that the packages were installed - at least to get the latest and greatest (if he had a previous version, then best to get the latest version).

This is kind of a sanity check too; i.e., analogous to toggling an I/O to start and taking it from there.

i already installed pyspark but the error still persists
it says : The system can not find the path specified.

Is there a non-Python executable that is also needed? I’m not familiar with pyspark but it looks from the traceback like it’s an interface to an external program, and that program is what can’t be found.

There is. It’s an interface to Apache Spark, written in Java.

Note that PySpark requires Java 8 or later with JAVA_HOME properly set. If using JDK 11, set -Dio.netty.tryReflectionSetAccessible=true for Arrow related features and refer to Downloading.

the variables are like this:


i don`t know if this is correct

JAVA_HOME should not have a ; at the end; it’s a single path, not a list of paths the way PATH is. And of course, you should make sure that you actually have a folder with that name, and that there’s a Java installation in there.

You might also have to check what’s in PATH. We can’t see all of it in this screenshot.

Please also see:

the java path i think is okay, the problem is with the pyspark

No, you do need to fix the environment variables. The problem is that pyspark will try to run a Java command in a new process. You get a “file not found” error when the environment variables are wrong, because Windows needs them in order to find out where Java is in order to run the command. The “file” is Java itself.