I am using EMR instance and running few pyspark scripts there.
For one of the requirement, I wanted to read an excel file in pyspark script. I have 2 options to read excel file in pyspark,
- Using python packages like openpyxl
- Use spark-excel_2.12-0.13.5.jar file
Can you please suggest which one is the best option and pros and cons for both…
Any difference between using python standard or third party packages in pyspark and spark jar files (ex: spark-excel, spark-xml)
Thanks in advance