Hi Everyone,
I have a quick question. Thanks to a previous post :
Python: Extract Data from an Interactive Map on the Web, with Several Years, into a CSV file
I was been able to extract JSON data from the web. Thanks again to @FelixLeg & @kknechtel to their useful help and advices. This could not be possible without their insightful comments. Many thanks to them again.
But now, I would like to extract those data into a CSV file, without taking into account the specific HTML jargon (for example <br> / </br>
, etc.) that I have in the original JSON file.
I would like to have for each column the variable name (Precio unitario medio, Desviacion tipica, Superficie media, Rango de precios frequente, Poblacion, Renta media por persona and Renta media por hogar). As in the original map.
I tried the following chunk with pandas
library, but obtain a systematic error:
import pandas as pd
with open('result.json', encoding='utf-8') as inputfile:
df = pd.read_json(inputfile)
df.to_csv('csvfile.csv', encoding='utf-8', index=False)
Here is the error I obtained:
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
Cell In[20], line 4
1 import pandas as pd
3 with open('result.json', encoding='utf-8') as inputfile:
----> 4 df = pd.read_json(inputfile)
6 df.to_csv('csvfile.csv', encoding='utf-8', index=False)
File ~\AppData\Local\anaconda3\Lib\site-packages\pandas\io\json\_json.py:760, in read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates, keep_default_dates, precise_float, date_unit, encoding, encoding_errors, lines, chunksize, compression, nrows, storage_options, dtype_backend, engine)
757 if convert_axes is None and orient != "table":
758 convert_axes = True
--> 760 json_reader = JsonReader(
761 path_or_buf,
762 orient=orient,
763 typ=typ,
764 dtype=dtype,
765 convert_axes=convert_axes,
766 convert_dates=convert_dates,
767 keep_default_dates=keep_default_dates,
768 precise_float=precise_float,
769 date_unit=date_unit,
770 encoding=encoding,
771 lines=lines,
772 chunksize=chunksize,
773 compression=compression,
774 nrows=nrows,
775 storage_options=storage_options,
776 encoding_errors=encoding_errors,
777 dtype_backend=dtype_backend,
778 engine=engine,
779 )
781 if chunksize:
782 return json_reader
File ~\AppData\Local\anaconda3\Lib\site-packages\pandas\io\json\_json.py:862, in JsonReader.__init__(self, filepath_or_buffer, orient, typ, dtype, convert_axes, convert_dates, keep_default_dates, precise_float, date_unit, encoding, lines, chunksize, compression, nrows, storage_options, encoding_errors, dtype_backend, engine)
860 elif self.engine == "ujson":
861 data = self._get_data_from_filepath(filepath_or_buffer)
--> 862 self.data = self._preprocess_data(data)
File ~\AppData\Local\anaconda3\Lib\site-packages\pandas\io\json\_json.py:874, in JsonReader._preprocess_data(self, data)
872 if hasattr(data, "read") and not (self.chunksize or self.nrows):
873 with self:
--> 874 data = data.read()
875 if not hasattr(data, "read") and (self.chunksize or self.nrows):
876 data = StringIO(data)
File <frozen codecs>:322, in decode(self, input, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 552: invalid continuation byte
Does anyone has an idea how can I circumvent this problem, please?
Thank you very much in advance!