Hi Python Community,
I hope you are doing well. I have a quite challenging task to do (or at least for me, as I am novice in Python).
I need some help to extract huge amounts of data from an interactive map on a website and put them into a .csv
file, and would like to know if there is a way to extract all the data from it.
The map in question is here.
- I would like to know if there is a way to extract all the municipalities’ data, please?
I thank you in advance for your help.
Best wishes,
Michael
Edit: Here is what I tried, but in vein, to extract those data into Python data frame. I know that the data that I want are there:
{
"type": "MultiPolygon",
"arcs": [
[
[
-15410,
15526,
-15491,
-15236,
-15379
]
]
],
"properties": {
"NAMEUNIT": "<strong>Municipio: Villabrázaro\u003c/strong><br/>",
"Unitario": "Precio unitario medio : 564 Euros/m<sup>2\u003c/sup><br/>",
"Precio unitario medio del municipio": 564,
"desviacion": "Desviacion tipica: 369<br/>",
"superficie": "Superficie media: 311 m<sup>2\u003c/sup><br/>",
"moda": "Rango de precio mas frecuente: 200-400 Euros/m<sup>2\u003c/sup><br/>",
"poblacion": "Poblacion: 239<br/>",
"renta_persona": "Renta media por persona: 13487 Euros/año<br/>",
"renta_hogar": "Renta media por hogar : 31180 Euros/año<br/>"
}
},
{
"type": "MultiPolygon",
"arcs": [
[
[
-15348,
-15345,
-15267,
-13840,
-13292
]
]
],
"properties": {
"NAMEUNIT": "<strong>Municipio: Villaescusa\u003c/strong><br/>",
"Unitario": "Precio unitario medio : 580 Euros/m<sup>2\u003c/sup><br/>",
"Precio unitario medio del municipio": 580,
"desviacion": "Desviacion tipica: 660<br/>",
"superficie": "Superficie media: 242 m<sup>2\u003c/sup><br/>",
"moda": "Rango de precio mas frecuente: 100-200 Euros/m<sup>2\u003c/sup><br/>",
"poblacion": "Poblacion: 235<br/>",
"renta_persona": "Renta media por persona: 11123 Euros/año<br/>",
"renta_hogar": "Renta media por hogar : 21876 Euros/año<br/>"
}
},
and so on...
I tried the following chunk, but obtained an error that I cannot solve:
import requests
import pandas as pd
# URL of the web page containing the data
url = "https://www.cohispania.com/wp-content/uploads/2024/01/mapa-2023.html"
# Fetch the HTML content
response = requests.get(url)
# Check if the request was successful
if response.status_code == 200:
# Extract JSON data from HTML content
json_data_list = response.json()
# List to store extracted data for each municipality
extracted_data = []
# Iterate over each JSON object representing a municipality
for json_data in json_data_list:
# Extract relevant information from the JSON data
properties = json_data["properties"]
name = properties["NAMEUNIT"].split(":")[1].strip().replace("</strong><br/>", "")
precio_unitario_medio = properties["Precio unitario medio del municipio"]
desviacion = properties["desviacion"].split(":")[1].strip().replace("<br/>", "")
superficie = properties["superficie"].split(":")[1].strip().replace(" m<sup>2</sup><br/>", "")
moda = properties["moda"].split(":")[1].strip().replace("</sup><br/>", "")
poblacion = properties["poblacion"].split(":")[1].strip().replace("<br/>", "")
renta_persona = properties["renta_persona"].split(":")[1].strip().replace(" Euros/año<br/>", "")
renta_hogar = properties["renta_hogar"].split(":")[1].strip().replace(" Euros/año<br/>", "")
# Append extracted data to the list
extracted_data.append({
"Name": name,
"Precio unitario medio": precio_unitario_medio,
"Desviacion": desviacion,
"Superficie": superficie,
"Moda": moda,
"Poblacion": poblacion,
"Renta media por persona": renta_persona,
"Renta media por hogar": renta_hogar
})
# Create DataFrame from the extracted data
df = pd.DataFrame(extracted_data)
# Display the DataFrame
print(df)
else:
print("Failed to fetch data from the web page")
- Could anyone please give me a hand with this?