I have written a code to do pagination of REST API and write it to a folder path. The code is not throwing any error but running for hours and not getting any output written. After running so long the notebook stops with an internal error.
Any help on what i am doing wrong or how the code can be improved is much appreciated.
Below my code
getURL = 'https://api.xxx.com/v3/direct-access/abc'
baseURL = 'https://api.xxx.com/v3/direct-access'
headers = {
"accept" : "application/json",
"Content-Type": "application/json",
"Authorization": "Bearer " + str(token)
}
results = []
response = requests.get(getURL, headers=headers)
r = response.json()
for i in r:
results.append(i)
while response.links.get('next'):
response = requests.get(baseURL + response.links['next']['url'],headers=headers)
r1 = response.json()
for i in response:
results.append(i)
##assert len(results) == requests.get(getURL[:-6]).json()
return results
rdd = spark.sparkContext.parallelize((results))
print(rdd)
df = spark.read.option('multiline','true').json(rdd)
df.repartition(1).write.json(stagingpath,mode="overwrite")
To add, your code shows an error message at the end. Does that mean that you copied the code after running it in an interactive IPython session and if so, could that be an indication of what goes wrong (no idea what code -9 means)?
I have fixed the errors and the code is working, modifed the code. But the issue i am facing is in the return results below. This runs for hours and don’t give any results. Any idea how i can print the results to a path?
this is what it says in the document for terminating the code when it reaches the end of pages. How to incorporate this?
When paginating through your request if an empty set is returned, this indicates the end of your
requested dataset. When you receive the first empty set, your process has successfully completed,
and you may exit your program