I am exporting large dataset from ‘/export’ endpoint of powerpi? For downloading and writing into .pbix file around 800MB it is taking more than 5hrs. How can we reduce it like in minutes?
with requests.get(url, headers=headers, params=params, stream=True) as response:
response.raise_for_status()
with open(pbix_fileName, 'wb') as report_file:
for chunk in response.iter_content(chunk_size=1024 **2): # 1MB chunks
if chunk:
report_file.write(chunk)
report_file.flush()
os.fsync(report_file.fileno())
print("success!")
How rapidly can you do the same thing with curl or wget on the same machine? That’s the lower bound for how quickly this operation can be done, so you need to measure that first.
@AndersMunch I just tried it. For downloading 6MB data it took around 4mins.
with requests.get(url, headers=headers, params=params, stream=True) as response:
response.raise_for_status()
data = b''
with open(pbix_fileName, 'wb') as report_file:
for chunk in response.iter_content(chunk_size=1024 *8*10): # 1MB chunks
if chunk:
report_file.write(chunk)
# report_file.flush()
os.fsync(report_file.fileno())
return "success"
As noted above the use of ‘flush’ and ‘fsync’ will drastically reduce performance, and they should not be necessary. Beyond that, if you still can’t obtain the performance you like, you’ll need to consider one of the ‘async’ HTTP client libraries like requests-threads, httpx, etc. Using those will allow you to overlap reading from the HTTP server and writing to the local file.
@kpfleming Thank you for your quick response. Yes I have tried by removing flush and fsync but still it is taking time. like for downloading 6MB it is taking around 4mins.
@kpfleming I just tried
pbix_fileName = Path(f’{save_path}/{filename}.pbix’)
async def main():
with open(pbix_fileName, 'wb') as f:
headers = {'Authorization': f'Bearer {access_token}'}
async with httpx.AsyncClient() as client:
async with client.stream('GET', url, headers=headers, timeout=None) as r:
async for chunk in r.aiter_bytes():
f.write(chunk)
The code is fine. I can download a random file at 10MB/s using basically the same code. Whatever is slowing you down is specific to your computer or the server you are downloading from.
@AndersMunch But if you are saying about server then I should not be download using powershell as well. Because Im using same machine and same server to downlaod.
Not necessarily. There server could be saying “I recognise powershell, I
will not limit the rate”, but with your script it goes “I have no idea
what this bot is, so I will throttle it”.