Which parallelism way should I use with a webframework?

Vanellope · January 10, 2024, 2:46pm

Hi,

I’m using the webframework Flask.

I would like to run something that will not block the main .py

I’ve read -docs.python.org/3/library/threading.html where we can see the following:

… In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). If you want your application to make better use of the computational resources of multi-core machines, you are advised to use multiprocessing or concurrent.futures.ProcessPoolExecutor. However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously.

For those who knows Flask I would like to do something like

#super basic for demostartion only.
import multiprocessing

def SendingEmail():
  #code...

@app.route('/something')
def FlaskSomething():
    sample_process = multiprocessing.Process(target = SendingEmail)
    sample_process.start()
    return 'background process should run ?'

of course I would like also to set a timeout, to kill the new process if it’s take took long…

There are so many ways to do so, that I’m lost I rather use native Python module than installing other package… So what would the best and why ?

something else ?

Rosuav · January 10, 2024, 3:05pm

What kind of parallelism do you actually need, though? Sending an email doesn’t require a huge amount of CPU processing, it’s mostly going to be waiting for the remote server to accept the mail. That means you can use nearly anything. I would recommend looking into either asyncio or threading for this sort of job.

hansgeunsmeyer · January 10, 2024, 4:49pm

As @Rosuav wrote - You have to ask first whether or not (and if so, where) you need any kind of parallellism. That said, if your web API calls do trigger long runnning tasks, you do not want to use blocking synchronous calls. (That is, not if you want to also use that API in an interactive website.) Using multiprocessing inside the APIs, as in your mock example (assuming that SendingEmail would be a slow call – which it generally isn’t) could also be problematic since the user of the API needs a way to check if the background process is still running, whether or not it has finished successfully and whether it perhaps needs to be restarted. You could use multiprocessing, threading or asyncio, but anyway the service also needs to manage all those triggered other tasks. So, you need some kind of task queue. One that integrates very well with Flask or other libraries is Celery (see: Background Tasks with Celery — Flask Documentation (2.3.x)) Nice thing about using celery is that you can write your main API code just as synchronous code and you shift the burden of task management (of those special tasks) all to celery.

If you have long running tasks where input and output can/should be streamed (for instance speech as input, run speech recognition on the server and return a stream of text), you could also consider using a different protocol instead of http, for instance grpc or websockets (see for instance: tornado websockets.

Vanellope · January 11, 2024, 6:24am

Thanks @hansgeunsmeyer , Celery seem great indeed, but the learning curve will be longer than using threading and I’m afraid that for only one call, there is too much ~overhead. But never the less, I plan to test celery when time permit.

hansgeunsmeyer · January 11, 2024, 1:51pm

You are right that for only one call using Celery may be overkill. It’s mainly useful if you want to develop a REST API that needs to support several different long running tasks. And it’s also true that Celery has a rather steep learning curve – because of the curious way their decorators are set up.
But also look at this Using Celery With Flask - miguelgrinberg.com (written by the maker of Flask) - shows that it doesn’t need to be that difficult.

Before actually implementing your API, I would advise to look around a bit and search for “task queues for long running tasks in Python REST APIs”. For instance this page might give some good hints:
Task Queues - Full Stack Python (mentions several alternative tools you could use).