Coverage is not generating for celery workers when using prefork

kiran.py · March 13, 2024, 2:14pm

I am trying to generate coverage for the code that is being executed by celery workers.
Coverage is generated when I am setting the pool as a solo. But When I am not setting anything (leaving default to prefork) Coverage is not generating. My issue is something similar to the one mentioned below.

github.com/celery/celery

Coverage data is not captured for running celery processes when using prefork POOL_CLS

opened 06:13PM - 31 Aug 16 UTC

closed 04:43PM - 06 Jul 19 UTC

elyezer

Component: Prefork Workers Pool

I have a functional/integration test suite which I run against a running Celery …+ Django project and I wanted to capture coverage data. I've followed all the information needed to capture coverage data from subprocess but since Celery uses a fork from multiprocessing the coverage data is not being captured. The above is not true when using `solo` or `threads` as `POOL_CLS` (`-P` command line option). When running using a different pool class the coverage data is being captured without any problem. To clarify, by capturing coverage data I mean coverage information for the body of the Celery tasks. With that said when running in `prefork` mode, all but Celery tasks coverage data is captured. This can be easily reproduced by following these steps: Create a directory and create a `myapp.py` Python module with the same contents as https://github.com/celery/celery/blob/3.1/examples/app/myapp.py. Create a virtual environment and install celery and coverage (I am using virtualenv-wrapper): ``` console mkvirtualenv -p python3 celery pip install celery==3.1 coverage threadpool ``` Then configure create a `sitecustomize.py` with the following contents on the corresponding Python site-packages: ``` python import coverage coverage.process_startup() ``` In my case I did: ``` console cat > ~/.virtualenvs/celery/lib/python3.4/site-packages/sitecustomize.py << EOF import coverage coverage.process_startup() EOF ``` Next create a `.coveragerc` file: ``` console cat > .coveragerc << EOF [run] include=myapp source=. concurrency= thread multiprocessing EOF ``` And export the `COVERAGE_PROCESS_START` with the path of the `.coveragerc` file, in my case it was located on the same directory: ``` console export COVERAGE_PROCESS_START=.coveragerc ``` With all the above steps we are now set to check the the behavior using the different `POOL_CLS` options. First let's start with the default `prefork`: Run celery on one terminal window: ``` console celery -A myapp worker -l info ``` Then on a second window, open a python shell session and execute the `add` task: ``` console $ python >>> from myapp import add >>> add.delay(2, 3) ``` After that, go back to the first window and press Ctrl+c to finish the celery process. If everything were setup properly some `.coverage.*` files should have been created. Here is the contents I got for all the coverage data files: ``` !coverage.py: This is a private format, don't read it directly!{"lines": {"/home/elyezer/code/celery/examples/app/myapp.py": [35, 39, 24, 25, 27, 28, 29]}} ``` As you can see the line 37 is not listed there, this is the line for the add task function body. As we can see the coverage data was not captured even though the task was executed. Next step is to check how coverage data capturing works with `solo` or `threads`, on my local tests they behaved the same. For this we will use just `solo` since it does not require any additional package to be installed. Make sure to clean up the coverage data files before proceeding: ``` console rm .coverage.* ``` Now we can run celery using `solo` as the `POOL_CLS`, on the first terminal window: ``` console celery -A myapp worker -l info -P solo ``` Repeat the task call on the second terminal window and exit the celery process on the first window. Then check for the generated coverage data files, on my local test I got: ``` !coverage.py: This is a private format, don't read it directly!{"lines": {"/home/elyezer/code/celery/examples/app/myapp.py": [35, 37, 39, 24, 25, 27, 28, 29]}} ``` Now the line 37 is included as expected. The documentation about how to capture coverage data from subprocess can be found at [1]. Also coverage.py allows us to specify a concurrency library, multiprocessing is one of the valid values and coverage.py apply some patches [2] to the multiprocessing library when it is specified as one of the concurrency libraries. I saw that Billiard is setting some information on the Python built-in multiprocessing library but it seem that it is missing something. I wish I could capture the coverage information on a running celery process no matter the `POOL_CLS` used. [1] http://coverage.readthedocs.io/en/coverage-4.2/subprocess.html [2] https://bitbucket.org/ned/coveragepy/src/257e52793fb0f28853ddca679f67b158107262bf/coverage/multiproc.py

@nedbat Can you please suggest some solution for this issue?

nedbat · March 13, 2024, 4:12pm

This issue seems relevant: Coverage.py not recording code executed asynchronously by Celery (For Flask) · Issue #689 · nedbat/coveragepy · GitHub Does the fix suggested there work for you?

Topic		Replies	Views
Celery Alternative for Windows Python Help documentation , help	0	161	April 18, 2024
Celery task runs about 50% of the time, doesn't even start the other 50% Python Help	9	1296	September 26, 2023
Python coverage for gstreamer threads Python Help	0	269	February 6, 2023
Task queues in K8s environment Python Help	2	440	April 30, 2021
Propose solution for https://bugs.python.org/issue47249 Core Development	3	517	July 8, 2023

Coverage is not generating for celery workers when using prefork

Related Topics