kiran.py
(kiran palika)
March 13, 2024, 2:14pm
1
I am trying to generate coverage for the code that is being executed by celery workers.
Coverage is generated when I am setting the pool as a solo. But When I am not setting anything (leaving default to prefork) Coverage is not generating. My issue is something similar to the one mentioned below.
opened 06:13PM - 31 Aug 16 UTC
closed 04:43PM - 06 Jul 19 UTC
Component: Prefork Workers Pool
I have a functional/integration test suite which I run against a running Celery … + Django project and I wanted to capture coverage data. I've followed all the information needed to capture coverage data from subprocess but since Celery uses a fork from multiprocessing the coverage data is not being captured.
The above is not true when using `solo` or `threads` as `POOL_CLS` (`-P` command line option). When running using a different pool class the coverage data is being captured without any problem.
To clarify, by capturing coverage data I mean coverage information for the body of the Celery tasks. With that said when running in `prefork` mode, all but Celery tasks coverage data is captured.
This can be easily reproduced by following these steps:
Create a directory and create a `myapp.py` Python module with the same contents as https://github.com/celery/celery/blob/3.1/examples/app/myapp.py.
Create a virtual environment and install celery and coverage (I am using virtualenv-wrapper):
``` console
mkvirtualenv -p python3 celery
pip install celery==3.1 coverage threadpool
```
Then configure create a `sitecustomize.py` with the following contents on the corresponding Python site-packages:
``` python
import coverage
coverage.process_startup()
```
In my case I did:
``` console
cat > ~/.virtualenvs/celery/lib/python3.4/site-packages/sitecustomize.py << EOF
import coverage
coverage.process_startup()
EOF
```
Next create a `.coveragerc` file:
``` console
cat > .coveragerc << EOF
[run]
include=myapp
source=.
concurrency=
thread
multiprocessing
EOF
```
And export the `COVERAGE_PROCESS_START` with the path of the `.coveragerc` file, in my case it was located on the same directory:
``` console
export COVERAGE_PROCESS_START=.coveragerc
```
With all the above steps we are now set to check the the behavior using the different `POOL_CLS` options. First let's start with the default `prefork`:
Run celery on one terminal window:
``` console
celery -A myapp worker -l info
```
Then on a second window, open a python shell session and execute the `add` task:
``` console
$ python
>>> from myapp import add
>>> add.delay(2, 3)
```
After that, go back to the first window and press Ctrl+c to finish the celery process. If everything were setup properly some `.coverage.*` files should have been created. Here is the contents I got for all the coverage data files:
```
!coverage.py: This is a private format, don't read it directly!{"lines": {"/home/elyezer/code/celery/examples/app/myapp.py": [35, 39, 24, 25, 27, 28, 29]}}
```
As you can see the line 37 is not listed there, this is the line for the add task function body. As we can see the coverage data was not captured even though the task was executed.
Next step is to check how coverage data capturing works with `solo` or `threads`, on my local tests they behaved the same. For this we will use just `solo` since it does not require any additional package to be installed.
Make sure to clean up the coverage data files before proceeding:
``` console
rm .coverage.*
```
Now we can run celery using `solo` as the `POOL_CLS`, on the first terminal window:
``` console
celery -A myapp worker -l info -P solo
```
Repeat the task call on the second terminal window and exit the celery process on the first window. Then check for the generated coverage data files, on my local test I got:
```
!coverage.py: This is a private format, don't read it directly!{"lines": {"/home/elyezer/code/celery/examples/app/myapp.py": [35, 37, 39, 24, 25, 27, 28, 29]}}
```
Now the line 37 is included as expected.
The documentation about how to capture coverage data from subprocess can be found at [1]. Also coverage.py allows us to specify a concurrency library, multiprocessing is one of the valid values and coverage.py apply some patches [2] to the multiprocessing library when it is specified as one of the concurrency libraries.
I saw that Billiard is setting some information on the Python built-in multiprocessing library but it seem that it is missing something.
I wish I could capture the coverage information on a running celery process no matter the `POOL_CLS` used.
[1] http://coverage.readthedocs.io/en/coverage-4.2/subprocess.html
[2] https://bitbucket.org/ned/coveragepy/src/257e52793fb0f28853ddca679f67b158107262bf/coverage/multiproc.py
@nedbat Can you please suggest some solution for this issue?
nedbat
(Ned Batchelder)
March 13, 2024, 4:12pm
2