Hi there. I would like to ask for some help with an issue related to frame stack walking with Python 3.11 that was reported to us by PyPI. The following traceback
AttributeError: 'code' object has no attribute 'f_code'
File "ddtrace/internal/periodic.py", line 73, in run
self._target()
File "ddtrace/profiling/collector/__init__.py", line 42, in periodic
for events in self.collect():
File "ddtrace/profiling/collector/stack.pyx", line 492, in ddtrace.profiling.collector.stack.StackCollector.collect
File "ddtrace/profiling/collector/stack.pyx", line 353, in ddtrace.profiling.collector.stack.stack_collect
File "ddtrace/profiling/collector/_traceback.pyx", line 44, in ddtrace.profiling.collector._traceback.pyframe_to_frames
File "ddtrace/profiling/collector/_traceback.pyx", line 55, in ddtrace.profiling.collector._traceback.pyframe_to_frames
is being generated by the Datadog Python profiler when attempting to sample the current frame stack of each running thread. The stack_collect
method uses the following helper to collect the running threads and create a mapping between thread ID and top frame (among other details).
cdef collect_threads(thread_id_ignore_list, thread_time, thread_span_links) with gil:
cdef dict current_exceptions = {}
IF UNAME_SYSNAME != "Windows" and PY_MAJOR_VERSION >= 3 and PY_MINOR_VERSION >= 7:
cdef PyInterpreterState* interp
cdef PyThreadState* tstate
cdef _PyErr_StackItem* exc_info
cdef PyThread_type_lock lmutex = _PyRuntime.interpreters.mutex
cdef PyObject* exc_type
cdef PyObject* exc_tb
cdef dict running_threads = {}
# This is an internal lock but we do need it.
# See https://bugs.python.org/issue1021318
if PyThread_acquire_lock(lmutex, WAIT_LOCK) == PY_LOCK_ACQUIRED:
# Do not try to do anything fancy here:
# Even calling print() will deadlock the program has it will try
# to lock the GIL and somehow touching this mutex.
try:
interp = PyInterpreterState_Head()
while interp:
tstate = PyInterpreterState_ThreadHead(interp)
while tstate:
# The frame can be NULL
# Python 3.11 moved PyFrameObject to internal C API and cannot be directly accessed from tstate
IF PY_MINOR_VERSION >= 11:
frame = PyThreadState_GetFrame(tstate)
if frame:
running_threads[tstate.thread_id] = <object>frame
exc_info = _PyErr_GetTopmostException(tstate)
if exc_info and exc_info.exc_value and <object> exc_info.exc_value is not None:
# Python 3.11 removed exc_type, exc_traceback from exception representations,
# can instead derive exc_type and exc_traceback from remaining exc_value field
exc_type = Py_TYPE(exc_info.exc_value)
exc_tb = PyException_GetTraceback(exc_info.exc_value)
if exc_tb:
current_exceptions[tstate.thread_id] = (<object>exc_type, <object>exc_tb)
Py_XDECREF(exc_tb)
Py_XDECREF(frame)
ELSE:
frame = tstate.frame
if frame:
running_threads[tstate.thread_id] = <object>frame
exc_info = _PyErr_GetTopmostException(tstate)
if exc_info and exc_info.exc_type and exc_info.exc_traceback:
current_exceptions[tstate.thread_id] = (<object>exc_info.exc_type, <object>exc_info.exc_traceback)
tstate = PyThreadState_Next(tstate)
interp = PyInterpreterState_Next(interp)
finally:
PyThread_release_lock(lmutex)
ELSE:
cdef dict running_threads = <dict>_PyThread_CurrentFrames()
# Now that we own the ref via <dict> casting, we can safely decrease the default refcount
# so we don't leak the object
Py_DECREF(running_threads)
cdef dict cpu_times = thread_time(running_threads.keys())
return tuple(
(
pthread_id,
native_thread_id,
_threading.get_thread_name(pthread_id),
running_threads[pthread_id],
current_exceptions.get(pthread_id),
thread_span_links.get_active_span_from_thread_id(pthread_id) if thread_span_links else None,
cpu_time,
)
for (pthread_id, native_thread_id), cpu_time in cpu_times.items()
if pthread_id not in thread_id_ignore_list
)
The top frames thus collected are passed to the pyframe_to_frames
helper, which performs the stack unwinding from each top frame passed to it
cpdef pyframe_to_frames(frame, max_nframes):
"""Convert a Python frame to a list of frames.
:param frame: The frame object to serialize.
:param max_nframes: The maximum number of frames to return.
:return: The serialized frames and the number of frames present in the original traceback."""
frames = []
nframes = 0
while frame is not None:
nframes += 1
if len(frames) < max_nframes:
code = frame.f_code
lineno = 0 if frame.f_lineno is None else frame.f_lineno
frames.append((code.co_filename, lineno, code.co_name, _extract_class_name(frame)))
frame = frame.f_back
return frames, nframes
The line code = frame.f_code
is where we seem to get a Python object that is not a FrameType
object on some occasion, which is then responsible for the traceback shown above.
The full sources are available from GitHub - DataDog/dd-trace-py: Datadog Python APM Client (the two relevant Cython sources featured in the traceback are dd-trace-py/stack.pyx at 1.x · DataDog/dd-trace-py · GitHub and dd-trace-py/_traceback.pyx at 1.x · DataDog/dd-trace-py · GitHub)
We have double-checked all our reference counting and decremented them whenever the C API we used returned a strong reference. We haven’t been able to spot any other potential issues with our coding. In particular, we cannot see how the frame
variable in the pyframe_to_frames
helper could ever be something other than a FrameType
(or None
). Note that the traceback above is one instance of similar tracebacks, where instead of 'code'
we would have some other random Python objects (e.g. 'Project'
from SQLAlchemy etc…).
Assuming that we have not overlooked anything in our code, this has left us with essentially two possible conclusion:
- this is an issue with Cython support for CPython 3.11
- this is an issue with CPython 3.11 in the way that frame objects are created lazily.
We’re trying to reproduce this issue locally to narrow down which C API call might be returning the random object that we see in the traceback, but so far we haven’t been able to. I’m hoping that perhaps a core dev can think of something that could potentially cause this issue to appear when using certain C API functions around frame stack unwinding, while we try to gather more insight into it.