Segfault using Sage function with try...except AlarmInterrupt

If you do what too many times, loop through those pseudocode steps?

  File "/usr/lib/python3/dist-packages/sage/groups/free_group.py", line 878, in _element_constructor_
    P = x.parent()
AttributeError: 'list' object has no attribute 'parent'

Line 878 in free_group.py is trying to apply a parent() method to the list object x but ‘x’ has no such method. The question, of course, is what happened to make that line run with type(x) = 'list'.

1 Like

Yes - if I run the MWE below, which runs through that psuedocode 1000 times, it fails at random places.

(BTW, I know the MWE is randomly generating stuff, but even when I supply it a predefined list, it fails on unpredictable cases - but always in the parent() location you identified).

Sure, I get that - but that’s buried in a Sagemath implementation of same GAP code, and is not likely something I can debug. I am mostly interested in making sure my use of try…except with the Alarm Interrupt is something that looks fine from a program control perspective. Could it be breaking uncleanly out of the interrupt loop, causing that line to execute with the wrong attribute?

How deep is the call stack at the point where the error trips?

Can you post it?

I don’t quite know how to do what you’re asking, but here’s my attempt.

I scattered some print(“Max Depth is:” len(inspect.stack(0))) around the code, but the only thing it ever said was “1”. Here’s the specific error again, with those additional lines:

[(x9*x11)^2, x3*x1*x8, x5*x6*x7, x11*x13*x12, x10*x0*x7^2, x3*x7*x8, x8*x2, x5*x13, x0*x10*x0, x11*x13*x4*x0, x10*x5, x11*x10*x8*x5] Finitely presented group < x0, x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, x11, x12, x13, x14 | (x9*x11)^2, x3*x1*x8, x5*x6*x7, x11*x13*x12, x10*x0*x7^2, x3*x7*x8, x8*x2, x5*x13, x0*x10*x0, x11*x13*x4*x0, x10*x5, x11*x10*x8*x5 >
Max Depth is: 1
Some Unhandled exception!
Max Depth is: 1
Group Type is: F4
40 [[10, 4, 1, 7], [9, 1, 7, 11], [11], [13, 9, 14, 9], [7, 7, 13], [13, 2, 7, 5], [1, 9, 14, 10], [3, 1], [4, 9, 11], [2, 14, 8], [6, 4], [10, 14]]
Max Depth is: 1
/usr/lib/python3/dist-packages/apport/report.py:13: DeprecationWarning: the imp module is deprecated in favour of importlib and slated for removal in Python 3.12; see the module's documentation for alternative uses
  import fnmatch, glob, traceback, errno, sys, atexit, locale, imp, stat
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/sage/groups/free_group.py", line 878, in _element_constructor_
    P = x.parent()
AttributeError: 'list' object has no attribute 'parent'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/cduston/Dropbox/Research/LQG/TopStatesSage/MWE-stacksize.sage.py", line 42, in <module>
    Y1=[H(Y[i]) for i in range(g*V)]
  File "/home/cduston/Dropbox/Research/LQG/TopStatesSage/MWE-stacksize.sage.py", line 42, in <listcomp>
    Y1=[H(Y[i]) for i in range(g*V)]
  File "sage/structure/parent.pyx", line 898, in sage.structure.parent.Parent.__call__ (build/cythonized/sage/structure/parent.c:9458)
  File "sage/structure/coerce_maps.pyx", line 156, in sage.structure.coerce_maps.DefaultConvertMap_unique._call_ (build/cythonized/sage/structure/coerce_maps.c:4627)
  File "/usr/lib/python3/dist-packages/sage/groups/free_group.py", line 880, in _element_constructor_
    return self.element_class(self, x, **kwds)
  File "/usr/lib/python3/dist-packages/sage/groups/free_group.py", line 229, in __init__
    AbstractWordTietzeWord = libgap.eval('AbstractWordTietzeWord')
  File "sage/libs/gap/libgap.pyx", line 400, in sage.libs.gap.libgap.Gap.eval (build/cythonized/sage/libs/gap/libgap.c:4365)
  File "sage/libs/gap/util.pyx", line 388, in sage.libs.gap.util.gap_eval (build/cythonized/sage/libs/gap/util.c:6054)
cysignals.signals.SignalError: Segmentation fault

Is that close to what you wanted, or can gdb give us more information?

The only thing I notice is that it seems to be hitting my “other exception” case (which was just a call to except: with no further conditions), tries to continue, and fails. But I “pass” out of that back to beginning of the loop, so I’m not sure why a redefinition of the variables (definitions that worked before) would now cause some kind of type error.

The call stack is the “trail of bread crumbs” (a la Hansel and Gretel / Hansel und Gretyl) that shows all of the nested calls that got you to where you currently are. This is a textbook case of its value since we may be deep in the woods and some stone we stepped on “back there” woke the Big Bad Wolf. [1]

The call stack is a debugging feature in Visual Studio Code and others. You can set breakpoints based on call events and other useful things. What editor are you using?

Here’s a link to the VS Code description.

I’m at lunch but will post a screenshot when I return to my desk.


  1. (Yes, this is a fairy tale version of a mixed metaphor, but I hope you find it to be a humorous analogy and also illustrative.) ↩︎

1 Like

This is a screenshot of the Call Stack pane in VS Code.

  1. Set a breakpoint at line 4 in ModuleTest2.py
  2. Run ModuleTest3.py that imports f() from ModuleTest2.py
  3. The Call stack traces the calls to f() and shows that f() is in ModuleTest2.py

If the break had been caused by an error, the call stack would show the call path to that error.

1 Like

I write on emacs, typically, or just IDLE default editor when that’s convienient. I’ll try a little VSCode magic…

My first attempt it to just run the code, which is eventually segfaults and gives me the following…not much info there I don’t think:

To try and answer the question I think you want (“size of the stack at the time of seg fault”), I tried to step through with the breakpoint set to line 39 - where the seg fault says the error is. That’s kinda tricky, since it just fails after some random number of times in the loop (47, 32, 51, etc…), but I can say that the stack size at that line is either 2 or 4 - and can’t attach another image, but it looks the same, just with

<listcomp>
<module>
<listcomp>
<module>

in the Call Stack.

But I still don’t see how that’s helpful - my code is asking Sage to do something with , and it’s getting some kind of unhandled exception.

Is that close to the information you think might be helpful? Thanks for your help!

Well, I obviously overstated the case. I was hoping that the stack would trace the library calls but evidently the call stack doesn’t parse the binaries–or maybe no libraries at all, and just independent modules…very disappointing. Coincidentally, I discovered this limit myself last night running an OpenCV function I had never used before. The resolution there was to browse the C++ source code at GitHub (free_group.py in your case) to learn more about the function’s arguments. That may be your best option unless one of the other ideas below are on target or someone comes up with a brighter idea. Often you can find an assertion-exception that shows where and how the fault is checked for. The checks and remedies at the end of this post are probably more promising than digging into source code, though (unless you’d like to get some under-the-hood understanding of the library functions and have time to do so), so read on before taking action.

Since you’re in VS Code, these are basic steps that tend to lead to better understanding (done during a debug BREAK):

  • Hover over the variable instances to check for unexpected values. Doing this, for example, you can sometimes find empty lists that are supposed to be populated or other unexpected values.
  • Use the ‘VARIABLES’ viewer above the CALL STACK pane to drill down into complex data objects and gain whatever insight is available there.
  • Change one value at a time (in the WATCH WINDOW) and run the offending line in the DEBUG TERMINAL to see which value (if any) is triggering the fault. (This probably won’t reveal much since SegFault is a memory usage error.)

It looks like the loops are accumulating objects in memory until the allocated memory is used up and tries to overflow into an adjacent (reserved) memory block. This definition of Segmentation Fault spells out the general situation well.

This StackExchange thread also contains some segmentation fault tracing methods (trace in stdlib, gdb, check your stack capacity, etc.) It also mentions a recursion limit setting that you could try reducing to see if this causes a recursion limit fault before the segmentation fault shows up.

P.S. This appears to be an OS-level fault and the remedy might be in telling the OS to allocate more memory to your stack. [ Anyone who knows about these things is encouraged to chime in. ] Before you make more room in memory, though…

Do these lists accumulate? Should they? If they’re accumulating and shouldn’t, then you should be able to do some explicit garbage collecting and probably stay below your current stack size.

1 Like

Segmentation fault is a reaction to SEGV (segmentation violation) signal which is sent to a process by the operating system when the process tries to access memory which it is not allowed to access (often unallocated memory).

In pure Python this should never happen unless there is a bug in the Python interpreter. In low-level languages like C this happens most often when you work with pointers incorrectly[1] or when you try to access heap memory which was already freed. But here we do not have (pure) Python…

Here I noticed the .pyx extension and cythonized. This points to the fact that this is a part of SageMath written in Cython, not Python, nor C.

You can try to:

  • examine the lines of the .c file and try to find their source in the .pyx file
  • insert some diagnostic outputs to the .pyx file
  • debug the Cython code using GNU debugger:

https://cython.readthedocs.io/en/latest/src/userguide/debugging.html


  1. e.g. you do not check if you do not cross the destination data structure size as Leland described ↩︎

1 Like

Thank you so much for the great info here. I am not all that interested in the underlying code (well…let’s just say I have interest but research goals in this project that are orthogonal to them!), so I will probably try exploring the error with VS Code as you’ve outlined here, and submit a bug report to the Sage project if I cannot solve the problem myself.

They should not be accumulating, and I believe they are not. At least, to try and force that I’ve added the lines

del Y
del G
del Y1

at the end of the loop. My understanding is that should be sufficient to make sure those variables are not being kept around for multiple iterations. I also threw in a

gc.collection(generation=2)

which I think should do even more…same exception errors.

I will report back with VS Code exploration, thanks again for all your advice here.

Oh holy moley, you’re right Cython is like it’s own damn thing…

https://doc.sagemath.org/html/en/developer/coding_in_cython.html

In fact, originally designed by the Sage developers! This makes me think I’m in way over my head here…

I guess I’ll read a little about cysignals, but…

Yes, Cython is a different language which aims to be a superset of Python…

Looking at your program:

I would really be afraid that the AlarmInterrupt can jump in at any inappropriate time and it can leave the Cython’s and Python’s data structures in an inconsistent state. The result can be what you see.

I have never used this interrupt in Python but I would guess that all the data structures which are used in a code which can be interrupted like that would need to be protected by some kind of transactions so that an unfinished transaction can be rolled back. This confirms the description in the documentation: Note on Signal Handlers and Exceptions. Basically it says that even the Python’s own internal data structures and code cannot cope with such interrupts.

So, now I am not surprised at all that you are getting the exceptions and segfaults. Originally I did not see that you are using the alarm interrupt. Can you confirm that the program never fails before the first interrupt?

…and you should never do this:

    except:
        print("Some Unhandled exception!")
        pass

This is very important. Catch only exceptions you expect and know why they can show up. Never catch all the possible exceptions! This hides all the important information from you. Did not you add this just to cover the problems caused by the alarm interrupts?

I see two possibilities:

  1. Use the solution from the documentation linked: Create your signal handler which will basically set a flag that you want to terminate the operation. Inside the code check for the flag and if it is set, terminate it gracefully. …but I guess that the code you want to interrupt is inside the Sage library :frowning: But maybe Sage has its own mechanism for interrupts which you can use…
  2. Split the program into two programs (if it is possible) which will run as separate processes:
    • First one will initialize the data structures which you need to keep between the iterations. It will serialize them (pickle, JSON…) and give to the second program.
    • The second program will do the long work you want to interrupt occasionally. When you kill the program you do not expect any output from it so any inconsistencies do not matter. When the program runs till the end, collect the data, serialize them and send them back to the first “main” program.
1 Like

The program never fails before the first interrupt, yes. I mean, the program never failed at all before I tried to do all this structure_description nonsense (the first line of the try: block).

Right - structure_description() was seg faulting before, which I considered expected behavior (it’s doing something called “solving the word problem”…which is not guaranteed to finish in finite time.). If I include that “extra” except: call, then I get through seg faults that reference the call to structure_description()…but eventually get halted by the seg faults I’ve got now.

So I get what you’re saying, and I’ll remove it, but it just means I have two seg fault problems to deal with now…although they may both be related to inappropriate AlarmInterrupts…

The only reason I’m learning Python is to write this Sage script, and now it turns out I’m learning the wrong language - and now I can’t get this thing to run as a Cython code - things like randrange don’t seem to exist when I follow instructions on running Cython from that Sage help page. “superset of Python” my ass…anyway, now I’ve got a few more things to learn before getting back to my original seg fault problem!

It is OK, if you need to catch an exception as a dirty workaround, but catch just the single type which should show up, print the message and expect problems… :slight_smile:

Python is a very mature general-purpose programming language. It is certainly useful to know it :slight_smile: You would need to secure the data structures against similar interrupts probably in every programming language.

I think maturity and available features of Cython are certainly lower. The number of users of Cython is in many orders of magnitude lower. Being “superset of Python” is the aim of Cython, not reality.

We can’t help if you keep us in the dark. Please post the full traceback and/or Cython error message.

My guess is you need something like

from random import randrange

first. (Exactly as you would need in pure Python.)

I doubt very much that Cython is literally incapable of running randrange, this blogpost shows an example of it doing so.

I think that you under-estimate how popular and mature Cython is. It was forked from Pyrex, which started in 2002, so it is 20 years old. It is used extensively in Sage, Numpy, Scipy, pandas and scikit-learn, which means it is indirectly used by nearly all scientific users of Python.

While it is not impossible for Cython to not run some pure Python code, my bet is that the error is a user mistake not a Cython limitation.

1 Like

[Václav] maturity and available features of Cython are certainly lower.

[Chris Duston] “superset of Python”

Yes, “superset” is a stretch since the Python versions you can use lag what’s current, though Cython does an admirable job of keeping up. You can check their website to see what version of Python is baked into which version of Cython.

The current Cython is 3.0 which is now going on a year since release. It supports Python ‘3.4+’. I’m pretty sure that the ‘+’ is purely about Python’s built-in backward compatibility and means that nothing introduced after 3.4 will run in Cython 3.0. The next Cython release will “only” catch up to Python 3.6. I imagine keeping up is a big job and a moving target if you set your sights too high.

In short, Cython is “equivalent” to the embedded version of Python, understanding that it may be 99.xx% equivalent, but not 100%.

Cython will run native Python.

It’s not a different language, merely an extension of Python that allows data types to be explicitly declared so that some optimization can be done at the C-level. The primary benefit of this optimization is faster execution speed because the overhead involved with allowing flexibility of shifting data types on the fly is eliminated–but only for the data objects that you explicitly bind to a data type. You can just run your native Python code in Cython if you don’t care about the performance enhancement, as long as your code uses instructions in the embedded Python version.

Why would you just run native Python in Cython…?

Cython allows you to compile a binary executable that isn’t easily cracked into, so you can package commercial code and release it into the wild with less worry of reverse engineering, copying, and licensing/property violations.

An example on that theme... (CLICK to unfold)

I’m in the process of deploying a native Python+OpenCV application with Cython because the licensing and copy protection system that I’m using (from Thales) requires a compiled executable to harden and secure the licensing code. [1] Due to the bonus from Cython of increased speed, hopefully it will be worth the investment in effort to learn the few extra instructions and syntax around type declarations.

Type declaration and steps to compile/package/deploy are the only differences that I’ve found so far–but I’m just getting started. The Cython interpreter may deviate from CPython, but that’s the same scenario for PyPy, Jython and others. Ideally, the programmer and user don’t see much difference beyond the benefits that led to using these CPython alternatives in the first place.


  1. The current copy protection is an in-house design and works well but depends on the computer’s hardware signature and can’t float among a group of software installations. ↩︎

It was not my intention to get you to help me run Cython code - the current problems are likely trivial in comparison to my original problem (my current problem is “_MWE_working_spyx_8.pyx:33:0: Mixed use of tabs and spaces”…making me ask why I ever try to do anything that’s not C…)

Anyway, apologies, this post is not about getting you folks to train me how to compile Cython code.

Sure - my specific problem was actually the interaction between sage, python, and cython. My sage script (blah.sage) runs perfectly fine when I do sage blah.sage, but in fact it did some kind of translation into python, since blah.sage.py appeared in my directory. So renaming my .sage into .spyx doesn’t work, but actually renaming .sage.py into .sage.spyx does indeed work (at least, now I’m back to seg faults!)

My point being I didn’t meant to suggest a found a counterexample to Cython running vanilla Python, only that there are …user space? … issues to deal with.

Yeah I get this. I mean, it’s frustrating to me since I literally don’t care how long computer programs take to do things. But…I know CS people who care, so fine, since they do the development, we all have to do what they they tell us :slight_smile:

1 Like

I noticed that there is an existing and much easier solution for my suggestion number 2:

Instead of creating two separate programs and complicatedly sending the data, just fork the process. Sage has the @fork decorator for this. It even has timeout argument!

https://doc.sagemath.org/html/en/reference/parallel/sage/parallel/decorate.html

BTW the forum linked above is probably much better suited for solving similar problems with Sage.