Access application data when extending embedded Python

efternavn · July 31, 2023, 6:45am

I am writing an application that embeds a Python interpreter. The application also extends Python with a custom module. This is done along the lines of the “Extending and Embedding the Python Interpreter” part of the Python documentation. In the section “Extending Embedded Python” it is described how to “access to functionality from the application” and the example module emb gets access to argv of the application’s main function by using a static variable numargs.

Now, my question is, can I somehow pass data (like argv in the example) to my module in a more encapsulated way than using global state?

In contrast to the emb example, I use multi-phase initialization and introduce module state in a struct via m_size in my PyModuleDef instance. So I would have liked to somehow pass data to my Py_mod_exec slot function to use it for initializing my module state, but as the setup is made with raw C function pointers, I cannot see a way to pass data that way. Alternatively, I’ve thought looking up the module after my Py_Initialize call and pass the data via PyModule_GetState, but I don’t see how I can lookup my multi-phase initialized module. So my idea now is to set some global state somewhere on the interpreter and pass the data that way. But I have not looked much into that yet, as I was hoping for a better solution.

I’ve also asked this question on Stack Overflow (sorry, I can apparently only have two links in my post) but maybe I’ll have more luck with the audience here

Rosuav · July 31, 2023, 7:17am

Depends on your design. It sounds like you have full control over this, so my recommendation is that you have the Python code define a function, and the external app then loads up Python, imports the module, finds a specific function by name, and calls it. That way, you can pass whatever you like as arguments to this entrypoint function - or even have multiple such entrypoints if that makes sense.

efternavn · July 31, 2023, 9:10am

Yes, that’s a good suggestion. I think I know how that should be implemented in both the module and the app that embeds Python. Thanks!
As I understand it, the implementation will require some wrapping and unwrapping of the function arguments, as the module has to define the entrypoint function as any other module function callable from Python code. And my entrypoint function could in fact be called from Python code, even though that would never make sense, right?
Anyway, I think this is better than the other alternatives that I have considered.

Rosuav · July 31, 2023, 9:37am

Correct on both counts. One way or another, you’ll need to shuffle data between your app and Python, so that’s the wrapping and unwrapping part.

efternavn · July 31, 2023, 2:35pm

Good, thanks for confirming my understanding

I guess you’re right.
But since both the embedding app and the module are written in C(++), I could have been possible to avoid the wrapping, right? I had hoped there would be some way to pass data into the initialization function set up with PyImport_AppendInittab (and that would have a chance to avoid the wrapping). This is probably not the case, and that’s fine.

And thanks again. I’ll start implementing your suggestion soon, I think.

Rosuav · July 31, 2023, 2:39pm

Not entirely sure I follow. Do you mean your architecture is “app calls Python code, Python calls app-provided module”? If so, you can cheat the shuffling a bit by creating Python objects that opaquely represent app data. But anything that’s going to be manipulated by Python has to have a Python object to represent it, and if you can do that with standard types like lists and strings and integers, it’ll be easier to debug.

efternavn · July 31, 2023, 2:52pm

Yes, exactly.

My app data should not be known to the Python code itself, it’s only going to be part of the internal module state. I my current case, the data is like a session handle that my app-provided module needs to have access to.
I haven’t looked at how the wrapping is going to look but I don’t expect it to be a real problem.

Rosuav · July 31, 2023, 3:00pm

Ahh, yeah. That sounds like a job for an opaque type. Assuming that your app is the only thing that will ever create or destroy those sessions, you should be able to instantiate Python objects to represent them, with most of the data being buried away somewhere. It’s been a while since I did anything like this, but you should be able to (a) define a type in your app, (b) instantiate an object of that type, with a pointer to the rest of the session data, (c) hand that to Python, and (d) receive it back from Python, able to see the session data in memory.

You’ll have to cope with the question of what happens to the Python object when the underlying session is disposed of, unless that’s fundamentally impossible (which would make it a lot simpler). Otherwise, it’s not too difficult.

efternavn · July 31, 2023, 8:08pm

I guess (c) could be done by keeping the opaque object as an attribute on my module. That way (d) would also be almost “for free”.

efternavn · August 3, 2023, 9:43am

For future reference:
What I didn’t understand was that I get a reference to my module by importing it (after my Py_Initialize call). Having the module reference, I can easily get ( PyModule_GetState) and modify the module state with my unwrapped C data.
I started out by wrapping my data in a Capsule as some sources made that sound like the preferred way to do it, but I don’t think I have any need for this wrapping after all.
The Capsule inspiration came from Passing a C pointer around with the Python/C API - Stack Overflow and the first draft of PEP 489. Note also that the final version of PEP 489 doesn’t have Capsule helper functions – it seems there was some discussion on whether Capsule wrapping was generally necessary.