Creating a handler for importing a new file extension

I want to augment the import machinery to support importing from files with a .foo extension using FooLoader.

While I wasn’t expecting that to be super-easy, I assumed there was at least some legal-feeling way to do it.

After staring at 5. The import system — Python 3.14.2 documentation and importlib — The implementation of import — Python 3.14.2 documentation and scratching my head and digging around, the best strategy I could see was:

  • Search sys.path_hooks for an entry with a __qualname__ that begins 'FileFinder.path_hook.'
  • Extract sys.path_hooks[i].__closure__[1].cell_contents
  • Replace sys.path_hooks[i] with a new FileFinder.path_hook listing all the existing loader_details plus my new entry

OK, an alternative would be to hard code the default loader_details into my code rather than jumping through hoops to extract them programmatically, but that feels even more likely to break in future versions of Python, plus has the significant disadvantage that it wouldn’t cope with multiple modules each attempting to add their own file extension.

Have I overlooked some clean way to do this?

1 Like

This should be straightforward, without having gone into the details, by adding a custom finder or combined finder/loader to sys.meta_path (or however they are registered). That object’s find_spec method can run any code desired, and takes a name argument. For example, it should be simple to loop over each dir_ in sys.path and check if f"{dir_}/{name}.foo" is a file (on Posix anyway- use pathlib instead for platform independence). More code is needed for packages, e.g. to support __init__.foo (I would not assume that automatically plays nicely with relative imports). From there, personally I’d copy the .foo to a temp folder in sys.path with a .py extension, and invoke the normal import machinery.

James Murphy managed to import from the cloud https://www.youtube.com/watch?v=2f7YKoOU6_g (please note his security warnings). His code could be easily adapted (and made safer! :wink: ).

1 Like

That doesn’t seem at all the right place to be implementing it: by my understanding, meta paths are for finding modules, not individual files within a module. The specialisation point for individual files (at least for modules located by PathFinder rather than the mechanisms for built-in modules, etc.) is sys.path_hooks .

The trouble is, that by default contains just the ZIP file importer and the FileFinder… which was given its list of file suffix and corresponding loader hooks when it was constructed, long before your program begins running.

So I replace it with a new FileFinder that supports more suffixes. Which is fair enough, apart from the above-mentioned underhanded tricks for:

  • Identifying the FileFinder hook in sys.path_hooks (the hook is a closed function, not an actual FileFinder or bound method)
  • Extracting the list of suffixes and loaders from the function’s closure

Those two steps are hacky and fragile, which is why I was hoping someone knew of an alternative.

by my understanding, meta paths are for finding modules, not individual files within a module

I suggest you re read the documentation, you yourself posted.

Meta hooks are registered by adding new finder objects to [sys.meta_path]

When the named module is not found in sys.modules, Python next searches sys.meta_path, which contains a list of meta path finder objects. These finders are queried in order to see if they know how to handle the named module.

Like I said, I’ve not gone into the details. That’s your job.

I’ve given you an example, that I trust works. Adapt it.

I’m sorry if you feel any steps are hacky, but the whole purpose of the exercise, is to import a .foo file instead of following the normal naming convention (and messing around with the import system is always bug prone).

1 Like

I’ve stared at this even harder, and I remain convinced that implementing this via sys.meta_path is not the answer, here.

To implement it there:

  • PathFinder has no specialisation points other than sys.path_hooks.
  • Therefore, to avoid using sys.path_hooks, one must either shadow the implementation of various PathFinder behaviours, or do without:
    • Iterate through sys.path and/or the provided path
    • Maintain a shadow of sys.path_importer_cache, to hold all our secondary file finders.
    • Recursively populate that cache at need, from our own shadow of sys.path_hooks
    • Attach to that shadow of sys.path_hooks a FileFinder.path_hook that knows about (solely) the .foo extension.
  • …and then you still need to implement your actual custom loader!

That’s a lot of existing code that has to be duplicated. Otherwise one will end up breaking top-level .foo modules, or .foo modules within namespace packages. Even then, having a second FileFinder instance per directory would double the amount of stat/listdir churn during import.

So why not at least mitigate much (though far from all) of that by adding a second FileFinder.path_hook to sys.path_hooks? Because any FileFinder.path_hook will give you a FileFinder that searches only for the extensions it understands.

Bear in mind that we might have a package like this:

my_package/
    __init__.py
    module1.py
    module2.foo

…in which case neither the standard FileFinder.path_hook already in sys.path_hooks nor the .foo-specific one would result in a finder that could find all submodules. The only solution I can see would be to have two FileFinders per directory, in two separate caches.

Again, it feels as though the correct specialisation point for adding a new importable file type is the loader_details embedded in the FileFinder.path_hook which is already in sys.path_hooks; it makes fundamental sense for a list of supported suffixes to be the place to put a supported suffix. But that isn’t designed to be augmented, hence the evil jumping through hoops I’m intending to perpetrate unless there’s a realistic alternative.

James Murphy managed to import from the cloud https://www.youtube.com/watch?v=2f7YKoOU6_g (please note his security warnings). His code could be easily adapted (and made safer! :wink: ).

I feel I should emphasise that that example is inapplicable to what I’m doing:

  • That video is importing the same kind of thing from somewhere else
  • I want to import a different kind of thing from the same place

I think you may be doing us, and therefore indirectly yourself, a disservice by not sharing what the motivation is for this project. So far all I have is that you want to give valid Python files a nonstandard suffix. But I don’t know why you want this, and that may be the core thing at issue.

Except, why should this be a desirable extension point?

The typical use cases for import system extension are “find modules in a specialized storage system, not the filesystem”. Customization of the path hooks, rather than the meta path, is too late for the common case. And all use cases around customizing imports are relatively rare.

You already seem to have a good handle on the fact that you can duplicate code which wasn’t designed to be extended in the way that you want, and modify that to suit your needs. That sounds to me like a good solution. Can you explain why it’s not satisfactory?


Somewhat separately, I want to note that you provided an example, and it immediately raises two concerns in my mind.

my_package/
    __init__.py
    module1.py
    module2.foo

The first is that not every module is part of a package. e.g., move module2.foo up a level. I think this will require you to write a meta path finder?

The second is name conflicts. What happens if module2.py gets added to the above?
Naturally you can define a precedence order but it gets back to it being unclear why you are doing this.

2 Likes

In general terms, I am wanting to define a file extension for stuff which, though not a Python program, can nonetheless be compiled into a Python module. This could happen by transforming the file contents into Python code, compiling an AST, or similar. In any case, the core of my approach is to make a class that inherits from SourceFileLoader and overrides source_to_code().

That gives some important advantages over techniques such as providing a Python function that loads a resource file and returns some kind of namespace:

  • The machinery for caching compiled .pycs works as normal, meaning the (potentially costly) source_to_code override isn’t run unnecessarily.
  • Similarly, the module finds its way into sys.modules like any other, saving reloading it when it’s used from multiple modules.
  • You get to use from ... import on it.

The two specific initial examples I have in mind are:

  • Facilitating making GraphQL queries from Python by taking a .gql file full of queries and exposing each as a function taking the required parameters (including various additional tricks like converting paginated queries into Python iterators).
  • Creating a templating system that leverages Python 3.14’s t-strings.

…though I can see plenty of other similar intriguing use cases.

2 Likes

Can you explain why it’s not satisfactory?

There are only two realistic options, both of which entail studying the source code of importlib:

  • Create classes which mimic parts of it
  • Specialise stuff at undocumented points not intended for specialisation

Both of those feel extremely fragile to changes between Python versions, and to other people doing the same kind of thing in other projects.

The first is that not every module is part of a package. e.g., move module2.foo up a level. I think this will require you to write a meta path finder?

There are two options:

  • Only allow this mechanism to be used from packages, and say __init__.py should import this mechanism before importing any modules within the package (so that it is certain to be in place before the FileFinder for its directory has been created.
  • sys.path_importer_cache.clear()

No, neither of those seems especially elegant, but they work.

1 Like

Fair enough. There’s more than one way to approach so many problems. Especially in Python (despite The Zen). If you get it working, I’m interested to hear how you did it

1 Like

Can you not just create your own FooFinder and add it to sys.path_hooks? I don’t know what kind of loader you would return but it seems like what you’re describing would involve first defining a new finder.

2 Likes