Reducing import time over several repeated calls in a Makefile

I’ve got a simple script that takes an asset, does some simple processing on it, and outputs the processed file. This script is called for every file in a particular directory by Make.

Using python 3.10 on WSL1 and Windows (I’ve tried 3.12 as well, yet to try 3.13), I noticed that the import cost is quite large compared to the actual script time (20 milliseconds for the script, 105 milliseconds for importing).

I understand to a degree why this happens, but I’m wondering if, for my use case, there’s an approach like ccache for Python being run within a Makefile to reduce import times. Or, if there’s some way to have the relevant imports pre-loaded as part of the application startup instead of on-demand when the script is parsed…

python3 -m cProfile ./scripts/txt2map.py build/attribmaps/0115.map gfx/attribmaps/0115.txt ./gfx/prebuilt/attribmaps
         4350 function calls (4283 primitive calls) in 0.020 seconds
python3 -X importtime ./scripts/txt2map.py build/attribmaps/0115.map gfx/attribmaps/0115.txt .
/gfx/prebuilt/attribmaps
import time: self [us] | cumulative | imported package
import time:       132 |        132 |   _io
import time:        35 |         35 |   marshal
import time:       331 |        331 |   posix
import time:      1438 |       1934 | _frozen_importlib_external
import time:       278 |        278 |   time
import time:       766 |       1043 | zipimport
import time:        96 |         96 |     _codecs
import time:       571 |        666 |   codecs
import time:       583 |        583 |   encodings.aliases
import time:      1515 |       2763 | encodings
import time:       616 |        616 | encodings.utf_8
import time:       126 |        126 | _signal
import time:        38 |         38 |     _abc
import time:       431 |        469 |   abc
import time:      5361 |       5830 | io
import time:        50 |         50 |       _stat
import time:       378 |        428 |     stat
import time:       953 |        953 |     _collections_abc
import time:       283 |        283 |       genericpath
import time:       836 |       1119 |     posixpath
import time:      1710 |       4208 |   os
import time:       460 |        460 |   _sitebuiltins
import time:      3632 |       3632 |     apport_python_hook
import time:       389 |       4021 |   sitecustomize
import time:       187 |        187 |   usercustomize
import time:     14291 |      23165 | site
import time:       471 |        471 |         types
import time:      1661 |       2131 |       enum
import time:       426 |        426 |         _sre
import time:       469 |        469 |           sre_constants
import time:       704 |       1173 |         sre_parse
import time:      4005 |       5603 |       sre_compile
import time:        89 |         89 |           itertools
import time:       408 |        408 |           keyword
import time:        70 |         70 |             _operator
import time:       889 |        958 |           operator
import time:       450 |        450 |           reprlib
import time:        57 |         57 |           _collections
import time:      4983 |       6944 |         collections
import time:        72 |         72 |         _functools
import time:     14217 |      21232 |       functools
import time:        80 |         80 |       _locale
import time:       502 |        502 |       copyreg
import time:      3014 |      32560 |     re
import time:      1169 |      33729 |   fnmatch
import time:        85 |         85 |   errno
import time:        82 |         82 |   zlib
import time:       436 |        436 |     _compression
import time:       885 |        885 |     _bz2
import time:      8526 |       9847 |   bz2
import time:       793 |        793 |     _lzma
import time:       875 |       1667 |   lzma
import time:      6587 |      51994 | shutil
import time:      1583 |       1583 | common
import time:       101 |        101 |     _struct
import time:      2241 |       2341 |   struct
import time:       724 |        724 |     _ast
import time:       809 |        809 |     contextlib
import time:      1679 |       3211 |   ast
import time:      6278 |      11829 | common.utils
import time:      1085 |       1085 | common.tilemaps
import time:      1859 |       1859 |   utils
import time:      1060 |       2919 | common.tilesets
import time:       353 |        353 | encodings.utf_8_sig

Here’s the script itself:

#!/bin/python

import os, sys
from shutil import copyfile
sys.path.append(os.path.join(os.path.dirname(__file__), 'common'))
from common import utils, tilemaps, tilesets

output_file = sys.argv[1]
input_file = sys.argv[2]
prebuilt_root = sys.argv[3]

fname = os.path.splitext(os.path.basename(input_file))[0]
char_table = {}

# 0xFE is a special character indicating a new line for tilemaps, it doesn't really belong in the tileset table but for this specifically it makes sense
char_table['\n'] = 0xFE

prebuilt = os.path.join(prebuilt_root, f"{fname}.map")
if os.path.isfile(prebuilt):
    print("\tUsing prebuilt {}".format(prebuilt))
    copyfile(prebuilt, output_file)
    os.utime(output_file, None)
    quit()

with open(input_file, 'r', encoding='utf-8-sig') as f:
    mode = int(f.readline().strip().strip('[]'), 16)
    tmap = [mode]
    if mode & 3:
        text = []
        for line in f:
            b = utils.txt2bin(line, char_table)
            text += b
        text.append(0xFF) # tmap compression expects 0xFF at the end
        tmap += tilemaps.compress_tmap(text)
    else:
        text = f.read().replace('\r\n','\n')
        tmap += utils.txt2bin(text, char_table)
        tmap.append(0xFF)
    with open(output_file, 'wb') as of:
        of.write(bytearray(tmap))

The common modules have more work that gets done within them, here are their imports:

import struct
from ast import literal_eval
from collections import OrderedDict

Why not just modify the script to take a list of of files and iterate over it?

If you can change your invocation to python3 -m cProfile -m scripts.txt2map or similar (make the script runnable/importable as a “module”) that should help with runtime a bit. Makes it so turning the script from text file → python interpreter bytecode will be cached.

1 Like

If your code have some conditionals, you can move the imports inside the condition, so it’s executed only if needed.

You should post your script to see what can be done.

(Edit: Sorry for the posts and edits, I’m still getting used to discourse…)

Appreciate your replies.

I believe the approach of one file processed per call is quite common, especially in a build system where you’d like to have fast iterative builds, but some alternatives I had considered:

  • Have one script that processes all the files at once → we give up parallelism and the benefit of incremental builds
  • Have the script just process multiple files and pass ‘all changed files at once’ → would cause us to not benefit from proper parallelism on clean builds, and how do you determine the trade-off on an arbitrary machine on what a good use of system resources for a build would be (i.e., splitting the work over cores)

Posted the script, but it’s pretty straightforward I think.

Giving it a shot, runtime doesn’t change at all with cProfile, but testing import time does show that more modules get added which increases the initial import time by about 2 seconds but the cached time (2nd run onwards) is within a hundred or so milliseconds (I guess within margin of error on my machine).

(As an aside, I did try Python 3.14 and it improves start up times quite a bit but they still are double the actual execution time of the script :frowning: )

1 Like

You can try to move from shutil import copyfile inside the if os.path.isfile(prebuilt):, and to remove tilesets import, that seems unused. I don’t really know however if there’s a noticeable improvement.

Can you post the code?

1 Like

(no imports here anyway)

Thanks for noticing this! I forgot I removed the need for the tilesets import and I’ll try moving the shutil bit as well. These are indeed helpful for reducing import times.

Actually, your post reminds me that I can also import the specific definitions from the common functions as well.

Ok, it seems that in your script you use only few functions defined in utils and in tilemaps. You can try to move these functions in separate modules, and import these modules in the script. You can also import them back in utils and tilemaps, so if you use utils and tilemaps elsewhere you have all the functions in one place.

This will not improve the speed of importing. The code of the module will be always evaluated entirely, even if you import only a name from the module. from a import b is only syntactic sugar for

import a
b = a.b
1 Like

You can process the list of files in parallel in the python script.

Noted, thanks for the info.

I’ll try this, thanks for the info.

Indeed, this is an option, but I’m not trying to rewrite Make’s build scheduling functionality in Python (python is used for file processing, but there are other applications executing as part of the build process). Executing in parallel within python would need to be taken into account by the build system itself.

If you’re not using any third-party packages (which it doesn’t look like you are?), you can reduce the startup time a bit by passing the -S flag, i.e. python3 -S ..., which skips importing the site module.