Load ML model file once in Python

I’m working on vosk (Speech to text conversion) library, Implementing in python project.

Vosk Link : https://github.com/alphacep/vosk-api

I have this line of code

import wave
import json
import subprocess

from vosk import Model, KaldiRecognizer, SetLogLevel
import Word as custom_Word

model = Model("model")
rec = KaldiRecognizer(model, sample_rate)

model=Model("model") this line loads a large file into a memory, I want to load this model only once and every other instance can share it.


import os

import sys

from .vosk_cffi import ffi as _ffi

def open_dll():

    dlldir = os.path.abspath(os.path.dirname(__file__))

    if sys.platform == 'win32':

        # We want to load dependencies too

        os.environ["PATH"] = dlldir + os.pathsep + os.environ['PATH']

        if hasattr(os, 'add_dll_directory'):


        return _ffi.dlopen(os.path.join(dlldir, "libvosk.dll"))

    elif sys.platform == 'linux':

        return _ffi.dlopen(os.path.join(dlldir, "libvosk.so"))

    elif sys.platform == 'darwin':

        return _ffi.dlopen(os.path.join(dlldir, "libvosk.dyld"))


        raise TypeError("Unsupported platform")

_c = open_dll()



class Model(object):

    def __init__(self, model_path):


        self._handle = _c.vosk_model_new(model_path.encode('utf-8'))

        if self._handle == _ffi.NULL:

            raise Exception("Failed to create a model")

    def __del__(self):


    def vosk_model_find_word(self, word):

        return _c.vosk_model_find_word(self._handle, word.encode('utf-8'))

    #def loadModel(self,model_path):

class SpkModel(object):

    def __init__(self, model_path):

        self._handle = _c.vosk_spk_model_new(model_path.encode('utf-8'))

        if self._handle == _ffi.NULL:

            raise Exception("Failed to create a speaker model")

    def __del__(self):


class KaldiRecognizer(object):

    def __init__(self, *args):

        if len(args) == 2:

            self._handle = _c.vosk_recognizer_new(args[0]._handle, args[1])

        elif len(args) == 3 and type(args[2]) is SpkModel:

            self._handle = _c.vosk_recognizer_new_spk(args[0]._handle, args[1], args[2]._handle)

        elif len(args) == 3 and type(args[2]) is str:

            self._handle = _c.vosk_recognizer_new_grm(args[0]._handle, args[1], args[2].encode('utf-8'))


            raise TypeError("Unknown arguments")

        if self._handle == _ffi.NULL:

            raise Exception("Failed to create a recognizer")

    def __del__(self):


    def SetMaxAlternatives(self, max_alternatives):

        _c.vosk_recognizer_set_max_alternatives(self._handle, max_alternatives)

    def SetWords(self, enable_words):

        _c.vosk_recognizer_set_words(self._handle, 1 if enable_words else 0)

    def SetSpkModel(self, spk_model):

        _c.vosk_recognizer_set_spk_model(self._handle, spk_model._handle)

    def AcceptWaveform(self, data):

        res = _c.vosk_recognizer_accept_waveform(self._handle, data, len(data))

        if res < 0:

            raise Exception("Failed to process waveform")

        return res

    def Result(self):

        return _ffi.string(_c.vosk_recognizer_result(self._handle)).decode('utf-8')

    def PartialResult(self):

        return _ffi.string(_c.vosk_recognizer_partial_result(self._handle)).decode('utf-8')

    def FinalResult(self):

        return _ffi.string(_c.vosk_recognizer_final_result(self._handle)).decode('utf-8')

    def Reset(self):

        return _c.vosk_recognizer_reset(self._handle)

def SetLogLevel(level):

    return _c.vosk_set_log_level(level)

def GpuInit():


def GpuThreadInit():


Its a little unclear what instances are sharing what; do you control the Model class (i.e. is vosk _your project)? Or are you seeking to adapt its Model class for your own use, to cache the model once loaded (which I’m guessing is done in the self._handle line? Without these details or a more specific problem statement, its difficult to know exactly what you need here. However, the basic idea is you can wrap the expensive function/method in a function and apply the functools.lru_cache decorator to it, which will store and return the same return value whenever it is called again with the same arguments, and then modify/subclass the Model object to have its __init__ call that instead. You could also manually build a lookup table of model_path keys as a class variable/attribute and store/get the results from there.

@CAM-Gerlach Thanks for your reply actually Vosk is an offline open source speech recognition toolkit. I integrated it in my python application it is working also but issue is it loads a model file which is aroung 2.5GB every time i run application so i need someway to load that model once and use that model every time, i have added more code in my question please check

Okay, thanks for the addition detail. Again, there are a few things I’m trying to get clarification on.

Is the code in the last codeblock (that defines the Model class) is your code that you want to modify to cache the result of loading the model (which I’m presuming is what is stored as self._handle? And are the “instances” you are referring to multiple different instances of Model? If so, again, wrap _c.vosh_model_new(model_path.encode()) in a function, use the @functools.lru_cache() decorator on it (with a very low maxsize, perhaps even 1 if you only ever load one model), and then call that wrapped function in self._handle.

If not, and the last code block is just the vosk code you do not control provided for reference, you can either pass around the same instance of Model (not sure if there’s a reason to create multiple instances), you can wrap it in a function get_model(path) that just does return Model(path) and use @functools.lru_cache on that and when you need a Model instance, call get_model instead of Model directly, or if you need truly unique Model instances but with the same data, subclass Model and override __init__ to operate as above.

Again, there’s a number of other ways to cache it, but that’s probably the simplest.

Thanks yes last code block is vosk code and initial code block is my code, should i create this method that will cache in my block or in vosk code block this is a simple model reference model = Model("model")

If the second code block is vosk’s code, you could consider making a feature request/pull request to the project to cache the model, but for now, you can just do what I suggest in the second paragraph.

Thanks for the reply, one more point i want to ask, will @functools.lru_cache handles all the cache related tasks or i have to handle it suppose i have loaded model files first tiem and second time it will automatically load it from cache, and will data be available in the cache till i clear it?

I was trying to implement like this :

def get_model():
    return Model("model")
model = get_model()

in my custom python file not vosk one
but second time also when i run my code it is taking time, Can you please let me knowhow to resolve this issue

One important thing to keep in mind is that the caching is happening in memory; it will work if you call get_model() multiple times within the same Python interpreter, i.e. within the same script, different modules within the same program, or different times within the same interactive interpreter session, but it will not work if you run a script via python spam.py, it completes and then you run the script again via python spam.py (it will work if you do so inside an IDE like Spyder, though, if you set it to use the same session).

If Model objects support serialization via pickle, you can use pickle.dump() to save them to disk and pickle.load() to reload them, but you’re ultimately limited by your disk’s read speed—for a fast SSD, this may be only a few seconds, or for a slow HDD, this could be minutes. Depending on exactly what _c.vosk_model_new is doing, doing this may save a significant amount of time, or it could even be slower in some circumstances. To note, some objects cannot be serialized via Pickle, so in that case, this approach won’t be possible unless vosk offers its own serialization system or you explore alternatives like cloudpickle and dill.

One fancier and potentially simpler and more elegant way of doing disk based caching is using joblib.Memory, though that of course requires using Joblib.

One other thing to note—with caching set up this way, versus subclassing and overriding __init__, you are not only loading the same data, but actually returning the same model object, so any changes you make to it will be reflected on all instances. Of course, for the same effect, you could simply pass around the same object.

I tested your implementation on some simple example functions, e.g.

class LongClass:
    def __init__(self, output):
        self._output = output
def wrapper_function():
    return LongClass("output")

And indeed, it works as expected, though it is a good idea to pass your model name through the wrapper function instead of hardcoding it, i.e.

def get_model(model_name):
    return Model(model_name)

model = get_model("model")

In case you want to use different or multiple models; lru_cache will cache each one separately.