Need help with audio frame length using pvporcupine

Bexell · October 24, 2024, 3:07am

I’m super new to python and know practically nothing, but I’ve been working on this project with the help of chatgpt to use openai and porcupine to make a jarvis esc virtual assistant. I’m using python 3.12, and I keep getting the following response Preformatted textwhen I attempt to execute my code:

Assistant is ready. Say 'G P T' to wake me up.
Expected Frame Length: 512
Traceback (most recent call last):
  File "D:\Bexell Stuff\GPT chatbot\GPTcode.py", line 70, in <module>
    keyword_index = porcupine.process(pcm)  # Process the audio data directly
                    ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\bexel\AppData\Local\Programs\Python\Python312\Lib\site-packages\pvporcupine\_porcupine.py", line 236, in process
    raise ValueError("Invalid frame length. expected %d but received %d" % (self.frame_length, len(pcm)))
ValueError: Invalid frame length. expected 512 but received 1024

Here’s my code:

import pvporcupine
import pyaudio
import speech_recognition as sr
import pyttsx3
import openai
import time

# Initialize pyttsx3 for text-to-speech
tts_engine = pyttsx3.init()
tts_engine.setProperty('rate', 150)

# OpenAI setup
openai.api_key = ""

# Picovoice access key
ACCESS_KEY = ""

# Path to your custom wake word model
keyword_path = "D:\\Bexell Stuff\\GPT chatbot\\gpt.ppn"

# Initialize Porcupine with custom wake word
porcupine = pvporcupine.create(
    access_key=ACCESS_KEY,
    keyword_paths=[keyword_path]
)

# Set up PyAudio with a buffer size matching the expected frame length
audio_stream = pyaudio.PyAudio().open(
    rate=porcupine.sample_rate,  # Ensure 16000 Hz sample rate
    channels=1,  # Mono audio
    format=pyaudio.paInt16,  # 16-bit audio
    input=True,
    frames_per_buffer=porcupine.frame_length  # Set buffer size to the expected frame length (512 samples)
)

recognizer = sr.Recognizer()

def speak(text):
    tts_engine.say(text)
    tts_engine.runAndWait()

def listen_for_command():
    with sr.Microphone() as source:
        print("Listening...")
        audio = recognizer.listen(source)
        try:
            command = recognizer.recognize_google(audio)
            print(f"You said: {command}")
            return command
        except sr.UnknownValueError:
            print("I didn't catch that.")
            return None

def get_openai_response(prompt):
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message['content']

print("Assistant is ready. Say 'G P T' to wake me up.")
print(f"Expected Frame Length: {porcupine.frame_length}")

while True:
    # Read audio from the stream
    pcm = audio_stream.read(porcupine.frame_length, exception_on_overflow=False)  # Read 512 samples (1024 bytes)

    # Ensure we only process the required amount of data
    if len(pcm) == porcupine.frame_length * 2:  # Check for 1024 bytes
        keyword_index = porcupine.process(pcm)  # Process the audio data directly

        if keyword_index >= 0:  # Wake word detected!
            print("Wake word detected!")
            speak("Yes, how can I assist?")

            command = listen_for_command()
            if command:
                response = get_openai_response(command)
                print(f"G P T: {response}")
                speak(response)

            time.sleep(1)  # Avoid immediate re-triggering
    else:
        print(f"Warning: Expected {porcupine.frame_length * 2} bytes but got {len(pcm)}.")

jeff5 · October 24, 2024, 6:55pm

Hi and welcome to the forum. Also, thank-you for taking the time to post nicely marked-up code.

In this circumstance, I would write another program with just the code that deals with reading audio and checking the frame length, and have it print out lengths. Almost certainly the problem is a misunderstanding about the API to pyaudio or pvporcupine, frames and bytes, or maybe the number of channels (stereo vs mono). Printing out some of the pcm data (on a silent mic, I suggest) might also let you guess whether it really is 16-bit. Is the pcm object really a bytes or maybe an array of int16? (Print out type(pcm).

I don’t know the API myself, but wave (in the standard library) is somewhat that way.

A simplified program, without the AI stuff, if it hasn’t made the problem obvious, would also be easier for others to help with.

Bexell · October 25, 2024, 6:58pm

Thank you so much. ChatGPT has been helping me annotate the code and it’s been super helpful in understanding what’s actually going on. I’ll try to shorten the code and run some tests on just the essentials asap, maybe adding a few more outputs so that i know exactly what info is coming in