Real-Time Frequency Filter for Audio playing Periodic Static Beeps

_auxerre · May 17, 2024, 2:19pm

Hi, I’m pretty new to coding and am trying to create something that takes in live audio from the microphone, finds the frequency which has the highest amplitude, and then taking that live audio input, clearing everyting outside of a ±40hz range from that frequency, and then outputting it.

This is the code i have written for it:

import pyaudio
import struct
import numpy as np
import matplotlib.pyplot as plt
import time
from tkinter import TclError

# Constants
CHUNK = 1024 * 8             # Samples per frame
FORMAT = pyaudio.paInt16     # Audio format
CHANNELS = 2                 # Single channel for microphone
RATE = 44100                 # Samples per second

# PyAudio class instance
p = pyaudio.PyAudio()
# Stream object to get data from microphone
inputstream = p.open(format=FORMAT,channels=CHANNELS,rate=RATE,input=True,output=True,frames_per_buffer=CHUNK)
# Stream object to output antisound
outputstream = p.open(format=FORMAT,channels=CHANNELS,rate=RATE,input=True,output=True,frames_per_buffer=CHUNK)

frame_count = 0
start_time = time.time()
plt.show()
while True:

    # Binary input data
    data = inputstream.read(CHUNK)
    # Convert data to integers
    data_int = struct.unpack(str(2 * CHUNK) + 'B', data)
    # Convert data to np array 
    data_np = np.array(data_int, dtype='b')[::2]
    
    # FFT the data
    fft_data = np.fft.fft(data_np)
    freqs = np.fft.fftfreq(len(data_np),d = 1./RATE) #takes the frequencies of the data and stores them in an array
    mag_fft_data = np.abs(fft_data) #fourier coefficients are complex numbers that tell you the magnitude of the wave. Use np.abs to find magnitude of complex number and then store those in the array
    threshold = freqs(np.argmax(mag_fft_data)) #finds frequency of max amplitude 
    indices_to_zero = np.where((np.abs(freqs) < np.abs(threshold) - 40) | (np.abs(freqs) > np.abs(threshold) + 40)) #bandpass filter
    '''
    threshold  = max(mag_fft_data)/2 #finding the maximum amplitude so far, and creating a threshold that is half of the max
    indices_to_zero = np.where(mag_fft_data < threshold) #finds every point in the array where the data is less than the threshold
    '''
    fft_data_clean = np.copy(fft_data) #copies th fourier data for clean up
    fft_data_clean[indices_to_zero] = 0 #makes all the points where the magnitude was below the threshold 0
    np_int_band_pass = np.fft.ifft(fft_data_clean) #inverse transforms it to put back into function
    list_int_band_pass = np_int_band_pass.tolist() #converts np array to python list
    bin_band_pass = struct.pack(str(CHUNK) + 'h', *list_int_band_pass) #converts to binary
    # Print
    print('Frequencies: ', freqs)
    print('Fourier Coefficients: ', fft_data)

    outputstream.write(bin_band_pass)

Right now, it just seems to be outputting periodic static beeps and I have absolutely no clue why.
I’d appreciate it greatly if someone could tell me what was wrong with my code

onePythonUser · May 17, 2024, 5:11pm

Hi,

from a practical standpoint, what sound are you expecting? The audio range is approximately from 16 Hz - 20 kHz. If you’re sampling audio streams and only selecting the peak maximums, and filtering out all the frequencies that are +/- 40 Hz from the center peak (BP filters), you are left with relatively narrow set of frequencies. You also have this code in a loop. Meaning you are continuously repeating the same process over and over again. Maybe the output sound is not surprising.

I wouldn’t expect anything differently to be honest since as per your own script requirements are only obtaining very (maybe even extremely) narrow set of frequencies. And, since this is related to audio frequencies, you can’t really expect a range of sounds, now can you.

For perspective, here is a time domain vs frequency domain comparison. Assuming you have two signals as shown here explicitly (though you can have signals composed of one or more frequencies via superposition), and you filter out all others by the use of a band pass filter. What you will have left is only one small tiny frequency spectrum for audio purposes (in the case of +/- 40 Hz from the center peak which is rather narrow) which will be quite monotonic in nature.

bp_filter

Here is a small experiment that you can perform to test if your code is working as expected. Open up the band pass filter to +/- 100 kHz (then slowly start narrowing it). This is to pass ALL potential audio frequencies. If the audio output sound is a replicate of the input audio stream, then it might be surmised that your code is working. The narrower you define the upper and lower frequencies of the band pass filter, the more monotonic the output audio will become.

_auxerre · May 17, 2024, 7:54pm

Ah right, thank you very much. Your test had pointed out that my code was not working at all, as even when the microphone input was turned all the way down, it was still playing said beeps. I’ve somewhat resolved as I had realised the outputstream object did not take the datatype that I had expected. However, it plays a very muffled, albeit filtered output of what I had just said into the microphone.
Many Regards, Aditya
If you have any advice on how I can fix the very muffled, quiet audio, I would appreciate it, as I’m not very familiar with the pyaudio library.

onePythonUser · May 17, 2024, 8:19pm

Hi,

as stated above, the upper limit in the audio frequency range is approximately ~20 kHz. To replicate a given signal, you should be sampling at a minimum of 2x of the input signal frequency. This is in theory. However, I have actually seen this in practice and it really didn’t work. What you want to do is oversample. Your current sampling rate is the absolute minimum. Try larger values - multiples of the upper bound: 60 kHz (3x), 80 kHz (4x) 100 kHz (5x), …, 10x, etc.

RATE = 60000  #  Try different values

Try this experiment and observe the different results. See if that helps in replicating your audio.

Here is a related article to get you familiar with the concept.

kknechtel · May 17, 2024, 11:23pm

This says that you are expecting stereo (i.e., two channels, despite the comment) input where every sample is a 2-byte integer per channel.

Aditya Das Gupta :

    # Binary input data
    data = inputstream.read(CHUNK)
    # Convert data to integers
    data_int = struct.unpack(str(2 * CHUNK) + 'B', data)
    # Convert data to np array 
    data_np = np.array(data_int, dtype='b')[::2]

This says that you will read some data, interpret 16,384 bytes worth of that data as a tuple of 16,384 unsigned single-byte integer values, convert that to a NumPy array and take every other value. I guess with the last part you’re trying to extract one channel; but your sample value type does not make sense.

If you want to unpack 8,192 two-byte values with struct.unpack, the format string should look like str(CHUNK) + 'h' (or, you know, f'{CHUNK}h'). Uppercase letters in struct are for unsigned types, just like with NumPy dtypes; and fundamentally your data is supposed to be using 2-byte values. But better yet, don’t use struct for this, because NumPy already handles it. Use np.frombuffer to create the array.