Whisper ai can't generate multi audio to txt file

I tried to get 2 or more audio into txt via whisper, but surprisingly I can’t get it, what’s the problem besides?

D:\Whisper>whisper Vernon 歌曲见证分享.mp3 VALERIE 【小雨的回忆见证】更容易听见圣灵的感动.mp3
Traceback (most recent call last):
File “C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\whisper\audio.py”, line 58, in load_audio
out = run(cmd, capture_output=True, check=True).stdout
File “C:\Users\user\AppData\Local\Programs\Python\Python310\lib\subprocess.py”, line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command ‘[‘ffmpeg’, ‘-nostdin’, ‘-threads’, ‘0’, ‘-i’, ‘Vernon’, ‘-f’, ‘s16le’, ‘-ac’, ‘1’, ‘-acodec’, ‘pcm_s16le’, ‘-ar’, ‘16000’, ‘-’]’ returned non-zero exit status 4294967294.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File “C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\whisper\transcribe.py”, line 478, in cli
result = transcribe(model, audio_path, temperature=temperature, **args)
File “C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\whisper\transcribe.py”, line 122, in transcribe
mel = log_mel_spectrogram(audio, model.dims.n_mels, padding=N_SAMPLES)
File “C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\whisper\audio.py”, line 140, in log_mel_spectrogram
audio = load_audio(audio)
File “C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\whisper\audio.py”, line 60, in load_audio
raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}") from e
RuntimeError: Failed to load audio: ffmpeg version 6.1.1-essentials_build-www.gyan.dev Copyright (c) 2000-2023 the FFmpeg developers
built with gcc 12.2.0 (Rev10, Built by MSYS2 project)
configuration: --enable-gpl --enable-version3 --enable-static --pkg-config=pkgconf --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-bzlib --enable-lzma --enable-zlib --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-sdl2 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-mediafoundation --enable-libass --enable-libfreetype --enable-libfribidi --enable-libharfbuzz --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-dxva2 --enable-d3d11va --enable-libvpl --enable-libgme --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libtheora --enable-libvo-amrwbenc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-librubberband
libavutil 58. 29.100 / 58. 29.100
libavcodec 60. 31.102 / 60. 31.102
libavformat 60. 16.100 / 60. 16.100
libavdevice 60. 3.100 / 60. 3.100
libavfilter 9. 12.100 / 9. 12.100
libswscale 7. 5.100 / 7. 5.100
libswresample 4. 12.100 / 4. 12.100
libpostproc 57. 3.100 / 57. 3.100
[in#0 @ 00000236a04cae80] Error opening input: No such file or directory
Error opening input file Vernon.
Error opening input files: No such file or directory

The issue seems to be that a file isn’t found:

Error opening input file Vernon.

On this command line, how many file names do you believe there are, and what are those filenames?

This is not an issue with Python; it is an issue of understanding how the command line works. When Windows calls your program, it has to know how to split up this command line into separate arguments. It will split at every space, and tell the whisper program that it got four arguments: Vernon, 歌曲见证分享.mp3, VALERIE and 【小雨的回忆见证】更容易听见圣灵的感动.mp3.

If you meant that there should be two files named Vernon 歌曲见证分享.mp3 and VALERIE 【小雨的回忆见证】更容易听见圣灵的感动.mp3, then these need to be wrapped in quotes.

so it means next time when i need to generate two or above audio to txt, the command lines is C:\Whisper> whisper “testing1.mp3 --language German --model large-v2” “testing2.mp3–language Chinese --model large-v2”

No, because e.g. --language German --model large-v2 is not part of the filename, they are separate arguments to use for the command.

“If I have two audio files, one is ‘testing1.mp3’ and the other is ‘testing2.mp3,’ I need to generate text from both at once. What command should I type that is correct?”

Then you can type the names normally, and whatever other arguments you need. You should put quotes around the file names themselves, if there is a space in the name. In general, you will need quotes for anything that is a single argument for the command, but that has spaces in it. The problem happened in your first example because e.g. Vernon 歌曲见证分享.mp3 has a space in it (between Vernon and 歌曲见证分享.mp3). testing1.mp3 does not have a space in it.

I don’t know anything about the whisper program; I am only explaining about how to use the command line properly.

Thank you for your help