The XVF3800 is a 4-mic array and only exports a 6-channel stream at the ALSA level (4 raw mics + 2 DSP-processed channels). PyAudio/SpeechRecognition asks for 1 channel, ALSA rejects it.
What works is: arecord plughw:2,0 -c 6 captures a fixed-duration WAV with all 6 channels
Python reads the raw PCM, reshapes it with numpy into (frames × 6), and slices out channel 0
That mono buffer is wrapped in sr.AudioData in my code and sent to Google STT directly for use on a local LLM running on a Hailo H10 Device attached to a Raspberry Pi 5.