保存Android Stock语音识别引擎的音频输入

时间:2021-12-24 19:42:59

I am trying to save in a file the audio data listened by speech recognition service of android.

我试图在文件中保存由android语音识别服务监听的音频数据。

Actually I implement RecognitionListener as explained here: Speech to Text on Android

实际上我按照这里的解释实现了RecognitionListener:Android上的语音到文本

save the data into a buffer as illustrated here: Capturing audio sent to Google's speech recognition server

将数据保存到缓冲区中,如下所示:捕获发送到Google语音识别服务器的音频

and write the buffer to a Wav file, as in here. Android Record raw bytes into WAVE file for Http Streaming

并将缓冲区写入Wav文件,如此处所示。 Android将原始字节记录到HVEp Streaming的WAVE文件中

My problem is how to get appropriate audio settings to save in the wav file's headers. In fact when I play the wav file only hear strange noise, with this parameters,

我的问题是如何获得适当的音频设置以保存在wav文件的标题中。事实上当我播放wav文件时,只听到奇怪的噪音,有了这个参数,

short nChannels=2;// audio channels
int sRate=44100;    // Sample rate
short bSamples = 16;// byteSample

or nothing with this:

或者没有这个:

short nChannels=1;// audio channels
int sRate=8000;    // Sample rate
short bSamples = 16;// byteSample

What is confusing is that looking at parameters of the speech recognition task from logcat I find first Set PLAYBACK sample rate to 44100 HZ:

令人困惑的是从logcat查看语音识别任务的参数我发现第一个Set PLAYBACK采样率为44100 HZ:

    12-20 14:41:34.007: DEBUG/AudioHardwareALSA(2364): Set PLAYBACK PCM format to S16_LE (Signed 16 bit Little Endian)
    12-20 14:41:34.007: DEBUG/AudioHardwareALSA(2364): Using 2 channels for PLAYBACK.
    12-20 14:41:34.007: DEBUG/AudioHardwareALSA(2364): Set PLAYBACK sample rate to 44100 HZ
    12-20 14:41:34.007: DEBUG/AudioHardwareALSA(2364): Buffer size: 2048
    12-20 14:41:34.007: DEBUG/AudioHardwareALSA(2364): Latency: 46439

and then aInfo.SampleRate = 8000 when it plays the file to send to google server:

然后当它播放要发送到谷歌服务器的文件时aInfo.SampleRate = 8000:

    12-20 14:41:36.152: DEBUG/(2364): PV_Wav_Parser::InitWavParser
12-20 14:41:36.152: DEBUG/(2364): File open Succes
12-20 14:41:36.152: DEBUG/(2364): File SEEK End Succes
...
12-20 14:41:36.152: DEBUG/(2364): PV_Wav_Parser::ReadData
12-20 14:41:36.152: DEBUG/(2364): Data Read buff = RIFF?
12-20 14:41:36.152: DEBUG/(2364): Data Read = RIFF?
12-20 14:41:36.152: DEBUG/(2364): PV_Wav_Parser::ReadData
12-20 14:41:36.152: DEBUG/(2364): Data Read buff = fmt 
...
12-20 14:41:36.152: DEBUG/(2364): PVWAVPARSER_OK
12-20 14:41:36.156: DEBUG/(2364): aInfo.AudioFormat = 1
12-20 14:41:36.156: DEBUG/(2364): aInfo.NumChannels = 1
12-20 14:41:36.156: DEBUG/(2364): aInfo.SampleRate = 8000
12-20 14:41:36.156: DEBUG/(2364): aInfo.ByteRate = 16000
12-20 14:41:36.156: DEBUG/(2364): aInfo.BlockAlign = 2
12-20 14:41:36.156: DEBUG/(2364): aInfo.BitsPerSample = 16
12-20 14:41:36.156: DEBUG/(2364): aInfo.BytesPerSample = 2
12-20 14:41:36.156: DEBUG/(2364): aInfo.NumSamples = 2258

So, how can I find out the right parameters to save the audio buffer in a good wav audio file?

那么,我怎样才能找到合适的参数来将音频缓冲区保存在一个好的wav音频文件中呢?

3 个解决方案

#1


6  

You haven't included your code for actually writing out the PCM data, so its hard to diagnose, but if you are hearing strange noises then it looks most likely you have the wrong endian when you are writing the data, or the wrong number of channels. Getting the sample rate wrong will only result in the audio sounding slower or faster, but if it sounds completely garbled it is probably either a mistake in specifying the number of channels or endianess of your byte stream.

您没有包含实际写出PCM数据的代码,因此难以诊断,但如果您听到奇怪的噪音,那么当您编写数据或错误的数字时,您很可能会看到错误的字节序。通道。获取采样率错误只会导致音频声音变慢或变快,但如果听起来完全乱码,则可能是指定通道数或字节流的字节顺序错误。

To know for sure, just stream your bytes directly to a file without any header (raw PCM data). This way you can rule out any errors when writing your file header. Then use Audacity to import the raw data, experimenting with the different options (bit depth, endian, channels) until you get an audio file that sounds correct (only one will be right). You do this from File->Import->Raw Data...

要确切知道,只需将您的字节直接流式传输到没有任何标头的文件(原始PCM数据)。这样,您可以在编写文件头时排除任何错误。然后使用Audacity导入原始数据,尝试使用不同的选项(位深度,字节序,通道),直到获得听起来正确的音频文件(只有一个是正确的)。您可以从文件 - >导入 - >原始数据执行此操作...

Once you have identified your byte format this way you only have to worry about whether you are setting the headers correctly. You might want to refer to this reference http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html for the file format. Or see the following links on existing Java solutions on writing audio files, Java - reading, manipulating and writing WAV files , or FMJ. Although I guess these might not be usable on Android.

一旦您以这种方式识别了字节格式,您只需要担心是否正确设置了标题。您可能希望参考此参考http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html获取文件格式。或者在现有Java解决方案上查看以下关于编写音频文件,Java读取,操作和编写WAV文件或FMJ的链接。虽然我猜这些可能无法在Android上使用。

If you are having to roll your own WAV/RIFF writer remember Java's data types are big-endian so any multi-byte primitives you write to your file must be written in reverse byte order to match RIFF's little-endianess.

如果您不得不自己编写WAV / RIFF编写器,请记住Java的数据类型是big-endian,因此您写入文件的任何多字节基元必须以反向字节顺序写入,以匹配RIFF的小端。

#2


2  

8000, little endian, 16 bit PCM, mono channel did the trick

8000,小端,16位PCM,单声道通道

#3


0  

In latest version onBufferReceived does not work, you can use record/save audio from voice recognition intent instead.

在最新版本的onBufferReceived不起作用时,您可以使用来自语音识别意图的录制/保存音频。

#1


6  

You haven't included your code for actually writing out the PCM data, so its hard to diagnose, but if you are hearing strange noises then it looks most likely you have the wrong endian when you are writing the data, or the wrong number of channels. Getting the sample rate wrong will only result in the audio sounding slower or faster, but if it sounds completely garbled it is probably either a mistake in specifying the number of channels or endianess of your byte stream.

您没有包含实际写出PCM数据的代码,因此难以诊断,但如果您听到奇怪的噪音,那么当您编写数据或错误的数字时,您很可能会看到错误的字节序。通道。获取采样率错误只会导致音频声音变慢或变快,但如果听起来完全乱码,则可能是指定通道数或字节流的字节顺序错误。

To know for sure, just stream your bytes directly to a file without any header (raw PCM data). This way you can rule out any errors when writing your file header. Then use Audacity to import the raw data, experimenting with the different options (bit depth, endian, channels) until you get an audio file that sounds correct (only one will be right). You do this from File->Import->Raw Data...

要确切知道,只需将您的字节直接流式传输到没有任何标头的文件(原始PCM数据)。这样,您可以在编写文件头时排除任何错误。然后使用Audacity导入原始数据,尝试使用不同的选项(位深度,字节序,通道),直到获得听起来正确的音频文件(只有一个是正确的)。您可以从文件 - >导入 - >原始数据执行此操作...

Once you have identified your byte format this way you only have to worry about whether you are setting the headers correctly. You might want to refer to this reference http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html for the file format. Or see the following links on existing Java solutions on writing audio files, Java - reading, manipulating and writing WAV files , or FMJ. Although I guess these might not be usable on Android.

一旦您以这种方式识别了字节格式,您只需要担心是否正确设置了标题。您可能希望参考此参考http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html获取文件格式。或者在现有Java解决方案上查看以下关于编写音频文件,Java读取,操作和编写WAV文件或FMJ的链接。虽然我猜这些可能无法在Android上使用。

If you are having to roll your own WAV/RIFF writer remember Java's data types are big-endian so any multi-byte primitives you write to your file must be written in reverse byte order to match RIFF's little-endianess.

如果您不得不自己编写WAV / RIFF编写器,请记住Java的数据类型是big-endian,因此您写入文件的任何多字节基元必须以反向字节顺序写入,以匹配RIFF的小端。

#2


2  

8000, little endian, 16 bit PCM, mono channel did the trick

8000,小端,16位PCM,单声道通道

#3


0  

In latest version onBufferReceived does not work, you can use record/save audio from voice recognition intent instead.

在最新版本的onBufferReceived不起作用时,您可以使用来自语音识别意图的录制/保存音频。