1、paddlespeech asr语音转录文字
参考:
/PaddlePaddle/PaddleSpeech
安装后运行可能会numpy相关报错;可能是python和numpy版本高的问题,我这里最终解决是python 3.10 numpy 1.22.0;
pip install paddlepaddle -i /pypi/simple
pip install paddlespeech
- 1
- 2
1)代码
模型默认下载保存位置:C:\Users\\models下
from import ASRExecutor
asr = ASRExecutor()
result = asr(audio_file="") ##第一次运行会首先下载自动模型
print(result)
- 1
- 2
- 3
- 4
###标点恢复
!paddlespeech text --task punc --input 今天的天气真不错啊你下午有空吗我想约你一起去吃饭
##或
from import TextExecutor
text_punc = TextExecutor()
result = text_punc(text="今天的天气真不错啊你下午有空吗我想约你一起去吃饭")
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
2)实时语音转录
参考:/chenkui164/p/
/PaddlePaddle/PaddleSpeech/blob/develop/demos/streaming_asr_server/
/PaddlePaddle/PaddleSpeech/tree/develop/demos/streaming_asr_server/web
paddlespeech_server stats --task asr ##可以擦好看支持的模型,更改模型该yaml文件
- 1
## 首先运行asr服务器
# 开启流式语音识别服务
cd PaddleSpeech/demos/streaming_asr_server
paddlespeech_server start --config_file conf/ws_conformer_wenetspeech_application_faster.yaml
- 1
- 2
- 3
- 4
运行后运行demo里的\demos\streaming_asr_server\web\文件测试:
2、阿里FunASR
/alibaba-damo-academy/FunASR/blob/main/runtime/docs/SDK_advanced_guide_online_zh.md
测试下来感受速度不快,英文识别不大准,优点支持标点断句
直接docker运行服务
##1、拉取镜像
sudo docker pull /funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.2
mkdir -p ./funasr-runtime-resources/models
## 2、运行容器,进入容器启动服务
sudo docker run -p 10095:10095 -it --privileged=true -v ./funasr-runtime-resources/models:/workspace/models /funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.2
##执行服务,模型会自动下载,启动 funasr-wss-server-2pass服务程序
cd FunASR/funasr/runtime
nohup bash run_server_2pass.sh \
--download-model-dir /workspace/models \
--vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
--model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx \
--online-model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx \
--punc-dir damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx \
--itn-dir thuduj12/fst_itn_zh > 2>&1 &
# 如果您想关闭ssl,增加参数:--certfile 0
# 如果您想使用时间戳或者热词模型进行部署,请设置--model-dir为对应模型:
# damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-onnx(时间戳)
# 或者 damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404-onnx(热词)
## 3、客服端运行(支持python 、CPP、html网页版本、Java、c#),代码下载链接:wget /ics/MaaS/ASR/sample/funasr_samples.
##运行python脚本
python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode 2pass
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
## 上面第二步服务器运行可以合并一起(不去掉nohup会运行不起来):
docker run -p 10095:10095 -d --privileged=true -v D:\funasr-runtime-resources\models:/workspace/models /funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.2 /bin/bash -c "cd /workspace/FunASR/funasr/runtime && bash run_server_2pass.sh --download-model-dir /workspace/models --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx --model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx --online-model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx --punc-dir damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx --itn-dir thuduj12/fst_itn_zh "
- 1
- 2
如果docker镜像版本是funasr:funasr-runtime-sdk-online-cpu是(0.1.6,0.1.7,0.1.8);这容器启动会一会需要等待*(没启动前运行客服端肯呢个报ConnectionResetError)
docker run -p 10095:10095 -d --privileged=true -v D:\funasr-runtime-resources\models:/workspace/models registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.2 /bin/bash -c "cd /workspace/FunASR/funasr/runtime && bash run_server_2pass.sh --download-model-dir /workspace/models --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx --model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx --online-model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx --punc-dir damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx --itn-dir thuduj12/fst_itn_zh "
- 1
然后运行客服端即可使用:
(代码下载链接:wget /ics/MaaS/ASR/sample/funasr_samples.)
代码:/alibaba-damo-academy/FunASR/blob/main/runtime/python/websocket/funasr_wss_client.py
python funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode 2pass
- 1
3、sherpa 实时语音转录
1)ncnn版本
参考:/k2-fsa/sherpa-ncnn
/video/BV1K44y197Fg
版本:sherpa-ncnn-2.1.7
安装:
pip install sherpa-ncnn sounddevice -i /pypi/simple
- 1
- 2
下载:
a、下载项目:git clone /k2-fsa/
b、下载模型
/marcoyang/sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23
下载这7个文件
a-1、实时麦克风转录文本
/k2-fsa/sherpa-ncnn/blob/master/python-api-examples/
/sherpa/ncnn/python/#start-recording
#!/usr/bin/env python3
# Real-time speech recognition from a microphone with sherpa-ncnn Python API
#
# Please refer to
# /sherpa/ncnn/pretrained_models/
# to download pre-trained models
import sys
try:
import sounddevice as sd
except ImportError as e:
print("Please install sounddevice first. You can use")
print()
print(" pip install sounddevice")
print()
print("to install it")
(-1)
import sherpa_ncnn
def create_recognizer():
# Please replace the model files if needed.
# See /sherpa/ncnn/pretrained_models/
# for download links.
recognizer = sherpa_ncnn.Recognizer(
tokens="./sherpa-ncnn-conv-emformer-transducer-2022-12-06/",
encoder_param="./sherpa-ncnn-conv-emformer-transducer-2022-12-06/encoder_jit_trace",
encoder_bin="./sherpa-ncnn-conv-emformer-transducer-2022-12-06/encoder_jit_trace",
decoder_param="./sherpa-ncnn-conv-emformer-transducer-2022-12-06/decoder_jit_trace",
decoder_bin="./sherpa-ncnn-conv-emformer-transducer-2022-12-06/decoder_jit_trace",
joiner_param="./sherpa-ncnn-conv-emformer-transducer-2022-12-06/joiner_jit_trace",
joiner_bin="./sherpa-ncnn-conv-emformer-transducer-2022-12-06/joiner_jit_trace",
num_threads=4,
)
return recognizer
def main():
print("Started! Please speak")
recognizer = create_recognizer()
sample_rate = recognizer.sample_rate
samples_per_read = int(0.1 * sample_rate) # 0.1 second = 100 ms
last_result = ""
with (channels=1, dtype="float32", samplerate=sample_rate) as s:
while True:
samples, _ = (samples_per_read) # a blocking read
samples = (-1)
recognizer.accept_waveform(sample_rate, samples)
result =
if last_result != result:
last_result = result
print("\r{}".format(result), end="", flush=True)
if __name__ == "__main__":
devices = sd.query_devices()
print(devices)
default_input_device_idx = [0]
print(f'Use default device: {devices[default_input_device_idx]["name"]}')
try:
main()
except KeyboardInterrupt:
print("\nCaught Ctrl + C. Exiting")
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
**修改结果打印效果,去除重复打印结果,结果每次只打印新增的,避免上面每次都打印一遍之前已经识别的内容
if last_result != result:
if i==0:
print("{}".format(result),end='')
last_result = result
i=i+1
else:
last_result_len=len(last_result)
new_word = result[last_result_len:]
# print(last_result,result,new_word)
print("{}".format(new_word),end='', flush=True)
last_result = result
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
a-2、实时麦克风转录文本,endpoint逗号
参考:/k2-fsa/sherpa-ncnn/blob/master/python-api-examples/
#!/usr/bin/env python3
# Real-time speech recognition from a microphone with sherpa-ncnn Python API
# with endpoint detection.
#
# Please refer to
# https://k2-fsa.github.io/sherpa/ncnn/pretrained_models/index.html
# to download pre-trained models
import sys
try:
import sounddevice as sd
except ImportError as e:
print("Please install sounddevice first. You can use")
print()
print(" pip install sounddevice")
print()
print("to install it")
sys.exit(-1)
import sherpa_ncnn
# def create_recognizer():
# # Please replace the model files if needed.
# # See https://k2-fsa.github.io/sherpa/ncnn/pretrained_models/index.html
# # for download links.
# recognizer = sherpa_ncnn.Recognizer(
# tokens="./sherpa-ncnn-conv-emformer-transducer-2022-12-06/tokens.txt",
# encoder_param="./sherpa-ncnn-conv-emformer-transducer-2022-12-06/encoder_jit_trace-pnnx.ncnn.param",
# encoder_bin="./sherpa-ncnn-conv-emformer-transducer-2022-12-06/encoder_jit_trace-pnnx.ncnn.bin",
# decoder_param="./sherpa-ncnn-conv-emformer-transducer-2022-12-06/decoder_jit_trace-pnnx.ncnn.param",
# decoder_bin="./sherpa-ncnn-conv-emformer-transducer-2022-12-06/decoder_jit_trace-pnnx.ncnn.bin",
# joiner_param="./sherpa-ncnn-conv-emformer-transducer-2022-12-06/joiner_jit_trace-pnnx.ncnn.param",
# joiner_bin="./sherpa-ncnn-conv-emformer-transducer-2022-12-06/joiner_jit_trace-pnnx.ncnn.bin",
# num_threads=4,
# decoding_method="modified_beam_search",
# enable_endpoint_detection=True,
# rule1_min_trailing_silence=2.4,
# rule2_min_trailing_silence=1.2,
# rule3_min_utterance_length=300,
# )
# return recognizer
def create_recognizer():
# Please replace the model files if needed.
# See https://k2-fsa.github.io/sherpa/ncnn/pretrained_models/index.html
# for download links.
# base_file = "sherpa-ncnn-conv-emformer-transducer-2022-12-06"
# base_file = "sherpa-ncnn-lstm-transducer-small-2023-02-13"
base_file = "sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13"
# base_file = "sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16"
# base_file = "sherpa-ncnn-streaming-zipformer-20M-2023-02-17"
recognizer = sherpa_ncnn.Recognizer(
tokens="./{}/tokens.txt".format(base_file),
encoder_param="./{}/encoder_jit_trace-pnnx.ncnn.param".format(base_file),
encoder_bin="./{}/encoder_jit_trace-pnnx.ncnn.bin".format(base_file),
decoder_param="./{}/decoder_jit_trace-pnnx.ncnn.param".format(base_file),
decoder_bin="./{}/decoder_jit_trace-pnnx.ncnn.bin".format(base_file),
joiner_param="./{}/joiner_jit_trace-pnnx.ncnn.param".format(base_file),
joiner_bin="./{}/joiner_jit_trace-pnnx.ncnn.bin".format(base_file),
num_threads=4,
decoding_method="modified_beam_search",
enable_endpoint_detection=True,
rule1_min_trailing_silence=2.4,
rule2_min_trailing_silence=1.2,
rule3_min_utterance_length=300,
hotwords_file="",
hotwords_score=1.5,
)
return recognizer
def main():
print("Started! Please speak")
recognizer = create_recognizer()
sample_rate = recognizer.sample_rate
samples_per_read = int(0.1 * sample_rate) # 0.1 second = 100 ms
last_result = ""
segment_id = 0
with sd.InputStream(channels=1, dtype="float32", samplerate=sample_rate) as s:
while True:
samples, _ = s.read(samples_per_read) # a blocking read
samples = samples.reshape(-1)
recognizer.accept_waveform(sample_rate, samples)
is_endpoint = recognizer.is_endpoint
result = recognizer.text
if result and (last_result != result):
last_result = result
print("\r{}:{}".format(segment_id, result), end="", flush=True)
if is_endpoint:
if result:
print("\r{}:{}".format(segment_id, result), flush=True)
segment_id += 1
recognizer.reset()
if __name__ == "__main__":
devices = sd.query_devices()
print(devices)
default_input_device_idx = sd.default.device[0]
print(f'Use default device: {devices[default_input_device_idx]["name"]}')
try:
main()
except KeyboardInterrupt:
print("\nCaught Ctrl + C. Exiting")
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
- 111
- 112
- 113
- 114
2)onnx版本(推荐)
参考:/sherpa/onnx/python/
/k2-fsa/sherpa-onnx/blob/master/python-api-examples/
安装:
pip install sherpa-onnx
a、最新版推荐
模型下载:/k2-fsa/sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12/tree/main
运行代码:
python .\speech-recognition-from-microphone-onnx.py --tokens=sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12/tokens.txt --encoder=sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12/encoder-epoch-20-avg-1-chunk-16-left-128.onnx --decoder=sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12/decoder-epoch-20-avg-1-chunk-16-left-128.onnx --joiner=sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12/joiner-epoch-20-avg-1-chunk-16-left-128.onnx
- 1
##代码
#!/usr/bin/env python3
# Real-time speech recognition from a microphone with sherpa-onnx Python API
#
# Please refer to
# https://k2-fsa.github.io/sherpa/onnx/pretrained_models/index.html
# to download pre-trained models
import argparse
import sys
from pathlib import Path
from typing import List
try:
import sounddevice as sd
except ImportError:
print("Please install sounddevice first. You can use")
print()
print(" pip install sounddevice")
print()
print("to install it")
sys.exit(-1)
import sherpa_onnx
def assert_file_exists(filename: str):
assert Path(filename).is_file(), (
f"{filename} does not exist!\n"
"Please refer to "
"https://k2-fsa.github.io/sherpa/onnx/pretrained_models/index.html to download it"
)
def get_args():
parser = argparse.ArgumentParser(
formatter_class=argparse.ArgumentDefaultsHelpFormatter
)
parser.add_argument(
"--tokens",
type=str,
required=True,
help="Path to tokens.txt",
)
parser.add_argument(
"--encoder",
type=str,
required=True,
help="Path to the encoder model",
)
parser.add_argument(
"--decoder",
type=str,
required=True,
help="Path to the decoder model",
)
parser.add_argument(
"--joiner",
type=str,
help="Path to the joiner model",
)
parser.add_argument(
"--decoding-method",
type=str,
default="greedy_search",
help="Valid values are greedy_search and modified_beam_search",
)
parser.add_argument(
"--max-active-paths",
type=int,
default=4,
help="""Used only when --decoding-method is modified_beam_search.
It specifies number of active paths to keep during decoding.
""",
)
parser.add_argument(
"--provider",
type=str,
default="cpu",
help="Valid values: cpu, cuda, coreml",
)
parser.add_argument(
"--hotwords-file",
type=str,
default="",
help="""
The file containing hotwords, one words/phrases per line, and for each
phrase the bpe/cjkchar are separated by a space. For example:
▁HE LL O ▁WORLD
你 好 世 界
""",
)
parser.add_argument(
"--hotwords-score",
type=float,
default=1.5,
help="""
The hotword score of each token for biasing word/phrase. Used only if
--hotwords-file is given.
""",
)
parser.add_argument(
"--blank-penalty",
type=float,
default=0.0,
help="""
The penalty applied on blank symbol during decoding.
Note: It is a positive value that would be applied to logits like
this `logits[:, 0] -= blank_penalty` (suppose logits.shape is
[batch_size, vocab] and blank id is 0).
""",
)
return parser.parse_args()
def create_recognizer(args):
assert_file_exists(args.encoder)
assert_file_exists(args.decoder)
assert_file_exists(args.joiner)
assert_file_exists(args.tokens)
# Please replace the model files if needed.
# See https://k2-fsa.github.io/sherpa/onnx/pretrained_models/index.html
# for download links.
recognizer = sherpa_onnx.OnlineRecognizer.from_transducer(
tokens=args.tokens,
encoder=args.encoder,
decoder=args.decoder,
joiner=args.joiner,
num_threads=1,
sample_rate=16000,
feature_dim=80,
decoding_method=args.decoding_method,
max_active_paths=args.max_active_paths,
provider=args.provider,
hotwords_file=args.hotwords_file,
hotwords_score=args.hotwords_score,
blank_penalty=args.blank_penalty,
)
return recognizer
def main():
args = get_args()
devices = sd.query_devices()
if len(devices) == 0:
print("No microphone devices found")
sys.exit(0)
print(devices)
default_input_device_idx = sd.default.device[0]
print(f'Use default device: {devices[default_input_device_idx]["name"]}')
recognizer = create_recognizer(args)
print("Started! Please speak")
# The model is using 16 kHz, we use 48 kHz here to demonstrate that
# sherpa-onnx will do resampling inside.
sample_rate = 48000
samples_per_read = int(0.1 * sample_rate) # 0.1 second = 100 ms
last_result = ""
stream = recognizer.create_stream()
with sd.InputStream(channels=1, dtype="float32", samplerate=sample_rate) as s:
while True:
samples, _ = s.read(samples_per_read) # a blocking read
samples = samples.reshape(-1)
stream.accept_waveform(sample_rate, samples)
while recognizer.is_ready(stream):
recognizer.decode_stream(stream)
result = recognizer.get_result(stream)
if last_result != result:
last_result = result
print("\r{}".format(result), end="", flush=True)
if __name__ == "__main__":
try:
main()
except KeyboardInterrupt:
print("\nCaught Ctrl + C. Exiting")
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
- 111
- 112
- 113
- 114
- 115
- 116
- 117
- 118
- 119
- 120
- 121
- 122
- 123
- 124
- 125
- 126
- 127
- 128
- 129
- 130
- 131
- 132
- 133
- 134
- 135
- 136
- 137
- 138
- 139
- 140
- 141
- 142
- 143
- 144
- 145
- 146
- 147
- 148
- 149
- 150
- 151
- 152
- 153
- 154
- 155
- 156
- 157
- 158
- 159
- 160
- 161
- 162
- 163
- 164
- 165
- 166
- 167
- 168
- 169
- 170
- 171
- 172
- 173
- 174
- 175
- 176
- 177
- 178
- 179
- 180
- 181
- 182
- 183
- 184
- 185
- 186
- 187
- 188
- 189
- 190
- 191
- 192
- 193
- 194
b、下载模型:
/csukuangfj/sherpa-onnx-streaming-conformer-zh-2023-05-23/tree/main
代码:
运行:python ./ --tokens=./sherpa-onnx-streaming-conformer-zh-2023-05-23/ --encoder=./sherpa-onnx-streaming-conformer-zh-2023-05-23/ --decoder=./sherpa-onnx-streaming-conformer-zh-2023-05-23/ --joiner=./sherpa-onnx-streaming-conformer-zh-2023-05-23/
#!/usr/bin/env python3
# Real-time speech recognition from a microphone with sherpa-onnx Python API
#
# Please refer to
# /sherpa/onnx/pretrained_models/
# to download pre-trained models
import argparse
import sys
from pathlib import Path
try:
import sounddevice as sd
except ImportError:
print("Please install sounddevice first. You can use")
print()
print(" pip install sounddevice")
print()
print("to install it")
(-1)
import sherpa_onnx
def assert_file_exists(filename: str):
assert Path(filename).is_file(), (
f"{filename} does not exist!\n"
"Please refer to "
"/sherpa/onnx/pretrained_models/ to download it"
)
def get_args():
parser = (
formatter_class=
)
parser.add_argument(
"--tokens",
type=str,
help="Path to ",
)
parser.add_argument(
"--encoder",
type=str,
help="Path to the encoder model",
)
parser.add_argument(
"--decoder",
type=str,
help="Path to the decoder model",
)
parser.add_argument(
"--joiner",
type=str,
help="Path to the joiner model",
)
parser.add_argument(
"--decoding-method",
type=str,
default="greedy_search",
help="Valid values are greedy_search and modified_beam_search",
)
return parser.parse_args()
def create_recognizer():
args = get_args()
assert_file_exists()
assert_file_exists()
assert_file_exists()
assert_file_exists()
# Please replace the model files if needed.
# See /sherpa/onnx/pretrained_models/
# for download links.
recognizer = sherpa_onnx.OnlineRecognizer(
tokens=,
encoder=,
decoder=,
joiner=,
num_threads=1,
sample_rate=16000,
feature_dim=80,
decoding_method=args.decoding_method,
)
return recognizer
def main():
recognizer = create_recognizer()
print("Started! Please speak")
# The model is using 16 kHz, we use 48 kHz here to demonstrate that
# sherpa-onnx will do resampling inside.
sample_rate = 48000
samples_per_read = int(0.1 * sample_rate) # 0.1 second = 100 ms
last_result = ""
stream = recognizer.create_stream()
with (channels=1, dtype="float32", samplerate=sample_rate) as s:
while True:
samples, _ = (samples_per_read) # a blocking read
samples = (-1)
stream.accept_waveform(sample_rate, samples)
while recognizer.is_ready(stream):
recognizer.decode_stream(stream)
result = recognizer.get_result(stream)
if last_result != result:
last_result = result
print("\r{}".format(result), end="", flush=True)
if __name__ == "__main__":
devices = sd.query_devices()
print(devices)
default_input_device_idx = [0]
print(f'Use default device: {devices[default_input_device_idx]["name"]}')
try:
main()
except KeyboardInterrupt:
print("\nCaught Ctrl + C. Exiting")
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
- 111
- 112
- 113
- 114
- 115
- 116
- 117
- 118
- 119
- 120
- 121
- 122
- 123
- 124
- 125
- 126
- 127
3)离线wav音频文件转录
注意:如果本地音频比特率不是256kps,需要转换;比特率(Bitrate)是指音频或视频文件中每秒的比特数。通常用于表示数据传输速率或压缩率。
对于音频文件,比特率表示每秒音频数据的传输速率,单位是kbps(千比特每秒)。通常,比特率越高,音频数据的质量越好,但文件大小也会增加。
例如,256 kbps意味着每秒音频数据的传输速率为256千比特。这种表示方式通常用于指定音频文件的压缩率或输出质量。
另外:windows安装sox参考:/yyy430/article/details/88408273
sox -r 16k -c 1
- 1
##官方代码
#!/usr/bin/env python3
"""
This file demonstrates how to use sherpa-ncnn Python API to recognize
a single file.
Please refer to
/sherpa/ncnn/
to install sherpa-ncnn and to download the pre-trained models
used in this file.
"""
import time
import wave
import numpy as np
import sherpa_ncnn
def main():
# Please refer to /sherpa/ncnn/
# to download the model files
# recognizer = sherpa_ncnn.Recognizer(
# tokens="./sherpa-ncnn-conv-emformer-transducer-2022-12-06/",
# encoder_param="./sherpa-ncnn-conv-emformer-transducer-2022-12-06/encoder_jit_trace",
# encoder_bin="./sherpa-ncnn-conv-emformer-transducer-2022-12-06/encoder_jit_trace",
# decoder_param="./sherpa-ncnn-conv-emformer-transducer-2022-12-06/decoder_jit_trace",
# decoder_bin="./sherpa-ncnn-conv-emformer-transducer-2022-12-06/decoder_jit_trace",
# joiner_param="./sherpa-ncnn-conv-emformer-transducer-2022-12-06/joiner_jit_trace",
# joiner_bin="./sherpa-ncnn-conv-emformer-transducer-2022-12-06/joiner_jit_trace",
# num_threads=4,
# )
base_file = "sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13"
# base_file = "sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16"
# base_file = "sherpa-ncnn-streaming-zipformer-20M-2023-02-17"
recognizer = sherpa_ncnn.Recognizer(
tokens="./{}/".format(base_file),
encoder_param="./{}/encoder_jit_trace".format(base_file),
encoder_bin="./{}/encoder_jit_trace".format(base_file),
decoder_param="./{}/decoder_jit_trace".format(base_file),
decoder_bin="./{}/decoder_jit_trace".format(base_file),
joiner_param="./{}/joiner_jit_trace".format(base_file),
joiner_bin="./{}/joiner_jit_trace".format(base_file),
num_threads=4,
)
filename = r'D:\sound\'
with (filename) as f:
# Note: If wave_file_sample_rate is different from
# recognizer.sample_rate, we will do resampling inside sherpa-ncnn
wave_file_sample_rate = ()
num_channels = ()
assert () == 2, () # it is in bytes
num_samples = ()
samples = (num_samples)
samples_int16 = (samples, dtype=np.int16)
samples_int16 = samples_int16.reshape(-1, num_channels)[:, 0]
samples_float32 = samples_int16.astype(np.float32)
samples_float32 = samples_float32 / 32768
# simulate streaming
chunk_size = int(0.1 * wave_file_sample_rate) # 0.1 seconds
start = 0
while start < samples_float32.shape[0]:
end = start + chunk_size
end = min(end, samples_float32.shape[0])
recognizer.accept_waveform(wave_file_sample_rate, samples_float32[start:end])
start = end
text =
if text:
print(text)
# simulate streaming by sleeping
(0.1)
tail_paddings = (int(wave_file_sample_rate * 0.5), dtype=np.float32)
recognizer.accept_waveform(wave_file_sample_rate, tail_paddings)
recognizer.input_finished()
text =
if text:
print(text)
if __name__ == "__main__":
main()
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
4)或者通过ffmpeg离线读取mp4、mav,网络读取rtsp链接,自己整理推荐这份代码
import subprocess
import sounddevice as sd
import numpy as np
from import MinMaxScaler
import sherpa_ncnn
def create_recognizer():
# Please replace the model files if needed.
# See /sherpa/ncnn/pretrained_models/
# for download links.
# base_file = "sherpa-ncnn-conv-emformer-transducer-2022-12-06"
# base_file = "sherpa-ncnn-lstm-transducer-small-2023-02-13"
base_file = r"D:\llm\sherpa*******mples\sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13"
# base_file = "sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16"
# base_file = "sherpa-ncnn-streaming-zipformer-20M-2023-02-17"
recognizer = sherpa_ncnn.Recognizer(
tokens="{}\\".format(base_file),
encoder_param="{}\encoder_jit_trace".format(base_file),
encoder_bin="{}\encoder_jit_trace".format(base_file),
decoder_param="{}\decoder_jit_trace".format(base_file),
decoder_bin="{}\decoder_jit_trace".format(base_file),
joiner_param="{}\joiner_jit_trace".format(base_file),
joiner_bin="{}\joiner_jit_trace".format(base_file),
num_threads=4,
)
return recognizer
print("Started! Please speak")
recognizer = create_recognizer()
# sample_rate = recognizer.sample_rate
# samples_per_read = int(0.1 * sample_rate) # 0.1 second = 100 ms
# 远程RTSP音频流的URL(wav\mp4/rtsp都可以)
# url = "your_rtsp_url"
# url = r'D:\sound\'
url = r'D:\sound\222.mp4'
# FFmpeg命令参数
ffmpeg_cmd = [
"ffmpeg",
"-i", url,
"-f", "s16le",
"-acodec", "pcm_s16le",
"-ar", "16000",
"-ac","1",
"-",
]
# 创建FFmpeg进程
process = (
ffmpeg_cmd,
stdout=,
stderr=,
bufsize=1600
)
# 定义音频流的采样率、通道数和每次读取的样本数量
sample_rate = 16000
channels = 1
frames_per_read = 1600
last_result = ""
i=0
# 读取和处理音频数据
while True:
# 从FFmpeg进程中读取音频数据
data = (frames_per_read * channels * 2) # 每个样本16位,乘以2
if not data:
break
# 将音频数据转换为numpy数组
samples = (data, dtype=np.int16)
samples = (np.float32)
# samples = MinMaxScaler(feature_range=(-1, 1)).fit_transform((-1, 1))
samples /= 32768.0 # 归一化到[-1, 1]范围
# print(, samples)
# 处理音频数据
# 在这里添加您的音频处理代码
recognizer.accept_waveform(sample_rate, samples)
result =
# print("result:",result,"last_result:",last_result)
if last_result != result:
if i==0:
print("{}".format(result),end='')
last_result = result
i=i+1
else:
last_result_len=len(last_result)
new_word = result[last_result_len:]
# print(last_result,result,new_word)
print("{}".format(new_word),end='', flush=True)
last_result = result
# 关闭FFmpeg进程
()
()
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
5)ffmpeg 实时读取本地麦克风声音
ffmpeg -list_devices true -f dshow -i dummy 命令可以查看本地电脑可用的dshow设备(包括麦克风)
import subprocess
import sounddevice as sd
import numpy as np
from import MinMaxScaler
import sherpa_ncnn
def create_recognizer():
# Please replace the model files if needed.
# See /sherpa/ncnn/pretrained_models/
# for download links.
# base_file = "sherpa-ncnn-conv-emformer-transducer-2022-12-06"
# base_file = "sherpa-ncnn-lstm-transducer-small-2023-02-13"
base_file = r"D:\llm\sherpa-ncnn-master\sherpa-ncnn-master\python-api-examples\sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13"
# base_file = "sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16"
# base_file = "sherpa-ncnn-streaming-zipformer-20M-2023-02-17"
recognizer = sherpa_ncnn.Recognizer(
tokens="{}\\".format(base_file),
encoder_param="{}\encoder_jit_trace".format(base_file),
encoder_bin="{}\encoder_jit_trace".format(base_file),
decoder_param="{}\decoder_jit_trace".format(base_file),
decoder_bin="{}\decoder_jit_trace".format(base_file),
joiner_param="{}\joiner_jit_trace".format(base_file),
joiner_bin="{}\joiner_jit_trace".format(base_file),
num_threads=4,
)
return recognizer
print("Started! Please speak")
recognizer = create_recognizer()
# sample_rate = recognizer.sample_rate
# samples_per_read = int(0.1 * sample_rate) # 0.1 second = 100 ms
# 远程RTSP音频流的URL
# url = "your_rtsp_url"
# url = r'D:\sound\'
# url = r'D:\sound\222.mp4'
url = "rtsp://admin:jc123456@192.168.63.88/Streaming/Channels/2?tcp"
# FFmpeg命令参数
# ffmpeg_cmd = [
# "ffmpeg",
# "-i", url,
# "-f", "s16le",
# "-acodec", "pcm_s16le",
# "-ar", "16000",
# "-ac","1",
# "-",
# ]
ffmpeg_cmd = [
"ffmpeg",
"-f", "dshow", # 使用alsa作为音频输入设备
"-i", "audio=麦克风阵列 (适用于数字麦克风的英特尔® 智音技术)", # 使用默认的音频输入设备(麦克风)
"-f", "s16le",
"-acodec", "pcm_s16le",
"-ar", "16000",
"-ac", "1",
"-"
]
# 创建FFmpeg进程
process = (
ffmpeg_cmd,
stdout=,
stderr=,
bufsize=1600
)
# 定义音频流的采样率、通道数和每次读取的样本数量
sample_rate = 16000
channels = 1
frames_per_read = 1600
last_result = ""
i=0
# 读取和处理音频数据
while True:
# 从FFmpeg进程中读取音频数据
data = (frames_per_read * channels * 2) # 每个样本16位,乘以2
if not data:
break
# 将音频数据转换为numpy数组
samples = (data, dtype=np.int16)
samples = (np.float32)
# samples = MinMaxScaler(feature_range=(-1, 1)).fit_transform((-1, 1))
samples /= 32768.0 # 归一化到[-1, 1]范围
# print(, samples)
# 处理音频数据
# 在这里添加您的音频处理代码
recognizer.accept_waveform(sample_rate, samples)
result =
# print("result:",result,"last_result:",last_result)
if last_result != result:
if i==0:
print("{}".format(result),end='')
last_result = result
i=i+1
else:
last_result_len=len(last_result)
new_word = result[last_result_len:]
# print(last_result,result,new_word)
print("{}".format(new_word),end='', flush=True)
last_result = result
# 关闭FFmpeg进程
()
()
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
- 111
- 112
- 113
- 114
- 115
- 116
- 117
- 118
- 119