音频文件变调

目标
* 通过soundtouch把解码出来的pcm数据作变调处理，再保存成文件。
* 广度：除了soundtouch，是不是还有其它音效处理库可以做到变调？
暂时未能找到类似的开源项目，这意味着soundtouch代表了一个高度。
* 广度：在对解码后的pcm作重采样时，比如通过Audition来读取pcm文件时需要选择参数，如果把采样率或声道数提高，会导致音调变尖，但时长变短。

问题
* 指定本地文件路径，如何做到逐帧解码并得到pcm数据。
* 怎么样做到逐帧解码得到pcm数据？
调用ffmpeg，基本流程：avformat_open_input -> avformat_find_stream_info -> codec_open2 -> 不断的av_read_frame与avcodec_decode_audio4
* 基于那个版本的ffmpeg？不同版本是否有接口上不同？
这里使用ffmpeg3.0。不同的版本的ffmpeg在接口上有差别，看起来ffmpeg3.0在命名上比2.0的要准确。
* 解码出来的pcm数据，能直接给soundtouch使用吗？
不能，需要重采样，比如重采样到44100,2,s16。
* 创建soundtouch，对pcm作变调处理。
* 在哪里拿到源码或库？
官网http://soundtouch.surina.net/，下载soundtouch源码。
* 如何使用soundtouch，以库的方式还是以源码的方式？
当前示例，以源码的方式即可，下载到soundtouch后把include跟source/SoundTouch目录的文件cp到当前项目中，再修改编译脚本即可编译使用。
* 如何编译soundtouch以用于mac上？
以源码的方式，直接把soundtouch的源文件编译进去，参考【假设】。
把soundtouch_config.h.in改名为soundtouch_config.h，并把#undef SOUNDTOUCH_INTEGER_SAMPLES改为#define SOUNDTOUCH_INTEGER_SAMPLES 1
也就是样本使用定点（short长度）。
* 如何调用soundtouch？
new一个soundtouch出来就可以调用。
* 如何设置soundtouch以达到变调或变速（或变节奏）的效果？
比如：
soundtouch->setPitch(0.5);
* 如何调用soundtouch处理pcm数据？
先调用putSamples填入数据，再调用receiveSamples获取处理后的数据。注意，soundtouch会作缓存，开始的若干次putSamples后并不会receive到数据。

观点立场
* 对于文件的解码、soundtouch的调用以及文件的保存，从经验与理论上来说，都是容易实现的。也就是说，目标是容易实现的。
* 实际上，难度的评估，除了要考虑大方向上的可行性与难度，也要考虑细节上难度。有不少项目延迟的原因是细节难度把握不到位，乐观评估。

信息与经验
* 通过ffmpeg来实现解码，先确定格式，再进入解码流程。
* 在解码流程中，对每一帧pcm作soundtouch处理，把输出的pcm数据写到文件。

假设
* ffmpeg库已经准备好，可以直接用于编程。
* 如何编译mac上的ffmpeg？
应该编写编译脚本，作裁剪的设置，也需要指明编译的指令集为x86_64。注意，ffmpeg3.x默认是开启了videotoolbox跟vda（涉及到ios平台的实现），为了避免编译程序时添加链接库的麻烦，可以考虑把这两者去掉：--disable-videotoolbox --disable-vda
* 在mac上运行，知道如何编译程序与调试运行。
* 如何编译程序，如何链接ffmpeg库？
可以写Makefile，比如：
------------------------------
OUT=change_pcm_pitch
objs=change_pcm_pitch.cpp soundtouch/source/*.cpp
$(OUT):$(objs)
g++ -o $(OUT) $(objs) -Iffmpeg/include/ -Lffmpeg/lib/ -Isoundtouch/include -lffmpeg -liconv -lz -lfdk-aac -lcrypto -lssl -g
clean:
rm -f $(OUT) *.o
------------------------------
以上假设已经编译出mac上使用的ffmpeg库，并拷贝到本项目目录中（./ffmpeg/）。以上的Makcefile假设启用了fdk-aac跟iconv库等。编译项-g是为了生成调试符号。
注意，使用ffmpeg3.x时，如果使用了VideoToolbox来支持硬解（可用--disable-videotoolbox来关闭），那么在编译程序时需要把videotoolbox库给链接上，这里为了简化，在编译ffmepg时已经把videotoolbox跟vda取消掉了。
* 如何调试？
加上-g编译后，使用gdb来调试，如：gdb change_pcm_pitch，然后下断点：b 20，再使用命令n、p等。

基础概念
* ffmpeg的概念。
* ffmpeg3.x开始支持ios平台的视频硬件解码（按理也应该支持了android上的MediaCodec），在libavcodec目录可以找到videotoolbox.h，这是ios平台硬解的关联文件（使用了VideoToolbox组件如VTDecompressionSessionRef等）。
* 使用avformat_find_stream_info之前是否要设置探测的大小？
一般默认的就足够了。
* soundtouch的概念。
* 文件操作概念。

结论
* 达成目标，难点还是在于ffmpeg库的使用，以及重采样。

结论的意义
* 无

推导与提问
* 解码过程、soundtouch的调用以及文件的保存，是怎么相关起来的？
解码的过程中，不断调用soundtouch来putSample跟receiveSample，再把得到的数据写到pcm文件。
* 每个思维要素，都可以单独进行思维标准的批判，可以不断补充“结论”。

-----------------------------------------------------------------------------------------
代码参考：

extern "C" {
#include "ffmpeg/include/libavcodec/avcodec.h"
#include "ffmpeg/include/libavformat/avformat.h"
#include "ffmpeg/include/libswresample/swresample.h"
#include "ffmpeg/include/libavutil/samplefmt.h"
}
#include "SoundTouch.h"
using namespace soundtouch;

void change_pcm_pitch(const char* filepath) {
av_register_all();
av_log_set_level(AV_LOG_DEBUG);
AVFormatContext* formatContext = avformat_alloc_context();
AVCodecContext* codecContext = NULL;
int status = 0;
bool success = false;
int audioindex = -1;
status = avformat_open_input(&formatContext, filepath, NULL, NULL);
if (status == 0) {
status = avformat_find_stream_info(formatContext, NULL);
if (status >= 0) {
for (int i = 0; i < formatContext->nb_streams; i ++) {
if (formatContext->streams[i]->codec->codec_type == AVMEDIA_TYPE_AUDIO) {
audioindex = i;
break;
}
}
if (audioindex > -1) {
codecContext = formatContext->streams[audioindex]->codec;
AVCodec* codec = avcodec_find_decoder(codecContext->codec_id);
if (codec) {
status = avcodec_open2(codecContext, codec, NULL);
if (status == 0) {
success = true;
}
}
}
}
}
if (success) {
av_dump_format(formatContext, 0, filepath, false);
av_log(NULL, AV_LOG_DEBUG, "format and decoder sucessful, and now in decoding each frame\n");
printf("sample_rate=%d, channels=%d\n", codecContext->sample_rate, codecContext->channels);
SoundTouch* soundtouch = new SoundTouch();
printf("soundtouch version=%s\n", soundtouch->getVersionString());
soundtouch->setSampleRate(codecContext->sample_rate);
soundtouch->setChannels(codecContext->channels);
soundtouch->setTempo(0.5); // tempo，播放节奏，1.0为正常节奏，大于1.0加快，小于1.0变慢，pcm的体积随之变化
soundtouch->setRate(3.0); // rate，播放速率，1.0为正常速度；单设置这个时，除了影响播放速度，还会影响到音调
soundtouch->setPitch(0.5); // pitch，音调，1.0为正常音调；这个设置并不会影响到时长
AVFrame* frame = av_frame_alloc();
SwrContext* swr = NULL;
int gotframe = 0;
char outfile[512] = {0};
strcpy(outfile, filepath);
strcat(outfile + strlen(outfile), ".pcm");
FILE* file = fopen(outfile, "wb");
if (file) {
while (true) {
AVPacket packet;
av_init_packet(&packet);
status = av_read_frame(formatContext, &packet);
if (status < 0) {
if (status == AVERROR_EOF) {
av_log(NULL, AV_LOG_DEBUG, "read end for file\n");
break;
}
else {
av_packet_unref(&packet);
}
}
else {
if (packet.stream_index == audioindex) {
int srcCount = packet.size;
while (srcCount > 0) {
int decodedcount = avcodec_decode_audio4(codecContext, frame, &gotframe, &packet);
if (decodedcount < 0) {
av_log(NULL, AV_LOG_DEBUG, "decode failed, perhaps not enough data\n");
break;
}
if (gotframe > 0) {
// resample
int targetchannel = 2;
int targetsrate = 44100;
int targetfmt = AV_SAMPLE_FMT_S16;
bool needresample = false;
if (av_frame_get_channels(frame) != targetchannel || frame->sample_rate != targetsrate || frame->format != targetfmt) {
needresample = true;
}
if (needresample) {
if (swr == NULL) {
uint64_t in_channel_layout = av_get_default_channel_layout(av_frame_get_channels(frame));
uint64_t out_channel_layout = av_get_default_channel_layout(targetchannel);
int inSamplerate = frame->sample_rate;
swr = swr_alloc_set_opts(NULL,
out_channel_layout, (enum AVSampleFormat )AV_SAMPLE_FMT_S16, targetsrate,
in_channel_layout, (enum AVSampleFormat)frame->format, inSamplerate, 0, NULL);
int ret = swr_init(swr);
if (ret != 0) {
printf("swr_init failed: ret=%d\n", ret);
}
}
if (swr) {
if (frame->extended_data && frame->data[0] && frame->linesize[0] > 0) {
int out_size = av_samples_get_buffer_size(NULL, targetchannel, frame->nb_samples, (enum AVSampleFormat)targetfmt, 0);
void* out_buffer = av_malloc(out_size);
if (out_buffer) {
int convertSamples = swr_convert(swr, (uint8_t**)(&out_buffer), frame->nb_samples,
(const uint8_t**)frame->extended_data, frame->nb_samples);
int len = convertSamples * targetchannel * av_get_bytes_per_sample((enum AVSampleFormat)targetfmt);
int samplecount = convertSamples;
soundtouch->putSamples((SAMPLETYPE*)out_buffer, samplecount);
int bufsize = samplecount * frame->channels * sizeof(short);
unsigned char* buf = (unsigned char*)malloc(bufsize);
int gotsamplecount = soundtouch->receiveSamples((SAMPLETYPE*)buf, samplecount);
printf("soundtouch receiveSamples after resample:gotsamplecount=%d bufsize=%d sizeof(SAMPLETYPE)=%lu\n", gotsamplecount, bufsize, sizeof(SAMPLETYPE));
if (gotsamplecount) {
fwrite(buf, gotsamplecount * frame->channels * sizeof(short), 1, file);
}
free(buf);
av_free(out_buffer);
}
}
}
}
else {
int samplecount = frame->nb_samples;
soundtouch->putSamples((SAMPLETYPE*)frame->data[0], samplecount);
int bufsize = samplecount * frame->channels * sizeof(short);
unsigned char* buf = (unsigned char*)malloc(bufsize);
int gotsamplecount = soundtouch->receiveSamples((SAMPLETYPE*)buf, samplecount);
printf("soundtouch receiveSamples:gotsamplecount=%d bufsize=%d sizeof(SAMPLETYPE)=%lu\n", gotsamplecount, bufsize, sizeof(SAMPLETYPE));
if (gotsamplecount) {
fwrite(buf, gotsamplecount * frame->channels * sizeof(short), 1, file);
}
free(buf);
}
}
srcCount -= decodedcount;
}
}
}
av_packet_unref(&packet);
}
fclose(file);
}
av_frame_free(&frame);
delete soundtouch;
if (swr) {
swr_free(&swr);
}
}
avformat_free_context(formatContext);
}

int main(int argc, const char *argv[])
{
const char filepath[] = "test2.mp3";
change_pcm_pitch(filepath);
return 0;
}
-----------------------------------------------------------------------------------------

秒客网

音频文件变调

相关文章