I need to be able to play an MP3 file at difference speeds without it affecting the pitch (or changing the pitch after speeding up, whatever), Also, this transition needs to be as seamless as possible.
我需要能够以不同的速度播放MP3文件而不影响音高(或者在加速后改变音高,无论如何)。此外,这种转换需要尽可能无缝。
Obviously there are apps that do this so it is possible, but it seems not to be possible with the MediaPlayer API, and the SoundPool API can only change both pitch and rate (unless I am missing something).
显然有些应用程序可以做到这一点,但是它似乎不可能使用MediaPlayer API,并且SoundPool API只能改变音高和速率(除非我遗漏了一些东西)。
Any idea on how to achieve this? Any API / 3rd party libraries that could help?
有关如何实现这一点的任何想法?任何可以提供帮助的API /第三方库?
Thanks.
1 个解决方案
#1
1
There is a general technique called Time Scale Modification that can do this. Here's a tool available that I haven't evaluated: http://sourceforge.net/projects/mffmtimescale/.
有一种称为时间尺度修改的通用技术可以做到这一点。这是一个我没有评估的工具:http://sourceforge.net/projects/mffmtimescale/。
If you zoom in on a time line of audio, it looks a lot like an old heartbeat monitor--a wiggly patterns of peaks and valleys. For vowels the pattern is quasi-stationary, which roughly means it's repetitive, like a healthy heartbeat pulse. A single ahhhh vowel sound may repeat its pattern 3-7 times in normal speech. A TSM algorithm deletes some of those repetitions, and needs to use a filter to introduce artifacts by clipping/joining imperfect repetitions. Empty spaces can be reduced as well, but care needs to be taken to not delete all empty space--in English the word "football" actually has a gap between "foot" and "ball" (say it slowly out loud). TSM can also do the reverse, pumping in empty space at the rights spots or adding pitch period repetitions to vowels. This all adds up to something fairly complex and somewhat language dependent that requires a lot of tuning--which for most applications means you won't want to develop your own.
如果你放大音频的时间线,它看起来很像一个旧的心跳监视器 - 一个摇摆的峰和谷的模式。对于元音,这种模式是准静态的,这大致意味着它是重复的,就像一个健康的心跳脉冲。单个ahhhh元音可能在正常语音中重复其模式3-7次。 TSM算法删除其中一些重复,并且需要使用过滤器通过剪切/连接不完美的重复来引入伪像。空的空间也可以减少,但需要注意不要删除所有空的空间 - 在英语中,“足球”这个词实际上在“脚”和“球”之间有一个间隙(说它慢慢地大声说出来)。 TSM也可以反过来,在权利点的空白区域抽水或者向元音添加基音周期重复。这一切都加起来相当复杂,并且在某种程度上取决于语言,这需要大量的调整 - 对于大多数应用程序而言,这意味着您不需要开发自己的。
#1
1
There is a general technique called Time Scale Modification that can do this. Here's a tool available that I haven't evaluated: http://sourceforge.net/projects/mffmtimescale/.
有一种称为时间尺度修改的通用技术可以做到这一点。这是一个我没有评估的工具:http://sourceforge.net/projects/mffmtimescale/。
If you zoom in on a time line of audio, it looks a lot like an old heartbeat monitor--a wiggly patterns of peaks and valleys. For vowels the pattern is quasi-stationary, which roughly means it's repetitive, like a healthy heartbeat pulse. A single ahhhh vowel sound may repeat its pattern 3-7 times in normal speech. A TSM algorithm deletes some of those repetitions, and needs to use a filter to introduce artifacts by clipping/joining imperfect repetitions. Empty spaces can be reduced as well, but care needs to be taken to not delete all empty space--in English the word "football" actually has a gap between "foot" and "ball" (say it slowly out loud). TSM can also do the reverse, pumping in empty space at the rights spots or adding pitch period repetitions to vowels. This all adds up to something fairly complex and somewhat language dependent that requires a lot of tuning--which for most applications means you won't want to develop your own.
如果你放大音频的时间线,它看起来很像一个旧的心跳监视器 - 一个摇摆的峰和谷的模式。对于元音,这种模式是准静态的,这大致意味着它是重复的,就像一个健康的心跳脉冲。单个ahhhh元音可能在正常语音中重复其模式3-7次。 TSM算法删除其中一些重复,并且需要使用过滤器通过剪切/连接不完美的重复来引入伪像。空的空间也可以减少,但需要注意不要删除所有空的空间 - 在英语中,“足球”这个词实际上在“脚”和“球”之间有一个间隙(说它慢慢地大声说出来)。 TSM也可以反过来,在权利点的空白区域抽水或者向元音添加基音周期重复。这一切都加起来相当复杂,并且在某种程度上取决于语言,这需要大量的调整 - 对于大多数应用程序而言,这意味着您不需要开发自己的。