Is it possible with FFT to find an occurrence of a small wav sample inside of a longer wav, if it is known that that exact sample exists somewhere in the wav (but may be mixed with other sounds)?
如果知道那个精确的样本存在于wav中的某个地方(但可能与其他声音混合),那么FFT是否有可能在较长的wav中找到小wav样本的出现?
edit
(after receiving two responses): What if I have a library of all known sounds that can be in the larger WAV and wish to find occurrences of each of them within that WAV? In other words, I know every possible sound that can be mixed into the big wav, and wish to find occurrences of them?
(在收到两个回复后):如果我有一个包含所有已知声音的库,可以在更大的WAV中,并希望在该WAV中找到每个声音的出现,该怎么办?换句话说,我知道可以混入大wav的每一种可能的声音,并希望找到它们的出现?
4 个解决方案
#1
I assume by exact you don't mean sample value exact. If it were sample-value exact, then it would be a simple matter of searching for the sample values, which is fast and efficient.
我确切地说你并不是指样本值准确。如果它是样本值精确的,那么搜索样本值将是一件简单的事情,这是快速有效的。
If you are looking for bits of sound that contribute, the best approach is to use a mathematical process called "convolution". Basically, take the sample that you are trying to find within the big sample, effectively place it next to the big sample, and correlate. Do this for every sample position. You will from this get a curve that will have distinct spikes in it where the sample is. Its quite computationally intensive, but computers have gotten quite fast, so its feasible.
如果您正在寻找有贡献的声音,最好的方法是使用称为“卷积”的数学过程。基本上,将您尝试在大样本中找到的样本放在大样本旁边,并将其关联起来。对每个样本位置执行此操作。您将从中获得一条曲线,该曲线将在样本所在的位置具有明显的尖峰。它计算量很大,但计算机速度很快,所以它的可行性。
But - this is assuming that the sample came from the same recording for both cases. Miking a drum sound, even the same drum sound, from two different locations, will not produce very good correlation.
但是 - 这是假设样本来自两个案例的相同记录。从两个不同的位置拍摄鼓声,即使是相同的鼓声,也不会产生非常好的相关性。
Hope that helps.
希望有所帮助。
#2
It depends on exactly what you're trying to find and what you're trying to find it in.
这取决于你想要找到的确切内容以及你想要找到它的内容。
- If you're looking for a sample that's exactly the same as a chunk of a larger WAV file, bit-for-bit, then you can search for the values directly.
- If it's exactly the same sound, but not sample-accurate (matching a clip of an MP3 to a WAV of the same song, for instance), you can easily find it using cross-correlation. Cross-correlation can be sped up significantly by using an FFT method instead of a "naive" method that explicitly multiplies and sums the samples.
- If you're looking for a short sample that's been mixed with other sounds, it might still be possible to use cross-correlation, but it depends if the other sounds affect the match. For digital piano with simple samples and no effects, straight into a digital recorder, this might work.
- If the sound has been through any type of filtering, polarity reversal, or phase shift, however, this will not work very well, since the wave shapes will be changed. So if the piano was played through speakers and then recorded with microphones, this isn't a viable solution.
如果您正在寻找一个与大型WAV文件的块完全相同的样本,那么您可以直接搜索这些值。
如果它是完全相同的声音,但不是样本准确(例如,将MP3的剪辑与同一首歌曲的WAV匹配),您可以使用互相关来轻松找到它。通过使用FFT方法而不是明确地对样本进行乘法和求和的“朴素”方法,可以显着加速互相关。
如果您正在寻找与其他声音混合的短样本,则可能仍然可以使用互相关,但这取决于其他声音是否会影响匹配。对于具有简单样本且没有效果的数码钢琴,直接进入数字录音机,这可能会起作用。
如果声音已通过任何类型的滤波,极性反转或相移,则这将不会很好地工作,因为波形将被改变。因此,如果钢琴是通过扬声器播放然后用麦克风录制的,那么这不是一个可行的解决方案。
What might work better in this case is to create a spectrogram of the recording using the short-time Fourier transform (STFT), and a spectrogram of the thing you're looking for, and then do a time-wise cross-correlation of the two images. The spectrogram is a 2D image of the amplitude of the sounds' spectrums over time, which you can then match. (This is probably a roundabout way of doing something there are more specialized algorithms for, but I don't know what it would be called.) ;)
在这种情况下可能更好的工作是使用短时傅里叶变换(STFT)创建记录的频谱图,以及您正在寻找的事物的频谱图,然后进行时间的互相关。两个图像。频谱图是声音频谱随时间变化的幅度的二维图像,然后您可以匹配。 (这可能是一种迂回的做法,有更专业的算法,但我不知道它会被称为什么。);)
Can you upload some sound clips somewhere?
你能在某处上传一些声音片段吗?
#3
If you know the exact nature of the sample (length in bits etc.) then it is very possible. If it alters in any way then you are going to have a lot of work to do first.
如果您知道样本的确切性质(位长等),那么很有可能。如果它以任何方式改变,那么你将首先要做很多工作。
Because of the way WAV files are encoded (sequentially by track - so you get the first lot of bits for the first track, then the first lot of bits for the second track, then the second lot of bits from the first track)
由于WAV文件的编码方式(按轨道顺序排列 - 因此您获得第一首曲目的第一批位,然后是第二首曲目的第一批位,然后是第一首曲目的第二批位)
This can obviously repeat for as many tracks. If you know the WAV file you are looking for is encoded specifically in one of these tracks then you can isolate each track and perform operations on them.
对于尽可能多的曲目,这显然可以重复。如果您知道要查找的WAV文件是专门在其中一个轨道中编码的,那么您可以隔离每个轨道并对它们执行操作。
Obviously if your sample differs by speed, tempo, pitch etc. then it's going to have a different bit signature so you will have to normalise the tracks.
显然,如果您的样本因速度,速度,音高等而不同,那么它将具有不同的位签名,因此您必须对轨道进行标准化。
#4
Not precisely as you have defined it, if it is mixed with other sounds, and here's the reason; consider the effect of a wave mixed precisely with its inverse; the result is flat response. The mixing of waves can have a monotonic function, that is, to effectively mask one wave with another in a way that the first is unretrievable.
正如你所定义的那样,如果它与其他声音混合在一起,并不正确,这就是原因;考虑与其反向精确混合的波的影响;结果是平坦的反应。波的混合可以具有单调函数,即,以第一波不可恢复的方式有效地掩蔽一波与另一波。
That said, there is likely a way of characterizing the "signature" of a wave such that it is likely to be present in a resultant composite wave file, but that signature would depend on the length of the wave file and to some extent what type of combinations were expected to be done upon it.
也就是说,很可能有一种方法来表征波的“签名”,使其很可能存在于合成波形文件中,但该签名将取决于波形文件的长度,并且在某种程度上取决于什么类型预计将在其上进行组合。
Your question probably has something to do with determining if samples of one work exist within another, composite, work. In general, yes, FFTs are useful for determining a "signature" for a given wave, and being able to extract that "signature" from another wave; they're good for some things (such as frequency shift; it just shows up as a displacement on the FFT), but not so great for other things (varying frequency modulation, for one; high (or uneven) bandwidth compression of the original signal). To put it another way: FFTs are a good way to detect "naive" use of samples, but a determined resampler can modify the original sample to make it hard to detect via FFT if he knows that that is the detection technique used.
你的问题可能与确定一件作品的样品是否存在于另一件作品中有关,复合作品。通常,是的,FFT可用于确定给定波的“签名”,并能够从另一波中提取“签名”;它们对某些东西有好处(例如频移;它只是在FFT上显示为位移),但对于其他东西则不那么好(变化的频率调制,一个;原始的高(或不均匀)带宽压缩信号)。换句话说:FFT是检测样本“天真”使用的好方法,但确定的重采样器可以修改原始样本,如果他知道这是使用的检测技术,则难以通过FFT进行检测。
#1
I assume by exact you don't mean sample value exact. If it were sample-value exact, then it would be a simple matter of searching for the sample values, which is fast and efficient.
我确切地说你并不是指样本值准确。如果它是样本值精确的,那么搜索样本值将是一件简单的事情,这是快速有效的。
If you are looking for bits of sound that contribute, the best approach is to use a mathematical process called "convolution". Basically, take the sample that you are trying to find within the big sample, effectively place it next to the big sample, and correlate. Do this for every sample position. You will from this get a curve that will have distinct spikes in it where the sample is. Its quite computationally intensive, but computers have gotten quite fast, so its feasible.
如果您正在寻找有贡献的声音,最好的方法是使用称为“卷积”的数学过程。基本上,将您尝试在大样本中找到的样本放在大样本旁边,并将其关联起来。对每个样本位置执行此操作。您将从中获得一条曲线,该曲线将在样本所在的位置具有明显的尖峰。它计算量很大,但计算机速度很快,所以它的可行性。
But - this is assuming that the sample came from the same recording for both cases. Miking a drum sound, even the same drum sound, from two different locations, will not produce very good correlation.
但是 - 这是假设样本来自两个案例的相同记录。从两个不同的位置拍摄鼓声,即使是相同的鼓声,也不会产生非常好的相关性。
Hope that helps.
希望有所帮助。
#2
It depends on exactly what you're trying to find and what you're trying to find it in.
这取决于你想要找到的确切内容以及你想要找到它的内容。
- If you're looking for a sample that's exactly the same as a chunk of a larger WAV file, bit-for-bit, then you can search for the values directly.
- If it's exactly the same sound, but not sample-accurate (matching a clip of an MP3 to a WAV of the same song, for instance), you can easily find it using cross-correlation. Cross-correlation can be sped up significantly by using an FFT method instead of a "naive" method that explicitly multiplies and sums the samples.
- If you're looking for a short sample that's been mixed with other sounds, it might still be possible to use cross-correlation, but it depends if the other sounds affect the match. For digital piano with simple samples and no effects, straight into a digital recorder, this might work.
- If the sound has been through any type of filtering, polarity reversal, or phase shift, however, this will not work very well, since the wave shapes will be changed. So if the piano was played through speakers and then recorded with microphones, this isn't a viable solution.
如果您正在寻找一个与大型WAV文件的块完全相同的样本,那么您可以直接搜索这些值。
如果它是完全相同的声音,但不是样本准确(例如,将MP3的剪辑与同一首歌曲的WAV匹配),您可以使用互相关来轻松找到它。通过使用FFT方法而不是明确地对样本进行乘法和求和的“朴素”方法,可以显着加速互相关。
如果您正在寻找与其他声音混合的短样本,则可能仍然可以使用互相关,但这取决于其他声音是否会影响匹配。对于具有简单样本且没有效果的数码钢琴,直接进入数字录音机,这可能会起作用。
如果声音已通过任何类型的滤波,极性反转或相移,则这将不会很好地工作,因为波形将被改变。因此,如果钢琴是通过扬声器播放然后用麦克风录制的,那么这不是一个可行的解决方案。
What might work better in this case is to create a spectrogram of the recording using the short-time Fourier transform (STFT), and a spectrogram of the thing you're looking for, and then do a time-wise cross-correlation of the two images. The spectrogram is a 2D image of the amplitude of the sounds' spectrums over time, which you can then match. (This is probably a roundabout way of doing something there are more specialized algorithms for, but I don't know what it would be called.) ;)
在这种情况下可能更好的工作是使用短时傅里叶变换(STFT)创建记录的频谱图,以及您正在寻找的事物的频谱图,然后进行时间的互相关。两个图像。频谱图是声音频谱随时间变化的幅度的二维图像,然后您可以匹配。 (这可能是一种迂回的做法,有更专业的算法,但我不知道它会被称为什么。);)
Can you upload some sound clips somewhere?
你能在某处上传一些声音片段吗?
#3
If you know the exact nature of the sample (length in bits etc.) then it is very possible. If it alters in any way then you are going to have a lot of work to do first.
如果您知道样本的确切性质(位长等),那么很有可能。如果它以任何方式改变,那么你将首先要做很多工作。
Because of the way WAV files are encoded (sequentially by track - so you get the first lot of bits for the first track, then the first lot of bits for the second track, then the second lot of bits from the first track)
由于WAV文件的编码方式(按轨道顺序排列 - 因此您获得第一首曲目的第一批位,然后是第二首曲目的第一批位,然后是第一首曲目的第二批位)
This can obviously repeat for as many tracks. If you know the WAV file you are looking for is encoded specifically in one of these tracks then you can isolate each track and perform operations on them.
对于尽可能多的曲目,这显然可以重复。如果您知道要查找的WAV文件是专门在其中一个轨道中编码的,那么您可以隔离每个轨道并对它们执行操作。
Obviously if your sample differs by speed, tempo, pitch etc. then it's going to have a different bit signature so you will have to normalise the tracks.
显然,如果您的样本因速度,速度,音高等而不同,那么它将具有不同的位签名,因此您必须对轨道进行标准化。
#4
Not precisely as you have defined it, if it is mixed with other sounds, and here's the reason; consider the effect of a wave mixed precisely with its inverse; the result is flat response. The mixing of waves can have a monotonic function, that is, to effectively mask one wave with another in a way that the first is unretrievable.
正如你所定义的那样,如果它与其他声音混合在一起,并不正确,这就是原因;考虑与其反向精确混合的波的影响;结果是平坦的反应。波的混合可以具有单调函数,即,以第一波不可恢复的方式有效地掩蔽一波与另一波。
That said, there is likely a way of characterizing the "signature" of a wave such that it is likely to be present in a resultant composite wave file, but that signature would depend on the length of the wave file and to some extent what type of combinations were expected to be done upon it.
也就是说,很可能有一种方法来表征波的“签名”,使其很可能存在于合成波形文件中,但该签名将取决于波形文件的长度,并且在某种程度上取决于什么类型预计将在其上进行组合。
Your question probably has something to do with determining if samples of one work exist within another, composite, work. In general, yes, FFTs are useful for determining a "signature" for a given wave, and being able to extract that "signature" from another wave; they're good for some things (such as frequency shift; it just shows up as a displacement on the FFT), but not so great for other things (varying frequency modulation, for one; high (or uneven) bandwidth compression of the original signal). To put it another way: FFTs are a good way to detect "naive" use of samples, but a determined resampler can modify the original sample to make it hard to detect via FFT if he knows that that is the detection technique used.
你的问题可能与确定一件作品的样品是否存在于另一件作品中有关,复合作品。通常,是的,FFT可用于确定给定波的“签名”,并能够从另一波中提取“签名”;它们对某些东西有好处(例如频移;它只是在FFT上显示为位移),但对于其他东西则不那么好(变化的频率调制,一个;原始的高(或不均匀)带宽压缩信号)。换句话说:FFT是检测样本“天真”使用的好方法,但确定的重采样器可以修改原始样本,如果他知道这是使用的检测技术,则难以通过FFT进行检测。