如何从wav文件中获取wav样本?

时间:2021-04-18 19:45:01

I want to know how to get samples out of a .wav file in order to perform windowed join of two .wav files.

我想知道如何从.wav文件中获取样本,以便执行两个.wav文件的窗口连接。

Can any one please tell how to do this?

任何人都可以告诉你怎么做?

4 个解决方案

#1


13  

The wave module of the standard library is the key: after of course import wave at the top of your code, wave.open('the.wav', 'r') returns a "wave read" object from which you can read frames with the .readframes method, which returns a string of bytes which are the samples... in whatever format the wave file has them (you can determine the two parameters relevant to decomposing frames into samples with the .getnchannels method for the number of channels, and .getsampwidth for the number of bytes per sample).

标准库的波形模块是关键:当然,在代码顶部导入wave之后,wave.open('the.wav','r')返回一个“wave read”对象,您可以从中读取帧使用.readframes方法,它返回一个字节字符串,这些字节是波形文件具有的任何格式的样本...(您可以确定与使用.getnchannels方法将帧分解为样本的两个参数相关的通道数和.getsampwidth表示每个样本的字节数)。

The best way to turn the string of bytes into a sequence of numeric values is with the array module, and a type of (respectively) 'B', 'H', 'L' for 1, 2, 4 bytes per sample (on a 32-bit build of Python; you can use the itemsize value of your array object to double-check this). If you have different sample widths than array can provide you, you'll need to slice up the byte string (padding each little slice appropriately with bytes worth 0) and use the struct module instead (but that's clunkier and slower, so use array instead if you can).

将字符串转换为数值序列的最佳方法是使用数组模块,并使用(分别)“B”,“H”,“L”的类型,每个样本使用1,2,4个字节(on一个32位的Python构建;您可以使用数组对象的itemsize值来仔细检查这个)。如果你有不同的数组宽度可以提供给你,你需要切片字节字符串(适当地填充每个小片的值为0的字节)并使用struct模块代替(但是这样更笨拙和更慢,所以请使用数组代替如果你可以的话)。

#2


2  

You can use the wave module. First you should read the metadata, such us sample size or the number of channels. Using the readframes() method, you can read samples, but only as a byte string. Based on the sample format, you have to convert them to samples using struct.unpack().

您可以使用波形模块。首先,您应该阅读元数据,例如样本大小或通道数。使用readframes()方法,您可以读取样本,但只能作为字节字符串。根据样本格式,您必须使用struct.unpack()将它们转换为样本。

Alternatively, if you want the samples as an array of floating-point numbers, you can use SciPy's io.wavfile module.

或者,如果您希望将样本作为浮点数组,则可以使用SciPy的io.wavfile模块。

#3


2  

Here's a function to read samples from a wave file (tested with mono & stereo):

这是一个从波形文件中读取样本的功能(使用单声道和立体声测试):

def read_samples(wave_file, nb_frames):
    frame_data = wave_file.readframes(nb_frames)
    if frame_data:
        sample_width = wave_file.getsampwidth()
        nb_samples = len(frame_data) // sample_width
        format = {1:"%db", 2:"<%dh", 4:"<%dl"}[sample_width] % nb_samples
        return struct.unpack(format, frame_data)
    else:
        return ()

And here's the full script that does windowed mixing or concatenating of multiple .wav files. All input files need to have the same params (# of channels and sample width).

这里是完整的脚本,可以窗口混合或连接多个.wav文件。所有输入文件都需要具有相同的参数(通道数和样本宽度)。

import argparse
import itertools
import struct
import sys
import wave

def _struct_format(sample_width, nb_samples):
    return {1:"%db", 2:"<%dh", 4:"<%dl"}[sample_width] % nb_samples

def _mix_samples(samples):
    return sum(samples)//len(samples)

def read_samples(wave_file, nb_frames):
    frame_data = wave_file.readframes(nb_frames)
    if frame_data:
        sample_width = wave_file.getsampwidth()
        nb_samples = len(frame_data) // sample_width
        format = _struct_format(sample_width, nb_samples)
        return struct.unpack(format, frame_data)
    else:
        return ()

def write_samples(wave_file, samples, sample_width):
    format = _struct_format(sample_width, len(samples))
    frame_data = struct.pack(format, *samples)
    wave_file.writeframes(frame_data)

def compatible_input_wave_files(input_wave_files):
    nchannels, sampwidth, framerate, nframes, comptype, compname = input_wave_files[0].getparams()
    for input_wave_file in input_wave_files[1:]:
        nc,sw,fr,nf,ct,cn = input_wave_file.getparams()
        if (nc,sw,fr,ct,cn) != (nchannels, sampwidth, framerate, comptype, compname):
            return False
    return True

def mix_wave_files(output_wave_file, input_wave_files, buffer_size):
    output_wave_file.setparams(input_wave_files[0].getparams())
    sampwidth = input_wave_files[0].getsampwidth()
    max_nb_frames = max([input_wave_file.getnframes() for input_wave_file in input_wave_files])
    for frame_window in xrange(max_nb_frames // buffer_size + 1):
        all_samples = [read_samples(wave_file, buffer_size) for wave_file in input_wave_files]
        mixed_samples = [_mix_samples(samples) for samples in itertools.izip_longest(*all_samples, fillvalue=0)]
        write_samples(output_wave_file, mixed_samples, sampwidth)

def concatenate_wave_files(output_wave_file, input_wave_files, buffer_size):
    output_wave_file.setparams(input_wave_files[0].getparams())
    sampwidth = input_wave_files[0].getsampwidth()
    for input_wave_file in input_wave_files:
        nb_frames = input_wave_file.getnframes()
        for frame_window in xrange(nb_frames // buffer_size + 1):
            samples = read_samples(input_wave_file, buffer_size)
            if samples:
                write_samples(output_wave_file, samples, sampwidth)

def argument_parser():
    parser = argparse.ArgumentParser(description='Mix or concatenate multiple .wav files')
    parser.add_argument('command', choices = ("mix", "concat"), help='command')
    parser.add_argument('output_file', help='ouput .wav file')
    parser.add_argument('input_files', metavar="input_file", help='input .wav files', nargs="+")
    parser.add_argument('--buffer_size', type=int, help='nb of frames to read per iteration', default=1000)
    return parser

if __name__ == '__main__':
    args = argument_parser().parse_args()

    input_wave_files = [wave.open(name,"rb") for name in args.input_files]
    if not compatible_input_wave_files(input_wave_files):
        print "ERROR: mixed wave files must have the same params."
        sys.exit(2)

    output_wave_file = wave.open(args.output_file, "wb")
    if args.command == "mix":
        mix_wave_files(output_wave_file, input_wave_files, args.buffer_size)
    elif args.command == "concat":
        concatenate_wave_files(output_wave_file, input_wave_files, args.buffer_size)

    output_wave_file.close()
    for input_wave_file in input_wave_files:
        input_wave_file.close()

#4


0  

After reading the samples (for example with the wave module, more details here) you may want to have the values scales between -1 and 1 (this is the convention for audio signals).

在读取样本后(例如,使用波形模块,此处有更多详细信息),您可能希望将值缩放在-1和1之间(这是音频信号的惯例)。

In this case, you can add:

在这种情况下,您可以添加:

# scale to -1.0 -- 1.0
max_nb_bit = float(2**(nb_bits-1))  
samples = signal_int / (max_nb_bit + 1.0) 

with nb_bits the bit depth and signal_int the integers values.

nb_bits的位深度和signal_int的整数值。

#1


13  

The wave module of the standard library is the key: after of course import wave at the top of your code, wave.open('the.wav', 'r') returns a "wave read" object from which you can read frames with the .readframes method, which returns a string of bytes which are the samples... in whatever format the wave file has them (you can determine the two parameters relevant to decomposing frames into samples with the .getnchannels method for the number of channels, and .getsampwidth for the number of bytes per sample).

标准库的波形模块是关键:当然,在代码顶部导入wave之后,wave.open('the.wav','r')返回一个“wave read”对象,您可以从中读取帧使用.readframes方法,它返回一个字节字符串,这些字节是波形文件具有的任何格式的样本...(您可以确定与使用.getnchannels方法将帧分解为样本的两个参数相关的通道数和.getsampwidth表示每个样本的字节数)。

The best way to turn the string of bytes into a sequence of numeric values is with the array module, and a type of (respectively) 'B', 'H', 'L' for 1, 2, 4 bytes per sample (on a 32-bit build of Python; you can use the itemsize value of your array object to double-check this). If you have different sample widths than array can provide you, you'll need to slice up the byte string (padding each little slice appropriately with bytes worth 0) and use the struct module instead (but that's clunkier and slower, so use array instead if you can).

将字符串转换为数值序列的最佳方法是使用数组模块,并使用(分别)“B”,“H”,“L”的类型,每个样本使用1,2,4个字节(on一个32位的Python构建;您可以使用数组对象的itemsize值来仔细检查这个)。如果你有不同的数组宽度可以提供给你,你需要切片字节字符串(适当地填充每个小片的值为0的字节)并使用struct模块代替(但是这样更笨拙和更慢,所以请使用数组代替如果你可以的话)。

#2


2  

You can use the wave module. First you should read the metadata, such us sample size or the number of channels. Using the readframes() method, you can read samples, but only as a byte string. Based on the sample format, you have to convert them to samples using struct.unpack().

您可以使用波形模块。首先,您应该阅读元数据,例如样本大小或通道数。使用readframes()方法,您可以读取样本,但只能作为字节字符串。根据样本格式,您必须使用struct.unpack()将它们转换为样本。

Alternatively, if you want the samples as an array of floating-point numbers, you can use SciPy's io.wavfile module.

或者,如果您希望将样本作为浮点数组,则可以使用SciPy的io.wavfile模块。

#3


2  

Here's a function to read samples from a wave file (tested with mono & stereo):

这是一个从波形文件中读取样本的功能(使用单声道和立体声测试):

def read_samples(wave_file, nb_frames):
    frame_data = wave_file.readframes(nb_frames)
    if frame_data:
        sample_width = wave_file.getsampwidth()
        nb_samples = len(frame_data) // sample_width
        format = {1:"%db", 2:"<%dh", 4:"<%dl"}[sample_width] % nb_samples
        return struct.unpack(format, frame_data)
    else:
        return ()

And here's the full script that does windowed mixing or concatenating of multiple .wav files. All input files need to have the same params (# of channels and sample width).

这里是完整的脚本,可以窗口混合或连接多个.wav文件。所有输入文件都需要具有相同的参数(通道数和样本宽度)。

import argparse
import itertools
import struct
import sys
import wave

def _struct_format(sample_width, nb_samples):
    return {1:"%db", 2:"<%dh", 4:"<%dl"}[sample_width] % nb_samples

def _mix_samples(samples):
    return sum(samples)//len(samples)

def read_samples(wave_file, nb_frames):
    frame_data = wave_file.readframes(nb_frames)
    if frame_data:
        sample_width = wave_file.getsampwidth()
        nb_samples = len(frame_data) // sample_width
        format = _struct_format(sample_width, nb_samples)
        return struct.unpack(format, frame_data)
    else:
        return ()

def write_samples(wave_file, samples, sample_width):
    format = _struct_format(sample_width, len(samples))
    frame_data = struct.pack(format, *samples)
    wave_file.writeframes(frame_data)

def compatible_input_wave_files(input_wave_files):
    nchannels, sampwidth, framerate, nframes, comptype, compname = input_wave_files[0].getparams()
    for input_wave_file in input_wave_files[1:]:
        nc,sw,fr,nf,ct,cn = input_wave_file.getparams()
        if (nc,sw,fr,ct,cn) != (nchannels, sampwidth, framerate, comptype, compname):
            return False
    return True

def mix_wave_files(output_wave_file, input_wave_files, buffer_size):
    output_wave_file.setparams(input_wave_files[0].getparams())
    sampwidth = input_wave_files[0].getsampwidth()
    max_nb_frames = max([input_wave_file.getnframes() for input_wave_file in input_wave_files])
    for frame_window in xrange(max_nb_frames // buffer_size + 1):
        all_samples = [read_samples(wave_file, buffer_size) for wave_file in input_wave_files]
        mixed_samples = [_mix_samples(samples) for samples in itertools.izip_longest(*all_samples, fillvalue=0)]
        write_samples(output_wave_file, mixed_samples, sampwidth)

def concatenate_wave_files(output_wave_file, input_wave_files, buffer_size):
    output_wave_file.setparams(input_wave_files[0].getparams())
    sampwidth = input_wave_files[0].getsampwidth()
    for input_wave_file in input_wave_files:
        nb_frames = input_wave_file.getnframes()
        for frame_window in xrange(nb_frames // buffer_size + 1):
            samples = read_samples(input_wave_file, buffer_size)
            if samples:
                write_samples(output_wave_file, samples, sampwidth)

def argument_parser():
    parser = argparse.ArgumentParser(description='Mix or concatenate multiple .wav files')
    parser.add_argument('command', choices = ("mix", "concat"), help='command')
    parser.add_argument('output_file', help='ouput .wav file')
    parser.add_argument('input_files', metavar="input_file", help='input .wav files', nargs="+")
    parser.add_argument('--buffer_size', type=int, help='nb of frames to read per iteration', default=1000)
    return parser

if __name__ == '__main__':
    args = argument_parser().parse_args()

    input_wave_files = [wave.open(name,"rb") for name in args.input_files]
    if not compatible_input_wave_files(input_wave_files):
        print "ERROR: mixed wave files must have the same params."
        sys.exit(2)

    output_wave_file = wave.open(args.output_file, "wb")
    if args.command == "mix":
        mix_wave_files(output_wave_file, input_wave_files, args.buffer_size)
    elif args.command == "concat":
        concatenate_wave_files(output_wave_file, input_wave_files, args.buffer_size)

    output_wave_file.close()
    for input_wave_file in input_wave_files:
        input_wave_file.close()

#4


0  

After reading the samples (for example with the wave module, more details here) you may want to have the values scales between -1 and 1 (this is the convention for audio signals).

在读取样本后(例如,使用波形模块,此处有更多详细信息),您可能希望将值缩放在-1和1之间(这是音频信号的惯例)。

In this case, you can add:

在这种情况下,您可以添加:

# scale to -1.0 -- 1.0
max_nb_bit = float(2**(nb_bits-1))  
samples = signal_int / (max_nb_bit + 1.0) 

with nb_bits the bit depth and signal_int the integers values.

nb_bits的位深度和signal_int的整数值。