如何将NumPy数组规范化到一定范围内?

After doing some processing on an audio or image array, it needs to be normalized within a range before it can be written back to a file. This can be done like so:

在对音频或图像数组进行一些处理之后，需要在一定范围内对其进行规范化，然后才能将其写入文件。可以这样做:

# Normalize audio channels to between -1.0 and +1.0
audio[:,0] = audio[:,0]/abs(audio[:,0]).max()
audio[:,1] = audio[:,1]/abs(audio[:,1]).max()

# Normalize image to between 0 and 255
image = image/(image.max()/255.0)

Is there a less verbose, convenience function way to do this? matplotlib.colors.Normalize() doesn't seem to be related.

有没有一种更简洁，更方便的方法来做这个?normalize()似乎并不相关。

6 个解决方案

#1

audio /= np.max(np.abs(audio),axis=0)
image *= (255.0/image.max())

Using /= and *= allows you to eliminate an intermediate temporary array, thus saving some memory. Multiplication is less expensive than division, so

使用/=和*=可以消除中间临时数组，从而节省一些内存。乘法比除法便宜，所以

image *= 255.0/image.max()    # Uses 1 division and image.size multiplications

is marginally faster than

略快于

image /= image.max()/255.0    # Uses 1+image.size divisions

Since we are using basic numpy methods here, I think this is about as efficient a solution in numpy as can be.

由于我们在这里使用的是基本的numpy方法，所以我认为这是numpy中最有效的解决方案。

#2

You can also rescale using sklearn. The advantages are that you can adjust normalize the standard deviation, in addition to mean-centering the data, and that you can do this on either axis, by features, or by records.

您还可以使用sklearn重新缩放比例。这样做的好处是，除了以数据为中心的平均值外，还可以调整标准偏差的规范化，并且可以在任意一个轴上，通过特性或记录进行调整。

from sklearn.preprocessing import scale
X = scale( X, axis=0, with_mean=True, with_std=True, copy=True )

The keyword arguments axis, with_mean, with_std are self explanatory, and are shown in their default state. The argument copy performs the operation in-place if it is set to False. Documentation here.

关键字参数轴with_mean、with_std是自解释的，并且显示在它们的默认状态中。如果设置为False，则参数复制执行操作。这里的文档。

#3

You can use the "i" (as in idiv, imul..) version, and it doesn't look half bad:

你可以使用“i”(就像idiv, imul..)版本，它看起来还不错:

image /= (image.max()/255.0)

For the other case you can write a function to normalize an n-dimensional array by colums:

对于另一种情况，你可以写一个函数使一个n维数组标准化:

def normalize_columns(arr):
    rows, cols = arr.shape
    for col in xrange(cols):
        arr[:,col] /= abs(arr[:,col]).max()

#4

If the array contains both positive and negative data, I'd go with:

如果数组包含正数据和负数据，我会选择:

import numpy as np

a = np.random.rand(3,2)

# Normalised [0,1]
b = (a - np.min(a))/np.ptp(a)

# Normalised [0,255] as integer
c = (255*(a - np.min(a))/np.ptp(a)).astype(int)

# Normalised [-1,1]
d = 2*(a - np.min(a))/np.ptp(a)-1

also, worth mentioning even if it's not OP's question, standardization:

同样值得一提的是，即使不是OP的问题，标准化:

e = (a - np.mean(a)) / np.std(a)

#5

A simple solution is using the scalers offered by the sklearn.preprocessing library.

一个简单的解决方案是使用sklearn提供的标量。预处理的图书馆。

scaler = sk.MinMaxScaler(feature_range=(0, 250))
scaler = scaler.fit(X)
X_scaled = scaler.transform(X)
# Checking reconstruction
X_rec = scaler.inverse_transform(X_scaled)

The error X_rec-X will be zero. You can adjust the feature_range for your needs, or even use a standart scaler sk.StandardScaler()

错误X_rec-X将为零。您可以根据需要调整功能范围，甚至可以使用标准标量sk.StandardScaler()

#6

I tried following this, and got the error

我尝试遵循这个，得到了错误

TypeError: ufunc 'true_divide' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind''

The numpy array I was trying to normalize was an integer array. It seems they deprecated type casting in versions > 1.10, and you have to use numpy.true_divide() to resolve that.

我试图规范化的numpy数组是一个整数数组。看起来他们在> 1.10版本中已经弃用了类型转换，您必须使用numpy.true_divide()来解决这个问题。

arr = np.array(img)
arr = np.true_divide(arr,[255.0],out=None)

img was an PIL.Image object.

img是公益诉讼。图像对象。

#1

audio /= np.max(np.abs(audio),axis=0)
image *= (255.0/image.max())