利用Python进行数据分析_Numpy_基础_2

时间:2021-12-16 00:24:22

 

Numpy数据类型包括:

int8、uint8、int16、uint16、int32、uint32、int64、uint64、float16、float32、float64、float128、complex64、complex128、complex256、bool、object、string_、unicode_

astype

显示转换数组类型的方法

例如:

利用Python进行数据分析_Numpy_基础_2

 

 

 

 

 

 

 

 

 

NumPy数组的索引和切片

索引

和python列表差不多,基本上没啥区别

切片

NumPy数组的切片出来的数值改变,就会改变NumPy数组的源数组的值。NumPy数组的切片是源数组的视图,而不是新复制出来的一个数组。从下面的例子,我们可以看到arr[1,1]=0,arr的数组变化了,data数组对应位置的数值也变化了。

In [101]: data = np.random.randn(4,4)

In [
102]: data
Out[
102]:
array([[
-1.68867271, -0.89369286, -0.0288363 , 0.73855122],
[
-0.13084603, 0.43972144, 0.73542583, 1.99925332],
[
0.04291022, -0.91963212, 3.09214837, -0.6070068 ],
[
-0.01416294, -1.46576298, 1.42196278, 0.84758994]])

In [
103]: arr = data[2:,1:]

In [
104]: arr
Out[
104]:
array([[
-0.91963212, 3.09214837, -0.6070068 ],
[
-1.46576298, 1.42196278, 0.84758994]])

In [
105]: arr = 0

In [
106]: data
Out[
106]:
array([[
-1.68867271, -0.89369286, -0.0288363 , 0.73855122],
[
-0.13084603, 0.43972144, 0.73542583, 1.99925332],
[
0.04291022, -0.91963212, 3.09214837, -0.6070068 ],
[
-0.01416294, -1.46576298, 1.42196278, 0.84758994]])

In [
107]: arr
Out[
107]: 0

In [
108]: arr = data[2:,1:]

In [
109]: arr
Out[
109]:
array([[
-0.91963212, 3.09214837, -0.6070068 ],
[
-1.46576298, 1.42196278, 0.84758994]])

In [
110]: arr == 0
Out[
110]:
array([[False, False, False],
[False, False, False]], dtype
=bool)

In [
111]: arr
Out[
111]:
array([[
-0.91963212, 3.09214837, -0.6070068 ],
[
-1.46576298, 1.42196278, 0.84758994]])

In [
112]: arr[1,1]=0

In [
113]: arr
Out[
113]:
array([[
-0.91963212, 3.09214837, -0.6070068 ],
[
-1.46576298, 0. , 0.84758994]])

In [
114]: data
Out[
114]:
array([[
-1.68867271, -0.89369286, -0.0288363 , 0.73855122],
[
-0.13084603, 0.43972144, 0.73542583, 1.99925332],
[
0.04291022, -0.91963212, 3.09214837, -0.6070068 ],
[
-0.01416294, -1.46576298, 0. , 0.84758994]])

In [
115]:

如果要复制NumPy数组的切片,则可以使用显示复制方法copy()

In [116]: data
Out[
116]:
array([[
-1.68867271, -0.89369286, -0.0288363 , 0.73855122],
[
-0.13084603, 0.43972144, 0.73542583, 1.99925332],
[
0.04291022, -0.91963212, 3.09214837, -0.6070068 ],
[
-0.01416294, -1.46576298, 0. , 0.84758994]])

In [
117]: arr = data

In [
118]: arr
Out[
118]:
array([[
-1.68867271, -0.89369286, -0.0288363 , 0.73855122],
[
-0.13084603, 0.43972144, 0.73542583, 1.99925332],
[
0.04291022, -0.91963212, 3.09214837, -0.6070068 ],
[
-0.01416294, -1.46576298, 0. , 0.84758994]])

In [
119]: arr = np.copy(data)

In [
120]: arr
Out[
120]:
array([[
-1.68867271, -0.89369286, -0.0288363 , 0.73855122],
[
-0.13084603, 0.43972144, 0.73542583, 1.99925332],
[
0.04291022, -0.91963212, 3.09214837, -0.6070068 ],
[
-0.01416294, -1.46576298, 0. , 0.84758994]])

布尔类型索引

假设每个字符串对应data数组一行数据。需要注意布尔型数组的长度必须与被索引的轴长度一致。

通过布尔型索引查找数组数值的方式如下:

In [140]: names = np.array(['aaa','bbb','ccc','ddd','eee','fff'])

In [141]: data = np.random.randn(6,4)

In [142]: names
Out[142]:
array(['aaa', 'bbb', 'ccc', 'ddd', 'eee', 'fff'],
       dtype='<U3')

In [143]: data
Out[143]:
array([[ 0.49394026, -0.65887621, -0.26946242,  0.22042355],
        [-1.11606179, -1.94945158, -0.4866134 ,  0.67712409],
        [-2.33792045,  0.01639887, -0.46020647,  0.84180777],
        [-1.99622938,  1.937877  , -0.17134376,  0.56915872],
        [ 1.50980905,  0.07244016, -0.95650922,  1.23508517],
        [ 0.74706519, -0.03149619, -0.38235363,  0.69786257]])

In [144]: names == 'aaa'
Out[144]: array([ True, False, False, False, False, False], dtype=bool)

In [145]: data[names=='aaa']
Out[145]: array([[ 0.49394026, -0.65887621, -0.26946242,  0.22042355]])

In [146]: names =='ccc'
Out[146]: array([False, False,  True, False, False, False], dtype=bool)

In [147]: data[names=='ccc']
Out[147]: array([[-2.33792045,  0.01639887, -0.46020647,  0.84180777]])

布尔数组索引结合切片进行查找数组的数值:

In [148]: data[names=='aaa',2]
Out[
148]: array([-0.26946242])

In [
149]: data[names=='aaa',2:]
Out[
149]: array([[-0.26946242, 0.22042355]])

In [
150]: data[names=='aaa',1:]
Out[
150]: array([[-0.65887621, -0.26946242, 0.22042355]])

反向查找

In [155]: names !='aaa'
Out[
155]: array([False, True, True, True, True, True], dtype=bool)

In [
156]: data[names!='aaa']
Out[
156]:
array([[
-1.11606179, -1.94945158, -0.4866134 , 0.67712409],
[
-2.33792045, 0.01639887, -0.46020647, 0.84180777],
[
-1.99622938, 1.937877 , -0.17134376, 0.56915872],
[
1.50980905, 0.07244016, -0.95650922, 1.23508517],
[
0.74706519, -0.03149619, -0.38235363, 0.69786257]])

组合查找

In [171]: mask = (names == 'aaa')|(names == 'ccc')

In [
172]: mask
Out[
172]: array([ True, False, True, False, False, False], dtype=bool)

In [
173]: data[mask]
Out[
173]:
array([[
0.49394026, -0.65887621, -0.26946242, 0.22042355],
[
-2.33792045, 0.01639887, -0.46020647, 0.84180777]])

花式索引

其实就是利用整数列表或数组进行索引查找。花式索引与数组切片不同,花式索引会将数据复制到新的数组。

整数列表

创建一个二维数组arr,然后传入[3,1],意思就是按 arr [3,:]、arr[1,:]的顺序显示出来。

In [203]: arr = np.array(([1,2,3,4],[2,3,4,5],[3,4,5,6],[7,8,9,10]))

In [
204]: arr
Out[
204]:
array([[
1, 2, 3, 4],
[
2, 3, 4, 5],
[
3, 4, 5, 6],
[
7, 8, 9, 10]])

In [
205]: arr[[3,1]]
Out[
205]:
array([[
7, 8, 9, 10],
[
2, 3, 4, 5]])

传入多个整数数组

一次传入多个整数数组,返回的是一个一维数组。

数组转置对轴对换

数组转置,是指将原数组A的行与列交换得到的一个新数组。

比如:

利用Python进行数据分析_Numpy_基础_2的转置是利用Python进行数据分析_Numpy_基础_2利用Python进行数据分析_Numpy_基础_2的转置是利用Python进行数据分析_Numpy_基础_2

方法1:T

In [227]: arr = np.random.randn(10)

In [
228]: arr
Out[
228]:
array([
-1.42853867, 1.54300781, -0.74079757, -1.20272388, -1.00416459,
-0.59571731, 1.16744662, 0.05739806, 1.01660691, -0.84625494])

In [
229]: arr.T
Out[
229]:
array([
-1.42853867, 1.54300781, -0.74079757, -1.20272388, -1.00416459,
-0.59571731, 1.16744662, 0.05739806, 1.01660691, -0.84625494])

In [
230]: arr = np.random.randn(3,5)

In [
231]: arr
Out[
231]:
array([[
1.36114118, 0.48455027, 0.64847485, 0.01691785, -0.03622465],
[
-2.31302164, 1.14992892, -1.47836923, 1.08003907, -1.33663009],
[
-0.38005499, 1.3517217 , 2.52024026, -0.3576492 , 0.46016645]])

In [
232]: arr.T
Out[
232]:
array([[
1.36114118, -2.31302164, -0.38005499],
[
0.48455027, 1.14992892, 1.3517217 ],
[
0.64847485, -1.47836923, 2.52024026],
[
0.01691785, 1.08003907, -0.3576492 ],
[
-0.03622465, -1.33663009, 0.46016645]])

方法2:transpose

三维数组 arr:4个3*4的数组

In [275]: arr = np.arange(48).reshape(4,3,4)

In [276]: arr
Out[276]:
array([[[ 0,  1,  2,  3],
         [ 4,  5,  6,  7],
         [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
         [16, 17, 18, 19],
         [20, 21, 22, 23]],

       [[24, 25, 26, 27],
         [28, 29, 30, 31],
         [32, 33, 34, 35]],

       [[36, 37, 38, 39],
         [40, 41, 42, 43],
         [44, 45, 46, 47]]])

     
 

transpose参数的真正意义在于这个shape元组的索引(轴编号)。

In [278]: arr.shape
Out[
278]: (4, 3, 4)

arr数组的索引(轴编号):0、1、2

下面是按索引 2、0、1进行对换

In [277]: arr.transpose(2,0,1)
Out[
277]:
array([[[
0, 4, 8],
[
12, 16, 20],
[
24, 28, 32],
[
36, 40, 44]],

[[
1, 5, 9],
[
13, 17, 21],
[
25, 29, 33],
[
37, 41, 45]],

[[
2, 6, 10],
[
14, 18, 22],
[
26, 30, 34],
[
38, 42, 46]],

[[
3, 7, 11],
[
15, 19, 23],
[
27, 31, 35],
[
39, 43, 47]]])

然后,我们再按(轴编号)0、1、2 对换回到原来的样子

In [279]: arr.transpose(0,1,2)
Out[
279]:
array([[[
0, 1, 2, 3],
[
4, 5, 6, 7],
[
8, 9, 10, 11]],

[[
12, 13, 14, 15],
[
16, 17, 18, 19],
[
20, 21, 22, 23]],

[[
24, 25, 26, 27],
[
28, 29, 30, 31],
[
32, 33, 34, 35]],

[[
36, 37, 38, 39],
[
40, 41, 42, 43],
[
44, 45, 46, 47]]])

方法3:swapaxes

swapaxes返回的是源数组的视图。

相比于transpose是需要传入一个索引元组(轴编号),swapaxes只需要一对索引元组(轴编号)。

In [283]: arr.swapaxes(2,1)
Out[
283]:
array([[[
0, 4, 8],
[
1, 5, 9],
[
2, 6, 10],
[
3, 7, 11]],

[[
12, 16, 20],
[
13, 17, 21],
[
14, 18, 22],
[
15, 19, 23]],

[[
24, 28, 32],
[
25, 29, 33],
[
26, 30, 34],
[
27, 31, 35]],

[[
36, 40, 44],
[
37, 41, 45],
[
38, 42, 46],
[
39, 43, 47]]])