读入数据和预处理
import keras
from keras import layers
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
%matplotlib inline
Using TensorFlow backend.
df = pd.read_csv('./data/Iris.csv')
df.head()
Id | SepalLengthCm | SepalWidthCm | PetalLengthCm | PetalWidthCm | Species | |
---|---|---|---|---|---|---|
0 | 1 | 5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa |
1 | 2 | 4.9 | 3.0 | 1.4 | 0.2 | Iris-setosa |
2 | 3 | 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa |
3 | 4 | 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa |
4 | 5 | 5.0 | 3.6 | 1.4 | 0.2 | Iris-setosa |
df.Species.unique()
array(['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'], dtype=object)
# 将其映射到0,1,2上
spec_dict = {'Iris-setosa':0, 'Iris-versicolor':1, 'Iris-virginica':2}
df['Species'] = df.Species.map(spec_dict)
df.head()
Id | SepalLengthCm | SepalWidthCm | PetalLengthCm | PetalWidthCm | Species | |
---|---|---|---|---|---|---|
0 | 1 | 5.1 | 3.5 | 1.4 | 0.2 | 0 |
1 | 2 | 4.9 | 3.0 | 1.4 | 0.2 | 0 |
2 | 3 | 4.7 | 3.2 | 1.3 | 0.2 | 0 |
3 | 4 | 4.6 | 3.1 | 1.5 | 0.2 | 0 |
4 | 5 | 5.0 | 3.6 | 1.4 | 0.2 | 0 |
打乱:
# 生成该区间的随意唯一索引
index = np.random.permutation(len(df))
# 用生成的乱的索引就能将其打乱了
df = df.iloc[index ,:]
划分x和y:
x = df.iloc[:, 1:-1]
y = df.Species
x.head(), y.head()
( SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm
89 5.5 2.5 4.0 1.3
10 5.4 3.7 1.5 0.2
149 5.9 3.0 5.1 1.8
88 5.6 3.0 4.1 1.3
17 5.1 3.5 1.4 0.3, 89 1
10 0
149 2
88 1
17 0
Name: Species, dtype: int64)
建立模型
注意,无论使用one-hot编码还是这样的顺序编码,Softmax多分类的输出都是类别的数目,虽然这里标签只有一列,但是模型的输出仍然设置成3维的。因为要对每个Logits做Softmax运算。
model = keras.Sequential()
model.add(layers.Dense(3, input_dim=4, activation='softmax'))
WARNING:tensorflow:From E:\MyProgram\Anaconda\envs\krs\lib\site-packages\tensorflow\python\framework\op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_1 (Dense) (None, 3) 15
=================================================================
Total params: 15
Trainable params: 15
Non-trainable params: 0
_________________________________________________________________
编译模型
# 注意顺序编码时Loss采用sparse_categorical_crossentropy
model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['acc']
)
训练模型
history = model.fit(x, y, epochs=300, verbose=0)
WARNING:tensorflow:From E:\MyProgram\Anaconda\envs\krs\lib\site-packages\tensorflow\python\ops\math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
绘制loss和acc变化曲线
plt.plot(range(300),history.history.get('loss'))
[<matplotlib.lines.Line2D at 0x140f1860>]
plt.plot(range(300),history.history.get('acc'))
[<matplotlib.lines.Line2D at 0x14180c18>]