This question already has an answer here:
这个问题在这里已有答案:
- How to plot confusion matrix with string axis rather than integer in python 3 answers
如何在python 3答案中使用字符串轴而不是整数来绘制混淆矩阵
I am using scikit-learn for classification of text documents(22000) to 100 classes. I use scikit-learn's confusion matrix method for computing the confusion matrix.
我正在使用scikit-learn将文本文档(22000)分类为100个类。我使用scikit-learn的混淆矩阵方法来计算混淆矩阵。
model1 = LogisticRegression()
model1 = model1.fit(matrix, labels)
pred = model1.predict(test_matrix)
cm=metrics.confusion_matrix(test_labels,pred)
print(cm)
plt.imshow(cm, cmap='binary')
This is how my confusion matrix looks like:
这就是我的混淆矩阵的样子:
[[3962 325 0 ..., 0 0 0]
[ 250 2765 0 ..., 0 0 0]
[ 2 8 17 ..., 0 0 0]
...,
[ 1 6 0 ..., 5 0 0]
[ 1 1 0 ..., 0 0 0]
[ 9 0 0 ..., 0 0 9]]
However, I do not receive a clear or legible plot. Is there a better way to do this?
但是,我没有收到明确或清晰的情节。有一个更好的方法吗?
3 个解决方案
#1
54
you can use plt.matshow()
instead of plt.imshow()
or you can use seaborn module's heatmap
(see documentation) to plot the confusion matrix
您可以使用plt.matshow()而不是plt.imshow(),或者您可以使用seaborn模块的热图(参见文档)来绘制混淆矩阵
import seaborn as sn
import pandas as pd
import matplotlib.pyplot as plt
array = [[33,2,0,0,0,0,0,0,0,1,3],
[3,31,0,0,0,0,0,0,0,0,0],
[0,4,41,0,0,0,0,0,0,0,1],
[0,1,0,30,0,6,0,0,0,0,1],
[0,0,0,0,38,10,0,0,0,0,0],
[0,0,0,3,1,39,0,0,0,0,4],
[0,2,2,0,4,1,31,0,0,0,2],
[0,1,0,0,0,0,0,36,0,2,0],
[0,0,0,0,0,0,1,5,37,5,1],
[3,0,0,0,0,0,0,0,0,39,0],
[0,0,0,0,0,0,0,0,0,0,38]]
df_cm = pd.DataFrame(array, index = [i for i in "ABCDEFGHIJK"],
columns = [i for i in "ABCDEFGHIJK"])
plt.figure(figsize = (10,7))
sn.heatmap(df_cm, annot=True)
#2
26
@bninopaul 's answer is not completely for beginners
@bninopaul的回答并不完全适合初学者
here is the code you can "copy and run"
这是您可以“复制并运行”的代码
import seaborn as sn
import pandas as pd
import matplotlib.pyplot as plt
array = [[13,1,1,0,2,0],
[3,9,6,0,1,0],
[0,0,16,2,0,0],
[0,0,0,13,0,0],
[0,0,0,0,15,0],
[0,0,1,0,0,15]]
df_cm = pd.DataFrame(array, range(6),
range(6))
#plt.figure(figsize = (10,7))
sn.set(font_scale=1.4)#for label size
sn.heatmap(df_cm, annot=True,annot_kws={"size": 16})# font size
#3
5
IF you want more data in you confusion matrix, including "totals column" and "totals line", and percents (%) in each cell, like matlab default (see image below)
如果你想在混淆矩阵中有更多数据,包括“totals column”和“totals line”,以及每个单元格中的百分比(%),就像matlab默认(见下图)
including the Heatmap and other options...
包括热图和其他选项......
You should have fun with the module above, shared in the github ; )
你应该对上面的模块感兴趣,在github*享; )
https://github.com/wcipriano/pretty-print-confusion-matrix
This module can do your task easily and produces the output above with a lot of params to customize your CM:
这个模块可以轻松完成你的任务,并产生上面的输出,有很多参数来定制你的CM:
#1
54
you can use plt.matshow()
instead of plt.imshow()
or you can use seaborn module's heatmap
(see documentation) to plot the confusion matrix
您可以使用plt.matshow()而不是plt.imshow(),或者您可以使用seaborn模块的热图(参见文档)来绘制混淆矩阵
import seaborn as sn
import pandas as pd
import matplotlib.pyplot as plt
array = [[33,2,0,0,0,0,0,0,0,1,3],
[3,31,0,0,0,0,0,0,0,0,0],
[0,4,41,0,0,0,0,0,0,0,1],
[0,1,0,30,0,6,0,0,0,0,1],
[0,0,0,0,38,10,0,0,0,0,0],
[0,0,0,3,1,39,0,0,0,0,4],
[0,2,2,0,4,1,31,0,0,0,2],
[0,1,0,0,0,0,0,36,0,2,0],
[0,0,0,0,0,0,1,5,37,5,1],
[3,0,0,0,0,0,0,0,0,39,0],
[0,0,0,0,0,0,0,0,0,0,38]]
df_cm = pd.DataFrame(array, index = [i for i in "ABCDEFGHIJK"],
columns = [i for i in "ABCDEFGHIJK"])
plt.figure(figsize = (10,7))
sn.heatmap(df_cm, annot=True)
#2
26
@bninopaul 's answer is not completely for beginners
@bninopaul的回答并不完全适合初学者
here is the code you can "copy and run"
这是您可以“复制并运行”的代码
import seaborn as sn
import pandas as pd
import matplotlib.pyplot as plt
array = [[13,1,1,0,2,0],
[3,9,6,0,1,0],
[0,0,16,2,0,0],
[0,0,0,13,0,0],
[0,0,0,0,15,0],
[0,0,1,0,0,15]]
df_cm = pd.DataFrame(array, range(6),
range(6))
#plt.figure(figsize = (10,7))
sn.set(font_scale=1.4)#for label size
sn.heatmap(df_cm, annot=True,annot_kws={"size": 16})# font size
#3
5
IF you want more data in you confusion matrix, including "totals column" and "totals line", and percents (%) in each cell, like matlab default (see image below)
如果你想在混淆矩阵中有更多数据,包括“totals column”和“totals line”,以及每个单元格中的百分比(%),就像matlab默认(见下图)
including the Heatmap and other options...
包括热图和其他选项......
You should have fun with the module above, shared in the github ; )
你应该对上面的模块感兴趣,在github*享; )
https://github.com/wcipriano/pretty-print-confusion-matrix
This module can do your task easily and produces the output above with a lot of params to customize your CM:
这个模块可以轻松完成你的任务,并产生上面的输出,有很多参数来定制你的CM: