R语言-生成频数表和列联表crosstable函数介绍

列联表crosstable

列联表不仅可以用来做简单的描述性统计，还可以在机器学习中用来比较识别正确率，FPR，TPR等等数据，以便我们比较不同的ML模型 or 调参。

2x2列联表一般长下面这样：

				?

									Total Observations in Table:  143 

									             | test_cancer$diagnosis 

									   lda.class |         0 |         1 | Row Total | 

									-------------|-----------|-----------|-----------|

									           0 |        82 |        11 |        93 | 

									             |     0.882 |     0.118 |     0.650 | 

									             |     0.988 |     0.183 |           | 

									             |     0.573 |     0.077 |           | 

									-------------|-----------|-----------|-----------|

									           1 |         1 |        49 |        50 | 

									             |     0.020 |     0.980 |     0.350 | 

									             |     0.012 |     0.817 |           | 

									             |     0.007 |     0.343 |           | 

									-------------|-----------|-----------|-----------|

									Column Total |        83 |        60 |       143 | 

									             |     0.580 |     0.420 |           | 

									-------------|-----------|-----------|-----------|

创建列联表crosstable

推荐使用R中“gmodels”包的CrossTable()函数来做。

举例

				?

									## 使用knn模型做预测

									knn_pred_1 = knn(train_cancer[,2:4], test_cancer[,2:4], train_cancer$diagnosis, k=1)

									## 创建列联表看预测效果

									CrossTable(x = knn_pred_1, y = test_cancer$diagnosis, prop.chisq = FALSE)

									>

									   Cell Contents

									|-------------------------|

									|                       N |

									|           N / Row Total |

									|           N / Col Total |

									|         N / Table Total |

									|-------------------------|

									Total Observations in Table:  143 

									             | test_cancer$diagnosis 

									  knn_pred_1 |         0 |         1 | Row Total | 

									-------------|-----------|-----------|-----------|

									           0 |        77 |         8 |        85 | 

									             |     0.906 |     0.094 |     0.594 | 

									             |     0.928 |     0.133 |           | 

									             |     0.538 |     0.056 |           | 

									-------------|-----------|-----------|-----------|

									           1 |         6 |        52 |        58 | 

									             |     0.103 |     0.897 |     0.406 | 

									             |     0.072 |     0.867 |           | 

									             |     0.042 |     0.364 |           | 

									-------------|-----------|-----------|-----------|

									Column Total |        83 |        60 |       143 | 

									             |     0.580 |     0.420 |           | 

									-------------|-----------|-----------|-----------|

注意事项

在crosstable函数中，prop.chisq 这个argument默认是true，但实际上大部分使用场景不需要这个卡方概率，所以可以单独在函数中设置prop.chisq = FALSE

函数语法：

				?

									CrossTable(x, y, digits=3, max.width = 5, expected=FALSE, prop.r=TRUE, prop.c=TRUE,

									           prop.t=TRUE, prop.chisq=TRUE, chisq = FALSE, fisher=FALSE, mcnemar=FALSE,

									           resid=FALSE, sresid=FALSE, asresid=FALSE,

									           missing.include=FALSE,

									           format=c("SAS","SPSS"), dnn = NULL, ...)

参数说明：

x,y:列联表的两个特征向量

digit:指定结果小数位数

prop.r:行比例是否加入

prop.c:列比例是否加入

prop.t:表比例是否加入

prop.chisq:每个单元的卡方值是否加入

chisq:卡方检验结果是否加入

频数表

频数表给出了各个特征值出现的频数，下面使用R自带的数据集“CO2”举例

				?

									head(CO2)

									#得到“conc”特征的频数表

									table(CO2$conc)

结果：

95 175 250 350 500 675 1000

12 12 12 12 12 12 12

补充：R--生成各种列联表

看代码吧~

				?

									library(vcd)

									head(Arthritis)

									table(Arthritis$Treatment,Arthritis$Improved)

									with(Arthritis,table(Treatment,Improved))

									mytable <- xtabs(~Treatment+Improved,data = Arthritis)

									with(Arthritis,xtabs(~Treatment+Improved,data = Arthritis))

									margin.table(mytable,2) # sum by row

									prop.table(mytable,2)  #proportion by column

									prop.table(mytable)  #proportion by total

									addmargins(mytable)

									addmargins(mytable,1)

									addmargins(prop.table(mytable,2),1)

									library(gmodels)

									CrossTable(Arthritis$Treatment,Arthritis$Improved) ##SAS format

以上为个人经验，希望能给大家一个参考，也希望大家多多支持服务器之家。如有错误或未考虑完全的地方，望不吝赐教。

原文链接：https://blog.csdn.net/Yann_YU/article/details/107359130

秒客网