I'm trying to create a confusion matrix preferably using the confusionMatrix() function , however I'm getting this error : Error in sort.list(y) : 'x' must be atomic for 'sort.list' Have you called 'sort' on a list?
我正在尝试创建一个混乱矩阵,最好使用混乱矩阵()函数,但是我得到了这个错误:sort.list(y)中的错误:“x”必须是“排序”的原子。名单上有你叫“排序”的名单吗?
I also tried using the table() function but got the same error.
我还尝试使用table()函数,但是得到了相同的错误。
Below is my entire code:
下面是我的全部代码:
#install load libraries
install.packages('MASS')
install.packages('tree')
install.packages("e1071")
install.packages("caret")
library('MASS')
library('tree')
library('e1071')
library('caret')
set.seed(1985)
#GET DATA
training <- read.csv("C:/Users/anaim/data_minig_project/pml-training.csv",header=TRUE, sep=",", na.strings="NA", dec=".", strip.white=TRUE)
training_df <- data.frame(training,stringsAsFactors=FALSE)
nrow(training_df)
ncol(training_df)
#create train & test set splits
inTrain <- createDataPartition(y=training_df$classe, p=0.75, list=FALSE)
training_data <- training_df[inTrain,]
testing_data<- training_df[-inTrain,]
#FEATURE SELECTION & DATA CLEANING
#one can see numbers of features is quite large with 160 columns, therefore we will refer to the studies such as paper #1 to start and reduce the number of features
#subset based on features mentioned studies
training_data_subset <- subset(training_data, select=c("avg_roll_belt","var_roll_belt","var_total_accel_belt","amplitude_roll_belt","max_roll_belt","var_roll_belt",
"var_accel_arm","magnet_arm_x","magnet_arm_y","magnet_arm_z","accel_dumbbell_y","accel_dumbbell_z","magnet_dumbbell_x","gyros_dumbbell_x",
"gyros_dumbbell_y","gyros_dumbbell_z","pitch_forearm","gyros_forearm_x","gyros_forearm_y","classe"))
#subset based on features mentioned studies
testing_data_subset <- subset(testing_data, select=c("avg_roll_belt","var_roll_belt","var_total_accel_belt","amplitude_roll_belt","max_roll_belt","var_roll_belt",
"var_accel_arm","magnet_arm_x","magnet_arm_y","magnet_arm_z","accel_dumbbell_y","accel_dumbbell_z","magnet_dumbbell_x","gyros_dumbbell_x",
"gyros_dumbbell_y","gyros_dumbbell_z","pitch_forearm","gyros_forearm_x","gyros_forearm_y","classe"))
#all NAs to 0
testing_data_subset[is.na(testing_data_subset)] <- 0
training_data_subset[is.na(training_data_subset)] <- 0
#load library(e1071) before using skewness()
#load library(e1071) befortraining_datae using skewness()
#investigate skewness
# Interpretation of skewness - http://www.tc3.edu/instruct/sbrown/stat/shape.htm#SkewnessCompute
skewness_result <- apply(training_data_subset[, sapply(training_data_subset, is.numeric)], 2, skewness)
skewness_df <- data.frame(skewness_result)
#remove highly skewed columns
remove <- c("var_roll_belt","var_total_accel_belt","amplitude_roll_belt","var_roll_belt","var_roll_belt.1","magnet_dumbbell_x")
training_data_subset <- training_data_subset[, !(colnames(training_data_subset) %in% remove), drop=FALSE]
testing_data_subset <- testing_data_subset[, !(colnames(testing_data_subset) %in% remove), drop=FALSE]
#valid columns were removed
ncol(training_data_subset)
ncol(testing_data_subset)
#BUILD MODEL
#1)decision tree
exercise.model <- tree(formula = classe ~ ., data = training_data_subset)
summary(exercise.model)
plot(exercise.model)
text(exercise.model ,pretty =0)
#MODEL EVALUATION
exercise.prediction <- predict(exercise.model,newdata = testing_data_subset, type="tree")
**#THIS IS WERE I GET THE ERROR**
confusionMatrix(exercise.prediction,testing_data_subset[['classe']])
confusionMatrix(exercise.prediction,testing_data_subset$classe)
**# I also tried table() just to get raw True (positive + True Negatives / Total) values but I got the same error**
table(exercise.prediction, testing_data_subset[['classe']])
table(exercise.prediction,testing_data_subset$classe)
Any help in creating the confusion matrix using the confusionMatrix() will be appreciated.
任何帮助创建混乱矩阵使用困惑矩阵()将被欣赏。
Thanks
谢谢
1 个解决方案
#1
1
tree() function is R's base function for building a decision tree, however the confusionMatrix() is part of the CARET package, therefore the output of tree() was for some reason not compatible with confusionMatrix(). When I replaced tree() with exercise.model <- train(classe ~ ., preProcess = c("center", "scale", "BoxCox", "pca"), data =training_data_subset ,method ="rpart");
Than I got the confusionMatrix() and table() functions to work.
tree()函数是用于构建决策树的R的基本函数,但是困惑矩阵()是插入符号包的一部分,因此树()的输出由于某些原因与混淆矩阵()不相容。当我用运动代替树()时。模型<- train(classe ~ ., preProcess = c("center", "scale", "BoxCox", "pca"), data = training_data_子集,方法="rpart");我得到了混乱矩阵()和表()函数来工作。
#1
1
tree() function is R's base function for building a decision tree, however the confusionMatrix() is part of the CARET package, therefore the output of tree() was for some reason not compatible with confusionMatrix(). When I replaced tree() with exercise.model <- train(classe ~ ., preProcess = c("center", "scale", "BoxCox", "pca"), data =training_data_subset ,method ="rpart");
Than I got the confusionMatrix() and table() functions to work.
tree()函数是用于构建决策树的R的基本函数,但是困惑矩阵()是插入符号包的一部分,因此树()的输出由于某些原因与混淆矩阵()不相容。当我用运动代替树()时。模型<- train(classe ~ ., preProcess = c("center", "scale", "BoxCox", "pca"), data = training_data_子集,方法="rpart");我得到了混乱矩阵()和表()函数来工作。