【Feature Selection】1-特征递归消除 Recursive Feature Elimination

时间:2024-12-10 22:54:33

合理的特征选择(feature set selection)有时能给模型表现与解释性、简洁性带来益处。本文以gradient boosting machine和support vector machine为例展示此过程。

Applied Machine Learning Using mlr3 in R - 6  Feature Selection

mlr-org - Recursive Feature Elimination on the Sonar Data Set

‘Currently, RFE works with support vector machines (SVM), decision tree algorithms and gradient boosting machines (GBM). Supported learners are tagged with the "importance" property.’(RFE是基于重要度排序的)

RFE-CV是RFE的一种变体。‘RFE-CV estimates the optimal number of features with cross-validation first. Then one more RFE is carried out on the complete dataset with the optimal number of features as the final feature set size.’

Typical RFE:

  1. 创建optimizer,存储相关参数设置
  2. 创建机器学习任务
  3. 创建机器学习器
  4. define the feature selection problem
  5. 将feature selection problem传递给optimizer
  6. 用最佳feature set在全数据集上训练最终模型,测试集评估其表现
  1. library(mlr3verse)
  2. #1,retrieve the RFE optimizer with the fs() function.
  3. optimizer = fs("rfe",
  4. n_features = 1,
  5. feature_number = 1,
  6. aggregation = "rank")
  7. #The optimizer stops when the number of features equals n_features.
  8. #The parameters feature_number, feature_fraction and subset_size determine the #number of features that are removed in each iteration.
  9. #2
  10. task = tsk("sonar")
  11. #3
  12. learner = lrn("",
  13. distribution = "bernoulli",
  14. predict_type = "prob")
  15. #4
  16. instance = fsi(
  17. task = task,
  18. learner = learner,
  19. resampling = rsmp("cv", folds = 6),#重抽样策略6折cv
  20. measures = msr(""),#模型表现度量为auc
  21. terminator = trm("none"))#终点:none,因为我们之前设置了最终feature数量为终点
  22. #5
  23. optimizer$optimize(instance)
  24. instance$result

 

 对特征选择过程进行可视化

  1. library(viridisLite)
  2. library(mlr3misc)
  3. data = as.data.table(instance$archive)
  4. data[, n:= map_int(importance, length)]
  5. ggplot(data, aes(x = n, y = )) +
  6. geom_line(
  7. color = viridis(1, begin = 0.5),
  8. linewidth = 1) +
  9. geom_point(
  10. fill = viridis(1, begin = 0.5),
  11. shape = 21,
  12. size = 3,
  13. stroke = 0.5,
  14. alpha = 0.8) +
  15. xlab("Number of Features") +
  16. scale_x_reverse() +
  17. theme_minimal()

Optimization path of the feature selection. We observe that the performance increases first as the number of features decreases. As soon as informative features are removed, the performance drops.

RFE-CV:

原理:RFE-CV在筛选特征前先通过CV确定最佳特征数量。

RFE-CV estimates the optimal number of features before selecting a feature set. For this, an RFE is run in each resampling iteration and the number of features with the best mean performance is selected. Then one more RFE is carried out on the complete dataset with the optimal number of features as the final feature set size.

 

 

  1. optimizer = fs("rfecv",
  2. n_features = 1,
  3. feature_number = 1) #no aggregation needed
  4. learner = lrn("",
  5. type = "C-classification",
  6. kernel = "linear",
  7. predict_type = "prob")
  8. instance = fsi(
  9. task = task,
  10. learner = learner,
  11. resampling = rsmp("cv", folds = 6),#6折cv确定feature set大小
  12. measures = msr(""),
  13. terminator = trm("none"),
  14. callback = clbk("mlr3fselect.svm_rfe"))
  15. optimizer$optimize(instance)
  16. library(ggplot2)
  17. library(viridisLite)
  18. library(mlr3misc)
  19. data = (instance$archive)[!(iteration), ]
  20. aggr = data[, list("y" = mean(unlist(.SD))), by = "batch_nr", .SDcols = ""]
  21. aggr[, batch_nr := 61 - batch_nr]
  22. data[, n:= map_int(importance, length)]
  23. ggplot(aggr, aes(x = batch_nr, y = y)) +
  24. geom_line(
  25. color = viridis(1, begin = 0.5),
  26. linewidth = 1) +
  27. geom_point(
  28. fill = viridis(1, begin = 0.5),
  29. shape = 21,
  30. size = 3,
  31. stroke = 0.5,
  32. alpha = 0.8) +
  33. geom_vline(
  34. xintercept = aggr[y == max(y)]$batch_nr,
  35. colour = viridis(1, begin = 0.33),
  36. linetype = 3
  37. ) +
  38. xlab("Number of Features") +
  39. ylab("Mean AUC") +
  40. scale_x_reverse() +
  41. theme_minimal()
  42. #We subset the task to the optimal feature set and train the learner.
  43. task$select(instance$result_feature_set)
  44. learner$train(task)
  45. #The trained model can now be used to predict new, external data.

 Estimation of the optimal number of features. The best mean performance is achieved with 19 features (blue line).