I am using xgboost in R. I had a matrix and created the xgb matrix fine, but when I reduce the columns in the data, I am getting the following error: Error in xgb.setinfo(dmat, names(p), p[[1]]) : The length of labels must equal to the number of rows in the input data
我在r中使用xgboost,我有一个矩阵,并创建了xgb矩阵,但是当我减少数据中的列时,我得到了以下错误:xgb中的错误。setinfo(dmat, names(p), p[[1]]):标签的长度必须等于输入数据中的行数。
Here is the R code:
这是R代码:
xgbmat1=xgb.DMatrix(Matrix(data.matrix(ctt1)),label=as.matrix(as.numeric(data$V2))-1) xgbmat1=xgb.DMatrix(Matrix(data.matrix(ctt1[,nr])),label=as.matrix(as.numeric(data$V2))-1)
xgbmat1 = xgb.DMatrix(矩阵(data.matrix(ctt1)),标签= as.matrix(as.numeric(数据V2)美元)1)xgbmat1 = xgb.DMatrix(矩阵(data.matrix(ctt1(nr))),标签= as.matrix(as.numeric(数据V2)美元)1)
the first works fine though.
第一个作品很好。
dim(ctt1[,nr])
暗(ctt1(nr))
[1] 6401 1048
6401年[1]6401
dim(ctt1)
暗(ctt1)
[1] 6401 5901
6401年[1]6401
3 个解决方案
#1
2
It turns out that by removing some columns, there are some rows with all 0s, and could not contribute to model.
事实证明,通过删除一些列,有一些列与所有的0,并且不能对模型作出贡献。
#2
1
For sparse matrices, xgboost R interface uses the CSC format creation method. The problem currently is that this method automatically determines the number of rows from the existing non-sparse values, and any completely sparse rows at the end are not counted in. A similar loss of completely sparse columns at the end can happen with the CSR sparse format. For more details see xgboost issue #1223 and also wikipedia on the sparse matrix formats.
对于稀疏矩阵,xgboost R接口使用CSC格式创建方法。目前的问题是,该方法自动地从现有的非稀疏值中确定行数,并且在末尾的任何完全稀疏的行都不被计算在内。在最后可能会出现类似的完全稀疏列的损失。有关更多细节,请参见xgboost问题#1223,以及关于稀疏矩阵格式的wikipedia。
#3
1
In my case I fixed this error by changing assign operation:
在我的例子中,我通过改变分配操作来修正这个错误:
labels <- df_train$target_feature
标签< - df_train target_feature美元
#1
2
It turns out that by removing some columns, there are some rows with all 0s, and could not contribute to model.
事实证明,通过删除一些列,有一些列与所有的0,并且不能对模型作出贡献。
#2
1
For sparse matrices, xgboost R interface uses the CSC format creation method. The problem currently is that this method automatically determines the number of rows from the existing non-sparse values, and any completely sparse rows at the end are not counted in. A similar loss of completely sparse columns at the end can happen with the CSR sparse format. For more details see xgboost issue #1223 and also wikipedia on the sparse matrix formats.
对于稀疏矩阵,xgboost R接口使用CSC格式创建方法。目前的问题是,该方法自动地从现有的非稀疏值中确定行数,并且在末尾的任何完全稀疏的行都不被计算在内。在最后可能会出现类似的完全稀疏列的损失。有关更多细节,请参见xgboost问题#1223,以及关于稀疏矩阵格式的wikipedia。
#3
1
In my case I fixed this error by changing assign operation:
在我的例子中,我通过改变分配操作来修正这个错误:
labels <- df_train$target_feature
标签< - df_train target_feature美元