R:如何计算数据表?

时间:2022-02-14 14:58:19

I'm trying to use the aprior package, but need to factor my data first. I have a data table. Some of the columns have fewer than 3 factors (true/false, 0/1), and others are continuous variables.

我正在尝试使用aprior包,但需要首先考虑我的数据。我有一个数据表。有些列的因子少于3个(真/假,0/1),其他列是连续变量。

It appears that I need to factor the table as follows

看来我需要将表格考虑如下

1) skip the key variables
2) leave the true/false columns alone 
3) factor if there are less than 6 unique values
4) for more than 5 unique values, then factor by quantile

The code example below meets these goals, and apriori is running (now).

下面的代码示例符合这些目标,而apriori正在运行(现在)。

I get a warning that I have trouble understanding. Can someone explain the warning, and how to correct it?

我得到一个警告,我无法理解。有人可以解释警告,以及如何纠正它?

library(data.table)
nSamples = 5000
set.seed(13)
dat <- data.table(id1=sample(seq(10000,10300),nSamples,replace=T),
                  id2=sample(100,nSamples,replace=T),
                  tfvar = sample(c(T,F),nSamples,replace=T),
                  contvar = runif(nSamples,1,2.3), 
                  disvar = sample(c(1,2),nSamples,replace=T))
setkey(dat,id1,id2)
colsToFactor <- setdiff(names(dat),key(dat))
cdat <- dat
myfact<-function(x) {
  if (typeof(x)== 'logical') {
    return(x)
  }
  nux <- length(unique(x))
  if (nux<3) {
    cx <- factor(x)
  } else {
    cx <- cut2(x,g=5)
  }
  return(cx)
}

myprint<-function(xl) {
  if (is.factor(xl)) {
    print(levels(xl))
  } else {
    print('not a factor')
  }
}
cdat[,(colsToFactor):=lapply(.SD, myfact),.SDcols=colsToFactor]
jnk<-cdat[, lapply(.SD, myprint)]
print(cdat)

Here is the output

这是输出

[1] "not a factor"
[1] "not a factor"
[1] "not a factor"
[1] "[1.00,1.27)" "[1.27,1.53)" "[1.53,1.79)" "[1.79,2.04)" "[2.04,2.30]"
[1] "1" "2"
        id1 id2 tfvar     contvar disvar
   1: 10000   4 FALSE [1.53,1.79)      2
   2: 10000  15 FALSE [2.04,2.30]      2
   3: 10000  18 FALSE [1.53,1.79)      2
   4: 10000  22  TRUE [1.00,1.27)      1
   5: 10000  22 FALSE [1.00,1.27)      2
  ---                                   
4996: 10300  81 FALSE [1.00,1.27)      2
4997: 10300  89  TRUE [1.79,2.04)      2
4998: 10300  89  TRUE [1.79,2.04)      1
4999: 10300  90  TRUE [1.79,2.04)      1
5000: 10300  93 FALSE [1.00,1.27)      1

And the warning message is

警告信息是

Warning message:
In as.data.table.list(jval) :
  Item 5 is of size 2 but maximum size is 5 (recycled leaving a remainder of 1 items)

How to get rid of this warning?

如何摆脱这种警告?

1 个解决方案

#1


1  

The problem is in the print statement

问题出在print语句中

print(levels(xl))

should be

应该

print(paste('factor(s) are',paste(levels(xl),collapse=', ')))

That modification removes the warning.

该修改删除了警告。

#1


1  

The problem is in the print statement

问题出在print语句中

print(levels(xl))

should be

应该

print(paste('factor(s) are',paste(levels(xl),collapse=', ')))

That modification removes the warning.

该修改删除了警告。