在rpart分类树中没有科学标记的打印日期。

时间:2022-10-04 09:21:52

When I create an rpart tree that uses a date cutoff at a node, the print methods I use - both rpart.plot and fancyRpartPlot - print the dates in scientific notation, which makes it hard to interpret the result. Here's the fancyRpartPlot:

当我创建一个在节点上使用日期截断的rpart树时,我使用的打印方法都是rpart。用科学的符号把日期打印出来,这样就很难解释结果。这是fancyRpartPlot:

在rpart分类树中没有科学标记的打印日期。

Is there a way to print this tree with more interpretable date values? This tree plot is meaningless as all those dates look the same.

是否有一种方法可以用更多可解释的日期值来打印这棵树?这个树形图毫无意义,因为所有的日期看起来都一样。

Here's my code for creating the tree and plotting two ways:

这里是我创建树和绘制两种方式的代码:

library(rpart) ; library(rpart.plot) ; library(rattle)
my_tree <- rpart(a ~ ., data = dat)
rpart.plot(my_tree)
fancyRpartPlot(my_tree)

Using this data:

使用这个数据:

# define a random date/time selection function
generate_days <- function(N, st="2012/01/01", et="2012/12/31") {
  st = as.POSIXct(as.Date(st))
  et = as.POSIXct(as.Date(et))
  dt = as.numeric(difftime(et,st,unit="sec"))
  ev = runif(N, 0, dt)
  rt = st + ev
  rt
}

set.seed(1)
dat <- data.frame(
  a = runif(1:100),
  b = rpois(100, 5),
  c = sample(c("hi","med","lo"), 100, TRUE),
  d = generate_days(100)
)

3 个解决方案

#1


3  

From a practical standpoint, perhaps you'd like to just use days from the start of the data:

从实际的角度来看,也许你只想从数据开始的几天开始:

dat$d <- dat$d-as.POSIXct(as.Date("2012/01/01"))
my_tree <- rpart(a ~ ., data = dat)
rpart.plot(my_tree,branch=1,extra=101,type=1,nn=TRUE)

在rpart分类树中没有科学标记的打印日期。

This reduces the number to something manageable and meaningful (though not as meaningful as a specific date, perhaps). You may even want to round it to the nearest day or week. (I can't install GTK+ on my computer so I can't us fancyRpartPlot.)

这将数字减少到可管理的和有意义的(虽然没有具体的日期那么有意义)。你甚至可能想把它转到最近的一天或一周。(我不能在我的电脑上安装GTK+,所以我不能设计一个图形。)

#2


1  

One possible way might be to use the digits options in print to examine the tree and as.POSIXlt to convert to date:

一种可能的方法是使用打印中的数字选项检查树和as。转换为日期:

> print(my_tree,digits=100)
n= 100

node), split, n, deviance, yval
      * denotes terminal node

 1) root 100 7.0885590 0.5178471
   2) d>=1346478795.049611568450927734375 33 1.7406368 0.4136051
     4) b>=4.5 23 1.0294497 0.3654257 *
     5) b< 4.5 10 0.5350040 0.5244177 *
   3) d< 1346478795.049611568450927734375 67 4.8127122 0.5691901
     6) d< 1340921905.3460228443145751953125 55 4.1140164 0.5368048
      12) c=hi 28 1.8580913 0.4779574
        24) d< 1335890083.3241622447967529296875 18 0.7796261 0.3806526 *
        25) d>=1335890083.3241622447967529296875 10 0.6012662 0.6531062 *
      13) c=lo,med 27 2.0584052 0.5978317
        26) d>=1337494347.697483539581298828125 8 0.4785274 0.3843749 *
        27) d< 1337494347.697483539581298828125 19 1.0618892 0.6877082 *
     7) d>=1340921905.3460228443145751953125 12 0.3766236 0.7176229 *

## Get date on first node
> as.POSIXlt(1346478795.049611568450927734375,origin="1970-01-01")
[1] "2012-08-31 22:53:15 PDT"

I also check the digits option in available in rpart.plot and fancyRpartPlot:

我还检查了在rpart中可用的数字选项。情节和fancyRpartPlot:

rpart.plot(my_tree,digits=10)
fancyRpartPlot(my_tree, digits=10)

#3


0  

I don't know how important the specific chronological date is in your classification but an alternative method would be to breakdown your dates by the characteristics. In other words, create bins based on the "year" (2012,2013,2014...) as [1,0]. "Day of the Week" (Mon, Tues, Wed, Thurs, Fri...) as [1,0]. Maybe even as "Day of Month" (1,2,3,4,5...31) as [1,0]. This adds a lot more categories to be classifying by but it eliminates the issue with working with a fully formatted date.

我不知道具体的年代日期在你的分类中有多重要,但另一种方法是按特征来划分你的日期。换句话说,根据“年”(2012、2013、2014…)作为[1,0]创建垃圾箱。“一周的一天”(星期一,星期二,星期三,星期四,星期五…)as[1,0]。甚至可以将“月之日”(1、2、3、4、5……31)作为[1、0]。这增加了更多的分类类别,但是它消除了使用完全格式化日期的问题。

#1


3  

From a practical standpoint, perhaps you'd like to just use days from the start of the data:

从实际的角度来看,也许你只想从数据开始的几天开始:

dat$d <- dat$d-as.POSIXct(as.Date("2012/01/01"))
my_tree <- rpart(a ~ ., data = dat)
rpart.plot(my_tree,branch=1,extra=101,type=1,nn=TRUE)

在rpart分类树中没有科学标记的打印日期。

This reduces the number to something manageable and meaningful (though not as meaningful as a specific date, perhaps). You may even want to round it to the nearest day or week. (I can't install GTK+ on my computer so I can't us fancyRpartPlot.)

这将数字减少到可管理的和有意义的(虽然没有具体的日期那么有意义)。你甚至可能想把它转到最近的一天或一周。(我不能在我的电脑上安装GTK+,所以我不能设计一个图形。)

#2


1  

One possible way might be to use the digits options in print to examine the tree and as.POSIXlt to convert to date:

一种可能的方法是使用打印中的数字选项检查树和as。转换为日期:

> print(my_tree,digits=100)
n= 100

node), split, n, deviance, yval
      * denotes terminal node

 1) root 100 7.0885590 0.5178471
   2) d>=1346478795.049611568450927734375 33 1.7406368 0.4136051
     4) b>=4.5 23 1.0294497 0.3654257 *
     5) b< 4.5 10 0.5350040 0.5244177 *
   3) d< 1346478795.049611568450927734375 67 4.8127122 0.5691901
     6) d< 1340921905.3460228443145751953125 55 4.1140164 0.5368048
      12) c=hi 28 1.8580913 0.4779574
        24) d< 1335890083.3241622447967529296875 18 0.7796261 0.3806526 *
        25) d>=1335890083.3241622447967529296875 10 0.6012662 0.6531062 *
      13) c=lo,med 27 2.0584052 0.5978317
        26) d>=1337494347.697483539581298828125 8 0.4785274 0.3843749 *
        27) d< 1337494347.697483539581298828125 19 1.0618892 0.6877082 *
     7) d>=1340921905.3460228443145751953125 12 0.3766236 0.7176229 *

## Get date on first node
> as.POSIXlt(1346478795.049611568450927734375,origin="1970-01-01")
[1] "2012-08-31 22:53:15 PDT"

I also check the digits option in available in rpart.plot and fancyRpartPlot:

我还检查了在rpart中可用的数字选项。情节和fancyRpartPlot:

rpart.plot(my_tree,digits=10)
fancyRpartPlot(my_tree, digits=10)

#3


0  

I don't know how important the specific chronological date is in your classification but an alternative method would be to breakdown your dates by the characteristics. In other words, create bins based on the "year" (2012,2013,2014...) as [1,0]. "Day of the Week" (Mon, Tues, Wed, Thurs, Fri...) as [1,0]. Maybe even as "Day of Month" (1,2,3,4,5...31) as [1,0]. This adds a lot more categories to be classifying by but it eliminates the issue with working with a fully formatted date.

我不知道具体的年代日期在你的分类中有多重要,但另一种方法是按特征来划分你的日期。换句话说,根据“年”(2012、2013、2014…)作为[1,0]创建垃圾箱。“一周的一天”(星期一,星期二,星期三,星期四,星期五…)as[1,0]。甚至可以将“月之日”(1、2、3、4、5……31)作为[1、0]。这增加了更多的分类类别,但是它消除了使用完全格式化日期的问题。