如何用R中的csv数据列创建帧数据结构?

时间:2021-10-12 09:10:50

Below are the first five rows of the imported data in R:

以下是R中导入数据的前五行:

data[1:5,]

    user event_date day_of_week
1 00002781A2ADA816CDB0D138146BD63323CCDAB2 2010-09-04    Saturday
2 00002D2354C7080C0868CB0E18C46157CA9F0FD4 2010-09-04    Saturday
3 00002D2354C7080C0868CB0E18C46157CA9F0FD4 2010-09-07     Tuesday
4 00002D2354C7080C0868CB0E18C46157CA9F0FD4 2010-09-08   Wednesday
5 00002D2354C7080C0868CB0E18C46157CA9F0FD4 2010-09-17      Friday
  distinct_events_a_count total_events_a_count
1                             2                          2
2                             2                          2
3                             1                          3
4                             1                          1
5                             1                          1
  events_a_duration distinct_events_b_count total_events_b_count
1                     615                       1                    1
2                      77                       1                    1
3                     201                       1                    1
4                      44                       1                    1
5                       3                       1                    1
  events_b_duration
1                      47
2                      43
3                     117
4                      74
5                      18

The problem is that the columns 6 and 9 are read as factors and not numerics therefore I can't perform math operations. In order to convert the imported data to appropriate format I tried to create the structure dataset the following way:

问题是列6和9被读作因子而不是数字,因此我无法执行数学运算。为了将导入的数据转换为适当的格式,我尝试按以下方式创建结构数据集:

dataset<-data.frame(events_a_duration=as.numeric(c(data[,6])), events_b_duration=as.numeric(c(data[,9])))

but checking the values I noticed that the frame structure doesn't contain the appropriate values:

但检查值我注意到框架结构不包含适当的值:

 dataset[1,]


events_a_duration events_b_duration
1                   10217                    6184

The values should be 615 and 47.

值应为615和47。

So what I don't know is how to create the frame data structure that consists of imported data columns and would be very thankful if anyone could show the way to create the appropriate data structure.

所以我不知道的是如何创建由导入数据列组成的帧数据结构,如果有人能够展示创建适当数据结构的方法,那将非常感激。

2 个解决方案

#1


4  

Your problem is that you are converting factors to integers by using the numbers of classes instead of the corresponding values. You can check that classes are numbered in ascending order of the values:

您的问题是您通过使用类的数量而不是相应的值将因子转换为整数。您可以检查类是否按值的升序编号:

> as.numeric(factor(c(615,47,42)))
[1] 3 2 1
> as.numeric(factor(c(615,42,47)))
[1] 3 1 2
> as.numeric(factor(c(615,42,47,37)))
[1] 4 2 3 1
> as.numeric(factor(c(615,42,37,47)))
[1] 4 2 1 3

Use as.numeric(as.character(MyFactor)). See below for instance:

使用as.numeric(as.character(MyFactor))。见例如:

> as.numeric(as.character(factor(c(615,42,37,47))))
[1] 615  42  37  47

#2


1  

data <- read.csv ("data.csv", stringsAsFactors=FALSE)

#1


4  

Your problem is that you are converting factors to integers by using the numbers of classes instead of the corresponding values. You can check that classes are numbered in ascending order of the values:

您的问题是您通过使用类的数量而不是相应的值将因子转换为整数。您可以检查类是否按值的升序编号:

> as.numeric(factor(c(615,47,42)))
[1] 3 2 1
> as.numeric(factor(c(615,42,47)))
[1] 3 1 2
> as.numeric(factor(c(615,42,47,37)))
[1] 4 2 3 1
> as.numeric(factor(c(615,42,37,47)))
[1] 4 2 1 3

Use as.numeric(as.character(MyFactor)). See below for instance:

使用as.numeric(as.character(MyFactor))。见例如:

> as.numeric(as.character(factor(c(615,42,37,47))))
[1] 615  42  37  47

#2


1  

data <- read.csv ("data.csv", stringsAsFactors=FALSE)