从长数据格式到宽数据格式的重新组合,并在R中从开始到结束的数据对进行匹配

时间:2021-12-23 04:29:03

I have a dataframe of student enrollment records (transactions) that are in long format.

我有一个学生登记记录(交易)的数据档案,格式很长。

Sample:

示例:

ID   Date     Type
123  2/1/14   Entry
123  2/5/14   Exit
123  3/1/14   Entry
123  4/4/14   Exit
234  3/2/14   Entry
234  3/20/14  Exit
234  4/3/14   Entry

And I need to convert to wide format by matching pairs of entry and exit records.

我需要通过匹配输入和退出记录来转换成宽格式。

Sample:

示例:

ID   Entry.Date   Exit.Date
123  2/1/14       2/5/14
123  3/1/14       4/4/14
234  3/2/14       3/20/14
234  4/3/14

There's nothing inherent in the data that I can use to key together the starting record with the ending record. It's simply ordered by student and then date. Some records are open ended (no matching exit record).

数据中没有任何固有的东西,我可以用它来将开始记录和结束记录放在一起。它只是由学生点的,然后是日期。有些记录是打开的(没有匹配的退出记录)。

I'm looking at some of the conversion functions such as reshape but don't know if/how I can use those to convert to wide format and limit it to the date range pair. Would you recommend one of those or should I pursue something less elegant? Thanks!

我正在研究一些转换函数,比如整形,但不知道是否/如何使用它们来转换为宽格式,并将其限制为日期范围对。你会推荐其中一个,还是我应该追求一些不那么优雅的东西?谢谢!

1 个解决方案

#1


2  

Here's one way using data.table. The idea is to group by ID, Type and add an additional column that identifies the Entry/Exit pairs. This is assuming that the data always has the right Entry/Exit pair adjacent to each other, except where either one is missing.

这里有一种使用data.table的方法。其思想是按ID进行分组,输入并添加一个额外的列来标识入口/出口对。这是假设数据总是有正确的入口/出口对彼此相邻,除非其中任何一个都没有。

require(data.table) ## >= 1.9.0
setDT(dat)          ## dat is your data. converted to data.table now.

dat[, ID2 := seq_len(.N), by=list(ID, Type)]
# dat 
#     ID    Date  Type ID2
# 1: 123  2/1/14 Entry   1
# 2: 123  2/5/14  Exit   1
# 3: 123  3/1/14 Entry   2
# 4: 123  4/4/14  Exit   2
# 5: 234  3/2/14 Entry   1
# 6: 234 3/20/14  Exit   1
# 7: 234  4/3/14 Entry   2

Now cast it to wide format using dcast. Of course you can also use it from reshape2. But data.table has it's own implementation now and is faster, so I'll use it here.

现在使用dcast将它转换成广泛的格式。当然你也可以用reshape2。但数据。table现在有自己的实现,而且更快,所以我在这里使用它。

dcast.data.table(dat, ID + ID2 ~ Type, value.var="Date")
#     ID ID2  Entry    Exit
# 1: 123   1 2/1/14  2/5/14
# 2: 123   2 3/1/14  4/4/14
# 3: 234   1 3/2/14 3/20/14
# 4: 234   2 4/3/14      NA

HTH

HTH

#1


2  

Here's one way using data.table. The idea is to group by ID, Type and add an additional column that identifies the Entry/Exit pairs. This is assuming that the data always has the right Entry/Exit pair adjacent to each other, except where either one is missing.

这里有一种使用data.table的方法。其思想是按ID进行分组,输入并添加一个额外的列来标识入口/出口对。这是假设数据总是有正确的入口/出口对彼此相邻,除非其中任何一个都没有。

require(data.table) ## >= 1.9.0
setDT(dat)          ## dat is your data. converted to data.table now.

dat[, ID2 := seq_len(.N), by=list(ID, Type)]
# dat 
#     ID    Date  Type ID2
# 1: 123  2/1/14 Entry   1
# 2: 123  2/5/14  Exit   1
# 3: 123  3/1/14 Entry   2
# 4: 123  4/4/14  Exit   2
# 5: 234  3/2/14 Entry   1
# 6: 234 3/20/14  Exit   1
# 7: 234  4/3/14 Entry   2

Now cast it to wide format using dcast. Of course you can also use it from reshape2. But data.table has it's own implementation now and is faster, so I'll use it here.

现在使用dcast将它转换成广泛的格式。当然你也可以用reshape2。但数据。table现在有自己的实现,而且更快,所以我在这里使用它。

dcast.data.table(dat, ID + ID2 ~ Type, value.var="Date")
#     ID ID2  Entry    Exit
# 1: 123   1 2/1/14  2/5/14
# 2: 123   2 3/1/14  4/4/14
# 3: 234   1 3/2/14 3/20/14
# 4: 234   2 4/3/14      NA

HTH

HTH