I have a dataframe of student enrollment records (transactions) that are in long format.
我有一个学生登记记录(交易)的数据档案,格式很长。
Sample:
示例:
ID Date Type
123 2/1/14 Entry
123 2/5/14 Exit
123 3/1/14 Entry
123 4/4/14 Exit
234 3/2/14 Entry
234 3/20/14 Exit
234 4/3/14 Entry
And I need to convert to wide format by matching pairs of entry and exit records.
我需要通过匹配输入和退出记录来转换成宽格式。
Sample:
示例:
ID Entry.Date Exit.Date
123 2/1/14 2/5/14
123 3/1/14 4/4/14
234 3/2/14 3/20/14
234 4/3/14
There's nothing inherent in the data that I can use to key together the starting record with the ending record. It's simply ordered by student and then date. Some records are open ended (no matching exit record).
数据中没有任何固有的东西,我可以用它来将开始记录和结束记录放在一起。它只是由学生点的,然后是日期。有些记录是打开的(没有匹配的退出记录)。
I'm looking at some of the conversion functions such as reshape but don't know if/how I can use those to convert to wide format and limit it to the date range pair. Would you recommend one of those or should I pursue something less elegant? Thanks!
我正在研究一些转换函数,比如整形,但不知道是否/如何使用它们来转换为宽格式,并将其限制为日期范围对。你会推荐其中一个,还是我应该追求一些不那么优雅的东西?谢谢!
1 个解决方案
#1
2
Here's one way using data.table
. The idea is to group by ID, Type
and add an additional column that identifies the Entry/Exit pairs. This is assuming that the data always has the right Entry/Exit pair adjacent to each other, except where either one is missing.
这里有一种使用data.table的方法。其思想是按ID进行分组,输入并添加一个额外的列来标识入口/出口对。这是假设数据总是有正确的入口/出口对彼此相邻,除非其中任何一个都没有。
require(data.table) ## >= 1.9.0
setDT(dat) ## dat is your data. converted to data.table now.
dat[, ID2 := seq_len(.N), by=list(ID, Type)]
# dat
# ID Date Type ID2
# 1: 123 2/1/14 Entry 1
# 2: 123 2/5/14 Exit 1
# 3: 123 3/1/14 Entry 2
# 4: 123 4/4/14 Exit 2
# 5: 234 3/2/14 Entry 1
# 6: 234 3/20/14 Exit 1
# 7: 234 4/3/14 Entry 2
Now cast it to wide format using dcast
. Of course you can also use it from reshape2
. But data.table
has it's own implementation now and is faster, so I'll use it here.
现在使用dcast将它转换成广泛的格式。当然你也可以用reshape2。但数据。table现在有自己的实现,而且更快,所以我在这里使用它。
dcast.data.table(dat, ID + ID2 ~ Type, value.var="Date")
# ID ID2 Entry Exit
# 1: 123 1 2/1/14 2/5/14
# 2: 123 2 3/1/14 4/4/14
# 3: 234 1 3/2/14 3/20/14
# 4: 234 2 4/3/14 NA
HTH
HTH
#1
2
Here's one way using data.table
. The idea is to group by ID, Type
and add an additional column that identifies the Entry/Exit pairs. This is assuming that the data always has the right Entry/Exit pair adjacent to each other, except where either one is missing.
这里有一种使用data.table的方法。其思想是按ID进行分组,输入并添加一个额外的列来标识入口/出口对。这是假设数据总是有正确的入口/出口对彼此相邻,除非其中任何一个都没有。
require(data.table) ## >= 1.9.0
setDT(dat) ## dat is your data. converted to data.table now.
dat[, ID2 := seq_len(.N), by=list(ID, Type)]
# dat
# ID Date Type ID2
# 1: 123 2/1/14 Entry 1
# 2: 123 2/5/14 Exit 1
# 3: 123 3/1/14 Entry 2
# 4: 123 4/4/14 Exit 2
# 5: 234 3/2/14 Entry 1
# 6: 234 3/20/14 Exit 1
# 7: 234 4/3/14 Entry 2
Now cast it to wide format using dcast
. Of course you can also use it from reshape2
. But data.table
has it's own implementation now and is faster, so I'll use it here.
现在使用dcast将它转换成广泛的格式。当然你也可以用reshape2。但数据。table现在有自己的实现,而且更快,所以我在这里使用它。
dcast.data.table(dat, ID + ID2 ~ Type, value.var="Date")
# ID ID2 Entry Exit
# 1: 123 1 2/1/14 2/5/14
# 2: 123 2 3/1/14 4/4/14
# 3: 234 1 3/2/14 3/20/14
# 4: 234 2 4/3/14 NA
HTH
HTH