Let's say I have the following dataframe:
假设我有以下数据帧:
personid date measurement
1 x 23
1 x 32
2 y 21
3 x 23
3 z 23
3 y 23
I want to sort this dataframe by the measurement column, and then create a new column that is a sequence along the sorted measurement column, like so:
我想通过测量列对此数据帧进行排序,然后创建一个新列,该列是排序测量列中的序列,如下所示:
personid date measurement id
1 x 23 2
1 x 32 3
2 y 21 1
3 x 23 2
3 z 23 2
3 y 23 2
My first instinct was to do something like:
我的第一直觉是做以下事情:
unique_measurements <- data.frame(unique(sort(df$measurement)))
unique_dates$counter <- 1:nrow(unique_dates)
Now I basically have a data-frame that represents a mapping from a given measurement to the correct counter. I recognize this is the wrong way of doing this, but (1) how would I actually use this mapping to achieve my goals; (2) what is the right way of doing this?
现在我基本上有一个数据框,表示从给定测量到正确计数器的映射。我认识到这是做错的方法,但是(1)我将如何实际使用这种映射来实现我的目标; (2)这样做的正确方法是什么?
2 个解决方案
#1
2
Using factor
as an intermediate:
使用因子作为中间体:
df$id = as.integer(factor(df$measurement))
If you want to use your method, just use merge
(though it might mess up the row order, use dplyr::left_join
or data.table::merge
instead to preserve row order in the original).
如果你想使用你的方法,只需使用合并(虽然它可能搞乱行顺序,使用dplyr :: left_join或data.table :: merge来保留原始的行顺序)。
unique_measurements <- data.frame(measurement = sort(unique(df$measurement)))
unique_dates$id <- 1:nrow(unique_dates)
merge(df, unique_dates)
#2
2
Here's a simpler way to do this:
这是一种更简单的方法:
df$id <- match(df$measurement, sort(unique(df$measurement)))
# personid date measurement id
# 1 1 x 23 2
# 2 1 x 32 3
# 3 2 y 21 1
# 4 3 x 23 2
# 5 3 z 23 2
# 6 3 y 23 2
#1
2
Using factor
as an intermediate:
使用因子作为中间体:
df$id = as.integer(factor(df$measurement))
If you want to use your method, just use merge
(though it might mess up the row order, use dplyr::left_join
or data.table::merge
instead to preserve row order in the original).
如果你想使用你的方法,只需使用合并(虽然它可能搞乱行顺序,使用dplyr :: left_join或data.table :: merge来保留原始的行顺序)。
unique_measurements <- data.frame(measurement = sort(unique(df$measurement)))
unique_dates$id <- 1:nrow(unique_dates)
merge(df, unique_dates)
#2
2
Here's a simpler way to do this:
这是一种更简单的方法:
df$id <- match(df$measurement, sort(unique(df$measurement)))
# personid date measurement id
# 1 1 x 23 2
# 2 1 x 32 3
# 3 2 y 21 1
# 4 3 x 23 2
# 5 3 z 23 2
# 6 3 y 23 2