The purpose is to collapse/re-assign levels as part of cleaning a dataset.
目的是在清理数据集时折叠/重新分配级别。
Here is the example:
这是一个例子:
df <- data.frame(V1 <- c("cat","lion","cat","beast","cat"),
V2 <- c("nice and grumpy","angry","old,but also nice","empty","has friends"),
stringsAsFactors = F); colnames(df) <- c("V1","V2")
>df
V1 V2
1 cat nice and grumpy
2 lion angry
3 cat old,but also nice
4 beast empty
5 cat has friends
The level of interest is cat
; these are the entries:
感兴趣的程度是猫;这些是条目:
parse1 <- V1[grepl("cat",V1)]
#[1] "cat" "cat" "cat"
From there, the idea is to search for an attribute in V2
, nice
, upon which the level cat
will be renamed as nice cat
. This search locates 2 entries of interest in V2
:
从那里,我们的想法是在V2中搜索一个属性,很好,等级cat将被重命名为好猫。此搜索在V2中找到2个感兴趣的条目:
df.sub <- subset(df,V1=="cat",select=V1:V2)
parse2 <- df.sub$V2[grep("([Nn]ice)",df.sub$V2)]
#[1] "nice and grumpy" "old,but also nice"
The ideal final result would have df
transformed to:
理想的最终结果将df转换为:
V1 V2
1 nice cat nice and grumpy
2 lion king
3 nice cat old,but also nice
4 beast empty
5 cat has friends
Any thoughts how to achieve this? Many thanks.
有什么想法如何实现这一点?非常感谢。
2 个解决方案
#1
1
An ifelse
seems to be enough for this:
ifelse似乎足够了:
df$V1 <- ifelse(grepl("([Nn]ice)", df$V2),
sub('cat', 'nice cat', df$V1),
df$V1 )
Output:
> df
V1 V2
1 nice cat nice and grumpy
2 lion angry
3 nice cat old,but also nice
4 beast empty
5 cat has friends
#2
1
You could use data.table
您可以使用data.table
df <- data.frame(V1 <- c("cat","lion","cat","beast","cat"),
V2 <- c("nice and grumpy","angry","old,but also nice","empty","has friends"),
stringsAsFactors = F); colnames(df) <- c("V1","V2")
library(data.table)
DT <- data.table(df)
# All the nice animals
DT[grepl ("([Nn]ice)",V2), V3:= paste0("nice ",V1)]
# All the nice cats
DT[grepl ("([Nn]ice)",V2) & V1=="cat", V4:= paste0("nice ",V1)]
#1
1
An ifelse
seems to be enough for this:
ifelse似乎足够了:
df$V1 <- ifelse(grepl("([Nn]ice)", df$V2),
sub('cat', 'nice cat', df$V1),
df$V1 )
Output:
> df
V1 V2
1 nice cat nice and grumpy
2 lion angry
3 nice cat old,but also nice
4 beast empty
5 cat has friends
#2
1
You could use data.table
您可以使用data.table
df <- data.frame(V1 <- c("cat","lion","cat","beast","cat"),
V2 <- c("nice and grumpy","angry","old,but also nice","empty","has friends"),
stringsAsFactors = F); colnames(df) <- c("V1","V2")
library(data.table)
DT <- data.table(df)
# All the nice animals
DT[grepl ("([Nn]ice)",V2), V3:= paste0("nice ",V1)]
# All the nice cats
DT[grepl ("([Nn]ice)",V2) & V1=="cat", V4:= paste0("nice ",V1)]