使用for循环来部分匹配字符字符串中因子的级别?(复制)

时间:2022-06-22 22:53:24

This question already has an answer here:

这个问题已经有了答案:

I have a dataset of plots and subplots in which I measured tree species presence. I am trying to run through the data and identify which species were present at each of the plot-subplot combinations.

我有一个图和子图的数据集,我在其中测量了树种的存在。我试着浏览这些数据,找出每个情节子情节组合中出现的物种。

I've succeeded in creating a dataframe that identifies which species are present in each plot-subplot combination, but am now trying to append columns for each species with indicator variables (values of 1) that show their presence.

我成功地创建了一个dataframe,用于识别每个plot-subplot组合中有哪些种类,但是现在正在尝试为每个带有指示器变量(值为1)的种类添加列,以显示它们的存在。

The initial code/data.frame looks as such:

初始代码/数据。框架如下:

f = aggregate(Species ~ Subplot + Plot, data = live.trees, 
              FUN=function(x) paste(unique(x), collapse=', '))

a=rep(0, 35)
b=cbind(a,a,a,a,a,a,a,a,a,a,a,a)
colnames(b) = levels(live.trees$Species)
freq = as.data.frame(cbind(f, b))

Species = as.factor(live.trees$Species)

#Only showing 2 of 7 plots here...

freq[1:10,]
   Subplot Plot                    Species AA AM AO BC BG BP EA RA RM SH XG XM
1        1    1                         RA  0  0  0  0  0  0  0  0  0  0  0  0
2        2    1 EA, BP, XM, BC, AA, XG, RA  0  0  0  0  0  0  0  0  0  0  0  0
3        3    1         EA, XG, AA, AM, RA  0  0  0  0  0  0  0  0  0  0  0  0
4        4    1             AA, XM, RA, EA  0  0  0  0  0  0  0  0  0  0  0  0
5        5    1             EA, BC, RA, AA  0  0  0  0  0  0  0  0  0  0  0  0
6        1    2             XM, BC, RA, AM  0  0  0  0  0  0  0  0  0  0  0  0
7        2    2                     RM, RA  0  0  0  0  0  0  0  0  0  0  0  0
8        3    2                 XM, BC, RA  0  0  0  0  0  0  0  0  0  0  0  0
9        4    2                     RA, XM  0  0  0  0  0  0  0  0  0  0  0  0
10       5    2     XM, XG, AA, BC, BG, RA  0  0  0  0  0  0  0  0  0  0  0  0

I am now trying to write a for loop that runs through the table and pastes a "1" in each of the individual species columns (AA, AM, AO, etc.) if the two character string for species is matched under the freq$Species column. The for loop code I have crafted so far is:

如果在freq$ species列下匹配两个字符字符串,那么我现在尝试编写一个for循环,该循环遍历表,并在每个物种列(AA、am、AO等)中粘贴一个“1”。到目前为止,我编写的for循环代码是:

#Manually going through and assigning a 1 value for each species 
#using a partial string match with grepl()

    for(k in 1:nrow(freq))
  if(grepl("AA", freq$Species[[k]]) == "TRUE")
    (freq$AA[k] = 1) else
    if(grepl("AM", freq$Species[[k]]) == "TRUE")
      (freq$AM[k] = 1) else
        if(grepl("AO", freq$Species[[k]]) == "TRUE")
          (freq$AO[k] = 1) else
            if(grepl("BC", freq$Species[[k]]) == "TRUE")
              (freq$BC[k] = 1)
                  #.... etc. (cutting off here to save space)

The code works to a degree, but is overwriting each previous Species column, and is also quite clunky.

该代码在一定程度上是有效的,但是它覆盖了前面的每一个物种列,而且还非常笨拙。

Subplot Plot                    Species AA AM AO BC BG BP EA RA RM SH XG XM
1        1    1                         RA  0  0  0  0  0  0  0  0  0  0  0  0
2        2    1 EA, BP, XM, BC, AA, XG, RA  1  0  0  0  0  0  0  0  0  0  0  0
3        3    1         EA, XG, AA, AM, RA  1  0  0  0  0  0  0  0  0  0  0  0
4        4    1             AA, XM, RA, EA  1  0  0  0  0  0  0  0  0  0  0  0
5        5    1             EA, BC, RA, AA  1  0  0  0  0  0  0  0  0  0  0  0
6        1    2             XM, BC, RA, AM  0  1  0  0  0  0  0  0  0  0  0  0
7        2    2                     RM, RA  0  0  0  0  0  0  0  0  0  0  0  0
8        3    2                 XM, BC, RA  0  0  0  1  0  0  0  0  0  0  0  0
9        4    2                     RA, XM  0  0  0  0  0  0  0  0  0  0  0  0
10       5    2     XM, XG, AA, BC, BG, RA  1  0  0  0  0  0  0  0  0  0  0  0

How do I:

我如何:

1) Get the for loop to stop overwriting species presence indicators in prior columns?

1)获取for循环以停止在先前的列中覆盖物种存在指示器?

2) Write the for loop in a more elegant manner? I thought I could create a factor variable called "Species" and loop over the elements of that (within the first for loop)... however my novice-rated experience started showing.

2)以更优雅的方式编写for循环?我想我可以创建一个名为“物种”的因子变量,并对其元素进行循环(在第一个for循环中)……然而,我的无师自通的经历开始显现出来。

Any help or suggestions would be hugely appreciated!

非常感谢您的帮助和建议!

I know that this is not a reproducible example, but am looking for general suggestions or tips that might help point me in the right direction. I will try to find a default dataset within R that I can coerce to replicate my troubles in the mean time.

我知道这不是一个可复制的例子,但我正在寻找可能帮助我找到正确方向的一般建议或提示。我将尝试在R中找到一个默认的数据集,我可以强迫它在同时复制我的问题。

Thank you in advance!

提前谢谢你!

Note: The Species column was created as a string and is thus of class character.

注意:物种列被创建为字符串,因此是类字符。

1 个解决方案

#1


3  

Try

试一试

library(qdapTools)
res <- cbind(freq[1:3], mtabulate(strsplit(freq$Species, ', ')))
rowsum(res[,4:ncol(res)], group= res$Plot)
#  AA AM BC BG BP EA RA RM XG XM
#1  4  1  2  0  1  4  5  0  2  2
#2  1  1  3  1  0  0  5  1  1  4

Or

aggregate(.~Plot, res[c(2,4:ncol(res))], FUN=sum)
#   Plot AA AM BC BG BP EA RA RM XG XM
#1    1  4  1  2  0  1  4  5  0  2  2
#2    2  1  1  3  1  0  0  5  1  1  4

Or

library(dplyr)
res %>%
   group_by(Plot) %>%
   summarise_each(funs(sum), 4:ncol(res))

Or

library(data.table)
setDT(res)[, lapply(.SD, sum), by =Plot, .SDcols=4:ncol(res)]

data

freq <- structure(list(Subplot = c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 
5L), Plot = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), Species = c("RA", 
"EA, BP, XM, BC, AA, XG, RA", "EA, XG, AA, AM, RA", "AA, XM, RA, EA", 
"EA, BC, RA, AA", "XM, BC, RA, AM", "RM, RA", "XM, BC, RA", "RA, XM", 
"XM, XG, AA, BC, BG, RA"), AA = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L), AM = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), 
    AO = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), BC = c(0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), BG = c(0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L), BP = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L), EA = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L
    ), RA = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), RM = c(0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), SH = c(0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L), XG = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L), XM = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L
    )), .Names = c("Subplot", "Plot", "Species", "AA", "AM", 
"AO", "BC", "BG", "BP", "EA", "RA", "RM", "SH", "XG", "XM"), 
 class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10"))

#1


3  

Try

试一试

library(qdapTools)
res <- cbind(freq[1:3], mtabulate(strsplit(freq$Species, ', ')))
rowsum(res[,4:ncol(res)], group= res$Plot)
#  AA AM BC BG BP EA RA RM XG XM
#1  4  1  2  0  1  4  5  0  2  2
#2  1  1  3  1  0  0  5  1  1  4

Or

aggregate(.~Plot, res[c(2,4:ncol(res))], FUN=sum)
#   Plot AA AM BC BG BP EA RA RM XG XM
#1    1  4  1  2  0  1  4  5  0  2  2
#2    2  1  1  3  1  0  0  5  1  1  4

Or

library(dplyr)
res %>%
   group_by(Plot) %>%
   summarise_each(funs(sum), 4:ncol(res))

Or

library(data.table)
setDT(res)[, lapply(.SD, sum), by =Plot, .SDcols=4:ncol(res)]

data

freq <- structure(list(Subplot = c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 
5L), Plot = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), Species = c("RA", 
"EA, BP, XM, BC, AA, XG, RA", "EA, XG, AA, AM, RA", "AA, XM, RA, EA", 
"EA, BC, RA, AA", "XM, BC, RA, AM", "RM, RA", "XM, BC, RA", "RA, XM", 
"XM, XG, AA, BC, BG, RA"), AA = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L), AM = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), 
    AO = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), BC = c(0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), BG = c(0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L), BP = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L), EA = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L
    ), RA = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), RM = c(0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), SH = c(0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L), XG = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L), XM = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L
    )), .Names = c("Subplot", "Plot", "Species", "AA", "AM", 
"AO", "BC", "BG", "BP", "EA", "RA", "RM", "SH", "XG", "XM"), 
 class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10"))