从R中的函数创建数据框中的新列

时间:2021-12-03 22:58:15

I have a set of dataframes that look like this (they have the same columns, not the same amount of rows):

我有一组看起来像这样的数据帧(它们具有相同的列,而不是相同的行数):

df1 <- data.frame(v = c("banana", "apple", "orange", "grape", "kiwi fruit", "pear"), x = rnorm(6, 0.06, 0.01))
df2 <- data.frame(v = c("table", "chair", "couch", "dresser", "night stand"), x = rnorm(5, 0.06, 0.01))
df3 <- data.frame(v = c("white", "blue", "pink", "bright red", "orange", "dark green", "black"), x = rnorm(7, 0.06, 0.01))

I have a range of operations (counting things about the words in df1$v, df2$v, df3$v) that I would like to perform on these dataframes. One solution I found is to put the datframes in a list, and then use lapply to apply a function over all the dataframes in the list:

我想对这些数据帧执行一系列操作(计算df1 $ v,df2 $ v,df3 $ v中的单词)。我找到的一个解决方案是将数据帧放在一个列表中,然后使用lapply在列表中的所有数据帧上应用一个函数:

ls <- list(df1, df2, df3)

func1 <- function(dat){
dat$complex <- sapply(strsplit(as.character(dat$v), " "), length)
}

ls_func1 <- lapply(ls, FUN = func1)

ls_func1
[[1]]
[1] 1 1 1 1 2 1
[[2]]
[1] 1 1 1 1 2
[[3]]
[1] 1 1 1 2 1 2 1

At least this gets me the counts of the number of words in v, which I can then combine again into a dataframe or whatever.

至少这会得到v中单词数量的计数,然后我可以将它再次组合成数据帧或其他任何数据。

The problem is, it does not seem to work for each function. This, for instance, works fine when done for a single dataframe:

问题是,它似乎不适用于每个功能。例如,对于单个数据帧,这可以正常工作:

 for(i in 1:length(df1$v)){
 string <- strsplit(as.character(df1$v[i]), "")
 counter <- 0
     for(j in 1:length(string[[1]])){
         if(grepl("a|b|c|d|e", string[[1]][j])){
         counter <- counter + 1
         }
     }
 df1$length[i] <- counter
 }

df1
       v          x     length
1     banana 0.05233752      4
2      apple 0.08564292      2
3     orange 0.04679124      2
4      grape 0.06655950      2
5 kiwi fruit 0.05684803      0
6       pear 0.07654617      2

But when transform it into a function, it does not work:

但是当它转换为函数时,它不起作用:

func2 <- function(dat){
for(i in 1:length(dat$v)){
string <- strsplit(as.character(dat$v[i]), "")
counter <- 0
    for(j in 1:length(string[[1]])){
        if(grepl("a|b|c|d|e", string[[1]][j])){
        counter <- counter + 1
        }
    }
dat$length[i] <- counter
}
}

ls_func2 <- lapply(ls, FUN = func2)

ls_func2
[[1]]
NULL
[[2]]
NULL
[[3]]
NULL

What am I doing wrong here? And is there any way to create new columns in my existing dataframes using these functions and lapply? In other words, to create the folowing by first applying the first function, and then applying the second function:

我在这做错了什么?有没有办法在我现有的数据框架中使用这些函数和lapply创建新列?换句话说,通过首先应用第一个函数,然后应用第二个函数来创建以下内容:

ls
[[1]]
           v          x complex length
1     banana 0.05233752       1      4
2      apple 0.08564292       1      2
3     orange 0.04679124       1      2
4      grape 0.06655950       1      2
5 kiwi fruit 0.05684803       2      0
6       pear 0.07654617       1      2

[[2]]
           v          x complex length
1      table 0.65790811       1      2
....
[[3]]
....

etc.?

等等。?

2 个解决方案

#1


1  

Is this what you're after? Add return(dat) before the closing brace in each function.

这就是你要追求的吗?在每个函数中的右大括号前添加return(dat)。

df1 <- data.frame(v = c("banana", "apple", "orange", "grape", "kiwi fruit", "pear"), x = rnorm(6, 0.06, 0.01))
df2 <- data.frame(v = c("table", "chair", "couch", "dresser", "night stand"), x = rnorm(5, 0.06, 0.01))
df3 <- data.frame(v = c("white", "blue", "pink", "bright red", "orange", "dark green", "black"), x = rnorm(7, 0.06, 0.01))
ls <- list(df1, df2, df3)


func1 <- function(dat){
dat$complex <- sapply(strsplit(as.character(dat$v), " "), length)
return(dat)
}

ls_func1 <- lapply(ls, FUN = func1)
ls_func1



func2 <- function(dat){
for(i in 1:length(dat$v)){
string <- strsplit(as.character(dat$v[i]), "")
counter <- 0
    for(j in 1:length(string[[1]])){
        if(grepl("a|b|c|d|e", string[[1]][j])){
        counter <- counter + 1
        }
    }
dat$length[i] <- counter
}
return(dat)
}

ls_func2 <- lapply(ls_func1, FUN = func2)
ls_func2

#2


1  

I added show (dat):

我添加了show(dat):

 func2 <- function(dat){
       for(i in 1:length(dat$v)){
             string <- strsplit(as.character(dat$v[i]), "")
             counter <- 0
             for(j in 1:length(string[[1]])){
                 if(grepl("a|b|c|d|e", string[[1]][j])){
                     counter <- counter + 1
                 }
             }
             dat$length[i] <- counter

         }
show(dat)
     }


    > ls_func2 <- lapply(ls, FUN = func2)
           v          x length
1     banana 0.05708859      4
2      apple 0.06938091      2
3     orange 0.04796599      2
4      grape 0.05912616      2
5 kiwi fruit 0.06250885      0
6       pear 0.05291484      2
            v          x length
1       table 0.06554054      3
2       chair 0.07783138      2
3       couch 0.06127833      2
4     dresser 0.05443105      3
5 night stand 0.06257048      2
           v          x length
1      white 0.06287645      1
2       blue 0.07196960      2
3       pink 0.05659455      0
4 bright red 0.05996639      3
5     orange 0.05826371      2
6 dark green 0.04892694      4
7      black 0.06830055      3

#1


1  

Is this what you're after? Add return(dat) before the closing brace in each function.

这就是你要追求的吗?在每个函数中的右大括号前添加return(dat)。

df1 <- data.frame(v = c("banana", "apple", "orange", "grape", "kiwi fruit", "pear"), x = rnorm(6, 0.06, 0.01))
df2 <- data.frame(v = c("table", "chair", "couch", "dresser", "night stand"), x = rnorm(5, 0.06, 0.01))
df3 <- data.frame(v = c("white", "blue", "pink", "bright red", "orange", "dark green", "black"), x = rnorm(7, 0.06, 0.01))
ls <- list(df1, df2, df3)


func1 <- function(dat){
dat$complex <- sapply(strsplit(as.character(dat$v), " "), length)
return(dat)
}

ls_func1 <- lapply(ls, FUN = func1)
ls_func1



func2 <- function(dat){
for(i in 1:length(dat$v)){
string <- strsplit(as.character(dat$v[i]), "")
counter <- 0
    for(j in 1:length(string[[1]])){
        if(grepl("a|b|c|d|e", string[[1]][j])){
        counter <- counter + 1
        }
    }
dat$length[i] <- counter
}
return(dat)
}

ls_func2 <- lapply(ls_func1, FUN = func2)
ls_func2

#2


1  

I added show (dat):

我添加了show(dat):

 func2 <- function(dat){
       for(i in 1:length(dat$v)){
             string <- strsplit(as.character(dat$v[i]), "")
             counter <- 0
             for(j in 1:length(string[[1]])){
                 if(grepl("a|b|c|d|e", string[[1]][j])){
                     counter <- counter + 1
                 }
             }
             dat$length[i] <- counter

         }
show(dat)
     }


    > ls_func2 <- lapply(ls, FUN = func2)
           v          x length
1     banana 0.05708859      4
2      apple 0.06938091      2
3     orange 0.04796599      2
4      grape 0.05912616      2
5 kiwi fruit 0.06250885      0
6       pear 0.05291484      2
            v          x length
1       table 0.06554054      3
2       chair 0.07783138      2
3       couch 0.06127833      2
4     dresser 0.05443105      3
5 night stand 0.06257048      2
           v          x length
1      white 0.06287645      1
2       blue 0.07196960      2
3       pink 0.05659455      0
4 bright red 0.05996639      3
5     orange 0.05826371      2
6 dark green 0.04892694      4
7      black 0.06830055      3