将函数应用于每个data.frame行并更新多个列值

时间:2021-02-06 18:37:26

I have a data.frame where each row is a tweet, and each row is an attribute ("text", "user", etc.).

我有一个data.frame,其中每一行都是一条推文,每一行都是一个属性(“text”,“user”等)。

I have written a function "processTweet()" that takes in a row of the data.frame and changes 3 columns in the tweet ("X", "Y" and "Z") and returns this modified single-row data.frame.

我编写了一个函数“processTweet()”,它接受data.frame的一行并更改tweet中的3列(“X”,“Y”和“Z”)并返回此修改后的单行data.frame 。

I'm currently trying to find out how to use something like dplyr or an apply-like function to actually reflect these modifications back in the original data.frame.

我目前正在尝试找出如何使用类似dplyr或类似应用的函数来实际反映原始data.frame中的这些修改。

I'm aware that I could split the processTweet function into 3, but this would be inefficient since I'd have to do the same logical lookup multiple times.

我知道我可以将processTweet函数拆分为3,但这样效率很低,因为我必须多次执行相同的逻辑查找。

I've tried using dplyr with rowwise, but I'm obviously doing something wrong, as the changes are not reflected in the tweets data.frame, whereas mutate seems to allow to modify one column, but not several: tweets %>% rowwise() %>% processTweet()

我尝试过使用dplyr和rowwise,但我显然做错了,因为更改没有反映在tweets data.frame中,而mutate似乎允许修改一列,但不是几个:tweets%>%rowwise ()%>%processTweet()

2 个解决方案

#1


Seem to have found an answer using plyr

似乎找到了使用plyr的答案

tweets = adply(.data = tweets, .margins = 1, .fun = processTweet)

tweets = adply(.data = tweets,.margins = 1,.fun = processTweet)

but deployer implementation is still a mystery.

但是部署实施仍然是一个谜。

The following question/answer works when result is saved into a single column, but unclear what to do when we want to return a whole data.frame in the function Applying a function to every row of a table using dplyr?

当结果保存到单个列时,以下问题/答案有效,但是当我们想要在函数中返回整个data.frame时,不清楚该怎么做?使用dplyr将函数应用于表的每一行?

#2


After some trial and a lot of error, the ddplyr way that seems to work is:

经过一些试验和很多错误,似乎有效的ddplyr方式是:

tweets = as.data.frame(tweets %>% rowwise() %>% do(processTweet(.)) %>% rbind())

tweets = as.data.frame(tweets%>%rowwise()%>%do(processTweet(。))%>%rbind())

#1


Seem to have found an answer using plyr

似乎找到了使用plyr的答案

tweets = adply(.data = tweets, .margins = 1, .fun = processTweet)

tweets = adply(.data = tweets,.margins = 1,.fun = processTweet)

but deployer implementation is still a mystery.

但是部署实施仍然是一个谜。

The following question/answer works when result is saved into a single column, but unclear what to do when we want to return a whole data.frame in the function Applying a function to every row of a table using dplyr?

当结果保存到单个列时,以下问题/答案有效,但是当我们想要在函数中返回整个data.frame时,不清楚该怎么做?使用dplyr将函数应用于表的每一行?

#2


After some trial and a lot of error, the ddplyr way that seems to work is:

经过一些试验和很多错误,似乎有效的ddplyr方式是:

tweets = as.data.frame(tweets %>% rowwise() %>% do(processTweet(.)) %>% rbind())

tweets = as.data.frame(tweets%>%rowwise()%>%do(processTweet(。))%>%rbind())