I have a dataset x with a bunch of text (columns: title, location, contents)
in about 3000 rows.
我有一个数据集x,其中有大量文本(列:标题,位置,内容),大约有3000行。
EDIT: an example.
编辑:一个例子。
title | location | contents ... DUBAI .... ... DUBAI .... ... KHARTOUM .... ... KHARTOUMSUDAN .... ... JAKARTA ....
标题|位置|内容......迪拜...... ......迪拜...... ...... KHARTOUM ...... ...... KHARTOUMSUDAN .... ... JAKARTA ....
链接到图片示例
I have a list of locations. locations <- c("DUBAI", "KHARTOUM", "JAKARTA", "Paris")
.
我有一个地点列表。地点< - c(“DUBAI”,“KHARTOUM”,“JAKARTA”,“巴黎”)。
Now I want to make a loop that'll start with Dubai and see in how many columns it occurs and then create a variable with the count for that. and then i want to move onto the next word in the locations list (Khartoum)
and do the same thing.
现在我想创建一个循环,从迪拜开始,看看它出现了多少列,然后创建一个带有计数的变量。然后我想转到位置列表(喀土穆)中的下一个单词并做同样的事情。
So in this case I would expect to see: Dubai = 2, Khartoum = 2, Jakarta = 1.
所以在这种情况下,我希望看到:迪拜= 2,喀土穆= 2,雅加达= 1。
I have this so far, but I don't know how to generalize it and make it into a loop:
到目前为止我有这个,但我不知道如何概括它并使它成为一个循环:
numberDUBAI <- nrow(dplyr::filter(x, grepl(' DUBAI ', location)))
and then I repeat it for each word
然后我为每个单词重复一遍
numberLOCATIONS <- c(numberDUBAI, numberKHARTOUM, numberJAKARTA, numberPARIS)
but this feels very inefficient, help? :D
但这感觉非常低效,有帮助吗? :d
1 个解决方案
#1
4
We can do this with tidyverse
using map
我们可以用tidyverse使用map来做到这一点
library(tidyverse)
map(locations, ~
x %>%
summarise(n = sum(str_detect(location, .x, ignore_case = TRUE)))
)
NOTE: Assuming that 'x' is the dataset, 'location' is the column and from the OP's post 'locations' is a vector
of patterns
注意:假设'x'是数据集,'location'是列,从OP的帖子'locations'是模式向量
#1
4
We can do this with tidyverse
using map
我们可以用tidyverse使用map来做到这一点
library(tidyverse)
map(locations, ~
x %>%
summarise(n = sum(str_detect(location, .x, ignore_case = TRUE)))
)
NOTE: Assuming that 'x' is the dataset, 'location' is the column and from the OP's post 'locations' is a vector
of patterns
注意:假设'x'是数据集,'location'是列,从OP的帖子'locations'是模式向量