从符号分隔的R向量中的字符串中提取字符

时间:2022-09-13 16:02:20

Hi I have a vector of string in R which are separated by @ , I want to extract words separated by @..Example

嗨我在R中有一个字符串向量,用@分隔,我想提取用@分隔的单词。例子

tweets =c( " @john @tom it is wonderful ", "@neel it is awesome ", "it is awesome")

I want a matrix/data.frame of names only with no text like this as output

我想要一个名称的矩阵/ data.frame只有没有这样的文本作为输出

X1=c("john","tom') 
X2 =c("neel",NA) , x3 = (NA,NA), data frame = as.data.frame(X1,X2,x3)

How can I do it?

我该怎么做?

1 个解决方案

#1


2  

A base R option would be to extract using gregexpr/regmatches and then pad NAs to the list elements with length<- and convert to a matrix

基本R选项是使用gregexpr / regmatches提取,然后将NAs填充到长度为< - 的列表元素并转换为矩阵

lst <- regmatches(tweets, gregexpr("(?<=@)\\w+", tweets, perl = TRUE))
do.call(rbind, lapply(lst, `length<-`, max(lengths(lst))))
#     [,1]   [,2] 
#[1,] "john" "tom"
#[2,] "neel" NA   
#[3,] NA     NA   

#1


2  

A base R option would be to extract using gregexpr/regmatches and then pad NAs to the list elements with length<- and convert to a matrix

基本R选项是使用gregexpr / regmatches提取,然后将NAs填充到长度为< - 的列表元素并转换为矩阵

lst <- regmatches(tweets, gregexpr("(?<=@)\\w+", tweets, perl = TRUE))
do.call(rbind, lapply(lst, `length<-`, max(lengths(lst))))
#     [,1]   [,2] 
#[1,] "john" "tom"
#[2,] "neel" NA   
#[3,] NA     NA