I have this string vector (for example):
我有这个字符串向量(例如)
str <- c("this is a string current trey",
"feather rtttt",
"tusla",
"laq")
To count the number of words in this vector I used this (as given here Count the number of words in a string in R?, which is a possible duplicate but with another issue)
为了计算这个向量中单词的数量,我用了这个(如这里所给的,在R中计算一个字符串中单词的数量?,这是一种可能的重复,但另一个问题)
No_words <- sapply(gregexpr("\\W+", str), length) + 1
but it returns
但它返回
6 2 2 2
String has only 1 element in last two places (i.e. "tusla"
and "laq"
)
字符串在最后两个位置只有1个元素(即:“tusla”和“hcho”)
so it should return
所以它应该返回
6 2 1 1
How do I get around this problem?
我如何解决这个问题?
2 个解决方案
#1
12
You can try
你可以试着
sapply(gregexpr("\\S+", x), length)
## [1] 6 2 1 1
Or as suggested in comments you can try
或者像评论中建议的那样,你可以试试
sapply(strsplit(x, "\\s+"), length)
## [1] 6 2 1 1
#2
7
Use the stringi
package and stri_count
:
使用stringi包和stri_count:
require(stringi)
str <- c(
"this is a string current trey",
"nospaces",
"multiple spaces",
" leadingspaces",
"trailingspaces ",
" leading and trailing ",
"just one space each")
> stri_count(str,regex="\\S+")
[1] 6 1 2 1 1 3 4
#1
12
You can try
你可以试着
sapply(gregexpr("\\S+", x), length)
## [1] 6 2 1 1
Or as suggested in comments you can try
或者像评论中建议的那样,你可以试试
sapply(strsplit(x, "\\s+"), length)
## [1] 6 2 1 1
#2
7
Use the stringi
package and stri_count
:
使用stringi包和stri_count:
require(stringi)
str <- c(
"this is a string current trey",
"nospaces",
"multiple spaces",
" leadingspaces",
"trailingspaces ",
" leading and trailing ",
"just one space each")
> stri_count(str,regex="\\S+")
[1] 6 1 2 1 1 3 4