I'm writing a small function in R as follows:
我用R写了一个小函数,如下所示:
tags.out <- as.character(tags.out)
tags.out.unique <- unique(tags.out)
z <- NROW(tags.out.unique)
for (i in 1:10) {
l <- length(grep(tags.out.unique[i], x = tags.out))
tags.count <- append(x = tags.count, values = l) }
Basically I'm looking to take each element of the unique character vector (tags.out.unique) and count it's occurrence in the vector prior to the unique
function.
基本上,我希望取唯一字符向量的每个元素(tag .out.unique)并计算它在唯一函数之前的向量中出现的次数。
This above section of code works correctly, however, when I replace for (i in 1:10)
with for (i in 1:z)
or even some number larger than 10 (18000 for example) I get the following error:
上面这段代码是正确的,但是,当我将for (I in 1:10)替换为for (I in 1:z)或甚至一些大于10(例如18000)的数字时,我得到以下错误:
Error in grep(tags.out.unique[i], x = tags.out) : invalid regular expression 'c++', reason 'Invalid use of repetition operators
在grep(tags.out错误。唯一的[i], x = tag .out):无效的正则表达式'c++',原因'重复运算符的无效使用
I would be extremely grateful if anyone were able to help me understand what's going on here.
如果有人能帮助我理解这里发生了什么,我将万分感激。
Many thanks.
多谢。
2 个解决方案
#1
2
It would appear that one of the elements of tags.out_unique
is c++
which is (as the error message plainly states) an invalid regular expression.
它看起来是标签的一个元素。out_unique是c++,它是一个无效的正则表达式。
You are currently programming inefficiently. The R-inferno is worth a read, noting especially that Growing objects is generally bad form -- it can be extremely inefficient in some cases. If you are going to have a blanket rule, then "not growing objects" is a better one than "avoid loops".
你目前的编程效率不高。《R-inferno》值得一读,它特别指出,生长的物体通常是不好的形式——在某些情况下,它的效率极低。如果你有一个毯子规则,那么“不生长的物体”比“避免循环”要好。
Given you are simply trying to count the number of times each value occurs there is no need for the loop or regex
如果您只是尝试计算每个值出现的次数,那么不需要循环或regex
counts <- table(tags.out)
# the unique values
names(counts)
should give you the results you want.
应该给你你想要的结果。
#2
6
The "+" in "c++" (which you're passing to grep as a pattern string) has a special meaning. However, you want the "+" to be interpreted literally as the character "+", so instead of
“c++”中的“+”(作为模式字符串传递给grep)有一个特殊的含义。但是,您希望“+”按字面意思解释为“+”,因此而不是
grep(pattern="c++", x="this string contains c++")
you should do
你应该做的
grep(pattern="c++", x="this string contains c++", fixed=TRUE)
If you google [regex special characters] or something similar, you'll see that "+", "*" and many others have a special meaning. In your case you want them to be interpreted literally -- see ?grep.
如果您谷歌[regex特殊字符]或类似的内容,您将看到“+”、“*”以及其他许多具有特殊含义的字符。在你的案例中,你想让他们按字面理解,看?grep。
#1
2
It would appear that one of the elements of tags.out_unique
is c++
which is (as the error message plainly states) an invalid regular expression.
它看起来是标签的一个元素。out_unique是c++,它是一个无效的正则表达式。
You are currently programming inefficiently. The R-inferno is worth a read, noting especially that Growing objects is generally bad form -- it can be extremely inefficient in some cases. If you are going to have a blanket rule, then "not growing objects" is a better one than "avoid loops".
你目前的编程效率不高。《R-inferno》值得一读,它特别指出,生长的物体通常是不好的形式——在某些情况下,它的效率极低。如果你有一个毯子规则,那么“不生长的物体”比“避免循环”要好。
Given you are simply trying to count the number of times each value occurs there is no need for the loop or regex
如果您只是尝试计算每个值出现的次数,那么不需要循环或regex
counts <- table(tags.out)
# the unique values
names(counts)
should give you the results you want.
应该给你你想要的结果。
#2
6
The "+" in "c++" (which you're passing to grep as a pattern string) has a special meaning. However, you want the "+" to be interpreted literally as the character "+", so instead of
“c++”中的“+”(作为模式字符串传递给grep)有一个特殊的含义。但是,您希望“+”按字面意思解释为“+”,因此而不是
grep(pattern="c++", x="this string contains c++")
you should do
你应该做的
grep(pattern="c++", x="this string contains c++", fixed=TRUE)
If you google [regex special characters] or something similar, you'll see that "+", "*" and many others have a special meaning. In your case you want them to be interpreted literally -- see ?grep.
如果您谷歌[regex特殊字符]或类似的内容,您将看到“+”、“*”以及其他许多具有特殊含义的字符。在你的案例中,你想让他们按字面理解,看?grep。