Let's imagine you have a string:
假设你有一根弦
strLine <- "The transactions (on your account) were as follows: 0 3,000 (500) 0 2.25 (1,200)"
Is there a function that strips out the numbers into an array/vector producing the following required solution:
是否有一个函数将这些数字剥离成一个数组/向量,从而产生以下所需的解决方案:
result <- c(0, 3000, -500, 0, 2.25, -1200)?
i.e.
即。
result[3] = -500
Notice, the numbers are presented in accounting form so negative numbers appear between (). Also, you can assume that only numbers appear to the right of the first occurance of a number. I am not that good with regexp so would appreciate it if you could help if this would be required. Also, I don't want to assume the string is always the same so I am looking to strip out all words (and any special characters) before the location of the first number.
注意,这些数字是以会计的形式显示的,所以在()之间出现了负数。同样,你可以假设数字在第一次出现的时候出现在数字的右边。我对regexp不是很在行,所以如果您能帮忙,如果需要的话,我将非常感激。另外,我不想假设字符串总是相同的,所以我希望在第一个数字的位置之前去掉所有的单词(和任何特殊字符)。
3 个解决方案
#1
26
library(stringr)
x <- str_extract_all(strLine,"\\(?[0-9,.]+\\)?")[[1]]
> x
[1] "0" "3,000" "(500)" "0" "2.25" "(1,200)"
Change the parens to negatives:
把这部分改为否定句:
x <- gsub("\\((.+)\\)","-\\1",x)
x
[1] "0" "3,000" "-500" "0" "2.25" "-1,200"
And then as.numeric()
or taRifx::destring
to finish up (the next version of destring
will support negatives by default so the keep
option won't be necessary):
然后as.numeric()或taRifx: destring:结束(下一个版本的destring默认情况下将支持底片,因此无需保留选项):
library(taRifx)
destring( x, keep="0-9.-")
[1] 0 3000 -500 0 2.25 -1200
OR:
或者:
as.numeric(gsub(",","",x))
[1] 0 3000 -500 0 2.25 -1200
#2
18
Here's the base R way, for the sake of completeness...
这是基本的R方法,为了完整……
x <- unlist(regmatches(strLine, gregexpr('\\(?[0-9,.]+', strLine)))
x <- as.numeric(gsub('\\(', '-', gsub(',', '', x)))
[1] 0.00 3000.00 -500.00 0.00 2.25 -1200.00
#3
0
What for me worked perfectly when working on single strings in a data frame
(One string per row in same column) was the following:
对于我来说,当我在数据框架中处理单个字符串时(在同一列中,每一行有一个字符串),我工作得很好:
library(taRifx)
DataFrame$Numbers<-as.character(destring(DataFrame$Strings, keep="0-9.-"))
The results are in a new column from the same data frame
.
结果在同一数据帧的新列中。
#1
26
library(stringr)
x <- str_extract_all(strLine,"\\(?[0-9,.]+\\)?")[[1]]
> x
[1] "0" "3,000" "(500)" "0" "2.25" "(1,200)"
Change the parens to negatives:
把这部分改为否定句:
x <- gsub("\\((.+)\\)","-\\1",x)
x
[1] "0" "3,000" "-500" "0" "2.25" "-1,200"
And then as.numeric()
or taRifx::destring
to finish up (the next version of destring
will support negatives by default so the keep
option won't be necessary):
然后as.numeric()或taRifx: destring:结束(下一个版本的destring默认情况下将支持底片,因此无需保留选项):
library(taRifx)
destring( x, keep="0-9.-")
[1] 0 3000 -500 0 2.25 -1200
OR:
或者:
as.numeric(gsub(",","",x))
[1] 0 3000 -500 0 2.25 -1200
#2
18
Here's the base R way, for the sake of completeness...
这是基本的R方法,为了完整……
x <- unlist(regmatches(strLine, gregexpr('\\(?[0-9,.]+', strLine)))
x <- as.numeric(gsub('\\(', '-', gsub(',', '', x)))
[1] 0.00 3000.00 -500.00 0.00 2.25 -1200.00
#3
0
What for me worked perfectly when working on single strings in a data frame
(One string per row in same column) was the following:
对于我来说,当我在数据框架中处理单个字符串时(在同一列中,每一行有一个字符串),我工作得很好:
library(taRifx)
DataFrame$Numbers<-as.character(destring(DataFrame$Strings, keep="0-9.-"))
The results are in a new column from the same data frame
.
结果在同一数据帧的新列中。