I am trying to replace certain patterns in a string with different replacement patters.
我正在尝试用不同的替换模式来替换字符串中的某些模式。
Example:
例子:
string test = "test replacing \"these characters\"";
What I want to do is replace all ' ' with '_' and all other non letter or number characters with an empty string. I have the following regex created and it seems to tokenize correctly, but I am not sure how to (if possible) perform a conditional replace using regex_replace
.
我要做的是用一个空字符串替换所有' ' ' '和所有其他非字母或数字字符。我创建了以下regex,它似乎正确地进行了标记,但是我不确定如何(如果可能的话)使用regex_replace执行条件替换。
string test = "test replacing \"these characters\"";
regex reg("(\\s+)|(\\W+)");
expected result after replace would be:
更换后的预期结果是:
string result = "test_replacing_these_characters";
EDIT: I cannot use boost, which is why I left it out of the tags. So please no answer that includes boost. I have to do this with the standard library. It may be that a different regex would accomplish the goal or that I am just stuck doing two passes.
编辑:我不能使用boost,这就是为什么我把它放在标签之外的原因。所以请不要回答包括boost的问题。我必须用标准库做这个。可能是一个不同的regex将完成目标,或者是我只做了两次传球。
EDIT2: I did not remember what characters were included in \w
at the time of my original regex, after looking it up I have further simplified the expression. Again the goal is anything matching \s+ should be replaced with '_' and anything matching \W+ should be replaced with empty string.
在我最初的regex时,我不记得什么字符包含在\w中,在查找之后,我进一步简化了表达式。同样,目标是任何匹配的\s+应该被替换为“_”,任何匹配的\W+都应该被替换为空字符串。
1 个解决方案
#1
23
The c++ (0x, 11, tr1) regular expressions do not really work (*) in every case (look up the phrase regex on this page for gcc), so it is better to use boost for a while.
c++ (0x, 11, tr1)正则表达式并不能在所有情况下(*)都起作用(在这个页面上为gcc查找正则表达式短语regex),所以最好暂时使用boost。
You may try if your compiler supports the regular expressions needed:
如果您的编译器支持所需的正则表达式,您可以尝试:
#include <string>
#include <iostream>
#include <regex>
using namespace std;
int main(int argc, char * argv[]) {
string test = "test replacing \"these characters\"";
regex reg("[^\\w]+");
test = regex_replace(test, reg, "_");
cout << test << endl;
}
The above works in Visual Studio 2012Rc.
以上作品在Visual Studio 2012Rc中。
Edit 1: To replace by two different strings in one pass (depending on the match), I'd think this won't work here. In Perl, this could easily be done within evaluated replacement expressions (/e
switch).
编辑1:要在一次传递中替换两个不同的字符串(取决于匹配),我认为这在这里行不通。在Perl中,这很容易在评估的替换表达式(/e开关)中完成。
Therefore, you'll need two passes, as you already suspected:
因此,您将需要两个通行证,正如您已经怀疑的:
...
string test = "test replacing \"these characters\"";
test = regex_replace(test, regex("\\s+"), "_");
test = regex_replace(test, regex("\\W+"), "");
...
Edit 2:
编辑2:
If it would be possible to use a callback function tr()
in regex_replace
, then you could modify the substitution there, like:
如果可以在regex_replace中使用回调函数tr(),则可以在那里修改替换,如:
string output = regex_replace(test, regex("\\s+|\\W+"), tr);
with tr()
doing the replacement work:
使用tr()进行替换工作:
string tr(const smatch &m) { return m[0].str()[0] == ' ' ? "_" : ""; }
the problem would have been solved. Unfortunately, there's no such overload in some C++11 regex implementations, but Boost has one. The following would work with boost and use one pass:
这个问题本来是可以解决的。不幸的是,在某些c++ 11 regex实现中没有这样的重载,但是Boost有一个。下面将使用boost和one pass:
...
#include <boost/regex.hpp>
using namespace boost;
...
string tr(const smatch &m) { return m[0].str()[0] == ' ' ? "_" : ""; }
...
string test = "test replacing \"these characters\"";
test = regex_replace(test, regex("\\s+|\\W+"), tr); // <= works in Boost
...
Maybe some day this will work with C++11 or whatever number comes next.
也许有一天它会和c++ 11或者其他数字一起工作。
Regards
问候
rbo
投
#1
23
The c++ (0x, 11, tr1) regular expressions do not really work (*) in every case (look up the phrase regex on this page for gcc), so it is better to use boost for a while.
c++ (0x, 11, tr1)正则表达式并不能在所有情况下(*)都起作用(在这个页面上为gcc查找正则表达式短语regex),所以最好暂时使用boost。
You may try if your compiler supports the regular expressions needed:
如果您的编译器支持所需的正则表达式,您可以尝试:
#include <string>
#include <iostream>
#include <regex>
using namespace std;
int main(int argc, char * argv[]) {
string test = "test replacing \"these characters\"";
regex reg("[^\\w]+");
test = regex_replace(test, reg, "_");
cout << test << endl;
}
The above works in Visual Studio 2012Rc.
以上作品在Visual Studio 2012Rc中。
Edit 1: To replace by two different strings in one pass (depending on the match), I'd think this won't work here. In Perl, this could easily be done within evaluated replacement expressions (/e
switch).
编辑1:要在一次传递中替换两个不同的字符串(取决于匹配),我认为这在这里行不通。在Perl中,这很容易在评估的替换表达式(/e开关)中完成。
Therefore, you'll need two passes, as you already suspected:
因此,您将需要两个通行证,正如您已经怀疑的:
...
string test = "test replacing \"these characters\"";
test = regex_replace(test, regex("\\s+"), "_");
test = regex_replace(test, regex("\\W+"), "");
...
Edit 2:
编辑2:
If it would be possible to use a callback function tr()
in regex_replace
, then you could modify the substitution there, like:
如果可以在regex_replace中使用回调函数tr(),则可以在那里修改替换,如:
string output = regex_replace(test, regex("\\s+|\\W+"), tr);
with tr()
doing the replacement work:
使用tr()进行替换工作:
string tr(const smatch &m) { return m[0].str()[0] == ' ' ? "_" : ""; }
the problem would have been solved. Unfortunately, there's no such overload in some C++11 regex implementations, but Boost has one. The following would work with boost and use one pass:
这个问题本来是可以解决的。不幸的是,在某些c++ 11 regex实现中没有这样的重载,但是Boost有一个。下面将使用boost和one pass:
...
#include <boost/regex.hpp>
using namespace boost;
...
string tr(const smatch &m) { return m[0].str()[0] == ' ' ? "_" : ""; }
...
string test = "test replacing \"these characters\"";
test = regex_replace(test, regex("\\s+|\\W+"), tr); // <= works in Boost
...
Maybe some day this will work with C++11 or whatever number comes next.
也许有一天它会和c++ 11或者其他数字一起工作。
Regards
问候
rbo
投