如何避免字符串在Boost Regex中使用?

时间:2022-01-08 05:25:48

I'm just getting my head around regular expressions, and I'm using the Boost Regex library.

我只是在思考正则表达式,我正在使用Boost Regex库。

I have a need to use a regex that includes a specific URL, and it chokes because obviously there are characters in the URL that are reserved for regex and need to be escaped.

我需要使用包含特定URL的regex,它会阻塞,因为显然URL中有些字符是为regex保留的,需要转义。

Is there any function or method in the Boost library to escape a string for this kind of usage? I know there are such methods in most other regex implementations, but I don't see one in Boost.

在Boost库中是否有函数或方法来转义这种用法的字符串?我知道在大多数regex实现中都有这样的方法,但是我在Boost中看不到这样的方法。

Alternatively, is there a list of all characters that would need to be escaped?

或者,是否有需要转义的所有字符的列表?

4 个解决方案

#1


34  

. ^ $ | ( ) [ ] { } * + ? \

Ironically, you could use a regex to escape your URL so that it can be inserted into a regex.

讽刺的是,您可以使用regex来转义URL,以便将其插入到regex中。

const boost::regex esc("[.^$|()\\[\\]{}*+?\\\\]");
const std::string rep("\\\\&");
std::string result = regex_replace(url_to_escape, esc, rep,
                                   boost::match_default | boost::format_sed);

(The flag boost::format_sed specifies to use the replacement string format of sed. In sed, an escape & will output whatever matched by the whole expression)

(标志boost:::format_sed指定使用sed的替换字符串格式。在sed中,escape &将输出与整个表达式匹配的任何内容)

Or if you are not comfortable with sed's replacement string format, just change the flag to boost::format_perl, and you can use the familiar $& to refer to whatever matched by the whole expression.

或者,如果您不熟悉sed替换字符串格式,只需将标志更改为boost::format_perl,并可以使用熟悉的$&来引用与整个表达式匹配的任何内容。

const std::string rep("\\\\$&");
std::string result = regex_replace(url_to_escape, esc, rep,
                                   boost::match_default | boost::format_perl);

#2


12  

Using code from Dav (+ a fix from comments), I created ASCII/Unicode function regex_escape():

使用Dav(加上注释中的修复)中的代码,我创建了ASCII/Unicode函数regex_escape():

std::wstring regex_escape(const std::wstring& string_to_escape) {
    static const boost::wregex re_boostRegexEscape( _T("[.^$|()\\[\\]{}*+?\\\\]") );
    const std::wstring rep( _T("\\\\&") );
    std::wstring result = regex_replace(string_to_escape, re_boostRegexEscape, rep, boost::match_default | boost::format_sed);
    return result;
}

For ASCII version, use std::string/boost::regex instead of std::wstring/boost::wregex.

对于ASCII版本,使用std::string/boost:::regex而不是std::wstring/boost::wregex。

#3


4  

Same with boost::xpressive:

与提高::xpressive:

const boost::xpressive::sregex re_escape_text = boost::xpressive::sregex::compile("([\\^\\.\\$\\|\\(\\)\\[\\]\\*\\+\\?\\/\\\\])");

std::string regex_escape(std::string text){
    text = boost::xpressive::regex_replace( text, re_escape_text, std::string("\\$1") );
    return text;
}

#4


1  

In C++11, you can use raw string literals to avoid escaping the regex string:

在c++ 11中,您可以使用原始字符串文本来避免从regex字符串中转义:

std::string myRegex = R"(something\.com)";

std::string myRegex = R”(\ com)”;

See http://en.cppreference.com/w/cpp/language/string_literal, item (6).

看到http://en.cppreference.com/w/cpp/language/string_literal项目(6)。

#1


34  

. ^ $ | ( ) [ ] { } * + ? \

Ironically, you could use a regex to escape your URL so that it can be inserted into a regex.

讽刺的是,您可以使用regex来转义URL,以便将其插入到regex中。

const boost::regex esc("[.^$|()\\[\\]{}*+?\\\\]");
const std::string rep("\\\\&");
std::string result = regex_replace(url_to_escape, esc, rep,
                                   boost::match_default | boost::format_sed);

(The flag boost::format_sed specifies to use the replacement string format of sed. In sed, an escape & will output whatever matched by the whole expression)

(标志boost:::format_sed指定使用sed的替换字符串格式。在sed中,escape &将输出与整个表达式匹配的任何内容)

Or if you are not comfortable with sed's replacement string format, just change the flag to boost::format_perl, and you can use the familiar $& to refer to whatever matched by the whole expression.

或者,如果您不熟悉sed替换字符串格式,只需将标志更改为boost::format_perl,并可以使用熟悉的$&来引用与整个表达式匹配的任何内容。

const std::string rep("\\\\$&");
std::string result = regex_replace(url_to_escape, esc, rep,
                                   boost::match_default | boost::format_perl);

#2


12  

Using code from Dav (+ a fix from comments), I created ASCII/Unicode function regex_escape():

使用Dav(加上注释中的修复)中的代码,我创建了ASCII/Unicode函数regex_escape():

std::wstring regex_escape(const std::wstring& string_to_escape) {
    static const boost::wregex re_boostRegexEscape( _T("[.^$|()\\[\\]{}*+?\\\\]") );
    const std::wstring rep( _T("\\\\&") );
    std::wstring result = regex_replace(string_to_escape, re_boostRegexEscape, rep, boost::match_default | boost::format_sed);
    return result;
}

For ASCII version, use std::string/boost::regex instead of std::wstring/boost::wregex.

对于ASCII版本,使用std::string/boost:::regex而不是std::wstring/boost::wregex。

#3


4  

Same with boost::xpressive:

与提高::xpressive:

const boost::xpressive::sregex re_escape_text = boost::xpressive::sregex::compile("([\\^\\.\\$\\|\\(\\)\\[\\]\\*\\+\\?\\/\\\\])");

std::string regex_escape(std::string text){
    text = boost::xpressive::regex_replace( text, re_escape_text, std::string("\\$1") );
    return text;
}

#4


1  

In C++11, you can use raw string literals to avoid escaping the regex string:

在c++ 11中,您可以使用原始字符串文本来避免从regex字符串中转义:

std::string myRegex = R"(something\.com)";

std::string myRegex = R”(\ com)”;

See http://en.cppreference.com/w/cpp/language/string_literal, item (6).

看到http://en.cppreference.com/w/cpp/language/string_literal项目(6)。