如何有效地检查字符串是否有c++中的特殊字符?

I am trying to find if there is better way to check if the string has special characters. In my case, anything other than alphanumeric and a '_' is considered a special character. Currently, I have a string that contains special characters such as std::string = "!@#$%^&". I then use the std::find_first_of () algorithm to check if any of the special characters are present in the string.

我正在寻找是否有更好的方法来检查字符串是否有特殊字符。在我的例子中，除了字母数字和“_”之外的任何字符都被认为是特殊字符。目前,我有一个字符串,该字符串包含特殊字符,如std::string =“! @ # $ % ^ &”。然后使用std::find_first_of()算法检查字符串中是否存在任何特殊字符。

I was wondering how to do it based on whitelisting. I want to specify the lowercase/uppercase characters, numbers and an underscore in a string ( I don't want to list them. Is there any way I can specify the ascii range of some sort like [a-zA-Z0-9_]). How can I achieve this? Then I plan to use the std::find_first_not_of(). In this way I can mention what I actually want and check for the opposite.

我想知道如何基于白名单。我想在字符串中指定小写/大写字符、数字和下划线(我不想列出它们)。我是否可以指定某种类型的ascii范围[a-zA-Z0-9_])。我如何做到这一点?然后我打算使用std::find_first_not_of()。这样，我就可以提到我真正想要的，并检查一下相反的情况。

8 个解决方案

#1

Try:

试一试:

std::string  x(/*Load*/);
if (x.find_first_not_of("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890_") != std::string::npos)
{
    std::cerr << "Error\n";
}

Or try boost regular expressions:

或者试试增强正则表达式:

// Note: \w matches any word character `alphanumeric plus "_"`
boost::regex test("\w+", re,boost::regex::perl);
if (!boost::regex_match(x.begin(), x.end(), test)
{
    std::cerr << "Error\n";
}

// The equivalent to \w should be:
boost::regex test("[A-Za-z0-9_]+", re,boost::regex::perl);

#2

The first thing that you need to consider is "is this ASCII only"? If you answer is yes, I would encourage you to really consider whether or not you should allow ASCII only. I currently work for a company that is really having some headaches getting into foreign markets because we didn't think to support unicode from the get-go.

你需要考虑的第一件事是“这是唯一的ASCII码吗?”如果您的回答是肯定的，我建议您考虑是否应该只允许ASCII。我目前为一家公司工作，该公司进入海外市场时确实有些头疼，因为我们从一开始就不考虑支持unicode。

That being said, ASCII makes it really easy to check for non alpha numerics. Take a look at the ascii chart.

也就是说，ASCII使得检查非字母数字变得非常容易。看一下ascii图表。

http://en.wikipedia.org/wiki/ASCII#ASCII_printable_characters

http://en.wikipedia.org/wiki/ASCII ASCII_printable_characters

Iterate through each character
遍历每个字符
Check if the character is decimal value 48 - 57, 65 - 90, 97 - 122, or 95 (underscore)
检查字符是否为十进制值48 - 57、65 - 90、97 - 122或95(下划线)

#3

There's no way using standard C or C++ to do that using character ranges, you have to list out all of the characters. For C strings, you can use strspn(3) and strcspn(3) to find the first character in a string that is a member of or is not a member of a given character set. For example:

使用标准的C或c++来使用字符范围是不可能的，您必须列出所有字符。对于C字符串，您可以使用strspn(3)和strcspn(3)查找字符串中属于或不属于给定字符集的第一个字符。

// Test if the given string has anything not in A-Za-z0-9_
bool HasSpecialCharacters(const char *str)
{
    return str[strspn(str, "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_")] != 0;
}

For C++ strings, you can equivalently use the find_first_of and find_first_not_of member functions.

对于c++字符串，您可以等效地使用成员函数的find_first_of和find_first_not_of。

Another option is to use the isalnum(3) and related functions from the <ctype.h> to test if a given character is alphanumeric or not; note that these functions are locale-dependent, so their behavior can (and does) change in other locales. If you do not want that behavior, then don't use them. If you do choose to use them, you'll have to also test for underscores separately, since there's no function that tests "alphabetic, numeric, or underscore", and you'll also have to code your own loop to search the string (or use std::find with an appropriate function object).

另一个选项是使用测试给定字符是否为字母数字;注意，这些函数是与位置相关的，因此它们的行为可以(并且确实)在其他地方发生变化。如果你不想要这种行为，那就不要使用它们。如果您选择使用它们，您还必须分别测试下划线，因为没有测试“字母、数字或下划线”的函数，而且您还必须编写自己的循环来搜索字符串(或使用std::find和适当的函数对象)。中的isalnum(3)和相关函数。h>

#4

I think I'd do the job just a bit differently, treating the std::string as a collection, and using an algorithm. Using a C++0x lambda, it would look something like this:

我想我应该用一种稍微不同的方式来完成这项工作，将std::string作为一个集合，并使用一个算法。使用c++ 0x，它会是这样的:

bool has_special_char(std::string const &str) {
    return std::find_if(str.begin(), str.end(),
        [](char ch) { return !(isalnum(ch) || ch == '_'); }) != str.end();
}

At least when you're dealing with char (not wchar_t), isalnum will typically use a table look up, so it'll usually be (quite a bit) faster than anything based on find_first_of (which will normally use a linear search instead). IOW, this is O(N) (N=str.size()), where something based on find_first_of will be O(N*M), (N=str.size(), M=pattern.size()).

至少在处理char(而不是wchar_t)时，isalnum通常会使用一个表查找，因此它通常比任何基于find_first_of(通常使用线性搜索)的查询要快(相当多)。这是O(N) (N= string .size())，其中基于find_first_of的东西将是O(N*M)， (N= string .size()， M=pattern.size())。

If you want to do the job with pure C, you can use scanf with a scanset conversion that's theoretically non-portable, but supported by essentially all recent/popular compilers:

如果你想用纯C完成这项工作，你可以使用scanf进行扫描集转换，这在理论上是不可移植的，但基本上所有最近/流行的编译器都支持这种转换:

char junk;
if (sscanf(str, "%*[A-Za-z0-9_]%c", &junk))
    /* it has at least one "special" character
else
    /* no special characters */

The basic idea here is pretty simple: the scanset skips across all consecutive non-special characters (but doesn't assign the result to anything, because of the *), then we try to read one more character. If that succeeds, it means there was at least one character that was not skipped, so we must have at least one special character. If it fails, it means the scanset conversion matched the whole string, so all the characters were "non-special".

这里的基本思想非常简单:scanset会跳过所有连续的非特殊字符(但是由于*，不会将结果赋给任何字符)，然后我们尝试再读一个字符。如果成功，这意味着至少有一个字符没有被跳过，所以我们必须至少有一个特殊字符。如果失败，这意味着scanset转换匹配整个字符串，因此所有字符都是“非特殊的”。

Officially, the C standard says that trying to put a range in a scanset conversion like this isn't portable (a '-' anywhere but the beginning or end of the scanset gives implementation defined behavior). There have even been a few compilers (from Borland) that would fail for this -- they would treat A-Z as matching exactly three possible characters, 'A', '-' and 'Z'. Most current compilers (or, more accurately, standard library implementations) take the approach this assumes: "A-Z" matches any upper-case character.

正式地说，C标准说，试图在这样的扫描集转换中放入范围是不可移植的(除了扫描集的开始或结束之外的任何地方都有定义的行为)。甚至有一些编译器(来自Borland)也会因此而失败——它们会把a -Z看作恰好匹配三个可能的字符，“a”、“-”和“Z”。大多数当前的编译器(或者更准确地说，标准库实现)采用这种方法:“A-Z”匹配任何大写字符。

#5

The functions (macros) are subject to locale settings, but you should investigate isalnum() and relatives from <ctype.h> or <cctype>.

函数(宏)受语言环境设置的限制，但是您应该研究isalnum()和或< cctype >。的关系。h>

#6

I would just use the built-in C facility here. Iterate over each character in the string and check if it's _ or if isalpha(ch) is true. If so then it's valid, otherwise it's a special character.

我只需要使用内置的C设备。遍历字符串中的每个字符，检查它是否为_或isalpha(ch)是否为真。如果是，那么它是有效的，否则它就是一个特殊的字符。

#7

If you want this, but don't want to go the whole hog and use regexps, and given you're test is for ASCII chars - just create a function to generate the string for find_first_not_of...

如果你想这样做，但又不想完全使用regexps，并且给定你的测试是针对ASCII字符的——只需创建一个函数来生成find_first_not_of…

#include <iostream>
#include <string>

std::string expand(const char* p)
{
    std::string result;
    while (*p)
        if (p[1] == '-' && p[2])
        {
            for (int c = p[0]; c <= p[2]; ++c)
                result += (char)c;
            p += 3;
        }
        else
            result += *p++;
    return result;
}

int main()
{
    std::cout << expand("A-Za-z0-9_") << '\n';
}

#8

Using

使用

    s.erase(std::remove_if(s.begin(), s.end(), my_predicate), s.end());

    bool my_predicate(char c)
    {
     return !(isalpha(c) || c=='_');
    }

will get you a clean string s.

会给你一个干净的字符串。

Erase will strip it off all the special characters and is highly customisable with the my_predicate function.

擦除将它从所有的特殊字符中删除，并且可以使用my_predicate函数进行高度定制。

#1