检查字符串是否只包含某些字符的最好方法是什么?

时间:2021-07-28 01:41:46

I have this problem: I have a String, but I need to make sure that it only contains letters A-Z and numbers 0-9. Here is my current code:

我有一个问题:我有一个字符串,但是我需要确保它只包含字母a - z和数字0-9。以下是我目前的代码:

boolean valid = true;
for (char c : string.toCharArray()) {
    int type = Character.getType(c);
    if (type == 2 || type == 1 || type == 9) {
        // the character is either a letter or a digit
    } else {
        valid = false;
        break;
    }
}

But what is the best and the most efficient way to implement it?

但是,实现它的最佳和最有效的方法是什么呢?

8 个解决方案

#1


13  

Since no one else has worried about "fastest" yet, here is my contribution:

由于还没有人担心“最快”,我的贡献如下:

boolean valid = true;

char[] a = s.toCharArray();

for (char c: a)
{
    valid = ((c >= 'a') && (c <= 'z')) || 
            ((c >= 'A') && (c <= 'Z')) || 
            ((c >= '0') && (c <= '9'));

    if (!valid)
    {
        break;
    }
}

return valid;

Full test code below:

完整的测试代码如下:

public static void main(String[] args)
{
    String[] testStrings = {"abcdefghijklmnopqrstuvwxyz0123456789", "", "00000", "abcdefghijklmnopqrstuvwxyz0123456789&", "1", "q", "test123", "(#*$))&v", "ABC123", "hello", "supercalifragilisticexpialidocious"};

    long startNanos = System.nanoTime();

    for (String testString: testStrings)
    {
        isAlphaNumericOriginal(testString);
    }

    System.out.println("Time for isAlphaNumericOriginal: " + (System.nanoTime() - startNanos) + " ns"); 

    startNanos = System.nanoTime();

    for (String testString: testStrings)
    {
        isAlphaNumericFast(testString);
    }

    System.out.println("Time for isAlphaNumericFast: " + (System.nanoTime() - startNanos) + " ns");

    startNanos = System.nanoTime();

    for (String testString: testStrings)
    {
        isAlphaNumericRegEx(testString);
    }

    System.out.println("Time for isAlphaNumericRegEx: " + (System.nanoTime() - startNanos) + " ns");

    startNanos = System.nanoTime();

    for (String testString: testStrings)
    {
        isAlphaNumericIsLetterOrDigit(testString);
    }

    System.out.println("Time for isAlphaNumericIsLetterOrDigit: " + (System.nanoTime() - startNanos) + " ns");      
}

private static boolean isAlphaNumericOriginal(String s)
{
    boolean valid = true;
    for (char c : s.toCharArray()) 
    {
        int type = Character.getType(c);
        if (type == 2 || type == 1 || type == 9) 
        {
            // the character is either a letter or a digit
        }
        else 
        {
            valid = false;
            break;
        }
    }

    return valid;
}

private static boolean isAlphaNumericFast(String s)
{
    boolean valid = true;

    char[] a = s.toCharArray();

    for (char c: a)
    {
        valid = ((c >= 'a') && (c <= 'z')) || 
                ((c >= 'A') && (c <= 'Z')) || 
                ((c >= '0') && (c <= '9'));

        if (!valid)
        {
            break;
        }
    }

    return valid;
}

private static boolean isAlphaNumericRegEx(String s)
{
    return Pattern.matches("[\\dA-Za-z]+", s);
}

private static boolean isAlphaNumericIsLetterOrDigit(String s)
{
    boolean valid = true;
    for (char c : s.toCharArray()) { 
        if(!Character.isLetterOrDigit(c))
        {
            valid = false;
            break;
        }
    }
    return valid;
}

Produces this output for me:

为我产生这个输出:

Time for isAlphaNumericOriginal: 164960 ns
Time for isAlphaNumericFast: 18472 ns
Time for isAlphaNumericRegEx: 1978230 ns
Time for isAlphaNumericIsLetterOrDigit: 110315 ns

#2


9  

If you want to avoid regex, then the Character class can help:

如果您想避免regex,那么角色类可以帮助:

boolean valid = true;
for (char c : string.toCharArray()) { 
    if(!Character.isLetterOrDigit(c))
    {
        valid = false;
        break;
    }
}

If you care about being upper case, then do below if statement instead:

如果你关心的是大写,那就用下面的If语句代替:

if(!((Character.isLetter(c) && Character.isUpperCase(c)) || Character.isDigit(c)))

#3


3  

You could use Apache Commons Lang:

您可以使用Apache Commons Lang:

StringUtils.isAlphanumeric(String)

#4


3  

Additionally to all the other answers, here's a Guava approach:

除此之外,还有一种番石榴疗法:

boolean valid = CharMatcher.JAVA_LETTER_OR_DIGIT.matchesAllOf(string);

More on CharMatcher: https://code.google.com/p/guava-libraries/wiki/StringsExplained#CharMatcher

更多关于CharMatcher:https://code.google.com/p/guava-libraries/wiki/StringsExplained # CharMatcher

#5


2  

Use a regular expression:

使用一个正则表达式:

Pattern.matches("[\\dA-Z]+", string)

[\\dA-Z]+: At least one occurrence (+) of digits or uppercase letters.

[\ dA-Z]+:数字或大写字母至少出现一次(+)。

If you want to include lowercase letter, replace [\\dA-Z]+ with [\\dA-Za-z]+.

如果你想包含小写字母,用[\ dA-Z]+替换[\ dA-Z]+。

#6


2  

The following way is not as fast as Regular expression to implement but is one of the most efficient solution (I think) because it use bitwise operations which are really fast.

下面的方法不像正则表达式那样快,但是它是最有效的解决方案之一(我认为),因为它使用的是非常快的位操作。

My solution is more complex and harder to read and maintain but I think it is another simple way to do what you want.

我的解决方案更复杂、更难阅读和维护,但我认为这是实现您所需的另一种简单方法。

A good way to test that a string only contains numbers or capital letters is with a simple 128 bits bitmask (2 Longs) representing the ASCII table.

测试一个字符串是否只包含数字或大写字母的一个好方法是使用一个简单的128位位位位掩码(2长)来表示ASCII表。

So, For the standard ASCII table, there's a 1 on every character we want to keep (bit 48 to 57 and bit 65 to 90)

因此,对于标准ASCII表,我们想保留的每个字符都有一个1(位48到57,位65到90)

Thus, you can test that a char is a:

因此,您可以测试char是a:

  1. Number with this mask: 0x3FF000000000000L (if the character code < 65)
  2. 这个掩码的编号:0x3ff0000000000000000l(如果字符代码< 65)
  3. Uppercase letter with this mask: 0x3FFFFFFL (if the character code >=65)
  4. 带此掩码的大写字母:0x3ffffffffffl(如果字符代码>=65)

So the following method should work:

因此,以下方法应该有效:

public boolean validate(String aString) {
    for (int i = 0; i < aString.length(); i++) {
        char c = aString.charAt(i);

        if ((c <= 64) & ((0x3FF000000000000L & (1L << c)) == 0) 
                | (c > 64) & ((0x3FFFFFFL & (1L << (c - 65))) == 0)) {
            return false;
        }
    }

    return true;
}

#7


1  

The best way in sense of maintainability and simplicity is the already posted regular expression. Once familiar the this technic you know what to expect and it is very easy to widen the criteria if needed. Downside of this is the performance.

在可维护性和简洁性方面,最好的方法是已经发布的正则表达式。一旦熟悉了这个技术,你就知道该期待什么,如果需要的话,很容易扩大标准。缺点是性能。

The fastest way to go is the Array approach. Checking if a character's numerical value falls in the wanted range ASCII A-Z and 0-9 is nearly speed of light. But the maintainability is bad. Simplicity gone.

最快的方法是数组方法。检查字符的数值是否落在所需的范围ASCII a - z和0-9几乎是光速。但是可维护性很差。简单了。

You could use and java 7 switch case with char approach but that's just as bad as the second.

您可以使用带有char方法的java 7交换用例,但这和第二个一样糟糕。

In the end, since we are talking about java, I would strongly suggest to use regular expressions.

最后,由于我们正在讨论java,我强烈建议使用正则表达式。

#8


0  

StringUtils in Apache Commons Lang 3 has a containsOnly method, https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html

Apache Commons Lang 3中的StringUtils有一个只有容器的方法:https://commons.apache.org/doc/commons-lang/apis/org/apache/commons/lang3/stringutils.html

The implementation should be fast enough.

实现应该足够快。

#1


13  

Since no one else has worried about "fastest" yet, here is my contribution:

由于还没有人担心“最快”,我的贡献如下:

boolean valid = true;

char[] a = s.toCharArray();

for (char c: a)
{
    valid = ((c >= 'a') && (c <= 'z')) || 
            ((c >= 'A') && (c <= 'Z')) || 
            ((c >= '0') && (c <= '9'));

    if (!valid)
    {
        break;
    }
}

return valid;

Full test code below:

完整的测试代码如下:

public static void main(String[] args)
{
    String[] testStrings = {"abcdefghijklmnopqrstuvwxyz0123456789", "", "00000", "abcdefghijklmnopqrstuvwxyz0123456789&", "1", "q", "test123", "(#*$))&v", "ABC123", "hello", "supercalifragilisticexpialidocious"};

    long startNanos = System.nanoTime();

    for (String testString: testStrings)
    {
        isAlphaNumericOriginal(testString);
    }

    System.out.println("Time for isAlphaNumericOriginal: " + (System.nanoTime() - startNanos) + " ns"); 

    startNanos = System.nanoTime();

    for (String testString: testStrings)
    {
        isAlphaNumericFast(testString);
    }

    System.out.println("Time for isAlphaNumericFast: " + (System.nanoTime() - startNanos) + " ns");

    startNanos = System.nanoTime();

    for (String testString: testStrings)
    {
        isAlphaNumericRegEx(testString);
    }

    System.out.println("Time for isAlphaNumericRegEx: " + (System.nanoTime() - startNanos) + " ns");

    startNanos = System.nanoTime();

    for (String testString: testStrings)
    {
        isAlphaNumericIsLetterOrDigit(testString);
    }

    System.out.println("Time for isAlphaNumericIsLetterOrDigit: " + (System.nanoTime() - startNanos) + " ns");      
}

private static boolean isAlphaNumericOriginal(String s)
{
    boolean valid = true;
    for (char c : s.toCharArray()) 
    {
        int type = Character.getType(c);
        if (type == 2 || type == 1 || type == 9) 
        {
            // the character is either a letter or a digit
        }
        else 
        {
            valid = false;
            break;
        }
    }

    return valid;
}

private static boolean isAlphaNumericFast(String s)
{
    boolean valid = true;

    char[] a = s.toCharArray();

    for (char c: a)
    {
        valid = ((c >= 'a') && (c <= 'z')) || 
                ((c >= 'A') && (c <= 'Z')) || 
                ((c >= '0') && (c <= '9'));

        if (!valid)
        {
            break;
        }
    }

    return valid;
}

private static boolean isAlphaNumericRegEx(String s)
{
    return Pattern.matches("[\\dA-Za-z]+", s);
}

private static boolean isAlphaNumericIsLetterOrDigit(String s)
{
    boolean valid = true;
    for (char c : s.toCharArray()) { 
        if(!Character.isLetterOrDigit(c))
        {
            valid = false;
            break;
        }
    }
    return valid;
}

Produces this output for me:

为我产生这个输出:

Time for isAlphaNumericOriginal: 164960 ns
Time for isAlphaNumericFast: 18472 ns
Time for isAlphaNumericRegEx: 1978230 ns
Time for isAlphaNumericIsLetterOrDigit: 110315 ns

#2


9  

If you want to avoid regex, then the Character class can help:

如果您想避免regex,那么角色类可以帮助:

boolean valid = true;
for (char c : string.toCharArray()) { 
    if(!Character.isLetterOrDigit(c))
    {
        valid = false;
        break;
    }
}

If you care about being upper case, then do below if statement instead:

如果你关心的是大写,那就用下面的If语句代替:

if(!((Character.isLetter(c) && Character.isUpperCase(c)) || Character.isDigit(c)))

#3


3  

You could use Apache Commons Lang:

您可以使用Apache Commons Lang:

StringUtils.isAlphanumeric(String)

#4


3  

Additionally to all the other answers, here's a Guava approach:

除此之外,还有一种番石榴疗法:

boolean valid = CharMatcher.JAVA_LETTER_OR_DIGIT.matchesAllOf(string);

More on CharMatcher: https://code.google.com/p/guava-libraries/wiki/StringsExplained#CharMatcher

更多关于CharMatcher:https://code.google.com/p/guava-libraries/wiki/StringsExplained # CharMatcher

#5


2  

Use a regular expression:

使用一个正则表达式:

Pattern.matches("[\\dA-Z]+", string)

[\\dA-Z]+: At least one occurrence (+) of digits or uppercase letters.

[\ dA-Z]+:数字或大写字母至少出现一次(+)。

If you want to include lowercase letter, replace [\\dA-Z]+ with [\\dA-Za-z]+.

如果你想包含小写字母,用[\ dA-Z]+替换[\ dA-Z]+。

#6


2  

The following way is not as fast as Regular expression to implement but is one of the most efficient solution (I think) because it use bitwise operations which are really fast.

下面的方法不像正则表达式那样快,但是它是最有效的解决方案之一(我认为),因为它使用的是非常快的位操作。

My solution is more complex and harder to read and maintain but I think it is another simple way to do what you want.

我的解决方案更复杂、更难阅读和维护,但我认为这是实现您所需的另一种简单方法。

A good way to test that a string only contains numbers or capital letters is with a simple 128 bits bitmask (2 Longs) representing the ASCII table.

测试一个字符串是否只包含数字或大写字母的一个好方法是使用一个简单的128位位位位掩码(2长)来表示ASCII表。

So, For the standard ASCII table, there's a 1 on every character we want to keep (bit 48 to 57 and bit 65 to 90)

因此,对于标准ASCII表,我们想保留的每个字符都有一个1(位48到57,位65到90)

Thus, you can test that a char is a:

因此,您可以测试char是a:

  1. Number with this mask: 0x3FF000000000000L (if the character code < 65)
  2. 这个掩码的编号:0x3ff0000000000000000l(如果字符代码< 65)
  3. Uppercase letter with this mask: 0x3FFFFFFL (if the character code >=65)
  4. 带此掩码的大写字母:0x3ffffffffffl(如果字符代码>=65)

So the following method should work:

因此,以下方法应该有效:

public boolean validate(String aString) {
    for (int i = 0; i < aString.length(); i++) {
        char c = aString.charAt(i);

        if ((c <= 64) & ((0x3FF000000000000L & (1L << c)) == 0) 
                | (c > 64) & ((0x3FFFFFFL & (1L << (c - 65))) == 0)) {
            return false;
        }
    }

    return true;
}

#7


1  

The best way in sense of maintainability and simplicity is the already posted regular expression. Once familiar the this technic you know what to expect and it is very easy to widen the criteria if needed. Downside of this is the performance.

在可维护性和简洁性方面,最好的方法是已经发布的正则表达式。一旦熟悉了这个技术,你就知道该期待什么,如果需要的话,很容易扩大标准。缺点是性能。

The fastest way to go is the Array approach. Checking if a character's numerical value falls in the wanted range ASCII A-Z and 0-9 is nearly speed of light. But the maintainability is bad. Simplicity gone.

最快的方法是数组方法。检查字符的数值是否落在所需的范围ASCII a - z和0-9几乎是光速。但是可维护性很差。简单了。

You could use and java 7 switch case with char approach but that's just as bad as the second.

您可以使用带有char方法的java 7交换用例,但这和第二个一样糟糕。

In the end, since we are talking about java, I would strongly suggest to use regular expressions.

最后,由于我们正在讨论java,我强烈建议使用正则表达式。

#8


0  

StringUtils in Apache Commons Lang 3 has a containsOnly method, https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html

Apache Commons Lang 3中的StringUtils有一个只有容器的方法:https://commons.apache.org/doc/commons-lang/apis/org/apache/commons/lang3/stringutils.html

The implementation should be fast enough.

实现应该足够快。