正则表达式匹配完全限定的类名

时间:2022-04-27 20:23:01

What is the best way to match fully qualified Java class name in a text?

在文本中匹配完全限定的Java类名的最佳方式是什么?

Examples: java.lang.Reflect, java.util.ArrayList, org.hibernate.Hibernate.

例子:. lang。反映,java.util。ArrayList,org.hibernate.Hibernate。

8 个解决方案

#1


62  

A Java fully qualified class name (lets say "N") has the structure

Java完全限定类名(我们说“N”)具有这种结构

N.N.N.N

The "N" part must be a Java identifier. Java identifiers cannot start with a number, but after the initial character they may use any combination of letters and digits, underscores or dollar signs:

“N”部分必须是一个Java标识符。Java标识符不能以数字开头,但在初始字符之后,它们可以使用字母和数字、下划线或美元符号的任何组合:

([a-zA-Z_$][a-zA-Z\d_$]*\.)*[a-zA-Z_$][a-zA-Z\d_$]*
------------------------    -----------------------
          N                           N

They can also not be a reserved word (like import, true or null). If you want to check plausibility only, the above is enough. If you also want to check validity, you must check against a list of reserved words as well.

它们也不能是保留字(如import、true或null)。如果你只是想检查可信度,以上就足够了。如果你也想检查有效性,你必须检查保留词的列表。

Java identifiers may contain any Unicode letter instead of "latin only". If you want to check for this as well, use Unicode character classes:

Java标识符可以包含任何Unicode字母,而不是“仅包含拉丁文”。如果您也想检查这个,请使用Unicode字符类:

([\p{Letter}_$][\p{Letter}\p{Number}_$]*\.)*[\p{Letter}_$][\p{Letter}\p{Number}_$]*

or, for short

简称,

([\p{L}_$][\p{L}\p{N}_$]*\.)*[\p{L}_$][\p{L}\p{N}_$]*

The Java Language Specification, (section 3.8) has all details about valid identifier names.

Java语言规范(第3.8节)包含关于有效标识符名称的所有细节。

Also see the answer to this question: Java Unicode variable names

还可以看到这个问题的答案:Java Unicode变量名

#2


7  

Here is a fully working class with tests, based on the excellent comment from @alan-moore

这是一个完整的带有测试的工作类,基于@alan-moore的精彩评论

import static org.junit.Assert.assertFalse;
import static org.junit.Assert.assertTrue;

import java.util.regex.Pattern;

import org.junit.Test;

public class ValidateJavaIdentifier {

    private static final String ID_PATTERN = "\\p{javaJavaIdentifierStart}\\p{javaJavaIdentifierPart}*";
    private static final Pattern FQCN = Pattern.compile(ID_PATTERN + "(\\." + ID_PATTERN + ")*");

    public static boolean validateJavaIdentifier(String identifier) {
        return FQCN.matcher(identifier).matches();
    }


    @Test
    public void testJavaIdentifier() throws Exception {
        assertTrue(validateJavaIdentifier("C"));
        assertTrue(validateJavaIdentifier("Cc"));
        assertTrue(validateJavaIdentifier("b.C"));
        assertTrue(validateJavaIdentifier("b.Cc"));
        assertTrue(validateJavaIdentifier("aAa.b.Cc"));
        assertTrue(validateJavaIdentifier("a.b.Cc"));

        // after the initial character identifiers may use any combination of
        // letters and digits, underscores or dollar signs
        assertTrue(validateJavaIdentifier("a.b.C_c"));
        assertTrue(validateJavaIdentifier("a.b.C$c"));
        assertTrue(validateJavaIdentifier("a.b.C9"));

        assertFalse("cannot start with a dot", validateJavaIdentifier(".C"));
        assertFalse("cannot have two dots following each other",
                validateJavaIdentifier("b..C"));
        assertFalse("cannot start with a number ",
                validateJavaIdentifier("b.9C"));
    }
}

#3


4  

The pattern provided by Renaud works. But, as far as I can tell, it will always backtrack at the end.

雷诺公司提供的模式。但是,就我所知,它最终总是会倒退。

To optimize it, you can essentially swap the first half with the last. Note the dot match that you also need to change.

要优化它,你可以将上半部分和上半部分互换。注意您还需要更改的点匹配。

The following is my version of it that, when compared to the original, runs about twice as fast:

下面是我的版本,与原始版本相比,运行速度大约是原来的两倍:

String ID_PATTERN = "\\p{javaJavaIdentifierStart}\\p{javaJavaIdentifierPart}*";
Pattern FQCN = Pattern.compile(ID_PATTERN + "(\\." + ID_PATTERN + ")*");

I cannot write comments, so I decided to write an answer instead.

我不能写评论,所以我决定写一个答案。

#4


3  

I came (on my own) to a similar answer (as Tomalak's answer), something as M.M.M.N:

我(以我自己的方式)得出了一个类似的答案(Tomalak的答案),像m.m.n:

([a-z][a-z_0-9]*\.)*[A-Z_]($[A-Z_]|[\w_])*

Where,

在那里,

M = ([a-z][a-z_0-9]*\.)*
N = [A-Z_]($[A-Z_]|[\w_])*

However, this regular expression (unlike Tomalak's answer) makes more assumptions:

然而,这个正则表达式(不像Tomalak的答案)提供了更多的假设:

  1. The package name (The M part) will be only in lower case, the first character of M will be always a lower letter, the rest can mix underscore, lower letters and numbers.

    包名(M部分)将只在小写的情况下,M的第一个字符将始终是一个小写字母,其余的可以混合下划线、小写字母和数字。

  2. The Class Name (the N part) will always start with an Upper Case Letter or an underscore, the rest can mix underscore, letters and numbers. Inner Classes will always start with a dollar symbol ($) and must obey the class name rules described previously.

    类名(N部分)总是以大写字母或下划线开头,其余的可以混合下划线、字母和数字。内部类总是以$符号($)开头,并且必须遵守前面描述的类名规则。

Note: the pattern \w is the XSD pattern for letters and digits (it does not includes the underscore symbol (_))

注意:模式\w是字母和数字的XSD模式(它不包括下划线符号(_)))

Hope this help.

希望这个有帮助。

#5


0  

Following expression works perfectly fine for me.

下面的表达式对我来说非常适用。

^[a-z][a-z0-9_]*(\.[a-z0-9_]+)+$

#6


0  

The following class validates that a provided package name is valid:

下面的类验证所提供的包名是否有效:

import java.util.HashSet;

public class ValidationUtils {

    // All Java reserved words that must not be used in a valid package name.
    private static final HashSet reserved;

    static {
        reserved = new HashSet();
        reserved.add("abstract");reserved.add("assert");reserved.add("boolean");
        reserved.add("break");reserved.add("byte");reserved.add("case");
        reserved.add("catch");reserved.add("char");reserved.add("class");
        reserved.add("const");reserved.add("continue");reserved.add("default");
        reserved.add("do");reserved.add("double");reserved.add("else");
        reserved.add("enum");reserved.add("extends");reserved.add("false");
        reserved.add("final");reserved.add("finally");reserved.add("float");
        reserved.add("for");reserved.add("if");reserved.add("goto");
        reserved.add("implements");reserved.add("import");reserved.add("instanceof");
        reserved.add("int");reserved.add("interface");reserved.add("long");
        reserved.add("native");reserved.add("new");reserved.add("null");
        reserved.add("package");reserved.add("private");reserved.add("protected");
        reserved.add("public");reserved.add("return");reserved.add("short");
        reserved.add("static");reserved.add("strictfp");reserved.add("super");
        reserved.add("switch");reserved.add("synchronized");reserved.add("this");
        reserved.add("throw");reserved.add("throws");reserved.add("transient");
        reserved.add("true");reserved.add("try");reserved.add("void");
        reserved.add("volatile");reserved.add("while");
    }

    /**
     * Checks if the string that is provided is a valid Java package name (contains only
     * [a-z,A-Z,_,$], every element is separated by a single '.' , an element can't be one of Java's
     * reserved words.
     *
     * @param name The package name that needs to be validated.
     * @return <b>true</b> if the package name is valid, <b>false</b> if its not valid.
     */
    public static final boolean isValidPackageName(String name) {
        String[] parts=name.split("\\.",-1);
        for (String part:parts){
            System.out.println(part);
            if (reserved.contains(part)) return false;
            if (!validPart(part)) return false;
        }
        return true;
    }

    /**
     * Checks that a part (a word between dots) is a valid part to be used in a Java package name.
     * @param part The part between dots (e.g. *PART*.*PART*.*PART*.*PART*).
     * @return <b>true</b> if the part is valid, <b>false</b> if its not valid.
     */
    private static boolean validPart(String part){
        if (part==null || part.length()<1){
            // Package part is null or empty !
            return false;
        }
        if (Character.isJavaIdentifierStart(part.charAt(0))){
            for (int i = 0; i < part.length(); i++){
                char c = part.charAt(i);
                if (!Character.isJavaIdentifierPart(c)){
                    // Package part contains invalid JavaIdentifier !
                    return false;
                }
            }
        }else{
            // Package part does not begin with a valid JavaIdentifier !
            return false;
        }

        return true;
    }
}

#7


0  

shorter version of a working regexp:

工作regexp的更短版本:

\p{Alnum}[\p{Alnum}._]+\p{Alnum}

#8


-3  

I'll say something like ([\w]+\.)*[\w]+

我会说([\w]+\)*[\w]+

But maybe I can be more specific knowing what you want to do with it ;)

但也许我可以更具体地知道你想用它做什么;

#1


62  

A Java fully qualified class name (lets say "N") has the structure

Java完全限定类名(我们说“N”)具有这种结构

N.N.N.N

The "N" part must be a Java identifier. Java identifiers cannot start with a number, but after the initial character they may use any combination of letters and digits, underscores or dollar signs:

“N”部分必须是一个Java标识符。Java标识符不能以数字开头,但在初始字符之后,它们可以使用字母和数字、下划线或美元符号的任何组合:

([a-zA-Z_$][a-zA-Z\d_$]*\.)*[a-zA-Z_$][a-zA-Z\d_$]*
------------------------    -----------------------
          N                           N

They can also not be a reserved word (like import, true or null). If you want to check plausibility only, the above is enough. If you also want to check validity, you must check against a list of reserved words as well.

它们也不能是保留字(如import、true或null)。如果你只是想检查可信度,以上就足够了。如果你也想检查有效性,你必须检查保留词的列表。

Java identifiers may contain any Unicode letter instead of "latin only". If you want to check for this as well, use Unicode character classes:

Java标识符可以包含任何Unicode字母,而不是“仅包含拉丁文”。如果您也想检查这个,请使用Unicode字符类:

([\p{Letter}_$][\p{Letter}\p{Number}_$]*\.)*[\p{Letter}_$][\p{Letter}\p{Number}_$]*

or, for short

简称,

([\p{L}_$][\p{L}\p{N}_$]*\.)*[\p{L}_$][\p{L}\p{N}_$]*

The Java Language Specification, (section 3.8) has all details about valid identifier names.

Java语言规范(第3.8节)包含关于有效标识符名称的所有细节。

Also see the answer to this question: Java Unicode variable names

还可以看到这个问题的答案:Java Unicode变量名

#2


7  

Here is a fully working class with tests, based on the excellent comment from @alan-moore

这是一个完整的带有测试的工作类,基于@alan-moore的精彩评论

import static org.junit.Assert.assertFalse;
import static org.junit.Assert.assertTrue;

import java.util.regex.Pattern;

import org.junit.Test;

public class ValidateJavaIdentifier {

    private static final String ID_PATTERN = "\\p{javaJavaIdentifierStart}\\p{javaJavaIdentifierPart}*";
    private static final Pattern FQCN = Pattern.compile(ID_PATTERN + "(\\." + ID_PATTERN + ")*");

    public static boolean validateJavaIdentifier(String identifier) {
        return FQCN.matcher(identifier).matches();
    }


    @Test
    public void testJavaIdentifier() throws Exception {
        assertTrue(validateJavaIdentifier("C"));
        assertTrue(validateJavaIdentifier("Cc"));
        assertTrue(validateJavaIdentifier("b.C"));
        assertTrue(validateJavaIdentifier("b.Cc"));
        assertTrue(validateJavaIdentifier("aAa.b.Cc"));
        assertTrue(validateJavaIdentifier("a.b.Cc"));

        // after the initial character identifiers may use any combination of
        // letters and digits, underscores or dollar signs
        assertTrue(validateJavaIdentifier("a.b.C_c"));
        assertTrue(validateJavaIdentifier("a.b.C$c"));
        assertTrue(validateJavaIdentifier("a.b.C9"));

        assertFalse("cannot start with a dot", validateJavaIdentifier(".C"));
        assertFalse("cannot have two dots following each other",
                validateJavaIdentifier("b..C"));
        assertFalse("cannot start with a number ",
                validateJavaIdentifier("b.9C"));
    }
}

#3


4  

The pattern provided by Renaud works. But, as far as I can tell, it will always backtrack at the end.

雷诺公司提供的模式。但是,就我所知,它最终总是会倒退。

To optimize it, you can essentially swap the first half with the last. Note the dot match that you also need to change.

要优化它,你可以将上半部分和上半部分互换。注意您还需要更改的点匹配。

The following is my version of it that, when compared to the original, runs about twice as fast:

下面是我的版本,与原始版本相比,运行速度大约是原来的两倍:

String ID_PATTERN = "\\p{javaJavaIdentifierStart}\\p{javaJavaIdentifierPart}*";
Pattern FQCN = Pattern.compile(ID_PATTERN + "(\\." + ID_PATTERN + ")*");

I cannot write comments, so I decided to write an answer instead.

我不能写评论,所以我决定写一个答案。

#4


3  

I came (on my own) to a similar answer (as Tomalak's answer), something as M.M.M.N:

我(以我自己的方式)得出了一个类似的答案(Tomalak的答案),像m.m.n:

([a-z][a-z_0-9]*\.)*[A-Z_]($[A-Z_]|[\w_])*

Where,

在那里,

M = ([a-z][a-z_0-9]*\.)*
N = [A-Z_]($[A-Z_]|[\w_])*

However, this regular expression (unlike Tomalak's answer) makes more assumptions:

然而,这个正则表达式(不像Tomalak的答案)提供了更多的假设:

  1. The package name (The M part) will be only in lower case, the first character of M will be always a lower letter, the rest can mix underscore, lower letters and numbers.

    包名(M部分)将只在小写的情况下,M的第一个字符将始终是一个小写字母,其余的可以混合下划线、小写字母和数字。

  2. The Class Name (the N part) will always start with an Upper Case Letter or an underscore, the rest can mix underscore, letters and numbers. Inner Classes will always start with a dollar symbol ($) and must obey the class name rules described previously.

    类名(N部分)总是以大写字母或下划线开头,其余的可以混合下划线、字母和数字。内部类总是以$符号($)开头,并且必须遵守前面描述的类名规则。

Note: the pattern \w is the XSD pattern for letters and digits (it does not includes the underscore symbol (_))

注意:模式\w是字母和数字的XSD模式(它不包括下划线符号(_)))

Hope this help.

希望这个有帮助。

#5


0  

Following expression works perfectly fine for me.

下面的表达式对我来说非常适用。

^[a-z][a-z0-9_]*(\.[a-z0-9_]+)+$

#6


0  

The following class validates that a provided package name is valid:

下面的类验证所提供的包名是否有效:

import java.util.HashSet;

public class ValidationUtils {

    // All Java reserved words that must not be used in a valid package name.
    private static final HashSet reserved;

    static {
        reserved = new HashSet();
        reserved.add("abstract");reserved.add("assert");reserved.add("boolean");
        reserved.add("break");reserved.add("byte");reserved.add("case");
        reserved.add("catch");reserved.add("char");reserved.add("class");
        reserved.add("const");reserved.add("continue");reserved.add("default");
        reserved.add("do");reserved.add("double");reserved.add("else");
        reserved.add("enum");reserved.add("extends");reserved.add("false");
        reserved.add("final");reserved.add("finally");reserved.add("float");
        reserved.add("for");reserved.add("if");reserved.add("goto");
        reserved.add("implements");reserved.add("import");reserved.add("instanceof");
        reserved.add("int");reserved.add("interface");reserved.add("long");
        reserved.add("native");reserved.add("new");reserved.add("null");
        reserved.add("package");reserved.add("private");reserved.add("protected");
        reserved.add("public");reserved.add("return");reserved.add("short");
        reserved.add("static");reserved.add("strictfp");reserved.add("super");
        reserved.add("switch");reserved.add("synchronized");reserved.add("this");
        reserved.add("throw");reserved.add("throws");reserved.add("transient");
        reserved.add("true");reserved.add("try");reserved.add("void");
        reserved.add("volatile");reserved.add("while");
    }

    /**
     * Checks if the string that is provided is a valid Java package name (contains only
     * [a-z,A-Z,_,$], every element is separated by a single '.' , an element can't be one of Java's
     * reserved words.
     *
     * @param name The package name that needs to be validated.
     * @return <b>true</b> if the package name is valid, <b>false</b> if its not valid.
     */
    public static final boolean isValidPackageName(String name) {
        String[] parts=name.split("\\.",-1);
        for (String part:parts){
            System.out.println(part);
            if (reserved.contains(part)) return false;
            if (!validPart(part)) return false;
        }
        return true;
    }

    /**
     * Checks that a part (a word between dots) is a valid part to be used in a Java package name.
     * @param part The part between dots (e.g. *PART*.*PART*.*PART*.*PART*).
     * @return <b>true</b> if the part is valid, <b>false</b> if its not valid.
     */
    private static boolean validPart(String part){
        if (part==null || part.length()<1){
            // Package part is null or empty !
            return false;
        }
        if (Character.isJavaIdentifierStart(part.charAt(0))){
            for (int i = 0; i < part.length(); i++){
                char c = part.charAt(i);
                if (!Character.isJavaIdentifierPart(c)){
                    // Package part contains invalid JavaIdentifier !
                    return false;
                }
            }
        }else{
            // Package part does not begin with a valid JavaIdentifier !
            return false;
        }

        return true;
    }
}

#7


0  

shorter version of a working regexp:

工作regexp的更短版本:

\p{Alnum}[\p{Alnum}._]+\p{Alnum}

#8


-3  

I'll say something like ([\w]+\.)*[\w]+

我会说([\w]+\)*[\w]+

But maybe I can be more specific knowing what you want to do with it ;)

但也许我可以更具体地知道你想用它做什么;