
时间:2023-01-07 14:22:16




  • Pattern 类

  • Matcher 类

  • PatternSyntaxException 类

    PatternSyntaxException 是一个非强制异常类,它表示一个正则表达式模式中的语法错误。


Pattern 对象是一个正则表达式的编译表示。Pattern 类没有公共构造方法。要创建一个 Pattern 对象,你必须首先调用其公共静态编译方法,它返回一个 Pattern 对象。该方法接受一个正则表达式作为它的第一个参数。

Pattern 实现类java.io中的Serializable虚拟接口。


  • compile()方法:


    /** * Compiles the given regular expression into a pattern. * * @param regex * The expression to be compiled * @return the given regular expression compiled into a pattern * @throws PatternSyntaxException * If the expression's syntax is invalid */
    public static Pattern compile(String regex) {
        return new Pattern(regex, 0);
    public static Pattern compile(String regex, int flags) {
        return new Pattern(regex, flags);

    其中参数flags是表明匹配模式,下面是取值说明,这些都是 Pattern 类的静态常量(final类型)

     * @param  flags
     *         Match flags, a bit mask that may include
     *         {@link #CASE_INSENSITIVE}, {@link #MULTILINE}, {@link #DOTALL},
     *         {@link #UNICODE_CASE}, {@link #CANON_EQ}, {@link #UNIX_LINES},
     *         {@link #LITERAL}, {@link #UNICODE_CHARACTER_CLASS}
     *         and {@link #COMMENTS}
  • toString()方法:返回模板的字符串形式

    /** * <p>Returns the string representation of this pattern. This * is the regular expression from which this pattern was * compiled.</p> * * @return The string representation of this pattern * @since 1.5 */
    public String toString() {
        return pattern;
  • matcher():用于获得Matcher对象的一个方法,该方法接收一个被判定的序列作为参数。其中compiled是一个boolean类型成员变量,初始值为false,以记录该pattern是否被编译。

    /** * Creates a matcher that will match the given input against this pattern. * * @param input * The character sequence to be matched * * @return A new matcher for this pattern */
    public Matcher matcher(CharSequence input) {
        if (!compiled) {
            synchronized(this) {
                if (!compiled)
        Matcher m = new Matcher(this, input);
        return m;
  • matches():匹配搜索,返回boolean值,实际上 String 类中的 matches 方法正是调用的此方法。

    public static boolean matches(String regex, CharSequence input) {
        Pattern p = Pattern.compile(regex);
        Matcher m = p.matcher(input);
        return m.matches();


该类实现了 MatchResult 接口。

Matcher类没有提供什么静态方法,通过调用 Pattern 对象的 matcher 方法来获得一个 Matcher 对象,如:

Pattern pattern = Pattern.compile(“regExp”);
Matcher matcher = pattern.matcher(“string”);



友好的而并非public,所以,并没有公共默认的构造方法,需要 Pattern 的 matcher() 方法。在第二个方法中我们可以看到匹配完成之后,此对象又将返回初始化状态。

/** * No default constructor. */
Matcher() {

/** * All matchers have the state used by Pattern during a match. */
Matcher(Pattern parent, CharSequence text) {
    this.parentPattern = parent;
    this.text = text;

    // Allocate state storage
    int parentGroupCount = Math.max(parent.capturingGroupCount, 10);
    groups = new int[parentGroupCount * 2];
    locals = new int[parent.localCount];

    // Put fields into initial states

reset ( )


 * Resets this matcher.
 * <p> Resetting a matcher discards all of its explicit state information
 * and sets its append position to zero. The matcher's region is set to the
 * default region, which is its entire character sequence. The anchoring
 * and transparency of this matcher's region boundaries are unaffected.
 * @return  This matcher
public Matcher reset() {
    first = -1;
    last = 0;
    oldLast = -1;
    for(int i=0; i<groups.length; i++)
        groups[i] = -1;
    for(int i=0; i<locals.length; i++)
        locals[i] = -1;
    lastAppendPosition = 0;
    from = 0;
    to = getTextLength();
 return this;

 * Resets this matcher with a new input sequence.
 * <p> Resetting a matcher discards all of its explicit state information
 * and sets its append position to zero.  The matcher's region is set to
 * the default region, which is its entire character sequence.  The
 * anchoring and transparency of this matcher's region boundaries are
 * unaffected.
 * @param  input
 *         The new input character sequence
 * @return  This matcher
public Matcher reset(CharSequence input) {
    text = input;
 return reset();



  • boolean matches()

  • boolean lookingAt()

  • boolean find()

  • boolean find(int start)


find ( )



     * Attempts to find the next subsequence of the input sequence that matches
     * the pattern.
     * <p> This method starts at the beginning of this matcher's region, or, if
     * a previous invocation of the method was successful and the matcher has
     * not since been reset, at the first character not matched by the previous
     * match.
     * <p> If the match succeeds then more information can be obtained via the
     * <tt>start</tt>, <tt>end</tt>, and <tt>group</tt> methods.  </p>
     * @return  <tt>true</tt> if, and only if, a subsequence of the input
     *          sequence matches this matcher's pattern
    public boolean find() {
        int nextSearchIndex = last;
        if (nextSearchIndex == first)

        // If next search starts before region, start it at region
        if (nextSearchIndex < from)
            nextSearchIndex = from;

        // If next search starts beyond region then it fails
        if (nextSearchIndex > to) {
            for (int i = 0; i < groups.length; i++)
                groups[i] = -1;
            return false;
        return search(nextSearchIndex);

     * Resets this matcher and then attempts to find the next subsequence of
     * the input sequence that matches the pattern, starting at the specified
     * index.
     * <p> If the match succeeds then more information can be obtained via the
     * <tt>start</tt>, <tt>end</tt>, and <tt>group</tt> methods, and subsequent
     * invocations of the {@link #find()} method will start at the first
     * character not matched by this match.  </p>
     * @param start the index to start searching for a match
     * @throws  IndexOutOfBoundsException
     *          If start is less than zero or if start is greater than the
     *          length of the input sequence.
     * @return  <tt>true</tt> if, and only if, a subsequence of the input
     *          sequence starting at the given index matches this matcher's
     *          pattern
    public boolean find(int start) {
        int limit = getTextLength();
        if ((start < 0) || (start > limit))
            throw new IndexOutOfBoundsException("Illegal start index");
        return search(start);


package com.general;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test {

    public static void main(String[] args) {

        Pattern pattern = Pattern.compile("\\d{3}");

        String s="123 456 789 4d56";

        Matcher matcher = pattern.matcher(s);
        int count=0;

        while(matcher.find()) {
             System.out.println("Match number: "+count);
             System.out.print("start(): "+matcher.start());
             System.out.println(", end(): "+matcher.end());


Match number: 1
start(): 0. end(): 3 123 Match number: 2 start(): 4. end(): 7 456 Match number: 3 start(): 8. end(): 11 789

group ( )

group是用括号括起来的,能被后面的表达式调用的正则表达式。group 0 表示整个表达式,group 1表示第一个(从左往右数)被括起来的group,以此类推。所以


里面有四个group:group 0是ABCDE, group 1是BCD,group 2是BC, group 3是D。


public int groupCount( )返回matcher对象中的group的数目。不包括group 0。

public String group( ) 返回上次匹配操作(比方说find( ))的group 0(整个匹配)

public String group(int i)返回上次匹配操作的某个group。如果匹配成功,但是没能找到group,则返回null。

public int start(int group)返回上次匹配所找到的group的开始位置(参数即为group索引)。

public int end(int group)返回上次匹配所找到的group的结束位置,最后一个字符的下标加一(参数即为group索引)。


package com.general;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test {

    public static final String poem =
            "Twas brillig, and the slithy toves/n" +
            "Did gyre and gimble in the wabe./n" +
            "All mimsy were the borogoves,/n" +
            "And the mome raths outgrabe./n/n" +
            "Beware the Jabberwock, my son,/n" +
            "The jaws that bite, the claws that catch./n" +
            "Beware the Jubjub bird, and shun/n" +
            "The frumious Bandersnatch.";

    public static void main(String[] args) {
        Matcher m = Pattern.compile("(\\S+)\\s+((\\S+)\\s+(\\S+))").matcher(poem);
        while(m.find()) {
            for (int j = 0; j <= m.groupCount(); j++) {
                System.out.println("group "+j+" [" + + "]");

                if(j==1) {
                    System.out.println("group1' start: "+m.start(1)+" , group1' end: "+m.end(1));


group 0 [Twas brillig, and]
group 1 [Twas]
group1' start: 0 , group1' end: 4
group 2 [brillig, and]
group 3 [brillig,]
group 4 [and]

group 0 [the slithy toves/nDid]
group 1 [the]
group1' start: 18 , group1' end: 21
group 2 [slithy toves/nDid]
group 3 [slithy]
group 4 [toves/nDid]

group 0 [gyre and gimble]
group 1 [gyre]
group1' start: 40 , group1' end: 44
group 2 [and gimble]
group 3 [and]
group 4 [gimble]

group 0 [in the wabe./nAll]
group 1 [in]
group1' start: 56 , group1' end: 58
group 2 [the wabe./nAll]
group 3 [the]
group 4 [wabe./nAll]

group 0 [mimsy were the]
group 1 [mimsy]
group1' start: 74 , group1' end: 79
group 2 [were the]
group 3 [were]
group 4 [the]

group 0 [borogoves,/nAnd the mome]
group 1 [borogoves,/nAnd]
group1' start: 89 , group1' end: 104
group 2 [the mome]
group 3 [the]
group 4 [mome]

group 0 [raths outgrabe./n/nBeware the]
group 1 [raths]
group1' start: 114 , group1' end: 119
group 2 [outgrabe./n/nBeware the]
group 3 [outgrabe./n/nBeware]
group 4 [the]

group 0 [Jabberwock, my son,/nThe]
group 1 [Jabberwock,]
group1' start: 144 , group1' end: 155
group 2 [my son,/nThe]
group 3 [my]
group 4 [son,/nThe]

group 0 [jaws that bite,]
group 1 [jaws]
group1' start: 169 , group1' end: 173
group 2 [that bite,]
group 3 [that]
group 4 [bite,]

group 0 [the claws that]
group 1 [the]
group1' start: 185 , group1' end: 188
group 2 [claws that]
group 3 [claws]
group 4 [that]

group 0 [catch./nBeware the Jubjub]
group 1 [catch./nBeware]
group1' start: 200 , group1' end: 214
group 2 [the Jubjub]
group 3 [the]
group 4 [Jubjub]

group 0 [bird, and shun/nThe]
group 1 [bird,]
group1' start: 226 , group1' end: 231
group 2 [and shun/nThe]
group 3 [and]
group 4 [shun/nThe]

start( ) and end( )

如果匹配成功,start( )会返回此次匹配的开始位置,end( )会返回此次匹配的结束位置,即最后一个字符的下标加一。如果之前的匹配不成功(或者没匹配),那么无论是调用start( )还是end( ),都会引发一个IllegalStateException。下面这段程序还演示了matches( )和lookingAt( ):

package com.general;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test {

    public static void main(String[] args) {

         String[] input = new String[] {
                 "Java has regular expressions in 1.4",
                 "regular expressions now expressing in Java",
                 "Java represses oracular expressions"
         Pattern p1 = Pattern.compile("re\\w*");
         Pattern p2 = Pattern.compile("Java.*");
         for(int i = 0; i < input.length; i++) {
             System.out.println("input[" + i + "]: " + input[i]);

             Matcher m1 = p1.matcher(input[i]);
             Matcher m2 = p2.matcher(input[i]);

                 System.out.println("m1.find() '" + + "' start = "+ m1.start() + " end = " + m1.end());

                 System.out.println("m2.find() '" + + "' start = "+ m2.start() + " end = " + m2.end());

                 System.out.println("m1.lookingAt() start = " + m1.start() + " end = " + m1.end());
                 System.out.println("m2.lookingAt() start = " + m2.start() + " end = " + m2.end());

                 System.out.println("m1.matches() start = " + m1.start() + " end = " + m1.end());
                 System.out.println("m2.matches() start = " + m2.start() + " end = " + m2.end());



input[0]: Java has regular expressions in 1.4
m1.find() 'regular' start = 9 end = 16 m1.find() 'ressions' start = 20 end = 28 m2.find() 'Java has regular expressions in 1.4' start = 0 end = 35 m2.lookingAt() start = 0 end = 35 m2.matches() start = 0 end = 35 input[1]: regular expressions now expressing in Java m1.find() 'regular' start = 0 end = 7 m1.find() 'ressions' start = 11 end = 19 m1.find() 'ressing' start = 27 end = 34 m2.find() 'Java' start = 38 end = 42 m1.lookingAt() start = 0 end = 7 input[2]: Java represses oracular expressions m1.find() 'represses' start = 5 end = 14 m1.find() 'ressions' start = 27 end = 35 m2.find() 'Java represses oracular expressions' start = 0 end = 35 m2.lookingAt() start = 0 end = 35 m2.matches() start = 0 end = 35

split( )


String[] split(CharSequence charseq)
String[] split(CharSequence charseq, int limit)



package com.general;

import java.util.Arrays;
import java.util.regex.Pattern;

public class Test {

    public static void main(String[] args) {
        String input = "This!!unusual use!!of exclamation!!points";

            /*Only do the first three*/
            System.out.println(Arrays.asList(Pattern.compile("!!").split(input, 3)));

            /*String's function split*/
            System.out.println(Arrays.asList("Aha! String has a split() built in!".split(" ")));


[This, unusual use, of exclamation, points]
[This, unusual use, of exclamation!!points]
[Aha!, String, has, a, split(), built, in!]


compile( )方法还有另一种重载,它可以传入一个控制正则表达式的匹配行为的参数.

    /* @param  flags
     *         Match flags, a bit mask that may include
     *         {@link #CASE_INSENSITIVE}, {@link #MULTILINE}, {@link #DOTALL},
     *         {@link #UNICODE_CASE}, {@link #CANON_EQ}, {@link #UNIX_LINES},
     *         {@link #LITERAL}, {@link #UNICODE_CHARACTER_CLASS}
     *         and {@link #COMMENTS}

public static Pattern compile(String regex, int flags) { return new Pattern(regex, flags); }



编译标志 描述
CANON_EQ 当且仅当两个字符的”正规分解(canonical decomposition)”都完全相同的情况下,才认定匹配。比如用了这个标志之后,表达式”a\u030A”会匹配”?”。默认情况下,不考虑”规范相等性(canonical equivalence)”。
CASE_INSENSITIVE 默认情况下,大小写不明感的匹配只适用于US-ASCII字符集。这个标志能让表达式忽略大小写进行匹配。要想对Unicode字符进行大小不明感的匹配,只要将UNICODE_CASE与这个标志合起来就行了。
COMMENTS 在这种模式下,匹配时会忽略(正则表达式里的)空格字符(注:不是指表达式里的”\s”,而是指表达式里的空格,tab,回车之类)。注释从#开始,一直到这行结束。可以通过嵌入式的标志来启用Unix行模式。
DOTALL 在这种模式下,表达式’.’可以匹配任意字符,包括表示一行的结束符。默认情况下,表达式’.’不匹配行的结束符。
MULTILINE 在这种模式下,’ ^ ‘和’ '分别匹配一行的开始和结束。此外,' ^ '仍然匹配字符串的开始,' ’也匹配字符串的结束。默认情况下,这两个表达式仅仅匹配字符串的开始和结束。
UNICODE_CASE 在这个模式下,如果你还启用了CASE_INSENSITIVE标志,那么它会对Unicode字符进行大小写不明感的匹配。默认情况下,大小写不明感的匹配只适用于US-ASCII字符集。
UNIX_LINES 在这个模式下,只有 ‘\n’ 才被认作一行的中止,并且与 ’ . ‘,’ ^ ‘,以及’ $ ‘进行匹配。

可以用”OR” ( ’ | ’ )运算符把这些标志合使用。



import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class PatternFlags {

    public static void main(String[] args) {

        Pattern p = Pattern.compile("^java", Pattern.CASE_INSENSITIVE | Pattern.MULTILINE);
        Matcher m = p.matcher("java has regex\nJava has regex\n" + "JAVA has pretty good regular expressions\n"
                + "Regular expressions are in Java");
        while (m.find())






  • replaceAll(String replacement):将目标字符串里与既有模式相匹配的子串全部替换为指定的字符串。

  • replaceFirst(String replacement):将目标字符串里第一个与既有模式相匹配的子串替换为指定的字符串。

  • appendReplacement(StringBuffer sb, String replacement):将当前匹配子串替换为指定字符串,并且将替换后的子串以及其之前到上次匹配子串之后的字符串段添加到一个StringBuffer对象里。

  • appendTail(StringBuffer sb):将最后一次匹配工作后剩余的字符串添加到一个StringBuffer对象里。



import java.util.regex.Matcher;
import java.util.regex.Pattern;


public class ReplaceTest {

    /* 对replaceAll方法进行测试 */
    public static void replaceAllTest() {
        Pattern pattern = Pattern.compile("cat");
        Matcher matcher = pattern.matcher("cat cat cat cat");
        boolean flag=matcher.find();
        if(flag) {

    /* 对replaceFirst方法进行测试 */
    public static void replaceFirstTest() {
        Pattern pattern = Pattern.compile("cat");
        Matcher matcher = pattern.matcher("cat cat cat cat");
        boolean flag=matcher.find();
        if(flag) {

    /* 对appendReplacement(StringBuffer sb, String replacement)和appendTail(StringBuffer sb)方法进行测试 */
    public static void appendTest() {
        Pattern pattern = Pattern.compile("Kelvin");
        Matcher matcher = pattern.matcher("Kelvin Li and Kelvin Chan are both working in Kelvin Chen's KelvinSoftShop company");
        StringBuffer sb = new StringBuffer();
        int i=0;
        boolean result = matcher.find();
        while(result) {
            matcher.appendReplacement(sb, "Sakila");
            result = matcher.find();
        System.out.println("调用m.appendTail(sb)后sb的最终内容是:"+ sb.toString());

    public static void main(String[] args) {
