文章目录

一、量词（Quantifier）
二、字符串的 matches 方法底层
三、Matcher 类的其他方法
四、贪婪、勉强、独占

一、量词（Quantifier）

???? 量词用以标记某个字符出现的次数

贪婪（Greedy）	勉强（Reluctant）	独占（Possessive）	含义
????{n}	????{n}?	????{n}+	???? 出现 n 次
????{n,m}	????{n,m}?	????{n,m}+	????出现 n 到 m 次
????{n,}	????{,}?	????{n,}+	????出现至少 n 次
?????	??????	?????+	????出现 0 次或者 1 次
????*	????*?	????*+	????出现任意次
????+	????+?	????++	????至少出现一次

???? 贪婪、勉强、独占的区别很大，后面介绍比较合适

二、字符串的 matches 方法底层

???? String 的 matches 方法底层用到了 Pattern 和 Matcher 类

39、一篇文章弄懂 Java 正则表达式中的量词、贪婪、勉强、独占和 String 的 matches 方法的底层【个人感觉非常值得学习】

public class TestDemo {
    public static void main(String[] args) {
        String input1 = "111庆222";
        String input2 = "111庆";
        String r = "1{3}庆";

        // Pattern 的 matchers 匹配的是【整个】字符串
        // match the entire region against the pattern.
        
        // input1.matches(r): 看一看 input1 字符串是否完全匹配正则表达式 r
        System.out.println(input1.matches(r)); // false
        
        System.out.println(input2.matches(r)); // true
    }
}

三、Matcher 类的其他方法

???? 上一节中了解了 Matcher 类的 matches 方法【如果整个 input（字符串）和正则表达式匹配，则返回 true】，这也是 String 的 matches 方法的底层

通过下面的代码引出 Matcher 类的其他方法：

public class TestDemo {
    public static void main(String[] args) {
        String input0 = "520";
        String input1 = "888";
        String input2 = "111_222_666";

        String r = "\\d{3}";
        System.out.println(input0.matches(r)); // true
        System.out.println(input1.matches(r)); // true

        // (1) 字符串的 matches 方法底层的 Matcher 类的 matches 方法是匹配整个字符串
        // (2) 必须整个字符串和正则模式匹配（必须整个字符串和正则模式匹配）
        // (3) input2 里面包含三个数字, 但不仅仅只是有三个数字, 无法和正则完全匹配
        System.out.println(input2.matches(r)); // false
    }
}

???? 上面代码中的 input2 无法完全匹配正则表达式 r（这是字符串底层 Matcher 类的 matches 方法的结果）
???? matches 方法是对整个字符串进行匹配，必须整个字符串匹配正则表达式
???? Matcher 类的 find 方法可以把字符串的部分序列和正则进行匹配，如果匹配成功，返回 true

39、一篇文章弄懂 Java 正则表达式中的量词、贪婪、勉强、独占和 String 的 matches 方法的底层【个人感觉非常值得学习】

(1) find、start、end、group

public class TestDemo {
    public static void main(String[] args) {
        String input = "111_222_666";

        String r = "\\d{3}";

        System.out.println(input.matches(r)); // false

        // 假如正则表达式 r 语法错误, 会抛异常
        Pattern pattern = Pattern.compile(r);
        Matcher matcher = pattern.matcher(input);
        // 如果调用 matches 方法, 则和 String 的 matches 方法返回的结果一样
        System.out.println(matcher.matches()); // false

        // input 中有子序列满足正则表达式 r 则返回 true
        System.out.println(matcher.find()); // true
    }
}

???? find: 如果从 input（给定字符串）中能够找到与 regex 匹配的子序列，则返回 true
???? 如果能够匹配成功，可以通过 start、end、group 方法获取更多的信息
???? 每次的查找范围会先剔除之前已经查找过的范围

???? start: 返回上一次匹配成功的开始索引

???? end: 返回上一次匹配成功的结束索引

???? group: 返回上一次匹配成功的子序列

public class TestDemo {
    public static void main(String[] args) {
        String input = "111_222_666";
        String r = "\\d{3}";

        Pattern pattern = Pattern.compile(r);
        Matcher matcher = pattern.matcher(input);

        boolean findResult = matcher.find();
        System.out.println(findResult); // true

        if (findResult) {
            System.out.println("匹配的子序列: " + matcher.group());
            System.out.println("起始索引: " + matcher.start());
            System.out.println("结束索引: " + matcher.end());
        }
    }
}

(2) find 细节

39、一篇文章弄懂 Java 正则表达式中的量词、贪婪、勉强、独占和 String 的 matches 方法的底层【个人感觉非常值得学习】
???? 该方法（find）会从给定 input 的一开始（第一个字符）开始匹配
???? 但若该方法先前的调用是成功的，并且 matcher 还没有被重置，则从先前的匹配中还没有被匹配过的字符开始匹配

???? matcher 还没有被重置: matcher 是通过 input（给定字符串）创建的
???? 只要 input 没有改变, matcher 就没有被重置

???? find 方法被调用一次，就会在该 input 中匹配一次（从前往后匹配）

取出所有符合正则表达式的子序列：

public class TestDemo {
    public static void main(String[] args) {
        String input = "520_222_666";
        String r = "\\d{3}";

        Pattern pattern = Pattern.compile(r);
        Matcher matcher = pattern.matcher(input);

        // 当无法匹配到满足正则的子序列时, 结束循环
        while (matcher.find()) {
            System.out.println(matcher.group());
        }
        
        /*
            520
            222
            666
         */
    }
}

(3) 封装：查找字符串中匹配正则的子串

    /**
     * 返回给定字符串中所有成功匹配正则表达式的子串
     *
     * @param input 给定字符串
     * @param regex 正则表达式
     * @return 成功匹配正则表达式的子串 List
     */
    public static List<String> okMatchRegexSubstrList(String input, String regex, int flags) {
        if (input == null || regex == null) return null;

        List<String> subStrings = new ArrayList<>();

        Pattern p = Pattern.compile(regex, flags);
        Matcher m = p.matcher(input);

        while (m.find()) {
            subStrings.add(m.group());
        }

        return subStrings;
    }
}

39、一篇文章弄懂 Java 正则表达式中的量词、贪婪、勉强、独占和 String 的 matches 方法的底层【个人感觉非常值得学习】

四、贪婪、勉强、独占

???? 贪婪：
✏️ ① 先 “吞掉” 整个 input 进行匹配
✏️ ② 若匹配失败，则吐出最后一个字符，然后再次尝试匹配
✏️ ③ 重复该操作

public class TestDemo {
    public static void main(String[] args) {
        String regex = ".*good";
        String input = "庆の6good8浩のgoodMorning";
        List<String> ret = okMatchRegexSubstrList(input, regex, 0);
        // 贪婪: [庆の6good8浩のgood]
        printList("贪婪", ret, ", ");
    }

    /**
     * 打印字符串 List
     */
    public static void printList(String desc, List<String> list, String divider) {
        int size = list.size();
        if (size == 0) {
            System.out.println("空数组");
            return;
        }

        System.out.print(desc + ": [");
        for (int i = 0; i < list.size(); i++) {
            if (i != list.size() - 1) {
                System.out.print(list.get(i) + divider);
            }
            System.out.print(list.get(i));
        }
        System.out.print("]");
    }

    public static List<String> okMatchRegexSubstrList(String input, String regex, int flags) {
        if (input == null || regex == null) return null;

        List<String> subStrings = new ArrayList<>();

        Pattern p = Pattern.compile(regex, flags);
        Matcher m = p.matcher(input);

        while (m.find()) {
            subStrings.add(m.group());
        }

        m.reset();

        return subStrings;
    }
}

???? 勉强：
✏️ ① 先 “吞掉” input 的第一个字符进行匹配
✏️ ② 若匹配失败，则再吞掉下一个字符，然后再次尝试匹配
✏️ ③ 重复该操作

public class TestDemo {
    public static void main(String[] args) {
        String regex = ".*?good";
        String input = "庆の6good8浩のgoodMorning";
        List<String> ret = okMatchRegexSubstrList(input, regex, 0);
        // 勉强: [庆の6good, 庆の6good8浩のgood]
        printList("勉强", ret, ", ");
    }

???? 独占：
✏️ “吞掉” 整个 input 进行唯一的一次匹配（类似 equals 方法）

public class TestDemo {
    public static void main(String[] args) {
        String regex = ".*+good";
        String input = "庆の6good8浩のgoodMorning";
        List<String> ret = okMatchRegexSubstrList(input, regex, 0); 
        // 空数组
        printList("独占", ret, ", ");
    } 
}

结束，如有错误！请赐教

秒客网

39、一篇文章弄懂 Java 正则表达式中的量词、贪婪、勉强、独占和 String 的 matches 方法的底层【个人感觉非常值得学习】

文章目录

一、量词（Quantifier）

二、字符串的 matches 方法底层

三、Matcher 类的其他方法

(1) find、start、end、group

(2) find 细节

(3) 封装：查找字符串中匹配正则的子串

四、贪婪、勉强、独占

相关文章