Java基础--正则表达式

时间:2022-01-05 11:50:58
1.正则表达式简介​
用途:字符串匹配   字符串查找   字符串替换​
例如: IP地址是否正确  从网页中揪出email地址   从网页中揪出来链接等​

类:

java.lang.String    

java.util.regex.Pattern (正则表达式的一种模式)  

java.util.regex.Matcher​​(匹配器)

public class Test {
    public static void main(String args[]) {
        //简单认识正则表达式的概念​
        System.out.println("abc".matches("..."));
    }
}

输出是:true         matcher("...")表示是否匹配3个字符,一个点代表一个字符​
public class Test {
    public static void main(String args[]) {
        p("abc".matches("..."));
        p("a4865a".replaceAll("\\d", "-"));
        Pattern p = Pattern.compile("[a-z]{3}");
        Matcher m = p.matcher("fgh");
        p(m.matches());//返回匹配还是不匹配
        p("fgh".matches("[a-z]{3}"));
        //和以上3句话的结果一样,但是上述方法提供了pattern和matcher更多的方法
    }

    public static void p(Object o) {
        System.out.println(o);
    }
}

结果:true   a----a  true​​
2.Meta Characters​

"."代表字符     

"*"代表0个或多个    

"+"一个或多个    

"?"一个或0个​

X{n}出现n次,X{n,}不少于n次,X{n,m}n-m次​
public class Test {
    public static void main(String args[]) {
        p("a".matches("."));
        p("aa".matches("aa"));
        p("aaaa".matches("a*"));
        p("aaaa".matches("a+"));
        p("".matches("a*"));
        p("aaaa".matches("a?"));
        p("".matches("a?"));
        p("a".matches("a?"));
        p("214523145234532".matches("\\d{3,100}"));
        p("192.168.0.aaa".matches("\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}"));// \\.代表"。"这个字符 \\d表示一个数字
        p("192".matches("[0-2][0-9][0-9]"));
    }

    public static void p(Object o) {
        System.out.println(o);
    }
}
结果:true  true true true true  false true true true false  true
其他的MetaCharacters​
\b  数字   \D不是数字
\s代表空白字符 [ \t\n\x0B\f\r](空格 tab键 换行 \x0B是垂直tab \r回车)​     \S​
\w构成单词的字符[a-zA-Z_0-9]     \W​
public class Test {
    public static void main(String args[]) {
        p(" \n\r\t".matches("\\s{4}"));
        p(" ".matches("\\S"));
        p("a_8".matches("\\w{3}"));
        p("abc888&^%".matches("[a-z]{1,3}\\d+[&^#%]+"));
        p("\\".matches("\\\\"));// \\\\表示2个
    }

    public static void p(Object o) {
        System.out.println(o);
    }
}
结果:true false  true true true​
POSIX  class​
\p{Lower}       A lower-case alphabetic character: [a-z]
\p{Upper}       An upper-case alphabetic character:[A-Z]
\p{ASCII}         All ASCII:[\x00-\x7F]
\p{Alpha}       An alphabetic character:[\p{Lower}\p{Upper}]
\p{Digit}        A decimal digit: [0-9]
\p{Alnum}     An alphanumeric character:[\p{Alpha}\p{Digit}]
\p{Punct}     Punctuation: One of !"#$%&'()*+,-./:;<=>?@[\]^_`{}~
\p{Graph}     A visible character: [\p{Alnum}\p{Punct}]
\p{Print}       A printable character: [\p{Graph}\x20]
\p{Blank}      A space or a tab: [ \t]
\p{Cntrl}       A control character: [\x00-\x1F\x7F]
\p{XDigit}     A hexadecimal digit: [0-9a-fA-F]
\p{Space}     A whitespace character: [ \t\n\x0B\f\r]
3.范围​
public class Test {
    public static void main(String args[]) {
        p("a".matches("[abc]"));// []中表示取其中一个
        p("a".matches("[^abc]")); //除了abc的
        p("A".matches("[a-zA-Z]"));
        p("A".matches("[a-z][A - Z]"));
        p("A".matches("[a-z[A-Z]]"));//此3种都表示大小写字母
        p("R".matches("[A-Z&&[RFG]]"));// A-Z中的RFG
    }

    public static void p(Object o) {
        System.out.println(o);
    }
}
结果:
true  false true  true true true​
4.边界处理​
^The beginning of a line在中括号外面表示以某个字符开头,另:在中括号里表示"不是"
$ The end of a line   以某个字符结尾
\b A word boundary   单词边界  换行,空格等
\B A non-word boundary
\A The beginning of the input  输入的开始
\G The end of the previous match  上一次匹配结束的位置
\Z The end of the input but for the final terminator, if any
\z The end of the input​​
举例:
public class Test {
    public static void main(String args[]) {
        p("hello sir".matches("^h.*"));
        p("hello sir".matches(".*ir$"));
        p("hello sir".matches("^h[a-z]{1,3}o\\b.*"));
        p("hellosir".matches("^h[a-z]{1,3}o\\b.*"));
        p(" \n".matches("^[\\s&&[^\\n]]*\\n$")); //whilte lines​
    }

    public static void p(Object o) {
        System.out.println(o);
    }
}

结果:true true true false true​
5._matches_find_lookingAt​_start_end
email:
[\w[.-]]+@[\w[.-]]+\.[\w]+
举例:
public class Test {
    public static void main(String args[]) {
        Pattern p = Pattern.compile("\\d{3,5}");
        String s = "123-34345-234-00";
        Matcher m = p.matcher(s);
        p(m.matches());//matchers匹配s时,节点到
        //第四个是会报flase,此时节点便留在第四位,运行4个find,最后一个会报flase
        m.reset(); //将matchers的节点位置重新定义到第一位,这样运行4个find全是true
        p(m.find()); //找一个和p匹配的子串
        p(m.start() + "-" + m.end());  //打印find的位置
        p(m.find());
        p(m.start() + "-" + m.end());
        p(m.find());
        p(m.start() + "-" + m.end());
        p(m.find());
        //p(m.start() + "-" + m.end());
        p(m.lookingAt());//每次都是从头看
        p(m.lookingAt());
        p(m.lookingAt());
        p(m.lookingAt());
    }

    public static void p(Object o) {
        System.out.println(o);
    }
}

结果:false true 0-3 true 4-9 true 10-13 false true true true true​
start和end只能打印true的结果位置,如将注掉的那句执行就会报错​
6.字符串的替换
public class Test {
    public static void main(String args[]) {
        Pattern p = Pattern.compile("java", Pattern.CASE_INSENSITIVE);//CASE_INSENSITIVE大小写不敏感
        Matcher m = p.matcher("java Java JAVa JaVa IloveJAVA you hateJava afasdfasdf");
        StringBuffer buf = new StringBuffer();
        int i = 0;
        while (m.find()) {
            i++;
            if (i % 2 == 0) {
                m.appendReplacement(buf, "java");//用java替换到当前找到的组的字符
            } else {
                m.appendReplacement(buf, "JAVA");
            }
        }
        m.appendTail(buf);//把没有匹配的字符串添加上
        p(buf);
    }

    public static void p(Object o) {
        System.out.println(o);
    }
}

结果:JAVA java JAVA java IloveJAVA you hatejava afasdfasdf ​
7.分组​

分组通过小括号完成,有个左小括号,第一个左小括号是一组,第二个是第二组​

public class Test {
    public static void main(String args[]) {
        Pattern p = Pattern.compile("(\\d{3,5})([a-z]{2})");
        String s = "123aa-34345bb-234cc-00";
        Matcher m = p.matcher(s);
        while(m.find()) {
            p(m.group(1));
        }
    }
}

结果:123  34345  234​ 如果是group2,结果是:aa  bb  cc​​
8.抓取网页中的Email地址​
public class EmainSpider {
    public static void main(String[] args) {
        try {
            BufferedReader br = new BufferedReader(new FileReader("D:\\email.html"));
            String line = "";
            while ((line = br.readLine()) != null) {
                parse(line);
            }
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    private static void parse(String line) {
        Pattern p = Pattern.compile("[\\w[.-]]+@[\\w[.-]]+\\.[\\w]+");
        Matcher m = p.matcher(line);
        while (m.find()) {
            System.out.println(m.group());
        }
    }
}
(\\w+)(\\.|_)?(\\w*)@(\\w+)(\\.(\\w+))+
9.代码统计小程序​
public class CodeCounter {
    static long normalLines = 0;
    static long commentLines = 0;
    static long whiteLines = 0;

    public static void main(String[] args) {
        File f = new File("E:\\JAVA");
        File[] codeFiles = f.listFiles();
        for (File child : codeFiles) {
            if (child.getName().matches(".*\\.java$")) {
                parse(child);
            }
        }
        System.out.println("normalLines:" + normalLines);
        System.out.println("commentLines:" + commentLines);
        System.out.println("whiteLines:" + whiteLines);
    }

    private static void parse(File f) {
        BufferedReader br = null;
        boolean comment = false;
        try {
            br = new BufferedReader(new FileReader(f));
            String line = "";
            while ((line = br.readLine()) != null) {
                line = line.trim();
                if (line.matches("^[\\s&&[^\\n]]*$")) {
                    whiteLines++;
                } else if (line.startsWith("")) {
                    commentLines++;
                    comment = true;
                } else if (line.startsWith("")) {
                    commentLines++;
                } else if (true == comment) {//区别在于容易查错,当误把==号写作=号时,if ($i=true)不会报错,而且无论$i为何值都会成立,但是写成if (true=$i) 会报错,因为常量无法被赋值。在涉及==的逻辑表达式中,常量写在前面可以有效利用编译器查错机制避免类似 if ($i == true)这样的错误。至于实际功能上,没有任何区别commentLines ++;
                    if (line.endsWith("*/")) {
                        comment = false;
                    }
                } else if (line.startsWith("//")) {
                    commentLines++;
                } else {
                    normalLines++;
                }
            }
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            if (br != null) {
                try {
                    br.close();
                    br = null;
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
    }
}
结果是:
normalLines:384
commentLines:2
whiteLines:37