Java字符串之正则表达式

时间:2022-01-14 14:42:00

正则表达式


基础

Java中“\\”代表插入正则表达式的反斜杠,后面的字符有特殊意义(例:表示一位数字:”\\d”)
换行:”\n”
表示一个或多个之前的表达式:”+”,正号:”\\+”

应用正则表达式,最简单利用String类的内建的功能,有如下有关正则的方法:
String.matches(String regex):是否匹配正则表达式
String.split(String regex):通过正则表达式去切割字符串
String.replaceFirst(String regex,String):只替换第一个匹配子串
String.replaceFirst(String regex,String):替换所有
例:

public class IntegerMath {
public static void main(String[] args) {
System.out.println("12345".matches("\\d+"));
System.out.println("12345".matches("(-|\\+)?\\d+"));
System.out.println("12345".matches("-\\d+"));
System.out.println(Arrays.toString("A1B2C3D4E5F".split("\\d")));
System.out.println("12345".replaceFirst("\\d", "A"));
System.out.println("12345".replaceAll("\\d", "A"));
}
}

Output:

true
true
false
[A, B, C, D, E, F]
A2345
AAAAA

                            字符类
· 任意字符 [abc] 包含a、b、c的任何字符(a|b|c)
[^abc] 除了a、b、c任意字符 [A-Za-z]
[abc[hij]] 任意a、b、c、h、i、j [a-z&&[hij]] 任意h、i、j
\s 空白符 \S 非空白符
\d 数字[0-9] \D 非数字[0-9]
\w 词字符[a-zA-Z0-9] \W 非词字符

量词::描述了一个模式吸收输入文本的方式

贪婪型:为所有可能的模式发现尽可能多的匹配:X
勉强型:用问号指定,匹配满足模式所需的最少字符数:X?
占有型:用+指定,防止正则表达式失控:X+


Pattern和Matcher

Pattern类可以创建功能更强大的正则表达式对象:
Pattern p=Pattern.compile(String regex)生成Pattern对象
Matcher m=p.matcher(String s)生成一个Matcher对象
m.find()查找多个匹配,像迭代器一样向前遍历字符串(boolean型)
组(Groups):组是用括号划分的正则表达式,组号为0表示整个表达式,组号1表示第一对括号括起的组,以此类推。

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Groups {
static public final String POEM=
"Twas brilling, and the slithy toves\n"+
"Did gyre and gimble in the wabe.\n"+
"All mimsy were the borogoves,\n";
public static void main(String[] args) {
Matcher m=
Pattern.compile("(?m)(\\S+)\\s+((\\S+)\\s+(\\S+))$")
.matcher(POEM);
while(m.find()){
for(int j=0;j<=m.groupCount();j++)
System.out.print("["+m.group(j)+"]");
System.out.println();
}
}

}
outPut:
[the slithy toves][the][slithy toves][slithy][toves]
[in the wabe.][in][the wabe.][the][wabe.]
[were the borogoves,][were][the borogoves,][the][borogoves,]

正常情况下,$与整个输入序列的末端进行匹配,我们需要显示的告知正则表达式注意输入序列中的换行符,由序列开头的模式标记“(?m)”完成


start()&end()

public class StartEnd2 {
public static String s="As long as there is injustice,whenever a\n"+
"Targathian baby cries out,wherever a distress\n"+
"signal sounds among the stars ...We'll be there.\n"+
"This fine ship, and this fine crew ...\n"+
"Never give up! Never surrender!";
public static void main(String[]args){
Pattern p1=Pattern.compile("\\w*ere\\w*");
Matcher m=p1.matcher(s);
while(m.find()){
System.out.println(m.group()+" start="+m.start()+" end="+m.end());
}
}
}
Output
there start=11 end=16
wherever start=67 end=75
there start=129 end=134

Pattern标记

Pattern Pattern.compile(String regex,int flag)

Pattern.CASE_INSENSITIVE(?i):这个标记允许模式匹配不考虑大小写
Pattern.COMMENTS(?x):空格符被忽略掉,并以#开始直到行末的注释也被忽略
Pattern.MULITILINE(?m):多行模式下,表达式^和 匹配输入字符串的结尾
Pattern.DOTALL(?s):表达式”.”匹配所有字符,包括行终结符(默认不匹配)。

public class ReFlags {

public static void main(String[] args) {
Pattern p=Pattern.compile("^java",
Pattern.CASE_INSENSITIVE|Pattern.MULTILINE);
Matcher m=p.matcher(
"java has regex\nJava has regex\n"+
"JAVA has pretty good regular expressions\n"+
"Regular expressions are in Java\n"+"JAva");
while(m.find())
System.out.println(m.group());
}
}
Output:
java
Java
JAVA
JAva

split()

将输入字符串断开成字符串对象数组

public class SplitDemo {
public static void main(String[]args){
String input=
"This!!unusual use!!of exclamation!!points";
System.out.println(Arrays.toString(
Pattern.compile("!!").split(input)));
System.out.println(Arrays.toString(
Pattern.compile("!!").split(input,3)));
}
}
Output:
[This, unusual use, of exclamation, points]
[This, unusual use, of exclamation!!points]


----------

替换操作

replaceFirst(String replacement):以参数字符串replacement替换掉第一个匹配成功的部分
replaceAll(String replacement):以参数字符串替换掉所有匹配成功的部分
appendReplacement(StringBuffer sbuf,String replacement):执行渐进式的替换。它允许你调用其他方法来生成或处理replacement,使你能够以编程的方式将目标分割成组,从而具备更强大的替换功能
appendTail(StringBuffer sbuf):在执行了一次或多次appendReplacement(),调用此方法将输入字符串余下的部分复制到sbuf。(未匹配的)


public class TheReplacements {
public static void main(String[]args){
String s="/*!Here's a block of text to use as input to\n"+
"the regular expression matcher. Note that we'll\n"+
"first extract the block of text by looking for\n"+
"the special delimiters, then process the\n"+
"extracted block. !*/";
Matcher mInput=
Pattern.compile("/\\*!(.*)!\\*/",Pattern.DOTALL).matcher(s);
if(mInput.find())
s=mInput.group(1);
s=s.replaceAll(" {2,}", " ");//将两个或两个以上的空格缩为一个
s=s.replaceAll("(?m)^ +", "");//去除开头的空格
System.out.println(s);
s=s.replaceFirst("[aeiou]", "(VOWEL1)");
System.out.println(s);
StringBuffer sbuf=new StringBuffer();
Pattern p=Pattern.compile("[aeiou]");
Matcher m=p.matcher(s);
while(m.find())
m.appendReplacement(sbuf, m.group().toUpperCase());
m.appendTail(sbuf);//复制最后没有匹配的"ck"
System.out.println(sbuf);
}
}
Output
Here's a block of text to use as input to
the regular expression matcher. Note that we'll
first extract the block of text by looking for
the special delimiters, then process the
extracted block.
H(VOWEL1)re's a block of text to use as input to
the regular expression matcher. Note that we'll
first extract the block of text by looking for
the special delimiters, then process the
extracted block.
H(VOWEL1)rE's A blOck Of tExt tO UsE As InpUt tO
thE rEgUlAr ExprEssIOn mAtchEr. NOtE thAt wE'll
fIrst ExtrAct thE blOck Of tExt by lOOkIng fOr
thE spEcIAl dElImItErs, thEn prOcEss thE
ExtrActEd blOck.

如果需要对替换字符进行特殊处理,如此处的变为大写字母,应该去使用appendReplacement()方法


reset()

可以将现有的Matcher对象应用于一个新的字符序列

public class Resetting {
public static void main(String[]args) throws Exception{
Matcher m=Pattern.compile("[frb][aiu][gx]")
.matcher("fix the rug with bags");
while(m.find())
System.out.print(m.group()+" ");
System.out.println();
m.reset("fix the rig with rags");
while(m.find())
System.out.print(m.group()+" ");
}
}
Output:
fix rug bag
fix rig rag