正则表达式作为一个模块内嵌在Python中,使用正则表达式需要导入re模块
导入re模块
import re
findall()语法:
findall(正则表达式,目标),其返回值为列表
import re
a = "123abc456deg"
b = ("abc",a)
print(b)
['abc']
元字符
- \d ----- 匹配十进制数字
- \s ----- 匹配空白字符
- \w ----- 匹配字母数字、下划线和各国语言
10、 {n} ----- 重复
[] ----- 常用来指定一个字符集或范围
import re
a = "aac;abc;acc-"
rule = "a[a,b,c]c"
b = (rule,a)
print(b)
a = "aac8abc9acc-"
rule = "a[a-z][a-z][0-9][a-z][a-z]c"
b = (rule,a)
print(b)
a = "a?.bac"
rule = "a[?,.][?,.]b"
b = (rule,a)
print(b)
['aac', 'abc', 'acc']
['aac8abc']
['a?.b']
在[ ]中元字符不起作用,当做普通字符
^ ----- 匹配以XXXX开头,通常匹配行首 例如:以1开头 ---- ^1
import re
a = "aac156abc78acc4"
rule = "^aac15"
b = (rule,a)
print(b)
['aac15']
$ ----- 匹配以XXXX结尾,通常匹配行尾
import re
a = "ac156abc78acc"
b = ("acc$",a)
print(b)
['acc']
\
- \d ----- 匹配十进制数字
import re
a = "ac156abc78acc"
b = ("ac\d\d\da",a)
print(b)
['ac156a']
- \s ----- 匹配空白字符
import re
a = "ac 6abc78acc"
b = ("\s\s",a)
print(b)
[' ', ' ']
- \w ----- 匹配字母数字、下划线和各国语言
import re
a = "ac6abc78acc"
b = ("\w\w",a)
print(b)
['ac', '6a', 'bc', '78', 'ac']
\字符还是转义字符可以将元字符转换为普通字符
import re
a = "ac6^abc78acc"
b = ("\^abc",a)
print(b)
['^abc']
{n} ----- 重复
{n}用来重复操作,用它可以减少我们写代码的程度
import re
a = "ac6^abc78acc"
b = ("\w{2}",a)
print(b)
['ac', 'ab', 'c7', '8a', 'cc']
* ----- 匹配零次或多次,会尽可能多去匹配
尽可能多去匹配是什么意思?即它会往后取与最后一个字符相同类型的字符
需要注意的是它可取零次
import re
a = "ac688abc78acc"
b = ("ac\d*",a)
print(b)
['ac688', 'ac']
+ ----- 匹配一次或多次
需要注意的是它不能取零次
import re
a = "ac688abc78acc"
b = ("ac\d+",a)
print(b)
['ac688']
. ----- 匹配任意符号(除了换行符\n)
import re
a = "ac688\nbc78acc"
b = (".",a)
print(b)
['a', 'c', '6', '8', '8', 'b', 'c', '7', '8', 'a', 'c', 'c']
import re
a = "ac688\nbc78acc"
b = ("..",a)
print(b)
['ac', '68', 'bc', '78', 'ac']
? ----- 表示一次或零次,贪婪模式:多去匹配数据;非贪婪模式:少去匹配数据
import re
a = "ac688bc78acc"
b = ("ac\d?",a)
print(b)
['ac6', 'ac']
贪婪模式:尽可能多匹配数据,表现为\d后加某元字符
import re
a = "ac688bc78acc"
b = ("ac\d*",a)
print(b)
['ac688', 'ac']
非贪婪模式:尽可能少的匹配数据,在\d后加?
import re
a = "ac688bc78acc"
b = ("ac\d?",a)
print(b)
['ac6', 'ac']
{m,n} -----最少重复m次,最多重复n次(加?表示尽少去匹配)
import re
a = "ac688bc78acc"
b = ("ac\d{0,2}",a)
print(b)
['ac68', 'ac']
正则的编译
re模块中我们可以使用compile()方法来编译
()方法用于编译正则表达式,供match()和search()这两个函数使用
import re
a = "abc123hjk788abc555555555"
rule = "abc\d*"
ruleCompile = (rule)
print(ruleCompile)
aCompile = (a)
print(aCompile)
('abc\\d*')
['abc123', 'abc555555555']
在正则表达式中还有其他方法
findall:找到与目标匹配的所有字符串,返回一个列表
search:找到目标匹配的第一个位置
import re
a = "abc123hjk788abc555555555"
rule = "abc\d*"
ruleCompile = (rule)
print(ruleCompile)
aCompile = (a)
print(aCompile)
('abc\\d*')
< object; span=(0, 6), match='abc123'>
match:决定目标是否在字符串刚开始的位置(匹配行首)
import re
a = "abc123hjk788abc555555555"
rule = "abc\d*"
ruleCompile = (rule)
print(ruleCompile)
aCompile = (a)
print(aCompile)
('abc\\d*')
< object; span=(0, 6), match='abc123'>
其方法返回的是一个match对象
Mathch操作
group()返回与目标匹配的字符串
start()返回匹配开始的位置
end()返回匹配结束的位置
span()返回(开始,结束)的位置,其值为元组
import re
a = "abc123hjk788abc555555555"
rule = "abc\d*"
ruleCompile = (rule)
aCompile = (a)
print(())
print(())
print(())
print(())
abc123
0
6
(0, 6)
1、findall()
2、sub(被替换字符串,替换字符串,目标字符串)
import re
a = "abc123hjk788abc555555555"
x = ("abc","xxx",a)
print(x)
xxx123hjk788xxx555555555
3、subn(被替换字符串,替换字符串,目标字符串)
import re
a = "abc123hjk788abc5abc5"
x = ("abc","xxx",a)
print(x)
('xxx123hjk788xxx5xxx5', 3)
4、spilt(分割符,字符串)
import re
a = "abc123hjk788abc5abc5"
x = ("abc",a)
print(x)
['', '123hjk788', '5', '5']