python中的正则表达式

时间:2025-02-11 22:13:58

正则表达式作为一个模块内嵌在Python中,使用正则表达式需要导入re模块

导入re模块

import re

findall()语法:

findall(正则表达式,目标),其返回值为列表

import re

a = "123abc456deg"
b = ("abc",a)
print(b)

['abc']

元字符

1、\
  • \d ----- 匹配十进制数字
  • \s ----- 匹配空白字符
  • \w ----- 匹配字母数字、下划线和各国语言
2、. -----  匹配任意符号(除了换行符 \n
3、^ ----- 匹配以 XXXX 开头,通常匹配行首 例如:以 1 开头 ---- ^1
4、$ ----- 匹配 XXXX 结尾,通常匹配行尾
5、[] ----- 常用来指定一个字符集或范围
6、 * ----- 匹配零次或多次,会尽可能多去匹配
7、 + ----- 匹配一次或多次
8、? ----- 表示一次或零次,贪婪模式:多去匹配数据;非贪婪模式:少去匹配数据
9、 {m,n} -----最少重复m次,最多重复n次(加?表示尽少去匹配)

10、 {n} ----- 重复

[] ----- 常用来指定一个字符集或范围

import re

a = "aac;abc;acc-"
rule = "a[a,b,c]c"
b = (rule,a)
print(b)
a = "aac8abc9acc-"
rule = "a[a-z][a-z][0-9][a-z][a-z]c"
b = (rule,a)
print(b)
a = "a?.bac"
rule = "a[?,.][?,.]b"
b = (rule,a)
print(b)

['aac', 'abc', 'acc']
['aac8abc']
['a?.b']

在[ ]中元字符不起作用,当做普通字符

^ ----- 匹配以XXXX开头,通常匹配行首 例如:以1开头 ---- ^1

import re

a = "aac156abc78acc4"
rule = "^aac15"
b = (rule,a)
print(b)

['aac15']

$ ----- 匹配XXXX结尾,通常匹配行尾

import re

a = "ac156abc78acc"
b = ("acc$",a)
print(b)

['acc']

\

  • \d ----- 匹配十进制数字
import re

a = "ac156abc78acc"
b = ("ac\d\d\da",a)
print(b)

['ac156a']

  • \s ----- 匹配空白字符
import re

a = "ac     6abc78acc"
b = ("\s\s",a)
print(b)

['  ', '  ']

  • \w ----- 匹配字母数字、下划线和各国语言
import re

a = "ac6abc78acc"
b = ("\w\w",a)
print(b)
['ac', '6a', 'bc', '78', 'ac']

\字符还是转义字符可以将元字符转换为普通字符

import re

a = "ac6^abc78acc"
b = ("\^abc",a)
print(b)

['^abc']

{n} ----- 重复

{n}用来重复操作,用它可以减少我们写代码的程度

import re

a = "ac6^abc78acc"
b = ("\w{2}",a)
print(b)

['ac', 'ab', 'c7', '8a', 'cc']

* ----- 匹配零次或多次,会尽可能多去匹配

尽可能多去匹配是什么意思?即它会往后取与最后一个字符相同类型的字符

需要注意的是它可取零次

import re

a = "ac688abc78acc"
b = ("ac\d*",a)
print(b)

['ac688', 'ac']

+ ----- 匹配一次或多次

需要注意的是它不能取零次

import re

a = "ac688abc78acc"
b = ("ac\d+",a)
print(b)

['ac688']

. ----- 匹配任意符号(除了换行符\n

import re

a = "ac688\nbc78acc"
b = (".",a)
print(b)

['a', 'c', '6', '8', '8', 'b', 'c', '7', '8', 'a', 'c', 'c']

import re

a = "ac688\nbc78acc"
b = ("..",a)
print(b)

['ac', '68', 'bc', '78', 'ac']

? ----- 表示一次或零次,贪婪模式:多去匹配数据;非贪婪模式:少去匹配数据

import re

a = "ac688bc78acc"
b = ("ac\d?",a)
print(b)

['ac6', 'ac']

贪婪模式:尽可能多匹配数据,表现为\d后加某元字符

import re

a = "ac688bc78acc"
b = ("ac\d*",a)
print(b)

['ac688', 'ac']

非贪婪模式:尽可能少的匹配数据,在\d后加?

import re

a = "ac688bc78acc"
b = ("ac\d?",a)
print(b)

['ac6', 'ac']

{m,n} -----最少重复m次,最多重复n次(加?表示尽少去匹配)

import re

a = "ac688bc78acc"
b = ("ac\d{0,2}",a)
print(b)

['ac68', 'ac']

正则的编译

re模块中我们可以使用compile()方法来编译

()方法用于编译正则表达式,供match()和search()这两个函数使用

import re

a = "abc123hjk788abc555555555"
rule = "abc\d*"
ruleCompile = (rule)
print(ruleCompile)
aCompile = (a)
print(aCompile)

('abc\\d*')
['abc123', 'abc555555555']

在正则表达式中还有其他方法

findall:找到与目标匹配的所有字符串,返回一个列表

search:找到目标匹配的第一个位置

import re

a = "abc123hjk788abc555555555"
rule = "abc\d*"
ruleCompile = (rule)
print(ruleCompile)
aCompile = (a)
print(aCompile)

('abc\\d*')
< object; span=(0, 6), match='abc123'>

match:决定目标是否在字符串刚开始的位置(匹配行首)

import re

a = "abc123hjk788abc555555555"
rule = "abc\d*"
ruleCompile = (rule)
print(ruleCompile)
aCompile = (a)
print(aCompile)

('abc\\d*')
< object; span=(0, 6), match='abc123'>

其方法返回的是一个match对象

Mathch操作

group()返回与目标匹配的字符串

start()返回匹配开始的位置

end()返回匹配结束的位置

span()返回(开始,结束)的位置,其值为元组

import re

a = "abc123hjk788abc555555555"
rule = "abc\d*"
ruleCompile = (rule)
aCompile = (a)
print(())
print(())
print(())
print(())

abc123
0
6
(0, 6)

1、findall()

2、sub(被替换字符串,替换字符串,目标字符串)

import re

a = "abc123hjk788abc555555555"
x = ("abc","xxx",a)
print(x)

xxx123hjk788xxx555555555

3、subn(被替换字符串,替换字符串,目标字符串)

import re

a = "abc123hjk788abc5abc5"
x = ("abc","xxx",a)
print(x)

('xxx123hjk788xxx5xxx5', 3)

4、spilt(分割符,字符串)

import re

a = "abc123hjk788abc5abc5"
x = ("abc",a)
print(x)

['', '123hjk788', '5', '5']