excel支持正则表达式提取字符函数(支持RegExp捕获分组)

时间:2022-04-19 18:50:01

一、要让excel脚本支持Microsoft VBScript Regular Expressions 5.5 ,按快捷键alt+F11,出现下图界面,操作如图示:

excel支持正则表达式提取字符函数(支持RegExp捕获分组)

excel支持正则表达式提取字符函数(支持RegExp捕获分组)

二.添加VBA代码:

excel支持正则表达式提取字符函数(支持RegExp捕获分组)

excel支持正则表达式提取字符函数(支持RegExp捕获分组)

 代码添加完毕后,关闭该窗口。

 

 

Function regex(strInput As String, matchPattern As String, Optional ByVal outputPattern As String = "$0") As Variant
Dim inputRegexObj As New VBScript_RegExp_55.RegExp, outputRegexObj As New VBScript_RegExp_55.RegExp, outReplaceRegexObj As New VBScript_RegExp_55.RegExp
Dim inputMatches As Object, replaceMatches As Object, replaceMatch As Object
Dim replaceNumber As Integer

With inputRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = matchPattern
End With
With outputRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = "\$(\d+)"
End With
With outReplaceRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
End With

Set inputMatches = inputRegexObj.Execute(strInput)
If inputMatches.Count = 0 Then
regex = False
Else
Set replaceMatches = outputRegexObj.Execute(outputPattern)
For Each replaceMatch In replaceMatches
replaceNumber = replaceMatch.SubMatches(0)
outReplaceRegexObj.Pattern = "\$" & replaceNumber

If replaceNumber = 0 Then
outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).Value)
Else
If replaceNumber > inputMatches(0).SubMatches.Count Then
'regex = "A to high $ tag found. Largest allowed is $" & inputMatches(0).SubMatches.Count & "."
regex = CVErr(xlErrValue)
Exit Function
Else
outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).SubMatches(replaceNumber - 1))
End If
End If
Next
regex = outputPattern
End If
End Function

 VBA脚本代码参数说明:

  1. A text to use the regular expression on.(第一个参数为被应用的字符串,即要从中提取的长字符串)
  2. A regular expression.(第二个参数为匹配的正则表达式,外侧需要加“”,支持捕获分组)
  3. A format string specifying how the result should look. It can contain $0$1$2, and so on. $0 is the entire match, $1 and up correspond to the respective match groups in the regular expression. Defaults to $0.(第三个参数为要捕获字符的字符分组,“$0”表示全部匹配捕获分组,“$1”,"$2"......表示捕获分组序号)

函数调用示例:

 

=regex("Peter Gordon: some@email.com, 47", "^(.+): (.+), (\d+)$", "$1")
结果:
Peter Gordon

  

  教程翻译自:https://*.com/a/28176749/2109599