I have been trying to create a regular expressions pattern that matches any reference in any Excel formula, including absolute, relative, and external references. I need to return the entire reference, including the worksheet and workbook name.
我一直在尝试创建一个正则表达式模式来匹配任何Excel公式中的引用,包括绝对引用、相对引用和外部引用。我需要返回整个引用,包括工作表和工作簿名称。
I haven't been able to find exhaustive documentation about Excel A1-notation, but with a lot of testing I have determined the following:
我还没有找到关于Excel A1-notation的详尽文档,但是通过大量的测试,我确定了以下内容:
- Formulas are preceded with an equal sign "="
- 公式前面有等号"="
- Strings within formulas are enclosed in double quotes and need to be removed before looking for real references, otherwise
=A1&"A1"
would break regex - 公式中的字符串用双引号括起来,在查找真正的引用之前需要删除,否则=A1&"A1"将会破坏regex
- Worksheet names can be up to 31 characters long, excluding \ / ? * [ ] :
- 工作表名称最长可达31个字符,不包括\ / ?*[]:
- Worksheet names in external references must be succeeded with bang
=Sheet1!A1
- 外部引用中的工作表名称必须与bang =Sheet1!A1一起继承
- Workbook names in external references must be enclosed in square brackets
=[Book1.xlsx]Sheet1!A1
- 外部引用中的工作簿名称必须包含在方括号中=[Book1.xlsx]Sheet1!A1
- Workbook paths, which Excel adds if a reference is to a range in a closed workbook, are always enclosed in single quotes and to the left of the brackets for the workbook name
'C:\[Book1.xlsx]Sheet1'!A1
- 工作簿路径,如果一个引用是在一个封闭的工作簿中的范围内,那么它就会添加,它总是被包含在单引号中,并且在工作簿名称的括号的左边,即:\[Book1.xlsx]Sheet1'!
- Some characters (non-breaking space, for example) cause Excel to enclose the workbook and worksheet name in an external reference in single quotes, but I don't know specifically which characters
='[Book 1.xlsx]Sheet 1'!A1
- 有些字符(例如不间断的空格)导致Excel将工作簿和工作表名括在单引号的外部引用中,但我不知道具体是哪个字符='[Book 1]。xlsx)表1 ' ! A1
- Even if R1C1-notation is enabled,
Range.Formula
still returns references in A1-notation.Range.FormulaR1C1
returns references in R1C1 notation. - 即使启用了r1c1 -表示法,范围也是如此。公式仍然返回A1-notation中的引用。的范围内。公式1c1返回R1C1表示法中的引用。
- 3D reference style allows a range of sheet names on one workbook
=SUM([Book5]Sheet1:Sheet3!A1)
- 3D参考样式允许一个工作簿上的一系列表名=SUM([Book5]Sheet1:Sheet3!A1)
- Named ranges can be specified in formulas:
- 可在公式中指定命名范围:
- The first character of a name must be a letter, an underscore character (_), or a backslash (\). Remaining characters in the name can be letters, numbers, periods, and underscore characters.
- 名称的第一个字符必须是字母、下划线字符(_)或反斜杠(\)。名称中的其他字符可以是字母、数字、句号和下划线。
- You cannot use the uppercase and lowercase characters "C", "c", "R", or "r" as a defined name, because they are all used as a shorthand for selecting a row or column for the currently selected cell when you enter them in a Name or Go To text box.
- 不能将大写和小写字符“C”、“C”、“R”或“R”作为已定义的名称,因为当您在名称中输入它们或进入文本框时,它们都被用作选择当前选定单元格的行或列的简写。
- Names cannot be the same as a cell reference, such as Z$100 or R1C1.
- 名称不能与单元格引用相同,例如z$ 100或R1C1。
- Spaces are not allowed as part of a name.
- 不允许空格作为名称的一部分。
- A name can be up to 255 characters in length.
- 名称的长度最多可达255个字符。
- Names can contain uppercase and lowercase letters. Excel does not distinguish between uppercase and lowercase characters in names.
- 名称可以包含大写字母和小写字母。Excel不能区分名称中的大小写字符。
Here is what I came up with wrapped in a VBA procedure for testing. I updated the code to handle names as well:
下面是我在VBA中包装的测试程序。我更新了代码来处理名称:
Sub ReturnFormulaReferences()
Dim objRegExp As New VBScript_RegExp_55.RegExp
Dim objCell As Range
Dim objStringMatches As Object
Dim objReferenceMatches As Object
Dim objMatch As Object
Dim intReferenceCount As Integer
Dim intIndex As Integer
Dim booIsReference As Boolean
Dim objName As Name
Dim booNameFound As Boolean
With objRegExp
.MultiLine = True
.Global = True
.IgnoreCase = True
End With
For Each objCell In Selection.Cells
If Left(objCell.Formula, 1) = "=" Then
objRegExp.Pattern = "\"".*\"""
Set objStringMatches = objRegExp.Execute(objCell.Formula)
objRegExp.Pattern = "(\'.*(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\'\!" _
& "|(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\!)?" _
& "(\$?[a-z]{1,3}\$?[0-9]{1,7}(\:\$?[a-z]{1,3}\$?[0-9]{1,7})?" _
& "|\$[a-z]{1,3}\:\$[a-z]{1,3}" _
& "|[a-z]{1,3}\:[a-z]{1,3}" _
& "|\$[0-9]{1,7}\:\$[0-9]{1,7}" _
& "|[0-9]{1,7}\:[0-9]{1,7}" _
& "|[a-z_\\][a-z0-9_\.]{0,254})"
Set objReferenceMatches = objRegExp.Execute(objCell.Formula)
intReferenceCount = 0
For Each objMatch In objReferenceMatches
intReferenceCount = intReferenceCount + 1
Next
Debug.Print objCell.Formula
For intIndex = intReferenceCount - 1 To 0 Step -1
booIsReference = True
For Each objMatch In objStringMatches
If objReferenceMatches(intIndex).FirstIndex > objMatch.FirstIndex _
And objReferenceMatches(intIndex).FirstIndex < objMatch.FirstIndex + objMatch.Length Then
booIsReference = False
Exit For
End If
Next
If booIsReference Then
objRegExp.Pattern = "(\'.*(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\'\!" _
& "|(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\!)?" _
& "(\$?[a-z]{1,3}\$?[0-9]{1,7}(\:\$?[a-z]{1,3}\$?[0-9]{1,7})?" _
& "|\$[a-z]{1,3}\:\$[a-z]{1,3}" _
& "|[a-z]{1,3}\:[a-z]{1,3}" _
& "|\$[0-9]{1,7}\:\$[0-9]{1,7}" _
& "|[0-9]{1,7}\:[0-9]{1,7})"
If Not objRegExp.Test(objReferenceMatches(intIndex).Value) Then 'reference is not A1
objRegExp.Pattern = "^(\'.*(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\'\!" _
& "|(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\!)" _
& "[a-z_\\][a-z0-9_\.]{0,254}$"
If Not objRegExp.Test(objReferenceMatches(intIndex).Value) Then 'name is not external
booNameFound = False
For Each objName In objCell.Worksheet.Parent.Names
If objReferenceMatches(intIndex).Value = objName.Name Then
booNameFound = True
Exit For
End If
Next
If Not booNameFound Then
objRegExp.Pattern = "^(\'.*(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\'\!" _
& "|(\[.*\])?([^\:\\\/\?\*\[\]]{1,31}\:)?[^\:\\\/\?\*\[\]]{1,31}\!)"
For Each objName In objCell.Worksheet.Names
If objReferenceMatches(intIndex).Value = objRegExp.Replace(objName.Name, "") Then
booNameFound = True
Exit For
End If
Next
End If
booIsReference = booNameFound
End If
End If
End If
If booIsReference Then
Debug.Print " " & objReferenceMatches(intIndex).Value _
& " (" & objReferenceMatches(intIndex).FirstIndex & ", " _
& objReferenceMatches(intIndex).Length & ")"
End If
Next intIndex
Debug.Print
End If
Next
Set objRegExp = Nothing
Set objStringMatches = Nothing
Set objReferenceMatches = Nothing
Set objMatch = Nothing
Set objCell = Nothing
Set objName = Nothing
End Sub
Can anyone break or improve this? Without exhaustive documentation on Excel's formula syntax it is difficult to know if this is correct.
有人能打破或改进这个吗?如果没有关于Excel公式语法的详尽文档,就很难知道这是否正确。
Thanks!
谢谢!
2 个解决方案
#1
3
jtolle steered me in the right direction. As far as I can tell, this is what I was trying to do. I've been testing and it seems to work.
杰托勒把我引向正确的方向。据我所知,这就是我想做的。我一直在测试,它似乎是有效的。
stringOriginFormula = rangeOrigin.Formula
rangeOrigin.Cut rangeDestination
rangeOrigin.Formula = stringOriginFormula
Thanks jtolle!
谢谢jtolle !
#2
0
Thanks Ben (I'm new to post here, even though * has caught my attention for years for high quality technical stuff, so I'm not sure if I read this page correctly for the author J)
感谢Ben(我是新手,虽然*多年来一直吸引我对高质量技术的关注,所以我不确定我是否正确地为作者J阅读了这一页)
I tried the posted solutions (testing, testing updated, as well as the one using range.precendents (which as correctly pointed, does not cover references to other sheets or other workbooks) and found a minor flaw: the external sheet name is enclosed in 'single quotation marks' only if it is a number; if it contains space (and possibly other characters as Ben (?) listed in the orginal post. with a simple addition to the regEx (opening [) this can be corrected (added "[", see code below). In addition, for my own purpose I converted the sub to a function that will return a comma-separated list with duplicates removed (note, this removes just identical reference notation, not cells that are included in multiple ranges):
我尝试了发布的解决方案(测试,测试更新,以及一个使用范围)。precendent(如正确指出的,不包括对其他表或其他工作簿的引用)发现了一个小缺陷:只有当外部表名是一个数字时,外部表名才包含在“单引号”中;如果它包含空格(可能还包括在原始post中列出的Ben(?)等其他字符。通过对regEx (open[)的简单添加,可以纠正这个错误(添加“[”,请参见下面的代码)。此外,为了我自己的目的,我将sub转换为一个函数,返回一个去掉重复项的逗号分隔列表(注意,这只删除相同的引用符号,而不是包含在多个范围中的单元格):
Public Function CellReflist(Optional r As Range) ' single cell
Dim result As Object: Dim testExpression As String: Dim objRegEx As Object
If r Is Nothing Then Set r = ActiveCell ' Cells(1, 2) ' INPUT THE CELL HERE , e.g. RANGE("A1")
Set objRegEx = CreateObject("VBScript.RegExp")
objRegEx.IgnoreCase = True: objRegEx.Global = True: objRegEx.Pattern = """.*?""" ' remove expressions
testExpression = CStr(r.Formula)
testExpression = objRegEx.Replace(testExpression, "")
'objRegEx.Pattern = "(([A-Z])+(\d)+)" 'grab the address
objRegEx.Pattern = "(['\[].*?['!])?([[A-Z0-9_]+[!])?(\$?[A-Z]+\$?(\d)+(:\$?[A-Z]+\$?(\d)+)?|\$?[A-Z]+:\$?[A-Z]+|(\$?[A-Z]+\$?(\d)+))"
If objRegEx.Test(testExpression) Then
Set result = objRegEx.Execute(testExpression)
If result.Count > 0 Then CellReflist = result(0).Value
If result.Count > 1 Then
For i = 1 To result.Count - 1 'Each Match In result
dbl = False ' poistetaan tuplaesiintymiset
For j = 0 To i - 1
If result(i).Value = result(j).Value Then dbl = True
Next j
If Not dbl Then CellReflist = CellReflist & "," & result(i).Value 'Match.Value
Next i 'Match
End If
End If
End Function
结束函数
#1
3
jtolle steered me in the right direction. As far as I can tell, this is what I was trying to do. I've been testing and it seems to work.
杰托勒把我引向正确的方向。据我所知,这就是我想做的。我一直在测试,它似乎是有效的。
stringOriginFormula = rangeOrigin.Formula
rangeOrigin.Cut rangeDestination
rangeOrigin.Formula = stringOriginFormula
Thanks jtolle!
谢谢jtolle !
#2
0
Thanks Ben (I'm new to post here, even though * has caught my attention for years for high quality technical stuff, so I'm not sure if I read this page correctly for the author J)
感谢Ben(我是新手,虽然*多年来一直吸引我对高质量技术的关注,所以我不确定我是否正确地为作者J阅读了这一页)
I tried the posted solutions (testing, testing updated, as well as the one using range.precendents (which as correctly pointed, does not cover references to other sheets or other workbooks) and found a minor flaw: the external sheet name is enclosed in 'single quotation marks' only if it is a number; if it contains space (and possibly other characters as Ben (?) listed in the orginal post. with a simple addition to the regEx (opening [) this can be corrected (added "[", see code below). In addition, for my own purpose I converted the sub to a function that will return a comma-separated list with duplicates removed (note, this removes just identical reference notation, not cells that are included in multiple ranges):
我尝试了发布的解决方案(测试,测试更新,以及一个使用范围)。precendent(如正确指出的,不包括对其他表或其他工作簿的引用)发现了一个小缺陷:只有当外部表名是一个数字时,外部表名才包含在“单引号”中;如果它包含空格(可能还包括在原始post中列出的Ben(?)等其他字符。通过对regEx (open[)的简单添加,可以纠正这个错误(添加“[”,请参见下面的代码)。此外,为了我自己的目的,我将sub转换为一个函数,返回一个去掉重复项的逗号分隔列表(注意,这只删除相同的引用符号,而不是包含在多个范围中的单元格):
Public Function CellReflist(Optional r As Range) ' single cell
Dim result As Object: Dim testExpression As String: Dim objRegEx As Object
If r Is Nothing Then Set r = ActiveCell ' Cells(1, 2) ' INPUT THE CELL HERE , e.g. RANGE("A1")
Set objRegEx = CreateObject("VBScript.RegExp")
objRegEx.IgnoreCase = True: objRegEx.Global = True: objRegEx.Pattern = """.*?""" ' remove expressions
testExpression = CStr(r.Formula)
testExpression = objRegEx.Replace(testExpression, "")
'objRegEx.Pattern = "(([A-Z])+(\d)+)" 'grab the address
objRegEx.Pattern = "(['\[].*?['!])?([[A-Z0-9_]+[!])?(\$?[A-Z]+\$?(\d)+(:\$?[A-Z]+\$?(\d)+)?|\$?[A-Z]+:\$?[A-Z]+|(\$?[A-Z]+\$?(\d)+))"
If objRegEx.Test(testExpression) Then
Set result = objRegEx.Execute(testExpression)
If result.Count > 0 Then CellReflist = result(0).Value
If result.Count > 1 Then
For i = 1 To result.Count - 1 'Each Match In result
dbl = False ' poistetaan tuplaesiintymiset
For j = 0 To i - 1
If result(i).Value = result(j).Value Then dbl = True
Next j
If Not dbl Then CellReflist = CellReflist & "," & result(i).Value 'Match.Value
Next i 'Match
End If
End If
End Function
结束函数