使用excel vba将字符串拆分为分隔符而忽略该分隔符的某些实例的最有效方法

时间:2021-07-23 21:37:20

I have a very redundant piece of old code, which I created when I first discovered and decided to learn vba (and man have I come a long way). I was using this code to cycle through cells that contained multiple values delimited with commas. However, there are cases where I can't simply use something such as the Split(string,",") function because some of the values have commas within that value (example value:[blah blah, so blah blah]). In instances where these brackets exist (they are around every value that has a comma within it) I devised the rather long-winded approach that is my old method to properly split the values up, dump them in an array and then proceed with my other tasks. But, now I've decided to revisit the code and fix the accuracy. Here is some background.

我有一段非常多余的旧代码,这是我第一次发现并决定学习vba时创建的(天哪,我走了很长的路)。我使用这段代码循环遍历包含用逗号分隔的多个值的单元格。但是,有些情况下我不能简单地使用Split(string,",")函数,因为有些值在该值中有逗号(示例值:[blah, so blah))。在这些括号存在的实例中(它们围绕着每个包含逗号的值),我设计了一种相当冗长的方法,这是我的旧方法,用于正确地分割值,将它们转储到数组中,然后继续我的其他任务。但是,现在我决定重新检查代码并修复准确性。这是一些背景。

Sample Data that can be found in one cell:
Please Note: This is data that suppliers send us, we don't have control over what they enter or how they enter it. This is a simple example to show the gist of how the data typically is provided in some cases

可以在一个单元格中找到的示例数据:请注意:这是供应商发送给我们的数据,我们无法控制供应商输入的内容或方式。这是一个简单的例子,说明了在某些情况下数据通常是如何提供的。

Available on 2 sides: Silkscreen,[full: color, covers entire face],Pad Print: One color,[heat transfer, may bleed]

Values are:

值:

  • Available on 2 sides: Silkscreen
  • 可用于双面:丝网印刷。
  • [full: color, covers entire face]
  • [全色,覆盖整张脸]
  • Pad Print: One color
  • 垫印刷:一种颜色
  • [heat transfer, may bleed]
  • (传热,可能流血)

What I'm looking for:
I'm looking for a more efficient and simpler method to be able to split up values correctly (while keeping the brackets for the values that have them).

我正在寻找的是:我正在寻找一种更高效、更简单的方法,以便能够正确地分割值(同时保留包含值的括号)。

I believe I have managed to create a much more efficient and compact method to handle instances that don't include brackets using the following code

我相信我已经创建了一个更高效更紧凑的方法来处理不包含括号的实例,使用下面的代码

New Code (Under Construction): I'm having issues with not knowing how to efficiently and accurately split the cells with brackets

新代码(正在构建中):我不知道如何高效、准确地用括号分隔单元格

Sub Test()
    Dim rngXid As Range, RegularColons As New Collection, UpchargeColons As New Collection, additionals As Range, upcharges As Range, Colon, UpchargeColon
    Dim Values() As String, endRange As Long, xidMap As Object, xid As String, NumberofValues As Integer
    endRange = ActiveSheet.Range("A" & Rows.Count).End(xlUp).Row

    Set xidMap = getXidMap(ActiveSheet.Range("A2:A" & UsedRange.Rows.Count)) 'Map products for quicker navigation
    Set additionals = ActiveSheet.Range("AJ:AK"): Set upcharges = ActiveSheet.Range("CS:CT")
    Set RegularColons = FindAllMatches(additionals, ":") 'This returns all instances/cells that contain a colon in the specified columns
    If Not RegularColons Is Nothing Then
        For Each Colon In RegularColons
            xid = ActiveSheet.Range("A" & Colon.Row).Value
            If InStr(1, Colon.Value, "[") = 0 Then 'If no brackets then simply split
                Values = Split(Trim(Colon.Value), ",")
            Else
                'This is where I'm at a lose for a more effective method
                '-----------Populate Values array with Colon.Value while watching out for brackets--------
            End If
            Set rngXid = xidMap(xid).EntireRow.Columns(upcharges) 'set to this specific product
            For ColorLocation = LBound(Values) To UBound(Values) 'cycle through each value in Values array
                If Not InStr(1, Values(ColorLocation), ":") = 0 Then 'Only proceed if the value has a colon
                    Set UpchargeColons = FindAllMatches(rngXid, Values(ColorLocation)) 'Searching other columns for this value
                    If Not UpchargeColons Is Nothing Then
                        For Each UpchargeColon In UpchargeColons 'If found in other columns proceed to replace colon
                            UpchargeColon.Value = Replace(UpchargeColon.Value, ":", " ")
                            Log UpchargeColon.Range, "Removed Colon from Additional Color/Location Upcharge", "Corrected" 'This is a custom sub of mine to record the change
                        Next UpchargeColon
                    End If
                    Values(ColorLocation) = Replace(Values(ColorLocation), ":", " ")
                End If
            Next ColorLocation
            Log Colon.Range, "Removed Colon(s) from Additional Color/Location Value(s)", "Corrected"
        Next Colon
    End If
End Sub

I've been browsing possible ways to do this and the one that keeps sticking out is Regex, which admittedly I have absolutely no experience with although I have previously heard of it. So, I tried reading up on it a bit using sites like this and of course the msdn documentation. My observations/thoughts while trying to learn a little more about this method are:

我一直在浏览各种可能的方法,其中最引人注目的是Regex,尽管我以前听说过它,但我绝对没有使用它的经验。所以,我试着在上面阅读一些类似的网站,当然还有msdn文档。我在尝试学习这个方法时的观察/想法是:

  1. This is absolutely, terrifyingly complex and overwhelming. So much so I have to refrain from crawling into a corner, assuming fetal position, and bawling my eyes out.
  2. 这是绝对的,可怕的复杂和压倒性的。所以我不得不避免爬到一个角落里,假设胎儿的位置,然后大哭一场。
  3. I can't seem to find anything that would suggest how I could incorporate this function with my need to split the string appropriately, at least as far as I can see. But, it may just be that I get so overwhelmed by all the seemingly random symbol sequences
  4. 我似乎找不到任何能说明如何将这个函数与需要适当分割字符串相结合的东西,至少在我看来是这样。但是,可能只是我被所有看似随机的符号序列弄得不知所措。

So, my question is:
What would be the most efficient way to accurately split up the values in cells that contain brackets?

所以,我的问题是:在包含括号的单元格中,最有效的分割值的方法是什么?

4 个解决方案

#1


2  

Regular Expressions (aka "regex") are indeed scary-looking, but they're also a powerful tool, and VBA supports them, if you add a reference to the Microsoft VBScript Regular Expressions 5.5 library.

正则表达式(又称“regex”)确实很吓人,但它们也是一个强大的工具,如果您向Microsoft VBScript正则表达式5.5库添加引用,VBA将支持它们。

With it you can create a RegExp object, which gives you a MatchCollection, which is, well, a collection of Match objects.

通过它,您可以创建RegExp对象,该对象为您提供了一个匹配的集合,这是一个匹配对象的集合。

Here's how to use them:

下面是如何使用它们的方法:

Sub Test()
    Const value As String = _
    "Available on 2 sides: Silkscreen,[full: color, covers entire face],Pad Print: One color,[heat transfer, may bleed]"

    Const pattern As String = _
    "(\[[^\]]+\]|[^,]+)"

    Dim regex As New RegExp
    regex.Global = True
    regex.pattern = pattern

    Dim matches As MatchCollection
    Set matches = regex.Execute(value)

    Dim m As Match
    For Each m In matches
        Debug.Print Trim(m.value) 'value will preserve any leading/trailing spaces
    Next

End Sub

Notice the pattern is pretty much that in Laurel's answer:

注意这个模式在劳雷尔的回答中是这样的

(\[[^\]]+\]|[^,]+)

By not specifying that you want to match a comma, you're not matching it (whether it's there or not) - hence, the above code outputs this:

通过不指定要匹配逗号,就没有匹配逗号(无论是否存在)——因此,上面的代码输出如下:

Available on 2 sides: Silkscreen
[full: color, covers entire face]
Pad Print: One color
[heat transfer, may bleed]

You can easily iterate a MatchCollection to populate an array if you need one.

如果需要的话,可以轻松迭代MatchCollection来填充数组。

#2


3  

There are other ways, but this regex seems to be pretty fast:

还有其他的方法,但是这个regex似乎非常快:

(\[[^\]]+\]|[^,]+),?

(\[[^ \]]+ \]|[^,]+),?

Explanation:

解释:

\[ and \] are escaped versions of [ and]

\[and]是转义版本的[and]

Essentially, it's looking for a \[, gets all non brackets [^\]], then the \]. Otherwise |, it will get all the non commas [^,]. The surrounding () makes it a capture group. ,? means there may or may not be a comma.

从本质上讲,它是寻找\[,得到所有非方括号[^ \]],然后\]。否则|,它将把所有非逗号[^,]。环境()使它成为一个捕获组。,吗?意思是可能有也可能没有逗号。

#3


3  

One way is to take the bracketed commas and replace them with Chr(184)'s. These little guys look a lot like commas.

一种方法是将括号内的逗号替换为Chr(184)。这些小家伙看起来很像逗号。

Once the bracketed commas have been replaced, you can use normal Split() Here is some code to do the replacement:

当括号括起来的逗号被替换后,您可以使用normal Split()来执行替换:

Sub parser()
    Dim s As String, s1 As String, s2 As String, pseudo As String
    Dim switch As Boolean, temp As String, CH As String

    pseudo = Chr(184)
    s1 = "["
    s2 = "]"
    s = [A1]
    switch = False
    temp = ""

    For i = 1 To Len(s)
        CH = Mid(s, i, 1)
        If CH = s1 Or CH = s2 Then switch = Not switch
        If switch Then CH = Replace(CH, ",", pseudo)
        temp = temp & CH
    Next i

    Range("A2").Value = temp
    MsgBox s & vbCrLf & temp
End Sub

使用excel vba将字符串拆分为分隔符而忽略该分隔符的某些实例的最有效方法

#4


1  

Function Splitter(s)
    Dim p As Long, b As Long, l As String
    Dim c As Long, s2 As String, arr, n

    If InStr(s, "[") = 0 Then
        arr = Split(s, ",")
    Else
        c = 0
        For p = 1 To Len(s)
            l = Mid(s, p, 1)
            If l = "," And c = 0 Then
                Mid(s, p, 1) = vbNull
            Else
                If l = "[" Then c = c + 1
                If l = "]" Then c = c - 1
            End If
        Next p
        arr = Split(s, vbNull)
    End If
    Splitter = arr
End Function

#1


2  

Regular Expressions (aka "regex") are indeed scary-looking, but they're also a powerful tool, and VBA supports them, if you add a reference to the Microsoft VBScript Regular Expressions 5.5 library.

正则表达式(又称“regex”)确实很吓人,但它们也是一个强大的工具,如果您向Microsoft VBScript正则表达式5.5库添加引用,VBA将支持它们。

With it you can create a RegExp object, which gives you a MatchCollection, which is, well, a collection of Match objects.

通过它,您可以创建RegExp对象,该对象为您提供了一个匹配的集合,这是一个匹配对象的集合。

Here's how to use them:

下面是如何使用它们的方法:

Sub Test()
    Const value As String = _
    "Available on 2 sides: Silkscreen,[full: color, covers entire face],Pad Print: One color,[heat transfer, may bleed]"

    Const pattern As String = _
    "(\[[^\]]+\]|[^,]+)"

    Dim regex As New RegExp
    regex.Global = True
    regex.pattern = pattern

    Dim matches As MatchCollection
    Set matches = regex.Execute(value)

    Dim m As Match
    For Each m In matches
        Debug.Print Trim(m.value) 'value will preserve any leading/trailing spaces
    Next

End Sub

Notice the pattern is pretty much that in Laurel's answer:

注意这个模式在劳雷尔的回答中是这样的

(\[[^\]]+\]|[^,]+)

By not specifying that you want to match a comma, you're not matching it (whether it's there or not) - hence, the above code outputs this:

通过不指定要匹配逗号,就没有匹配逗号(无论是否存在)——因此,上面的代码输出如下:

Available on 2 sides: Silkscreen
[full: color, covers entire face]
Pad Print: One color
[heat transfer, may bleed]

You can easily iterate a MatchCollection to populate an array if you need one.

如果需要的话,可以轻松迭代MatchCollection来填充数组。

#2


3  

There are other ways, but this regex seems to be pretty fast:

还有其他的方法,但是这个regex似乎非常快:

(\[[^\]]+\]|[^,]+),?

(\[[^ \]]+ \]|[^,]+),?

Explanation:

解释:

\[ and \] are escaped versions of [ and]

\[and]是转义版本的[and]

Essentially, it's looking for a \[, gets all non brackets [^\]], then the \]. Otherwise |, it will get all the non commas [^,]. The surrounding () makes it a capture group. ,? means there may or may not be a comma.

从本质上讲,它是寻找\[,得到所有非方括号[^ \]],然后\]。否则|,它将把所有非逗号[^,]。环境()使它成为一个捕获组。,吗?意思是可能有也可能没有逗号。

#3


3  

One way is to take the bracketed commas and replace them with Chr(184)'s. These little guys look a lot like commas.

一种方法是将括号内的逗号替换为Chr(184)。这些小家伙看起来很像逗号。

Once the bracketed commas have been replaced, you can use normal Split() Here is some code to do the replacement:

当括号括起来的逗号被替换后,您可以使用normal Split()来执行替换:

Sub parser()
    Dim s As String, s1 As String, s2 As String, pseudo As String
    Dim switch As Boolean, temp As String, CH As String

    pseudo = Chr(184)
    s1 = "["
    s2 = "]"
    s = [A1]
    switch = False
    temp = ""

    For i = 1 To Len(s)
        CH = Mid(s, i, 1)
        If CH = s1 Or CH = s2 Then switch = Not switch
        If switch Then CH = Replace(CH, ",", pseudo)
        temp = temp & CH
    Next i

    Range("A2").Value = temp
    MsgBox s & vbCrLf & temp
End Sub

使用excel vba将字符串拆分为分隔符而忽略该分隔符的某些实例的最有效方法

#4


1  

Function Splitter(s)
    Dim p As Long, b As Long, l As String
    Dim c As Long, s2 As String, arr, n

    If InStr(s, "[") = 0 Then
        arr = Split(s, ",")
    Else
        c = 0
        For p = 1 To Len(s)
            l = Mid(s, p, 1)
            If l = "," And c = 0 Then
                Mid(s, p, 1) = vbNull
            Else
                If l = "[" Then c = c + 1
                If l = "]" Then c = c - 1
            End If
        Next p
        arr = Split(s, vbNull)
    End If
    Splitter = arr
End Function