识别以逗号分隔的字符串的新增内容

时间:2021-11-15 16:57:13

I have an excel spreadsheet with 50 rows of comma-delimited data. The number of features contained within the comma-delimited data increases from bottom to top i.e row 50 (the last row) always has the fewest delimiters, and row 1 (the first row) always has the most delimiters. The number of features increases randomly, and each feature can be either unique or duplicated. Either multiple or single features can be added to the string on each row. The features are placed randomly into the previous rows' comma-delimited string i.e they can be placed into the middle of the string on the previous row, or at the beginning or the end of the previous string. If there are multiple added to a row, they may not be placed together. For example:

我有一个包含50行逗号分隔数据的Excel电子表格。逗号分隔数据中包含的特征数量从下到上增加,即第50行(最后一行)总是具有最少的分隔符,第1行(第一行)总是具有最多分隔符。功能的数量随机增加,每个功能可以是唯一的或重复的。可以将多个或单个要素添加到每行的字符串中。这些特征随机放入前面行的逗号分隔字符串中,即它们可以放在前一行的字符串中间,或者放在前一个字符串的开头或结尾。如果有多个添加到一行,它们可能不会放在一起。例如:

1  fish,pig,cat,dog,fish,mouse,fish,cow
2  pig,cat,dog,fish,mouse,fish
3  pig,cat,dog,fish,mouse
4  pig,cat,dog,mouse
5  pig,cat,dog,mouse
6  cat,dog,mouse
7  cat,mouse
8  cat,mouse
9  cat 
10 

I need to extract the feature(s) that have been added to the comma-delimited string on each row, preferably using a UDF. The desired output from the above example would be:

我需要提取已添加到每行上逗号分隔字符串的功能,最好使用UDF。上例中的所需输出为:

1  fish,cow
2  fish
3  fish
4  
5  pig
6  dog
7  
8  mouse
9  cat
10 

I have had some success using a UDF that compares adjacent rows, and extracts any unique values between the two rows in an adjacent column (i.e if the UDF is used on rows 4 and 5 in B4, B4 will be blank; however, if the UDF is used on rows 3 and 4 in B3, B3 will have the value "fish") . However, this causes problems since some of the features are duplicated (see rows 1 and 2 in the above example). This results in the UDF returning a blank value when a duplicate has been added to the string.

我使用比较相邻行的UDF取得了一些成功,并提取了相邻列中两行之间的任何唯一值(即如果在B4中的行4和5上使用UDF,则B4将为空;但是,如果UDF用于B3中的第3行和第4行,B3将具有值“fish”)。但是,这会导致问题,因为某些功能是重复的(请参阅上例中的第1行和第2行)。这会导致UDF在将重复项添加到字符串时返回空值。

I have had most success with these (very slightly adjusted) UDFs that I found on stack exchange, particularly the former:

我在堆栈交换中发现的这些(非常轻微调整的)UDF取得了最大的成功,尤其是前者:

Function NotThere(BaseText As String, TestText As String) As String
  Dim V As Variant, BaseWords() As String
  NotThere = "" & TestText & ","
  For Each V In Split(BaseText, ",")
    NotThere = Replace(NotThere, V & ",", ",")
  Next
  NotThere = Mid(Application.Trim(NotThere), 3, Len(NotThere) - 0)
End Function

and

Function Dups(R1 As String, R2 As String) As String
    Dim nstr As String, R As Variant
        For Each R In Split(R2, ",")
            If InStr(R1, Trim(R)) = 0 Then
                nstr = nstr & IIf(nstr = "", R, "," & R)
            End If
        Next R
    Dups = nstr
    End Function

I have also tried the method suggested here: http://www.ozgrid.com/VBA/array-differences.htm, but continually get #VALUE errors.

我也试过这里建议的方法:http://www.ozgrid.com/VBA/array-differences.htm,但不断得到#VALUE错误。

3 个解决方案

#1


4  

iterate both arrays and remove as duplicates are found. When done return what is left:

迭代两个数组并删除,因为找到了重复项。完成后返回剩下的内容:

Function newadd(rng1 As String, rng2 As String) As String
    If rng1 = "" Then
        newadd = rng2
        Exit Function
    End If

    Dim spltStr1() As String
    spltStr1 = Split(rng1, ",")

    Dim spltstr2() As String
    spltstr2 = Split(rng2, ",")

    Dim i As Long, j As Long
    Dim temp As String
    For i = LBound(spltstr2) To UBound(spltstr2)
        For j = LBound(spltStr1) To UBound(spltStr1)
            If spltStr1(j) = spltstr2(i) Then
                spltStr1(j) = ""
                spltstr2(i) = ""
                Exit For
            End If
        Next j
        If spltstr2(i) <> "" Then
            temp = temp & "," & spltstr2(i)
        End If
    Next i


    newadd = Mid(temp, 2)
End Function

识别以逗号分隔的字符串的新增内容

#2


4  

Try a scripting dictionary to track your duplicates.

尝试使用脚本字典来跟踪重复项。

Option Explicit

Function NotThere(BaseText As String, TestText As String, _
                  Optional delim As String = ",") As String
    Static dict As Object
    Dim bt As Variant, tt As Variant, i As Long, tmp As String

    If dict Is Nothing Then
        Set dict = CreateObject("scripting.dictionary")
    Else
        dict.RemoveAll
    End If
    dict.CompareMode = vbTextCompare

    tt = Split(TestText, delim)
    bt = Split(BaseText, delim)

    For i = LBound(tt) To UBound(tt)
        If Not dict.exists(tt(i)) Then
            dict.Item(tt(i)) = 1
        Else
            dict.Item(tt(i)) = dict.Item(tt(i)) + 1
        End If
    Next i

    For i = LBound(bt) To UBound(bt)
        If Not dict.exists(bt(i)) Then
            tmp = tmp & delim & bt(i)
        Else
            dict.Item(bt(i)) = dict.Item(bt(i)) - 1
            If Not CBool(dict.Item(bt(i))) Then dict.Remove bt(i)
        End If
    Next i

    NotThere = Mid(tmp, Len(delim) + 1)

End Function

识别以逗号分隔的字符串的新增内容

#3


2  

edited to account for possible features as substrings of other features

编辑以考虑可能的功能作为其他功能的子串

you could use this UDF:

你可以使用这个UDF:

Public Function NewFeatures(ByVal txt1 As String, txt2 As String) As String
    Dim feat As Variant
    txt1 = "," & txt1 & ","
    For Each feat In Split(txt2, ",")
        txt1 = Replace(txt1, "," & feat & ",", ",,", , 1)
    Next
    NewFeatures = Replace(WorksheetFunction.Trim(Join(Split(txt1, ","), " ")), " ", ",")
End Function

#1


4  

iterate both arrays and remove as duplicates are found. When done return what is left:

迭代两个数组并删除,因为找到了重复项。完成后返回剩下的内容:

Function newadd(rng1 As String, rng2 As String) As String
    If rng1 = "" Then
        newadd = rng2
        Exit Function
    End If

    Dim spltStr1() As String
    spltStr1 = Split(rng1, ",")

    Dim spltstr2() As String
    spltstr2 = Split(rng2, ",")

    Dim i As Long, j As Long
    Dim temp As String
    For i = LBound(spltstr2) To UBound(spltstr2)
        For j = LBound(spltStr1) To UBound(spltStr1)
            If spltStr1(j) = spltstr2(i) Then
                spltStr1(j) = ""
                spltstr2(i) = ""
                Exit For
            End If
        Next j
        If spltstr2(i) <> "" Then
            temp = temp & "," & spltstr2(i)
        End If
    Next i


    newadd = Mid(temp, 2)
End Function

识别以逗号分隔的字符串的新增内容

#2


4  

Try a scripting dictionary to track your duplicates.

尝试使用脚本字典来跟踪重复项。

Option Explicit

Function NotThere(BaseText As String, TestText As String, _
                  Optional delim As String = ",") As String
    Static dict As Object
    Dim bt As Variant, tt As Variant, i As Long, tmp As String

    If dict Is Nothing Then
        Set dict = CreateObject("scripting.dictionary")
    Else
        dict.RemoveAll
    End If
    dict.CompareMode = vbTextCompare

    tt = Split(TestText, delim)
    bt = Split(BaseText, delim)

    For i = LBound(tt) To UBound(tt)
        If Not dict.exists(tt(i)) Then
            dict.Item(tt(i)) = 1
        Else
            dict.Item(tt(i)) = dict.Item(tt(i)) + 1
        End If
    Next i

    For i = LBound(bt) To UBound(bt)
        If Not dict.exists(bt(i)) Then
            tmp = tmp & delim & bt(i)
        Else
            dict.Item(bt(i)) = dict.Item(bt(i)) - 1
            If Not CBool(dict.Item(bt(i))) Then dict.Remove bt(i)
        End If
    Next i

    NotThere = Mid(tmp, Len(delim) + 1)

End Function

识别以逗号分隔的字符串的新增内容

#3


2  

edited to account for possible features as substrings of other features

编辑以考虑可能的功能作为其他功能的子串

you could use this UDF:

你可以使用这个UDF:

Public Function NewFeatures(ByVal txt1 As String, txt2 As String) As String
    Dim feat As Variant
    txt1 = "," & txt1 & ","
    For Each feat In Split(txt2, ",")
        txt1 = Replace(txt1, "," & feat & ",", ",,", , 1)
    Next
    NewFeatures = Replace(WorksheetFunction.Trim(Join(Split(txt1, ","), " ")), " ", ",")
End Function