I have an excel spreadsheet with 50 rows of comma-delimited data. The number of features contained within the comma-delimited data increases from bottom to top i.e row 50 (the last row) always has the fewest delimiters, and row 1 (the first row) always has the most delimiters. The number of features increases randomly, and each feature can be either unique or duplicated. Either multiple or single features can be added to the string on each row. The features are placed randomly into the previous rows' comma-delimited string i.e they can be placed into the middle of the string on the previous row, or at the beginning or the end of the previous string. If there are multiple added to a row, they may not be placed together. For example:
我有一个包含50行逗号分隔数据的Excel电子表格。逗号分隔数据中包含的特征数量从下到上增加,即第50行(最后一行)总是具有最少的分隔符,第1行(第一行)总是具有最多分隔符。功能的数量随机增加,每个功能可以是唯一的或重复的。可以将多个或单个要素添加到每行的字符串中。这些特征随机放入前面行的逗号分隔字符串中,即它们可以放在前一行的字符串中间,或者放在前一个字符串的开头或结尾。如果有多个添加到一行,它们可能不会放在一起。例如:
1 fish,pig,cat,dog,fish,mouse,fish,cow
2 pig,cat,dog,fish,mouse,fish
3 pig,cat,dog,fish,mouse
4 pig,cat,dog,mouse
5 pig,cat,dog,mouse
6 cat,dog,mouse
7 cat,mouse
8 cat,mouse
9 cat
10
I need to extract the feature(s) that have been added to the comma-delimited string on each row, preferably using a UDF. The desired output from the above example would be:
我需要提取已添加到每行上逗号分隔字符串的功能,最好使用UDF。上例中的所需输出为:
1 fish,cow
2 fish
3 fish
4
5 pig
6 dog
7
8 mouse
9 cat
10
I have had some success using a UDF that compares adjacent rows, and extracts any unique values between the two rows in an adjacent column (i.e if the UDF is used on rows 4 and 5 in B4, B4 will be blank; however, if the UDF is used on rows 3 and 4 in B3, B3 will have the value "fish") . However, this causes problems since some of the features are duplicated (see rows 1 and 2 in the above example). This results in the UDF returning a blank value when a duplicate has been added to the string.
我使用比较相邻行的UDF取得了一些成功,并提取了相邻列中两行之间的任何唯一值(即如果在B4中的行4和5上使用UDF,则B4将为空;但是,如果UDF用于B3中的第3行和第4行,B3将具有值“fish”)。但是,这会导致问题,因为某些功能是重复的(请参阅上例中的第1行和第2行)。这会导致UDF在将重复项添加到字符串时返回空值。
I have had most success with these (very slightly adjusted) UDFs that I found on stack exchange, particularly the former:
我在堆栈交换中发现的这些(非常轻微调整的)UDF取得了最大的成功,尤其是前者:
Function NotThere(BaseText As String, TestText As String) As String
Dim V As Variant, BaseWords() As String
NotThere = "" & TestText & ","
For Each V In Split(BaseText, ",")
NotThere = Replace(NotThere, V & ",", ",")
Next
NotThere = Mid(Application.Trim(NotThere), 3, Len(NotThere) - 0)
End Function
and
和
Function Dups(R1 As String, R2 As String) As String
Dim nstr As String, R As Variant
For Each R In Split(R2, ",")
If InStr(R1, Trim(R)) = 0 Then
nstr = nstr & IIf(nstr = "", R, "," & R)
End If
Next R
Dups = nstr
End Function
I have also tried the method suggested here: http://www.ozgrid.com/VBA/array-differences.htm, but continually get #VALUE errors.
我也试过这里建议的方法:http://www.ozgrid.com/VBA/array-differences.htm,但不断得到#VALUE错误。
3 个解决方案
#1
4
iterate both arrays and remove as duplicates are found. When done return what is left:
迭代两个数组并删除,因为找到了重复项。完成后返回剩下的内容:
Function newadd(rng1 As String, rng2 As String) As String
If rng1 = "" Then
newadd = rng2
Exit Function
End If
Dim spltStr1() As String
spltStr1 = Split(rng1, ",")
Dim spltstr2() As String
spltstr2 = Split(rng2, ",")
Dim i As Long, j As Long
Dim temp As String
For i = LBound(spltstr2) To UBound(spltstr2)
For j = LBound(spltStr1) To UBound(spltStr1)
If spltStr1(j) = spltstr2(i) Then
spltStr1(j) = ""
spltstr2(i) = ""
Exit For
End If
Next j
If spltstr2(i) <> "" Then
temp = temp & "," & spltstr2(i)
End If
Next i
newadd = Mid(temp, 2)
End Function
#2
4
Try a scripting dictionary to track your duplicates.
尝试使用脚本字典来跟踪重复项。
Option Explicit
Function NotThere(BaseText As String, TestText As String, _
Optional delim As String = ",") As String
Static dict As Object
Dim bt As Variant, tt As Variant, i As Long, tmp As String
If dict Is Nothing Then
Set dict = CreateObject("scripting.dictionary")
Else
dict.RemoveAll
End If
dict.CompareMode = vbTextCompare
tt = Split(TestText, delim)
bt = Split(BaseText, delim)
For i = LBound(tt) To UBound(tt)
If Not dict.exists(tt(i)) Then
dict.Item(tt(i)) = 1
Else
dict.Item(tt(i)) = dict.Item(tt(i)) + 1
End If
Next i
For i = LBound(bt) To UBound(bt)
If Not dict.exists(bt(i)) Then
tmp = tmp & delim & bt(i)
Else
dict.Item(bt(i)) = dict.Item(bt(i)) - 1
If Not CBool(dict.Item(bt(i))) Then dict.Remove bt(i)
End If
Next i
NotThere = Mid(tmp, Len(delim) + 1)
End Function
#3
2
edited to account for possible features as substrings of other features
编辑以考虑可能的功能作为其他功能的子串
you could use this UDF:
你可以使用这个UDF:
Public Function NewFeatures(ByVal txt1 As String, txt2 As String) As String
Dim feat As Variant
txt1 = "," & txt1 & ","
For Each feat In Split(txt2, ",")
txt1 = Replace(txt1, "," & feat & ",", ",,", , 1)
Next
NewFeatures = Replace(WorksheetFunction.Trim(Join(Split(txt1, ","), " ")), " ", ",")
End Function
#1
4
iterate both arrays and remove as duplicates are found. When done return what is left:
迭代两个数组并删除,因为找到了重复项。完成后返回剩下的内容:
Function newadd(rng1 As String, rng2 As String) As String
If rng1 = "" Then
newadd = rng2
Exit Function
End If
Dim spltStr1() As String
spltStr1 = Split(rng1, ",")
Dim spltstr2() As String
spltstr2 = Split(rng2, ",")
Dim i As Long, j As Long
Dim temp As String
For i = LBound(spltstr2) To UBound(spltstr2)
For j = LBound(spltStr1) To UBound(spltStr1)
If spltStr1(j) = spltstr2(i) Then
spltStr1(j) = ""
spltstr2(i) = ""
Exit For
End If
Next j
If spltstr2(i) <> "" Then
temp = temp & "," & spltstr2(i)
End If
Next i
newadd = Mid(temp, 2)
End Function
#2
4
Try a scripting dictionary to track your duplicates.
尝试使用脚本字典来跟踪重复项。
Option Explicit
Function NotThere(BaseText As String, TestText As String, _
Optional delim As String = ",") As String
Static dict As Object
Dim bt As Variant, tt As Variant, i As Long, tmp As String
If dict Is Nothing Then
Set dict = CreateObject("scripting.dictionary")
Else
dict.RemoveAll
End If
dict.CompareMode = vbTextCompare
tt = Split(TestText, delim)
bt = Split(BaseText, delim)
For i = LBound(tt) To UBound(tt)
If Not dict.exists(tt(i)) Then
dict.Item(tt(i)) = 1
Else
dict.Item(tt(i)) = dict.Item(tt(i)) + 1
End If
Next i
For i = LBound(bt) To UBound(bt)
If Not dict.exists(bt(i)) Then
tmp = tmp & delim & bt(i)
Else
dict.Item(bt(i)) = dict.Item(bt(i)) - 1
If Not CBool(dict.Item(bt(i))) Then dict.Remove bt(i)
End If
Next i
NotThere = Mid(tmp, Len(delim) + 1)
End Function
#3
2
edited to account for possible features as substrings of other features
编辑以考虑可能的功能作为其他功能的子串
you could use this UDF:
你可以使用这个UDF:
Public Function NewFeatures(ByVal txt1 As String, txt2 As String) As String
Dim feat As Variant
txt1 = "," & txt1 & ","
For Each feat In Split(txt2, ",")
txt1 = Replace(txt1, "," & feat & ",", ",,", , 1)
Next
NewFeatures = Replace(WorksheetFunction.Trim(Join(Split(txt1, ","), " ")), " ", ",")
End Function