is there built in functionality in vba to get unique values from a one-dimensional array? what about just getting rid of duplicates?
在vba中是否有内置的功能来从一维数组中获得唯一的值?去掉重复的东西怎么样?
if not, then how would i get the unique values from an array?
如果没有,那么如何从数组中获得唯一值?
8 个解决方案
#1
47
This post contains 2 examples. I like the 2nd one:
这篇文章包含两个例子。我喜欢第二点:
Sub unique()
Dim arr As New Collection, a
Dim aFirstArray() As Variant
Dim i As Long
aFirstArray() = Array("Banana", "Apple", "Orange", "Tomato", "Apple", _
"Lemon", "Lime", "Lime", "Apple")
On Error Resume Next
For Each a In aFirstArray
arr.Add a, a
Next
For i = 1 To arr.Count
Cells(i, 1) = arr(i)
Next
End Sub
#2
35
There's no built-in functionality to remove duplicates from arrays. Raj's answer seems elegant, but I prefer to use dictionaries.
没有内置的功能可以从数组中删除副本。拉杰的回答似乎很优雅,但我更喜欢用字典。
Dim d As Object
Set d = CreateObject("Scripting.Dictionary")
'Set d = New Scripting.Dictionary
Dim i As Long
For i = LBound(myArray) To UBound(myArray)
d(myArray(i)) = 1
Next i
Dim v As Variant
For Each v In d.Keys()
'd.Keys() is a Variant array of the unique values in myArray.
'v will iterate through each of them.
Next v
EDIT: I changed the loop to use LBound
and UBound
as per Tomalak's suggested answer. EDIT: d.Keys()
is a Variant array, not a Collection.
编辑:我改变了循环,使用了LBound和UBound,按照Tomalak的建议答案。EDIT: d.Keys()是一个变体数组,不是集合。
#3
15
Update (6/15/16)
I have created much more thorough benchmarks. First of all, as @ChaimG pointed out, early binding makes a big difference (I originally used @eksortso's code above verbatim which uses late binding). Secondly, my original benchmarks only included the time to create the unique object, however, it did not test the efficiency of using the object. My point in doing this is, it doesn't really matter if I can create an object really fast if the object I create is clunky and slows me down moving forward.
我创造了更彻底的基准。首先,正如@ChaimG指出的,早期绑定会产生很大的不同(我最初使用的是@eksortso的代码,它使用了后期绑定)。其次,我的原始基准只包括创建唯一对象的时间,但是它没有测试使用对象的效率。我这样做的目的是,如果我创建的对象是笨拙的,并且让我放慢前进的速度,那么我是否能够快速创建一个对象并不重要。
Old Remark: It turns out, that looping over a collection object is highly inefficient
老注:事实证明,在集合对象上循环是非常低效的。
It turns out that looping over a collection can be quite efficient if you know how to do it (I didn't). As @ChaimG (yet again), pointed out in the comments, using a For Each
construct is ridiculously superior to simply using a For
loop. To give you an idea, before changing the loop construct, the time for Collection2
for the Test Case Size = 10^6
was over 1400s (i.e. ~23 minutes). It is now a meager 0.195s (over 7000x faster).
事实证明,如果你知道如何去做(我没有),那么在一个集合上循环是非常有效的。正如@ChaimG(再次)在评论中指出的那样,使用For Each构造比简单地使用For循环要优越得多。之前给你一个想法,改变循环结构,测试用例的时间Collection2大小= 10 ^ 6在1400年代(例如~ 23分钟)。现在只有0.195(比7000倍快)。
For the Collection
method there are two times. The first (my original benchmark Collection1
) show the time to create the unique object. The second part (Collection2
) shows the time to loop over the object (which is very natural) to create a returnable array as the other functions do.
对于集合方法,有两次。第一个(我的原始基准集合1)显示了创建唯一对象的时间。第二部分(Collection2)显示了对对象进行循环的时间(这是非常自然的),以创建一个可返回的数组,就像其他函数一样。
In the chart below, a yellow background indicates that it was the fastest for that test case, and red indicates the slowest ("Not Tested" algorithms are excluded). The total time for the Collection
method is the sum of Collection1
and Collection2
. Turquoise indicates that is was the fastest regardless of original order.
在下面的图表中,黄色的背景表明它是测试用例中最快的,红色表示最慢的(“未测试”的算法除外)。集合方法的总时间是Collection1和Collection2的总和。绿松石表明这是最快的,不管原来的顺序。
Below is the original algorithm I created (I have modified it slightly e.g. I no longer instantiate my own data type). It returns the unique values of an array with the original order in a very respectable time and it can be modified to take on any data type. Outside of the IndexMethod
, it is the fastest algorithm for very large arrays.
下面是我创建的原始算法(我已经稍微修改过了,我不再实例化我自己的数据类型)。它在一个非常值得尊敬的时间内返回一个数组的惟一值,并且可以修改它以接受任何数据类型。在IndexMethod之外,它是非常大的数组中最快的算法。
Here are the main ideas behind this algorithm:
下面是这个算法的主要思想:
- Index the array
- 索引数组
- Sort by values
- 按值排序
- Place identical values at the end of the array and subsequently "chop" them off.
- 将相同的值放在数组的末尾,然后将它们“切碎”。
- Finally, sort by index.
- 最后,通过索引。
Below is an example:
下面是一个例子:
Let myArray = (86, 100, 33, 19, 33, 703, 19, 100, 703, 19)
1. (86, 100, 33, 19, 33, 703, 19, 100, 703, 19)
(1 , 2, 3, 4, 5, 6, 7, 8, 9, 10) <<-- Indexing
2. (19, 19, 19, 33, 33, 86, 100, 100, 703, 703) <<-- sort by values
(4, 7, 10, 3, 5, 1, 2, 8, 6, 9)
3. (19, 33, 86, 100, 703) <<-- remove duplicates
(4, 3, 1, 2, 6)
4. (86, 100, 33, 19, 703)
( 1, 2, 3, 4, 6) <<-- sort by index
Here is the code:
这是代码:
Function SortingUniqueTest(ByRef myArray() As Long, bOrigIndex As Boolean) As Variant
Dim MyUniqueArr() As Long, i As Long, intInd As Integer
Dim StrtTime As Double, Endtime As Double, HighB As Long, LowB As Long
LowB = LBound(myArray): HighB = UBound(myArray)
ReDim MyUniqueArr(1 To 2, LowB To HighB)
intInd = 1 - LowB 'Guarantees the indices span 1 to Lim
For i = LowB To HighB
MyUniqueArr(1, i) = myArray(i)
MyUniqueArr(2, i) = i + intInd
Next i
QSLong2D MyUniqueArr, 1, LBound(MyUniqueArr, 2), UBound(MyUniqueArr, 2), 2
Call UniqueArray2D(MyUniqueArr)
If bOrigIndex Then QSLong2D MyUniqueArr, 2, LBound(MyUniqueArr, 2), UBound(MyUniqueArr, 2), 2
SortingUniqueTest = MyUniqueArr()
End Function
Public Sub UniqueArray2D(ByRef myArray() As Long)
Dim i As Long, j As Long, Count As Long, Count1 As Long, DuplicateArr() As Long
Dim lngTemp As Long, HighB As Long, LowB As Long
LowB = LBound(myArray, 2): Count = LowB: i = LowB: HighB = UBound(myArray, 2)
Do While i < HighB
j = i + 1
If myArray(1, i) = myArray(1, j) Then
Do While myArray(1, i) = myArray(1, j)
ReDim Preserve DuplicateArr(1 To Count)
DuplicateArr(Count) = j
Count = Count + 1
j = j + 1
If j > HighB Then Exit Do
Loop
QSLong2D myArray, 2, i, j - 1, 2
End If
i = j
Loop
Count1 = HighB
If Count > 1 Then
For i = UBound(DuplicateArr) To LBound(DuplicateArr) Step -1
myArray(1, DuplicateArr(i)) = myArray(1, Count1)
myArray(2, DuplicateArr(i)) = myArray(2, Count1)
Count1 = Count1 - 1
ReDim Preserve myArray(1 To 2, LowB To Count1)
Next i
End If
End Sub
Here is the sorting algorithm I use (more about this algo here).
这里是我使用的排序算法(更多关于这个algo)。
Sub QSLong2D(ByRef saArray() As Long, bytDim As Byte, lLow1 As Long, lHigh1 As Long, bytNum As Byte)
Dim lLow2 As Long, lHigh2 As Long
Dim sKey As Long, sSwap As Long, i As Byte
On Error GoTo ErrorExit
If IsMissing(lLow1) Then lLow1 = LBound(saArray, bytDim)
If IsMissing(lHigh1) Then lHigh1 = UBound(saArray, bytDim)
lLow2 = lLow1
lHigh2 = lHigh1
sKey = saArray(bytDim, (lLow1 + lHigh1) \ 2)
Do While lLow2 < lHigh2
Do While saArray(bytDim, lLow2) < sKey And lLow2 < lHigh1: lLow2 = lLow2 + 1: Loop
Do While saArray(bytDim, lHigh2) > sKey And lHigh2 > lLow1: lHigh2 = lHigh2 - 1: Loop
If lLow2 < lHigh2 Then
For i = 1 To bytNum
sSwap = saArray(i, lLow2)
saArray(i, lLow2) = saArray(i, lHigh2)
saArray(i, lHigh2) = sSwap
Next i
End If
If lLow2 <= lHigh2 Then
lLow2 = lLow2 + 1
lHigh2 = lHigh2 - 1
End If
Loop
If lHigh2 > lLow1 Then QSLong2D saArray(), bytDim, lLow1, lHigh2, bytNum
If lLow2 < lHigh1 Then QSLong2D saArray(), bytDim, lLow2, lHigh1, bytNum
ErrorExit:
End Sub
Below is a special algorithm that is blazing fast if your data contains integers. It makes use of indexing and the Boolean data type.
下面是一种特殊的算法,如果你的数据包含整数,它就会快速的燃烧。它利用了索引和布尔数据类型。
Function IndexSort(ByRef myArray() As Long, bOrigIndex As Boolean) As Variant
'' Modified to take both positive and negative integers
Dim arrVals() As Long, arrSort() As Long, arrBool() As Boolean
Dim i As Long, HighB As Long, myMax As Long, myMin As Long, OffSet As Long
Dim LowB As Long, myIndex As Long, count As Long, myRange As Long
HighB = UBound(myArray)
LowB = LBound(myArray)
For i = LowB To HighB
If myArray(i) > myMax Then myMax = myArray(i)
If myArray(i) < myMin Then myMin = myArray(i)
Next i
OffSet = Abs(myMin) '' Number that will be added to every element
'' to guarantee every index is non-negative
If myMax > 0 Then
myRange = myMax + OffSet '' E.g. if myMax = 10 & myMin = -2, then myRange = 12
Else
myRange = OffSet
End If
If bOrigIndex Then
ReDim arrSort(1 To 2, 1 To HighB)
ReDim arrVals(1 To 2, 0 To myRange)
ReDim arrBool(0 To myRange)
For i = LowB To HighB
myIndex = myArray(i) + OffSet
arrBool(myIndex) = True
arrVals(1, myIndex) = myArray(i)
If arrVals(2, myIndex) = 0 Then arrVals(2, myIndex) = i
Next i
For i = 0 To myRange
If arrBool(i) Then
count = count + 1
arrSort(1, count) = arrVals(1, i)
arrSort(2, count) = arrVals(2, i)
End If
Next i
QSLong2D arrSort, 2, 1, count, 2
ReDim Preserve arrSort(1 To 2, 1 To count)
Else
ReDim arrSort(1 To HighB)
ReDim arrVals(0 To myRange)
ReDim arrBool(0 To myRange)
For i = LowB To HighB
myIndex = myArray(i) + OffSet
arrBool(myIndex) = True
arrVals(myIndex) = myArray(i)
Next i
For i = 0 To myRange
If arrBool(i) Then
count = count + 1
arrSort(count) = arrVals(i)
End If
Next i
ReDim Preserve arrSort(1 To count)
End If
ReDim arrVals(0)
ReDim arrBool(0)
IndexSort = arrSort
End Function
Here are the Collection (by @DocBrown) and Dictionary (by @eksortso) Functions.
这里是集合(由@DocBrown)和Dictionary(通过@eksortso)函数。
Function CollectionTest(ByRef arrIn() As Long, Lim As Long) As Variant
Dim arr As New Collection, a, i As Long, arrOut() As Variant, aFirstArray As Variant
Dim StrtTime As Double, EndTime1 As Double, EndTime2 As Double, count As Long
On Error Resume Next
ReDim arrOut(1 To UBound(arrIn))
ReDim aFirstArray(1 To UBound(arrIn))
StrtTime = Timer
For i = 1 To UBound(arrIn): aFirstArray(i) = CStr(arrIn(i)): Next i '' Convert to string
For Each a In aFirstArray ''' This part is actually creating the unique set
arr.Add a, a
Next
EndTime1 = Timer - StrtTime
StrtTime = Timer ''' This part is writing back to an array for return
For Each a In arr: count = count + 1: arrOut(count) = a: Next a
EndTime2 = Timer - StrtTime
CollectionTest = Array(arrOut, EndTime1, EndTime2)
End Function
Function DictionaryTest(ByRef myArray() As Long, Lim As Long) As Variant
Dim StrtTime As Double, Endtime As Double
Dim d As Scripting.Dictionary, i As Long '' Early Binding
Set d = New Scripting.Dictionary
For i = LBound(myArray) To UBound(myArray): d(myArray(i)) = 1: Next i
DictionaryTest = d.Keys()
End Function
Here is the Direct approach provided by @IsraelHoletz.
这是@以色列- holetz提供的直接方法。
Function ArrayUnique(ByRef aArrayIn() As Long) As Variant
Dim aArrayOut() As Variant, bFlag As Boolean, vIn As Variant, vOut As Variant
Dim i As Long, j As Long, k As Long
ReDim aArrayOut(LBound(aArrayIn) To UBound(aArrayIn))
i = LBound(aArrayIn)
j = i
For Each vIn In aArrayIn
For k = j To i - 1
If vIn = aArrayOut(k) Then bFlag = True: Exit For
Next
If Not bFlag Then aArrayOut(i) = vIn: i = i + 1
bFlag = False
Next
If i <> UBound(aArrayIn) Then ReDim Preserve aArrayOut(LBound(aArrayIn) To i - 1)
ArrayUnique = aArrayOut
End Function
Function DirectTest(ByRef aArray() As Long, Lim As Long) As Variant
Dim aReturn() As Variant
Dim StrtTime As Long, Endtime As Long, i As Long
aReturn = ArrayUnique(aArray)
DirectTest = aReturn
End Function
Here is the benchmark function that compares all of the functions. You should note that the last two cases are handled a little bit different because of memory issues. Also note, that I didn't test the Collection
method for the Test Case Size = 10,000,000
. For some reason, it was returning incorrect results and behaving unusual (I'm guessing the collection object has a limit on how many things you can put in it. I searched and I couldn't find any literature on this).
这里是比较所有函数的基准函数。您应该注意到,由于内存问题,最后两个案例处理起来有点不同。还要注意,我没有测试测试用例大小的集合方法= 10,000,000。由于某些原因,它返回不正确的结果和行为异常(我猜测集合对象对您可以放入多少东西有限制)。我找遍了,我找不到任何有关这方面的资料。
Function UltimateTest(Lim As Long, bTestDirect As Boolean, bTestDictionary, bytCase As Byte) As Variant
Dim dictionTest, collectTest, sortingTest1, indexTest1, directT '' all variants
Dim arrTest() As Long, i As Long, bEquality As Boolean, SizeUnique As Long
Dim myArray() As Long, StrtTime As Double, EndTime1 As Variant
Dim EndTime2 As Double, EndTime3 As Variant, EndTime4 As Double
Dim EndTime5 As Double, EndTime6 As Double, sortingTest2, indexTest2
ReDim myArray(1 To Lim): Rnd (-2) '' If you want to test negative numbers,
'' insert this to the left of CLng(Int(Lim... : (-1) ^ (Int(2 * Rnd())) *
For i = LBound(myArray) To UBound(myArray): myArray(i) = CLng(Int(Lim * Rnd() + 1)): Next i
arrTest = myArray
If bytCase = 1 Then
If bTestDictionary Then
StrtTime = Timer: dictionTest = DictionaryTest(arrTest, Lim): EndTime1 = Timer - StrtTime
Else
EndTime1 = "Not Tested"
End If
arrTest = myArray
collectTest = CollectionTest(arrTest, Lim)
arrTest = myArray
StrtTime = Timer: sortingTest1 = SortingUniqueTest(arrTest, True): EndTime2 = Timer - StrtTime
SizeUnique = UBound(sortingTest1, 2)
If bTestDirect Then
arrTest = myArray: StrtTime = Timer: directT = DirectTest(arrTest, Lim): EndTime3 = Timer - StrtTime
Else
EndTime3 = "Not Tested"
End If
arrTest = myArray
StrtTime = Timer: indexTest1 = IndexSort(arrTest, True): EndTime4 = Timer - StrtTime
arrTest = myArray
StrtTime = Timer: sortingTest2 = SortingUniqueTest(arrTest, False): EndTime5 = Timer - StrtTime
arrTest = myArray
StrtTime = Timer: indexTest2 = IndexSort(arrTest, False): EndTime6 = Timer - StrtTime
bEquality = True
For i = LBound(sortingTest1, 2) To UBound(sortingTest1, 2)
If Not CLng(collectTest(0)(i)) = sortingTest1(1, i) Then
bEquality = False
Exit For
End If
Next i
For i = LBound(dictionTest) To UBound(dictionTest)
If Not dictionTest(i) = sortingTest1(1, i + 1) Then
bEquality = False
Exit For
End If
Next i
For i = LBound(dictionTest) To UBound(dictionTest)
If Not dictionTest(i) = indexTest1(1, i + 1) Then
bEquality = False
Exit For
End If
Next i
If bTestDirect Then
For i = LBound(dictionTest) To UBound(dictionTest)
If Not dictionTest(i) = directT(i + 1) Then
bEquality = False
Exit For
End If
Next i
End If
UltimateTest = Array(bEquality, EndTime1, EndTime2, EndTime3, EndTime4, _
EndTime5, EndTime6, collectTest(1), collectTest(2), SizeUnique)
ElseIf bytCase = 2 Then
arrTest = myArray
collectTest = CollectionTest(arrTest, Lim)
UltimateTest = Array(collectTest(1), collectTest(2))
ElseIf bytCase = 3 Then
arrTest = myArray
StrtTime = Timer: sortingTest1 = SortingUniqueTest(arrTest, True): EndTime2 = Timer - StrtTime
SizeUnique = UBound(sortingTest1, 2)
UltimateTest = Array(EndTime2, SizeUnique)
ElseIf bytCase = 4 Then
arrTest = myArray
StrtTime = Timer: indexTest1 = IndexSort(arrTest, True): EndTime4 = Timer - StrtTime
UltimateTest = EndTime4
ElseIf bytCase = 5 Then
arrTest = myArray
StrtTime = Timer: sortingTest2 = SortingUniqueTest(arrTest, False): EndTime5 = Timer - StrtTime
UltimateTest = EndTime5
ElseIf bytCase = 6 Then
arrTest = myArray
StrtTime = Timer: indexTest2 = IndexSort(arrTest, False): EndTime6 = Timer - StrtTime
UltimateTest = EndTime6
End If
End Function
And finally, here is the sub that produces the table above.
最后,这是生成上面表格的子。
Sub GetBenchmarks()
Dim myVar, i As Long, TestCases As Variant, j As Long, temp
TestCases = Array(1000, 5000, 10000, 20000, 50000, 100000, 200000, 500000, 1000000, 2000000, 5000000, 10000000)
For j = 0 To 11
If j < 6 Then
myVar = UltimateTest(CLng(TestCases(j)), True, True, 1)
ElseIf j < 10 Then
myVar = UltimateTest(CLng(TestCases(j)), False, True, 1)
ElseIf j < 11 Then
myVar = Array("Not Tested", "Not Tested", 0.1, "Not Tested", 0.1, 0.1, 0.1, 0, 0, 0)
temp = UltimateTest(CLng(TestCases(j)), False, False, 2)
myVar(7) = temp(0): myVar(8) = temp(1)
temp = UltimateTest(CLng(TestCases(j)), False, False, 3)
myVar(2) = temp(0): myVar(9) = temp(1)
myVar(4) = UltimateTest(CLng(TestCases(j)), False, False, 4)
myVar(5) = UltimateTest(CLng(TestCases(j)), False, False, 5)
myVar(6) = UltimateTest(CLng(TestCases(j)), False, False, 6)
Else
myVar = Array("Not Tested", "Not Tested", 0.1, "Not Tested", 0.1, 0.1, 0.1, "Not Tested", "Not Tested", 0)
temp = UltimateTest(CLng(TestCases(j)), False, False, 3)
myVar(2) = temp(0): myVar(9) = temp(1)
myVar(4) = UltimateTest(CLng(TestCases(j)), False, False, 4)
myVar(5) = UltimateTest(CLng(TestCases(j)), False, False, 5)
myVar(6) = UltimateTest(CLng(TestCases(j)), False, False, 6)
End If
Cells(4 + j, 6) = TestCases(j)
For i = 1 To 9: Cells(4 + j, 6 + i) = myVar(i - 1): Next i
Cells(4 + j, 17) = myVar(9)
Next j
End Sub
Summary
From the table of results, we can see that the Dictionary
method works really well for cases less than about 500,000, however, after that, the IndexMethod
really starts to dominate. You will notice that when order doesn't matter and your data is made up of positive integers, there is no comparison to the IndexMethod
algorithm (it returns the unique values from an array containing 10 million elements in less than 1 sec!!! Incredible!). Below I have a breakdown of which algorithm is preferred in various cases.
从结果表中可以看出,字典方法对于小于50万的案例非常有效,然而,在此之后,IndexMethod真正开始占据主导地位。您会注意到,当顺序无关紧要,而您的数据是由正整数组成时,没有与IndexMethod算法进行比较(它从一个包含1,000万个元素的数组中返回唯一值,在不到1秒!!!)难以置信!)下面我列出了在各种情况下首选算法的分类。
Case 1
Your Data contains integers (i.e. whole numbers, both positive and negative): IndexMethod
案例1你的数据包含整数(即整数,正负):IndexMethod。
Case 2
Your Data contains non-integers (i.e. variant, double, string, etc.) with less than 200000 elements: Dictionary Method
Case 2您的数据包含了非整数(即变量、双、字符串等),其中包含了少于20万个元素:Dictionary方法。
Case 3
Your Data contains non-integers (i.e. variant, double, string, etc.) with more than 200000 elements: Collection Method
案例3您的数据包含有超过20万个元素的非整数(即变体、双、字符串等):集合方法。
If you had to choose one algorithm, in my opinion, the Collection
method is still the best as it only requires a few lines of code, it's super general, and it's fast enough.
如果您必须选择一个算法,在我看来,集合方法仍然是最好的,因为它只需要几行代码,它是超级通用的,而且速度足够快。
#4
3
No, nothing built-in. Do it yourself:
不,没有内置的。做自己:
- Instantiate a
Scripting.Dictionary
object - 实例化一个脚本。字典对象
- Write a
For
loop over your array (be sure to useLBound()
andUBound()
instead of looping from 0 to x!) - 为您的数组写一个For循环(请务必使用LBound()和UBound()而不是从0循环到x!)
- On each iteration, check
Exists()
on the dictionary. Add every array value (that doesn't already exist) as a key to the dictionary (useas I've just learned, keys can be of any type in aCStr()
since keys must be stringsScripting.Dictionary
), also store the array value itself into the dictionary. - 在每次迭代中,检查字典上的Exists()。将每个数组值(不存在的)添加为字典的键(使用CStr(),因为键必须是字符串,因为我刚学过,键可以是任何类型的脚本。字典),也将数组值本身存储到字典中。
- When done, use
Keys()
(orItems()
) to return all values of the dictionary as a new, now unique array. - 当完成时,使用Keys()(或Items())将字典的所有值作为一个新的、现在唯一的数组返回。
- In my tests, the Dictionary keeps original order of all added values, so the output will be ordered like the input was. I'm not sure if this is documented and reliable behavior, though.
- 在我的测试中,字典保留了所有添加值的原始顺序,因此输出将像输入一样有序。但我不确定这是否有记录和可靠的行为。
#5
2
I don't know of any built-in functionality in VBA. The best would be to use a collection using the value as key and only add to it if a value doesn't exist.
我不知道VBA中有什么内置功能。最好的方法是使用值作为键的集合,如果值不存在,则只添加该值。
#6
2
No, VBA does not have this functionality. You can use the technique of adding each item to a collection using the item as the key. Since a collection does not allow duplicate keys, the result is distinct values that you can copy to an array, if needed.
不,VBA没有这个功能。您可以使用将项作为键将每个项添加到集合中的技术。由于集合不允许重复键,所以结果是不同的值,如果需要,您可以复制到一个数组。
You may also want something more robust. See Distinct Values Function at http://www.cpearson.com/excel/distinctvalues.aspx
你可能还想要一些更健壮的东西。在http://www.cpearson.com/excel/distinctvalues.aspx查看不同的值函数。
Distinct Values Function
不同值函数
A VBA Function that will return an array of the distinct values in a range or array of input values.
一个VBA函数,它将返回一个数组中的不同值的数组。
Excel has some manual methods, such as Advanced Filter, for getting a list of distinct items from an input range. The drawback of using such methods is that you must manually refresh the results when the input data changes. Moreover, these methods work only with ranges, not arrays of values, and, not being functions, cannot be called from worksheet cells or incorporated into array formulas. This page describes a VBA function called DistinctValues that accepts as input either a range or an array of data and returns as its result an array containing the distinct items from the input list. That is, the elements with all duplicates removed. The order of the input elements is preserved. The order of the elements in the output array is the same as the order in the input values. The function can be called from an array entered range on a worksheet (see this page for information about array formulas), or from in an array formula in a single worksheet cell, or from another VB function.
Excel有一些手动方法,比如高级筛选器,用于从输入范围获取不同项的列表。使用这种方法的缺点是,当输入数据发生变化时,必须手动刷新结果。此外,这些方法只适用于范围,而不是值的数组,而不是函数,不能从工作表单元中调用或合并到数组公式中。这个页面描述了一个VBA函数,称为distinct tvalues,它接受一个范围或一个数据数组作为输入,并返回一个数组,该数组包含来自输入列表的不同项。也就是说,所有复制的元素都被删除了。输入元素的顺序被保留。输出数组中元素的顺序与输入值中的顺序相同。该函数可以从工作表中输入的数组中调用(请参阅该页面以获取有关数组公式的信息),或者从单个工作表单元格中的数组公式中,或者从另一个VB函数中调用。
#7
0
If the order of the deduplicated array does not matter to you, you can use my pragmatic function:
如果重复的数组的顺序对你不重要,你可以使用我的语用功能:
Function DeDupArray(ia() As String)
Dim newa() As String
ReDim newa(999)
ni = -1
For n = LBound(ia) To UBound(ia)
dup = False
If n <= UBound(ia) Then
For k = n + 1 To UBound(ia)
If ia(k) = ia(n) Then dup = True
Next k
If dup = False And Trim(ia(n)) <> "" Then
ni = ni + 1
newa(ni) = ia(n)
End If
End If
Next n
If ni > -1 Then
ReDim Preserve newa(ni)
Else
ReDim Preserve newa(1)
End If
DeDupArray = newa
End Function
Sub testdedup()
Dim m(5) As String
Dim m2() As String
m(0) = "Horse"
m(1) = "Cow"
m(2) = "Dear"
m(3) = "Horse"
m(4) = "Joke"
m(5) = "Cow"
m2 = DeDupArray(m)
t = ""
For n = LBound(m2) To UBound(m2)
t = t & n & "=" & m2(n) & " "
Next n
MsgBox t
End Sub
From the test function, it will result in the following deduplicated array:
从测试函数中,它将导致以下被重复的数组:
"0=Dear 1=Horse 2=Joke 3=Cow "
“0=亲爱的1=马2=笑话3=牛”
#8
-1
The Collection and Dictionary solutions are all nice and shine for a short approach, but if you want speed try using a more direct approach:
收集和字典解决方案都很好,而且很适合做一个简短的方法,但是如果你想要速度,尝试使用更直接的方法:
Function ArrayUnique(ByVal aArrayIn As Variant) As Variant
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
' ArrayUnique
' This function removes duplicated values from a single dimension array
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Dim aArrayOut() As Variant
Dim bFlag As Boolean
Dim vIn As Variant
Dim vOut As Variant
Dim i%, j%, k%
ReDim aArrayOut(LBound(aArrayIn) To UBound(aArrayIn))
i = LBound(aArrayIn)
j = i
For Each vIn In aArrayIn
For k = j To i - 1
If vIn = aArrayOut(k) Then bFlag = True: Exit For
Next
If Not bFlag Then aArrayOut(i) = vIn: i = i + 1
bFlag = False
Next
If i <> UBound(aArrayIn) Then ReDim Preserve aArrayOut(LBound(aArrayIn) To i - 1)
ArrayUnique = aArrayOut
End Function
Calling it:
叫它:
Sub Test()
Dim aReturn As Variant
Dim aArray As Variant
aArray = Array(1, 2, 3, 1, 2, 3, "Test", "Test")
aReturn = ArrayUnique(aArray)
End Sub
For speed comparasion, this will be 100x to 130x faster then the dictionary solution, and about 8000x to 13000x faster than the collection one.
对于速度比较而言,这将是100x到130x的速度,然后是字典解决方案,比集合1的速度快8000x到13000x。
#1
47
This post contains 2 examples. I like the 2nd one:
这篇文章包含两个例子。我喜欢第二点:
Sub unique()
Dim arr As New Collection, a
Dim aFirstArray() As Variant
Dim i As Long
aFirstArray() = Array("Banana", "Apple", "Orange", "Tomato", "Apple", _
"Lemon", "Lime", "Lime", "Apple")
On Error Resume Next
For Each a In aFirstArray
arr.Add a, a
Next
For i = 1 To arr.Count
Cells(i, 1) = arr(i)
Next
End Sub
#2
35
There's no built-in functionality to remove duplicates from arrays. Raj's answer seems elegant, but I prefer to use dictionaries.
没有内置的功能可以从数组中删除副本。拉杰的回答似乎很优雅,但我更喜欢用字典。
Dim d As Object
Set d = CreateObject("Scripting.Dictionary")
'Set d = New Scripting.Dictionary
Dim i As Long
For i = LBound(myArray) To UBound(myArray)
d(myArray(i)) = 1
Next i
Dim v As Variant
For Each v In d.Keys()
'd.Keys() is a Variant array of the unique values in myArray.
'v will iterate through each of them.
Next v
EDIT: I changed the loop to use LBound
and UBound
as per Tomalak's suggested answer. EDIT: d.Keys()
is a Variant array, not a Collection.
编辑:我改变了循环,使用了LBound和UBound,按照Tomalak的建议答案。EDIT: d.Keys()是一个变体数组,不是集合。
#3
15
Update (6/15/16)
I have created much more thorough benchmarks. First of all, as @ChaimG pointed out, early binding makes a big difference (I originally used @eksortso's code above verbatim which uses late binding). Secondly, my original benchmarks only included the time to create the unique object, however, it did not test the efficiency of using the object. My point in doing this is, it doesn't really matter if I can create an object really fast if the object I create is clunky and slows me down moving forward.
我创造了更彻底的基准。首先,正如@ChaimG指出的,早期绑定会产生很大的不同(我最初使用的是@eksortso的代码,它使用了后期绑定)。其次,我的原始基准只包括创建唯一对象的时间,但是它没有测试使用对象的效率。我这样做的目的是,如果我创建的对象是笨拙的,并且让我放慢前进的速度,那么我是否能够快速创建一个对象并不重要。
Old Remark: It turns out, that looping over a collection object is highly inefficient
老注:事实证明,在集合对象上循环是非常低效的。
It turns out that looping over a collection can be quite efficient if you know how to do it (I didn't). As @ChaimG (yet again), pointed out in the comments, using a For Each
construct is ridiculously superior to simply using a For
loop. To give you an idea, before changing the loop construct, the time for Collection2
for the Test Case Size = 10^6
was over 1400s (i.e. ~23 minutes). It is now a meager 0.195s (over 7000x faster).
事实证明,如果你知道如何去做(我没有),那么在一个集合上循环是非常有效的。正如@ChaimG(再次)在评论中指出的那样,使用For Each构造比简单地使用For循环要优越得多。之前给你一个想法,改变循环结构,测试用例的时间Collection2大小= 10 ^ 6在1400年代(例如~ 23分钟)。现在只有0.195(比7000倍快)。
For the Collection
method there are two times. The first (my original benchmark Collection1
) show the time to create the unique object. The second part (Collection2
) shows the time to loop over the object (which is very natural) to create a returnable array as the other functions do.
对于集合方法,有两次。第一个(我的原始基准集合1)显示了创建唯一对象的时间。第二部分(Collection2)显示了对对象进行循环的时间(这是非常自然的),以创建一个可返回的数组,就像其他函数一样。
In the chart below, a yellow background indicates that it was the fastest for that test case, and red indicates the slowest ("Not Tested" algorithms are excluded). The total time for the Collection
method is the sum of Collection1
and Collection2
. Turquoise indicates that is was the fastest regardless of original order.
在下面的图表中,黄色的背景表明它是测试用例中最快的,红色表示最慢的(“未测试”的算法除外)。集合方法的总时间是Collection1和Collection2的总和。绿松石表明这是最快的,不管原来的顺序。
Below is the original algorithm I created (I have modified it slightly e.g. I no longer instantiate my own data type). It returns the unique values of an array with the original order in a very respectable time and it can be modified to take on any data type. Outside of the IndexMethod
, it is the fastest algorithm for very large arrays.
下面是我创建的原始算法(我已经稍微修改过了,我不再实例化我自己的数据类型)。它在一个非常值得尊敬的时间内返回一个数组的惟一值,并且可以修改它以接受任何数据类型。在IndexMethod之外,它是非常大的数组中最快的算法。
Here are the main ideas behind this algorithm:
下面是这个算法的主要思想:
- Index the array
- 索引数组
- Sort by values
- 按值排序
- Place identical values at the end of the array and subsequently "chop" them off.
- 将相同的值放在数组的末尾,然后将它们“切碎”。
- Finally, sort by index.
- 最后,通过索引。
Below is an example:
下面是一个例子:
Let myArray = (86, 100, 33, 19, 33, 703, 19, 100, 703, 19)
1. (86, 100, 33, 19, 33, 703, 19, 100, 703, 19)
(1 , 2, 3, 4, 5, 6, 7, 8, 9, 10) <<-- Indexing
2. (19, 19, 19, 33, 33, 86, 100, 100, 703, 703) <<-- sort by values
(4, 7, 10, 3, 5, 1, 2, 8, 6, 9)
3. (19, 33, 86, 100, 703) <<-- remove duplicates
(4, 3, 1, 2, 6)
4. (86, 100, 33, 19, 703)
( 1, 2, 3, 4, 6) <<-- sort by index
Here is the code:
这是代码:
Function SortingUniqueTest(ByRef myArray() As Long, bOrigIndex As Boolean) As Variant
Dim MyUniqueArr() As Long, i As Long, intInd As Integer
Dim StrtTime As Double, Endtime As Double, HighB As Long, LowB As Long
LowB = LBound(myArray): HighB = UBound(myArray)
ReDim MyUniqueArr(1 To 2, LowB To HighB)
intInd = 1 - LowB 'Guarantees the indices span 1 to Lim
For i = LowB To HighB
MyUniqueArr(1, i) = myArray(i)
MyUniqueArr(2, i) = i + intInd
Next i
QSLong2D MyUniqueArr, 1, LBound(MyUniqueArr, 2), UBound(MyUniqueArr, 2), 2
Call UniqueArray2D(MyUniqueArr)
If bOrigIndex Then QSLong2D MyUniqueArr, 2, LBound(MyUniqueArr, 2), UBound(MyUniqueArr, 2), 2
SortingUniqueTest = MyUniqueArr()
End Function
Public Sub UniqueArray2D(ByRef myArray() As Long)
Dim i As Long, j As Long, Count As Long, Count1 As Long, DuplicateArr() As Long
Dim lngTemp As Long, HighB As Long, LowB As Long
LowB = LBound(myArray, 2): Count = LowB: i = LowB: HighB = UBound(myArray, 2)
Do While i < HighB
j = i + 1
If myArray(1, i) = myArray(1, j) Then
Do While myArray(1, i) = myArray(1, j)
ReDim Preserve DuplicateArr(1 To Count)
DuplicateArr(Count) = j
Count = Count + 1
j = j + 1
If j > HighB Then Exit Do
Loop
QSLong2D myArray, 2, i, j - 1, 2
End If
i = j
Loop
Count1 = HighB
If Count > 1 Then
For i = UBound(DuplicateArr) To LBound(DuplicateArr) Step -1
myArray(1, DuplicateArr(i)) = myArray(1, Count1)
myArray(2, DuplicateArr(i)) = myArray(2, Count1)
Count1 = Count1 - 1
ReDim Preserve myArray(1 To 2, LowB To Count1)
Next i
End If
End Sub
Here is the sorting algorithm I use (more about this algo here).
这里是我使用的排序算法(更多关于这个algo)。
Sub QSLong2D(ByRef saArray() As Long, bytDim As Byte, lLow1 As Long, lHigh1 As Long, bytNum As Byte)
Dim lLow2 As Long, lHigh2 As Long
Dim sKey As Long, sSwap As Long, i As Byte
On Error GoTo ErrorExit
If IsMissing(lLow1) Then lLow1 = LBound(saArray, bytDim)
If IsMissing(lHigh1) Then lHigh1 = UBound(saArray, bytDim)
lLow2 = lLow1
lHigh2 = lHigh1
sKey = saArray(bytDim, (lLow1 + lHigh1) \ 2)
Do While lLow2 < lHigh2
Do While saArray(bytDim, lLow2) < sKey And lLow2 < lHigh1: lLow2 = lLow2 + 1: Loop
Do While saArray(bytDim, lHigh2) > sKey And lHigh2 > lLow1: lHigh2 = lHigh2 - 1: Loop
If lLow2 < lHigh2 Then
For i = 1 To bytNum
sSwap = saArray(i, lLow2)
saArray(i, lLow2) = saArray(i, lHigh2)
saArray(i, lHigh2) = sSwap
Next i
End If
If lLow2 <= lHigh2 Then
lLow2 = lLow2 + 1
lHigh2 = lHigh2 - 1
End If
Loop
If lHigh2 > lLow1 Then QSLong2D saArray(), bytDim, lLow1, lHigh2, bytNum
If lLow2 < lHigh1 Then QSLong2D saArray(), bytDim, lLow2, lHigh1, bytNum
ErrorExit:
End Sub
Below is a special algorithm that is blazing fast if your data contains integers. It makes use of indexing and the Boolean data type.
下面是一种特殊的算法,如果你的数据包含整数,它就会快速的燃烧。它利用了索引和布尔数据类型。
Function IndexSort(ByRef myArray() As Long, bOrigIndex As Boolean) As Variant
'' Modified to take both positive and negative integers
Dim arrVals() As Long, arrSort() As Long, arrBool() As Boolean
Dim i As Long, HighB As Long, myMax As Long, myMin As Long, OffSet As Long
Dim LowB As Long, myIndex As Long, count As Long, myRange As Long
HighB = UBound(myArray)
LowB = LBound(myArray)
For i = LowB To HighB
If myArray(i) > myMax Then myMax = myArray(i)
If myArray(i) < myMin Then myMin = myArray(i)
Next i
OffSet = Abs(myMin) '' Number that will be added to every element
'' to guarantee every index is non-negative
If myMax > 0 Then
myRange = myMax + OffSet '' E.g. if myMax = 10 & myMin = -2, then myRange = 12
Else
myRange = OffSet
End If
If bOrigIndex Then
ReDim arrSort(1 To 2, 1 To HighB)
ReDim arrVals(1 To 2, 0 To myRange)
ReDim arrBool(0 To myRange)
For i = LowB To HighB
myIndex = myArray(i) + OffSet
arrBool(myIndex) = True
arrVals(1, myIndex) = myArray(i)
If arrVals(2, myIndex) = 0 Then arrVals(2, myIndex) = i
Next i
For i = 0 To myRange
If arrBool(i) Then
count = count + 1
arrSort(1, count) = arrVals(1, i)
arrSort(2, count) = arrVals(2, i)
End If
Next i
QSLong2D arrSort, 2, 1, count, 2
ReDim Preserve arrSort(1 To 2, 1 To count)
Else
ReDim arrSort(1 To HighB)
ReDim arrVals(0 To myRange)
ReDim arrBool(0 To myRange)
For i = LowB To HighB
myIndex = myArray(i) + OffSet
arrBool(myIndex) = True
arrVals(myIndex) = myArray(i)
Next i
For i = 0 To myRange
If arrBool(i) Then
count = count + 1
arrSort(count) = arrVals(i)
End If
Next i
ReDim Preserve arrSort(1 To count)
End If
ReDim arrVals(0)
ReDim arrBool(0)
IndexSort = arrSort
End Function
Here are the Collection (by @DocBrown) and Dictionary (by @eksortso) Functions.
这里是集合(由@DocBrown)和Dictionary(通过@eksortso)函数。
Function CollectionTest(ByRef arrIn() As Long, Lim As Long) As Variant
Dim arr As New Collection, a, i As Long, arrOut() As Variant, aFirstArray As Variant
Dim StrtTime As Double, EndTime1 As Double, EndTime2 As Double, count As Long
On Error Resume Next
ReDim arrOut(1 To UBound(arrIn))
ReDim aFirstArray(1 To UBound(arrIn))
StrtTime = Timer
For i = 1 To UBound(arrIn): aFirstArray(i) = CStr(arrIn(i)): Next i '' Convert to string
For Each a In aFirstArray ''' This part is actually creating the unique set
arr.Add a, a
Next
EndTime1 = Timer - StrtTime
StrtTime = Timer ''' This part is writing back to an array for return
For Each a In arr: count = count + 1: arrOut(count) = a: Next a
EndTime2 = Timer - StrtTime
CollectionTest = Array(arrOut, EndTime1, EndTime2)
End Function
Function DictionaryTest(ByRef myArray() As Long, Lim As Long) As Variant
Dim StrtTime As Double, Endtime As Double
Dim d As Scripting.Dictionary, i As Long '' Early Binding
Set d = New Scripting.Dictionary
For i = LBound(myArray) To UBound(myArray): d(myArray(i)) = 1: Next i
DictionaryTest = d.Keys()
End Function
Here is the Direct approach provided by @IsraelHoletz.
这是@以色列- holetz提供的直接方法。
Function ArrayUnique(ByRef aArrayIn() As Long) As Variant
Dim aArrayOut() As Variant, bFlag As Boolean, vIn As Variant, vOut As Variant
Dim i As Long, j As Long, k As Long
ReDim aArrayOut(LBound(aArrayIn) To UBound(aArrayIn))
i = LBound(aArrayIn)
j = i
For Each vIn In aArrayIn
For k = j To i - 1
If vIn = aArrayOut(k) Then bFlag = True: Exit For
Next
If Not bFlag Then aArrayOut(i) = vIn: i = i + 1
bFlag = False
Next
If i <> UBound(aArrayIn) Then ReDim Preserve aArrayOut(LBound(aArrayIn) To i - 1)
ArrayUnique = aArrayOut
End Function
Function DirectTest(ByRef aArray() As Long, Lim As Long) As Variant
Dim aReturn() As Variant
Dim StrtTime As Long, Endtime As Long, i As Long
aReturn = ArrayUnique(aArray)
DirectTest = aReturn
End Function
Here is the benchmark function that compares all of the functions. You should note that the last two cases are handled a little bit different because of memory issues. Also note, that I didn't test the Collection
method for the Test Case Size = 10,000,000
. For some reason, it was returning incorrect results and behaving unusual (I'm guessing the collection object has a limit on how many things you can put in it. I searched and I couldn't find any literature on this).
这里是比较所有函数的基准函数。您应该注意到,由于内存问题,最后两个案例处理起来有点不同。还要注意,我没有测试测试用例大小的集合方法= 10,000,000。由于某些原因,它返回不正确的结果和行为异常(我猜测集合对象对您可以放入多少东西有限制)。我找遍了,我找不到任何有关这方面的资料。
Function UltimateTest(Lim As Long, bTestDirect As Boolean, bTestDictionary, bytCase As Byte) As Variant
Dim dictionTest, collectTest, sortingTest1, indexTest1, directT '' all variants
Dim arrTest() As Long, i As Long, bEquality As Boolean, SizeUnique As Long
Dim myArray() As Long, StrtTime As Double, EndTime1 As Variant
Dim EndTime2 As Double, EndTime3 As Variant, EndTime4 As Double
Dim EndTime5 As Double, EndTime6 As Double, sortingTest2, indexTest2
ReDim myArray(1 To Lim): Rnd (-2) '' If you want to test negative numbers,
'' insert this to the left of CLng(Int(Lim... : (-1) ^ (Int(2 * Rnd())) *
For i = LBound(myArray) To UBound(myArray): myArray(i) = CLng(Int(Lim * Rnd() + 1)): Next i
arrTest = myArray
If bytCase = 1 Then
If bTestDictionary Then
StrtTime = Timer: dictionTest = DictionaryTest(arrTest, Lim): EndTime1 = Timer - StrtTime
Else
EndTime1 = "Not Tested"
End If
arrTest = myArray
collectTest = CollectionTest(arrTest, Lim)
arrTest = myArray
StrtTime = Timer: sortingTest1 = SortingUniqueTest(arrTest, True): EndTime2 = Timer - StrtTime
SizeUnique = UBound(sortingTest1, 2)
If bTestDirect Then
arrTest = myArray: StrtTime = Timer: directT = DirectTest(arrTest, Lim): EndTime3 = Timer - StrtTime
Else
EndTime3 = "Not Tested"
End If
arrTest = myArray
StrtTime = Timer: indexTest1 = IndexSort(arrTest, True): EndTime4 = Timer - StrtTime
arrTest = myArray
StrtTime = Timer: sortingTest2 = SortingUniqueTest(arrTest, False): EndTime5 = Timer - StrtTime
arrTest = myArray
StrtTime = Timer: indexTest2 = IndexSort(arrTest, False): EndTime6 = Timer - StrtTime
bEquality = True
For i = LBound(sortingTest1, 2) To UBound(sortingTest1, 2)
If Not CLng(collectTest(0)(i)) = sortingTest1(1, i) Then
bEquality = False
Exit For
End If
Next i
For i = LBound(dictionTest) To UBound(dictionTest)
If Not dictionTest(i) = sortingTest1(1, i + 1) Then
bEquality = False
Exit For
End If
Next i
For i = LBound(dictionTest) To UBound(dictionTest)
If Not dictionTest(i) = indexTest1(1, i + 1) Then
bEquality = False
Exit For
End If
Next i
If bTestDirect Then
For i = LBound(dictionTest) To UBound(dictionTest)
If Not dictionTest(i) = directT(i + 1) Then
bEquality = False
Exit For
End If
Next i
End If
UltimateTest = Array(bEquality, EndTime1, EndTime2, EndTime3, EndTime4, _
EndTime5, EndTime6, collectTest(1), collectTest(2), SizeUnique)
ElseIf bytCase = 2 Then
arrTest = myArray
collectTest = CollectionTest(arrTest, Lim)
UltimateTest = Array(collectTest(1), collectTest(2))
ElseIf bytCase = 3 Then
arrTest = myArray
StrtTime = Timer: sortingTest1 = SortingUniqueTest(arrTest, True): EndTime2 = Timer - StrtTime
SizeUnique = UBound(sortingTest1, 2)
UltimateTest = Array(EndTime2, SizeUnique)
ElseIf bytCase = 4 Then
arrTest = myArray
StrtTime = Timer: indexTest1 = IndexSort(arrTest, True): EndTime4 = Timer - StrtTime
UltimateTest = EndTime4
ElseIf bytCase = 5 Then
arrTest = myArray
StrtTime = Timer: sortingTest2 = SortingUniqueTest(arrTest, False): EndTime5 = Timer - StrtTime
UltimateTest = EndTime5
ElseIf bytCase = 6 Then
arrTest = myArray
StrtTime = Timer: indexTest2 = IndexSort(arrTest, False): EndTime6 = Timer - StrtTime
UltimateTest = EndTime6
End If
End Function
And finally, here is the sub that produces the table above.
最后,这是生成上面表格的子。
Sub GetBenchmarks()
Dim myVar, i As Long, TestCases As Variant, j As Long, temp
TestCases = Array(1000, 5000, 10000, 20000, 50000, 100000, 200000, 500000, 1000000, 2000000, 5000000, 10000000)
For j = 0 To 11
If j < 6 Then
myVar = UltimateTest(CLng(TestCases(j)), True, True, 1)
ElseIf j < 10 Then
myVar = UltimateTest(CLng(TestCases(j)), False, True, 1)
ElseIf j < 11 Then
myVar = Array("Not Tested", "Not Tested", 0.1, "Not Tested", 0.1, 0.1, 0.1, 0, 0, 0)
temp = UltimateTest(CLng(TestCases(j)), False, False, 2)
myVar(7) = temp(0): myVar(8) = temp(1)
temp = UltimateTest(CLng(TestCases(j)), False, False, 3)
myVar(2) = temp(0): myVar(9) = temp(1)
myVar(4) = UltimateTest(CLng(TestCases(j)), False, False, 4)
myVar(5) = UltimateTest(CLng(TestCases(j)), False, False, 5)
myVar(6) = UltimateTest(CLng(TestCases(j)), False, False, 6)
Else
myVar = Array("Not Tested", "Not Tested", 0.1, "Not Tested", 0.1, 0.1, 0.1, "Not Tested", "Not Tested", 0)
temp = UltimateTest(CLng(TestCases(j)), False, False, 3)
myVar(2) = temp(0): myVar(9) = temp(1)
myVar(4) = UltimateTest(CLng(TestCases(j)), False, False, 4)
myVar(5) = UltimateTest(CLng(TestCases(j)), False, False, 5)
myVar(6) = UltimateTest(CLng(TestCases(j)), False, False, 6)
End If
Cells(4 + j, 6) = TestCases(j)
For i = 1 To 9: Cells(4 + j, 6 + i) = myVar(i - 1): Next i
Cells(4 + j, 17) = myVar(9)
Next j
End Sub
Summary
From the table of results, we can see that the Dictionary
method works really well for cases less than about 500,000, however, after that, the IndexMethod
really starts to dominate. You will notice that when order doesn't matter and your data is made up of positive integers, there is no comparison to the IndexMethod
algorithm (it returns the unique values from an array containing 10 million elements in less than 1 sec!!! Incredible!). Below I have a breakdown of which algorithm is preferred in various cases.
从结果表中可以看出,字典方法对于小于50万的案例非常有效,然而,在此之后,IndexMethod真正开始占据主导地位。您会注意到,当顺序无关紧要,而您的数据是由正整数组成时,没有与IndexMethod算法进行比较(它从一个包含1,000万个元素的数组中返回唯一值,在不到1秒!!!)难以置信!)下面我列出了在各种情况下首选算法的分类。
Case 1
Your Data contains integers (i.e. whole numbers, both positive and negative): IndexMethod
案例1你的数据包含整数(即整数,正负):IndexMethod。
Case 2
Your Data contains non-integers (i.e. variant, double, string, etc.) with less than 200000 elements: Dictionary Method
Case 2您的数据包含了非整数(即变量、双、字符串等),其中包含了少于20万个元素:Dictionary方法。
Case 3
Your Data contains non-integers (i.e. variant, double, string, etc.) with more than 200000 elements: Collection Method
案例3您的数据包含有超过20万个元素的非整数(即变体、双、字符串等):集合方法。
If you had to choose one algorithm, in my opinion, the Collection
method is still the best as it only requires a few lines of code, it's super general, and it's fast enough.
如果您必须选择一个算法,在我看来,集合方法仍然是最好的,因为它只需要几行代码,它是超级通用的,而且速度足够快。
#4
3
No, nothing built-in. Do it yourself:
不,没有内置的。做自己:
- Instantiate a
Scripting.Dictionary
object - 实例化一个脚本。字典对象
- Write a
For
loop over your array (be sure to useLBound()
andUBound()
instead of looping from 0 to x!) - 为您的数组写一个For循环(请务必使用LBound()和UBound()而不是从0循环到x!)
- On each iteration, check
Exists()
on the dictionary. Add every array value (that doesn't already exist) as a key to the dictionary (useas I've just learned, keys can be of any type in aCStr()
since keys must be stringsScripting.Dictionary
), also store the array value itself into the dictionary. - 在每次迭代中,检查字典上的Exists()。将每个数组值(不存在的)添加为字典的键(使用CStr(),因为键必须是字符串,因为我刚学过,键可以是任何类型的脚本。字典),也将数组值本身存储到字典中。
- When done, use
Keys()
(orItems()
) to return all values of the dictionary as a new, now unique array. - 当完成时,使用Keys()(或Items())将字典的所有值作为一个新的、现在唯一的数组返回。
- In my tests, the Dictionary keeps original order of all added values, so the output will be ordered like the input was. I'm not sure if this is documented and reliable behavior, though.
- 在我的测试中,字典保留了所有添加值的原始顺序,因此输出将像输入一样有序。但我不确定这是否有记录和可靠的行为。
#5
2
I don't know of any built-in functionality in VBA. The best would be to use a collection using the value as key and only add to it if a value doesn't exist.
我不知道VBA中有什么内置功能。最好的方法是使用值作为键的集合,如果值不存在,则只添加该值。
#6
2
No, VBA does not have this functionality. You can use the technique of adding each item to a collection using the item as the key. Since a collection does not allow duplicate keys, the result is distinct values that you can copy to an array, if needed.
不,VBA没有这个功能。您可以使用将项作为键将每个项添加到集合中的技术。由于集合不允许重复键,所以结果是不同的值,如果需要,您可以复制到一个数组。
You may also want something more robust. See Distinct Values Function at http://www.cpearson.com/excel/distinctvalues.aspx
你可能还想要一些更健壮的东西。在http://www.cpearson.com/excel/distinctvalues.aspx查看不同的值函数。
Distinct Values Function
不同值函数
A VBA Function that will return an array of the distinct values in a range or array of input values.
一个VBA函数,它将返回一个数组中的不同值的数组。
Excel has some manual methods, such as Advanced Filter, for getting a list of distinct items from an input range. The drawback of using such methods is that you must manually refresh the results when the input data changes. Moreover, these methods work only with ranges, not arrays of values, and, not being functions, cannot be called from worksheet cells or incorporated into array formulas. This page describes a VBA function called DistinctValues that accepts as input either a range or an array of data and returns as its result an array containing the distinct items from the input list. That is, the elements with all duplicates removed. The order of the input elements is preserved. The order of the elements in the output array is the same as the order in the input values. The function can be called from an array entered range on a worksheet (see this page for information about array formulas), or from in an array formula in a single worksheet cell, or from another VB function.
Excel有一些手动方法,比如高级筛选器,用于从输入范围获取不同项的列表。使用这种方法的缺点是,当输入数据发生变化时,必须手动刷新结果。此外,这些方法只适用于范围,而不是值的数组,而不是函数,不能从工作表单元中调用或合并到数组公式中。这个页面描述了一个VBA函数,称为distinct tvalues,它接受一个范围或一个数据数组作为输入,并返回一个数组,该数组包含来自输入列表的不同项。也就是说,所有复制的元素都被删除了。输入元素的顺序被保留。输出数组中元素的顺序与输入值中的顺序相同。该函数可以从工作表中输入的数组中调用(请参阅该页面以获取有关数组公式的信息),或者从单个工作表单元格中的数组公式中,或者从另一个VB函数中调用。
#7
0
If the order of the deduplicated array does not matter to you, you can use my pragmatic function:
如果重复的数组的顺序对你不重要,你可以使用我的语用功能:
Function DeDupArray(ia() As String)
Dim newa() As String
ReDim newa(999)
ni = -1
For n = LBound(ia) To UBound(ia)
dup = False
If n <= UBound(ia) Then
For k = n + 1 To UBound(ia)
If ia(k) = ia(n) Then dup = True
Next k
If dup = False And Trim(ia(n)) <> "" Then
ni = ni + 1
newa(ni) = ia(n)
End If
End If
Next n
If ni > -1 Then
ReDim Preserve newa(ni)
Else
ReDim Preserve newa(1)
End If
DeDupArray = newa
End Function
Sub testdedup()
Dim m(5) As String
Dim m2() As String
m(0) = "Horse"
m(1) = "Cow"
m(2) = "Dear"
m(3) = "Horse"
m(4) = "Joke"
m(5) = "Cow"
m2 = DeDupArray(m)
t = ""
For n = LBound(m2) To UBound(m2)
t = t & n & "=" & m2(n) & " "
Next n
MsgBox t
End Sub
From the test function, it will result in the following deduplicated array:
从测试函数中,它将导致以下被重复的数组:
"0=Dear 1=Horse 2=Joke 3=Cow "
“0=亲爱的1=马2=笑话3=牛”
#8
-1
The Collection and Dictionary solutions are all nice and shine for a short approach, but if you want speed try using a more direct approach:
收集和字典解决方案都很好,而且很适合做一个简短的方法,但是如果你想要速度,尝试使用更直接的方法:
Function ArrayUnique(ByVal aArrayIn As Variant) As Variant
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
' ArrayUnique
' This function removes duplicated values from a single dimension array
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Dim aArrayOut() As Variant
Dim bFlag As Boolean
Dim vIn As Variant
Dim vOut As Variant
Dim i%, j%, k%
ReDim aArrayOut(LBound(aArrayIn) To UBound(aArrayIn))
i = LBound(aArrayIn)
j = i
For Each vIn In aArrayIn
For k = j To i - 1
If vIn = aArrayOut(k) Then bFlag = True: Exit For
Next
If Not bFlag Then aArrayOut(i) = vIn: i = i + 1
bFlag = False
Next
If i <> UBound(aArrayIn) Then ReDim Preserve aArrayOut(LBound(aArrayIn) To i - 1)
ArrayUnique = aArrayOut
End Function
Calling it:
叫它:
Sub Test()
Dim aReturn As Variant
Dim aArray As Variant
aArray = Array(1, 2, 3, 1, 2, 3, "Test", "Test")
aReturn = ArrayUnique(aArray)
End Sub
For speed comparasion, this will be 100x to 130x faster then the dictionary solution, and about 8000x to 13000x faster than the collection one.
对于速度比较而言,这将是100x到130x的速度,然后是字典解决方案,比集合1的速度快8000x到13000x。