This problem is 4-11 of Skiena. The solution to finding majority elements - repeated more than half times is majority algorithm. Can we use this to find all numbers repeated n/4 times?
这个问题是斯基纳的4-11。寻找多数元素的解决方案——重复超过一半是多数算法。我们可以用这个来发现所有的数字重复n/4次吗?
4 个解决方案
#1
3
Misra and Gries describe a couple approaches. I don't entirely understand their paper, but a key idea is to use a bag.
Misra和Gries描述了几种方法。我不完全理解他们的论文,但一个关键的想法是使用一个袋子。
Boyer and Moore's original majority algorithm paper has a lot of incomprehensible proofs and discussion of formal verification of FORTRAN code, but it has a very good start of an explanation of how the majority algorithm works. The key concept starts with the idea that if the majority of the elements are A
and you remove, one at a time, a copy of A
and a copy of something else, then in the end you will have only copies of A
. Next, it should be clear that removing two different items, neither of which is A
, can only increase the majority that A
holds. Therefore it's safe to remove any pair of items, as long as they're different. This idea can then be made concrete. Take the first item out of the list and stick it in a box. Take the next item out and stick it in the box. If they're the same, let them both sit there. If the new one is different, throw it away, along with an item from the box. Repeat until all items are either in the box or in the trash. Since the box is only allowed to have one kind of item at a time, it can be represented very efficiently as a pair (item type, count)
.
Boyer和Moore的原始多数算法论文有很多难以理解的证明和对FORTRAN代码的正式验证的讨论,但它对大多数算法的工作原理有一个很好的解释。关键概念开始,如果大部分的元素是一个和你删除一次,的一个副本和一个副本,那么最终你将只有副本A .接下来,应该清楚,删除两个不同的项目,这些都不是,只能增加持有的多数。因此,只要它们是不同的,就可以安全地删除任何对项。这个想法可以被具体化。从列表中取出第一项并将其放入一个方框中。把下一件东西拿出来,放在盒子里。如果它们是一样的,让它们都坐在那里。如果新版本是不同的,就把它和盒子里的东西一起扔掉。重复,直到所有的物品都在箱子里或垃圾桶里。因为这个盒子一次只能有一种物品,所以它可以很有效地作为一对(物品类型,计数)来表示。
The generalization to find all items that may occur more than n/k
times is simple, but explaining why it works is a little harder. The basic idea is that we can find and destroy groups of k
distinct elements without changing anything. Why? If w > n/k
then w-1 > (n-k)/k
. That is, if we take away one of the popular elements, and we also take away k-1
other elements, then the popular element remains popular!
泛化查找所有可能出现超过n/k次的项是很简单的,但是要解释它的工作原理就有点困难了。基本思想是,我们可以在不改变任何东西的情况下发现和破坏k个不同的元素。为什么?如果w > n/k然后是w-1 > (n-k)/k。也就是说,如果我们去掉一个流行元素,我们也去掉k-1其他元素,那么流行元素仍然流行!
Implementation: instead of only allowing one kind of item in the box, allow k-1
of them. Whenever you see a group of k
different items show up (that is, there are k-1
types in the box, and the one arriving doesn't match any of them), you throw one of each type in the trash, including the one that just arrived. What data structure should we use for this "box"? Well, a bag, of course! As Misra and Gries explain, if the elements can be ordered, a tree-based bag with O(log k) basic operations will give the whole algorithm a complexity of O(n log k). One point to note is that the operation of removing one of each element is expensive (I think O(k log k)), but that cost is amortized over the arrivals of those elements, so it's no big deal. Of course, if your elements are hashable rather than orderable, you can use a hash-based bag instead, which under certain common assumptions will give even better asymptotic performance (but it's not guaranteed). If your elements are drawn from a small finite set, you can guarantee that. If they can only be compared for equality, then your bag gets much more expensive and I'm pretty sure you end up with something like O(nk) instead.
实现:而不是只允许一种物品在盒子里,允许k-1。每当你看到一组k不同的条目出现(也就是说,盒子里有k-1类型,而到达的一组不匹配),你就把每种类型的一种扔到垃圾中,包括刚刚到达的那一种。对于这个“box”,我们应该使用什么数据结构?当然是一个袋子!Misra葛瑞斯解释,如果元素可以下令,树袋与O(log k)基本操作会给整个算法的复杂性O(n日志k)。需要注意的一点是,删除的操作每个元素之一是昂贵的(我认为O日志k(k),但这费用平摊在这些元素的移民,所以这没什么大不了的。当然,如果您的元素是可洗的,而不是可排序的,那么您可以使用一个基于散列的包,在某些常见的假设下,它将提供更好的渐近性能(但并没有保证)。如果你的元素是从一个小的有限集合中画出来的,你可以保证。如果他们只能被拿来与平等进行比较,那么你的包就会变得更贵,我很确定你最终会得到像O(nk)这样的东西。
#2
2
See http://www.cs.yale.edu/homes/el327/datamining2011aFiles/ASimpleAlgorithmForFindingFrequentElementsInStreamsAndBags.pdf for a solution that uses constant memory and runs in linear time, which will find 3 candidates for elements that occur more than n/4 times. Note that if you assume that your data is given as a stream that you can only go through once, this is the best you can do -- you have to go through the stream one more time to test each of the 3 candidates to see if it occurs more than n/4 times in the stream. However, if you assume a priori that there are 3 elements that occur more than n/4 times then you only need to go through the stream once so you get a linear time online algorithm (only goes through the stream once) that only requires constant storage.
请参阅http://www.cs.yale.edu/homes/el327/datamining2011aFiles/ASimpleAlgorithmForFindingFrequentElementsInStreamsAndBags.pdf为一个使用常量内存并在线性时间内运行的解决方案,该方案将为出现超过n/4次的元素找到3个候选项。请注意,如果您认为您的数据流,你只能经历一次,这是你能做的最好的,你得通过流一次测试的每个3候选人是否发生流n / 4倍以上。然而,如果你假设有3个元素的出现次数超过了n/4次,那么你只需要穿过一次流,那么你就得到了一个线性时间在线算法(只经过一次流),这只需要常量存储。
#3
1
Find the majority element that appears n/2 times
by Moore-Voting Algorithm
通过Moore-Voting算法查找出现n/2次的大多数元素。
See method 3 of the given link for Moore's Voting Algo (http://www.geeksforgeeks.org/majority-element/).
在Moore的投票Algo的给定链接中,见方法3 (http://www.geeksforgeeks.org/- element/)。
Time:O(n)
时间:O(n)
Now after finding majority element, scan the array again and remove the majority element
or make it -1.
在找到大多数元素之后,再次扫描数组并删除大多数元素或使其为-1。
Time:O(n)
时间:O(n)
Now apply Moore Voting Algorithm on the remaining elements of array (but ignore -1 now as it has already been included earlier). The new majority element appears n/4 times.
现在,在数组的其余元素上应用摩尔投票算法(但是现在忽略-1,因为它已经包含在前面了)。新的多数元素出现n/4次。
Time:O(n)
时间:O(n)
Total Time:O(n)
总时间:O(n)
Extra Space:O(1)
额外的空间:O(1)
You can do it for element appearing more than n/8,n/16,.... times
你能做到元素出现超过n / 8,.... n / 16日次
EDIT:
编辑:
There may exist a case when there is no majority element in the array:
当数组中没有大多数元素时,可能存在一个例子:
For e.g. if the input arrays is {3, 1, 2, 2, 1, 2, 3, 3}
then the output should be [2, 3]
.
例如,如果输入数组是{3,1,2,2,1,2,3,3}那么输出应该是[2,3]。
Given an array of of size n and a number k, find all elements that appear more than n/k times
给定一个大小为n和一个数字k的数组,查找出现超过n/k次的所有元素。
See this link for the answer: https://*.com/a/24642388/3714537
查看此链接的答案:https://*.com/a/24642388/3714537。
References:
引用:
http://www.cs.utexas.edu/~moore/best-ideas/mjrty/
http://www.cs.utexas.edu/摩尔/最好的想法/ mjrty /
#4
0
As you didnt mention space complexity , one possible solution is using hashtable for the elements which maps to count then you can just increment count if the element is found.
当您没有提到空间复杂性时,一个可能的解决方案是使用hashtable来计算要计算的元素,如果找到元素,您可以只增加计数。
#1
3
Misra and Gries describe a couple approaches. I don't entirely understand their paper, but a key idea is to use a bag.
Misra和Gries描述了几种方法。我不完全理解他们的论文,但一个关键的想法是使用一个袋子。
Boyer and Moore's original majority algorithm paper has a lot of incomprehensible proofs and discussion of formal verification of FORTRAN code, but it has a very good start of an explanation of how the majority algorithm works. The key concept starts with the idea that if the majority of the elements are A
and you remove, one at a time, a copy of A
and a copy of something else, then in the end you will have only copies of A
. Next, it should be clear that removing two different items, neither of which is A
, can only increase the majority that A
holds. Therefore it's safe to remove any pair of items, as long as they're different. This idea can then be made concrete. Take the first item out of the list and stick it in a box. Take the next item out and stick it in the box. If they're the same, let them both sit there. If the new one is different, throw it away, along with an item from the box. Repeat until all items are either in the box or in the trash. Since the box is only allowed to have one kind of item at a time, it can be represented very efficiently as a pair (item type, count)
.
Boyer和Moore的原始多数算法论文有很多难以理解的证明和对FORTRAN代码的正式验证的讨论,但它对大多数算法的工作原理有一个很好的解释。关键概念开始,如果大部分的元素是一个和你删除一次,的一个副本和一个副本,那么最终你将只有副本A .接下来,应该清楚,删除两个不同的项目,这些都不是,只能增加持有的多数。因此,只要它们是不同的,就可以安全地删除任何对项。这个想法可以被具体化。从列表中取出第一项并将其放入一个方框中。把下一件东西拿出来,放在盒子里。如果它们是一样的,让它们都坐在那里。如果新版本是不同的,就把它和盒子里的东西一起扔掉。重复,直到所有的物品都在箱子里或垃圾桶里。因为这个盒子一次只能有一种物品,所以它可以很有效地作为一对(物品类型,计数)来表示。
The generalization to find all items that may occur more than n/k
times is simple, but explaining why it works is a little harder. The basic idea is that we can find and destroy groups of k
distinct elements without changing anything. Why? If w > n/k
then w-1 > (n-k)/k
. That is, if we take away one of the popular elements, and we also take away k-1
other elements, then the popular element remains popular!
泛化查找所有可能出现超过n/k次的项是很简单的,但是要解释它的工作原理就有点困难了。基本思想是,我们可以在不改变任何东西的情况下发现和破坏k个不同的元素。为什么?如果w > n/k然后是w-1 > (n-k)/k。也就是说,如果我们去掉一个流行元素,我们也去掉k-1其他元素,那么流行元素仍然流行!
Implementation: instead of only allowing one kind of item in the box, allow k-1
of them. Whenever you see a group of k
different items show up (that is, there are k-1
types in the box, and the one arriving doesn't match any of them), you throw one of each type in the trash, including the one that just arrived. What data structure should we use for this "box"? Well, a bag, of course! As Misra and Gries explain, if the elements can be ordered, a tree-based bag with O(log k) basic operations will give the whole algorithm a complexity of O(n log k). One point to note is that the operation of removing one of each element is expensive (I think O(k log k)), but that cost is amortized over the arrivals of those elements, so it's no big deal. Of course, if your elements are hashable rather than orderable, you can use a hash-based bag instead, which under certain common assumptions will give even better asymptotic performance (but it's not guaranteed). If your elements are drawn from a small finite set, you can guarantee that. If they can only be compared for equality, then your bag gets much more expensive and I'm pretty sure you end up with something like O(nk) instead.
实现:而不是只允许一种物品在盒子里,允许k-1。每当你看到一组k不同的条目出现(也就是说,盒子里有k-1类型,而到达的一组不匹配),你就把每种类型的一种扔到垃圾中,包括刚刚到达的那一种。对于这个“box”,我们应该使用什么数据结构?当然是一个袋子!Misra葛瑞斯解释,如果元素可以下令,树袋与O(log k)基本操作会给整个算法的复杂性O(n日志k)。需要注意的一点是,删除的操作每个元素之一是昂贵的(我认为O日志k(k),但这费用平摊在这些元素的移民,所以这没什么大不了的。当然,如果您的元素是可洗的,而不是可排序的,那么您可以使用一个基于散列的包,在某些常见的假设下,它将提供更好的渐近性能(但并没有保证)。如果你的元素是从一个小的有限集合中画出来的,你可以保证。如果他们只能被拿来与平等进行比较,那么你的包就会变得更贵,我很确定你最终会得到像O(nk)这样的东西。
#2
2
See http://www.cs.yale.edu/homes/el327/datamining2011aFiles/ASimpleAlgorithmForFindingFrequentElementsInStreamsAndBags.pdf for a solution that uses constant memory and runs in linear time, which will find 3 candidates for elements that occur more than n/4 times. Note that if you assume that your data is given as a stream that you can only go through once, this is the best you can do -- you have to go through the stream one more time to test each of the 3 candidates to see if it occurs more than n/4 times in the stream. However, if you assume a priori that there are 3 elements that occur more than n/4 times then you only need to go through the stream once so you get a linear time online algorithm (only goes through the stream once) that only requires constant storage.
请参阅http://www.cs.yale.edu/homes/el327/datamining2011aFiles/ASimpleAlgorithmForFindingFrequentElementsInStreamsAndBags.pdf为一个使用常量内存并在线性时间内运行的解决方案,该方案将为出现超过n/4次的元素找到3个候选项。请注意,如果您认为您的数据流,你只能经历一次,这是你能做的最好的,你得通过流一次测试的每个3候选人是否发生流n / 4倍以上。然而,如果你假设有3个元素的出现次数超过了n/4次,那么你只需要穿过一次流,那么你就得到了一个线性时间在线算法(只经过一次流),这只需要常量存储。
#3
1
Find the majority element that appears n/2 times
by Moore-Voting Algorithm
通过Moore-Voting算法查找出现n/2次的大多数元素。
See method 3 of the given link for Moore's Voting Algo (http://www.geeksforgeeks.org/majority-element/).
在Moore的投票Algo的给定链接中,见方法3 (http://www.geeksforgeeks.org/- element/)。
Time:O(n)
时间:O(n)
Now after finding majority element, scan the array again and remove the majority element
or make it -1.
在找到大多数元素之后,再次扫描数组并删除大多数元素或使其为-1。
Time:O(n)
时间:O(n)
Now apply Moore Voting Algorithm on the remaining elements of array (but ignore -1 now as it has already been included earlier). The new majority element appears n/4 times.
现在,在数组的其余元素上应用摩尔投票算法(但是现在忽略-1,因为它已经包含在前面了)。新的多数元素出现n/4次。
Time:O(n)
时间:O(n)
Total Time:O(n)
总时间:O(n)
Extra Space:O(1)
额外的空间:O(1)
You can do it for element appearing more than n/8,n/16,.... times
你能做到元素出现超过n / 8,.... n / 16日次
EDIT:
编辑:
There may exist a case when there is no majority element in the array:
当数组中没有大多数元素时,可能存在一个例子:
For e.g. if the input arrays is {3, 1, 2, 2, 1, 2, 3, 3}
then the output should be [2, 3]
.
例如,如果输入数组是{3,1,2,2,1,2,3,3}那么输出应该是[2,3]。
Given an array of of size n and a number k, find all elements that appear more than n/k times
给定一个大小为n和一个数字k的数组,查找出现超过n/k次的所有元素。
See this link for the answer: https://*.com/a/24642388/3714537
查看此链接的答案:https://*.com/a/24642388/3714537。
References:
引用:
http://www.cs.utexas.edu/~moore/best-ideas/mjrty/
http://www.cs.utexas.edu/摩尔/最好的想法/ mjrty /
#4
0
As you didnt mention space complexity , one possible solution is using hashtable for the elements which maps to count then you can just increment count if the element is found.
当您没有提到空间复杂性时,一个可能的解决方案是使用hashtable来计算要计算的元素,如果找到元素,您可以只增加计数。