Description
描述
Given an Array of size (n*k+b)
where n elements occur k times and one element occurs b times, in other words there are n+1
distinct Elements. Given that 0 < b < k
find the element occurring b times.
给定一个大小为(n*k+b)的数组,其中n个元素出现k次,一个元素出现b次,换句话说,有n+1个不同的元素。假设0 < b < k,找到发生b次的元素。
My Attempted solutions
我试图解决方案
-
Obvious solution will be using hashing but it will not work if the numbers are very large. Complexity is
O(n)
显然的解决方案是使用散列,但是如果数字很大,它就不能工作。复杂度是O(n)
-
Using map to store the frequencies of each element and then traversing map to find the element occurring b times.As Map's are implemented as height balanced trees Complexity will be
O(nlogn)
.使用map来存储每个元素的频率,然后遍历map以查找发生b次的元素。当Map被实现为高度平衡树时,复杂度将是O(nlogn)。
Both of my solution were accepted but the interviewer wanted a linear solution without using hashing and hint he gave was make the height of tree constant in tree in which you are storing frequencies, but I am not able to figure out the correct solution yet.
我的两个解决方案都被接受了,但是面试官想要一个线性的解决方案而没有使用散列和暗示,他给出的是使树的树的高度恒定,在树中存储频率,但是我还不能找到正确的解决方案。
I want to know how to solve this problem in linear time without hashing?
我想知道如何在线性时间内不哈希地解决这个问题?
EDIT:
编辑:
Sample:
示例:
Input: n=2 b=2 k=3
输入:n = 2 b = 2 k = 3
Aarray: 2 2 2 3 3 3 1 1
数组:2 2 2 2 3 3 3 1 1
Output: 1
输出:1
5 个解决方案
#1
9
An idea using cyclic groups.
使用循环群的想法。
To guess i-th bit of answer, follow this procedure:
要猜测第i位答案,请按照以下步骤:
- Count how many numbers in array has i-th bit set, store as
cnt
- 数数数组中有多少个数字设置为第i位,存储为cnt
- If
cnt % k
is non-zero, then i-th bit of answer is set. Otherwise it is clear. - 如果cnt % k是非零的,那么就设置了i-th比特,否则就很清楚了。
To guess whole number, repeat the above for every bit.
要猜出整个数字,请对每一比特重复以上内容。
This solution is technically O((n*k+b)*log max N)
, where max N
is maximal value in the table, but because number of bits is usually constant, this solution is linear in array size.
这个解在技术上是O((n*k+b)*log max n),其中max n是表中的最大值,但是因为位的个数通常是常数,所以这个解在数组大小上是线性的。
No hashing, memory usage is O(log k * log max N)
.
没有哈希,内存使用是O(log k * log max N)。
Example implementation:
示例实现:
from random import randint, shuffle
def generate_test_data(n, k, b):
k_rep = [randint(0, 1000) for i in xrange(n)]
b_rep = [randint(0, 1000)]
numbers = k_rep*k + b_rep*b
shuffle(numbers)
print "k_rep: ", k_rep
print "b_rep: ", b_rep
return numbers
def solve(data, k):
cnts = [0]*10
for number in data:
bits = [number >> b & 1 for b in xrange(10)]
cnts = [cnts[i] + bits[i] for i in xrange(10)]
return reduce(lambda a,b:2*a+(b%k>0), reversed(cnts), 0)
print "Answer: ", solve(generate_test_data(10, 15, 13), 3)
#2
11
I assume:
我认为:
- The elements of the array are comparable.
- 数组的元素是可比较的。
- We know the values of n and k beforehand.
- 我们事先知道n和k的值。
- A solution O(n*k+b) is good enough.
- 解O(n*k+b)就足够了。
Let the number occuring only b times be S. We are trying to find the S in an array of n*k+b size.
让只出现b的数为S,我们试图在n*k+b的数组中找到S。
Recursive Step: Find the median element of the current array slice as in Quick Sort in lineer time. Let the median element be M.
递归步骤:找到当前数组切片的中值元素,就像在lineer时间中快速排序一样。中位数是M。
After the recursive step you have an array where all elements smaller than M occur on the left of the first occurence of M. All M elements are next to each other and all element larger than M are on the right of all occurences of M.
在递归步骤之后,你会有一个数组,其中所有小于M的元素都出现在M的第一次出现的左边,所有M元素都是相邻的,所有大于M的元素都在M的所有发生的右边。
Look at the index of the leftmost M and calculate whether S<M or S>=M. Recurse either on the left slice or the right slice.
看最左边的M的指数,计算S
So you are doing a quick sort but delving only one part of the divisions at any time. You will recurse O(logN) times but each time with 1/2, 1/4, 1/8, .. sizes of the original array, so the total time will still be O(n).
所以你在做一个快速排序,但在任何时候只进行一个部分的划分。递归O(logN)次但每次都是1/2,1/4,1/8。原始数组的大小,所以总时间仍然是O(n)
Clarification: Let's say n=20 and k = 10. Then, there are 21 distinct elements in the array, 20 of which occur 10 times and the last occur let's say 7 times. I find the medium element, let's say it is 1111. If the S<1111 than the index of the leftmost occurence of 1111 will be less than 11*10. If S>=1111 then the index will be equal to 11*10.
说明:假设n=20 k = 10。然后,数组中有21个不同的元素,其中20个发生了10次,最后一次发生了7次。找到中间元素,假设是1111。如果S<1111,则最左发生率为1111的指数将小于11*10。如果S>=1111,那么指数将等于11*10。
Full example: n = 4. k = 3. Array = {1,2,3,4,5,1,2,3,4,5,1,2,3,5} After the first recursive step I find the median element is 3 and the array is something like: {1,2,1,2,1,2,3,3,3,5,4,5,5,4} There are 6 elements on the left of 3. 6 is a multiple of k=3. So each element must be occuring 3 times there. So S>=3. Recurse on the right side. And so on.
完整的例子:n = 4。k = 3。数组={1、2、3、4、5、1、2、4、5、1、2、3、5}在第一个递归步骤之后,我发现中位数元素是3,数组是这样的:{1、2、1、2、2、3、3、5、5、5、4}在3的左边有6个元素。6是k=3的倍数。所以每个元素必须出现3次。所以年代> = 3。在右边递归。等等。
#3
4
In order to have a constant height B-tree containing n
distinct elements, with height h
constant, you need z=n^(1/h)
children per nodes: h=log_z(n)
, thus h=log(n)/log(z)
, thus log(z)=log(n)/h
, thus z=e^(log(n)/h)
, thus z=n^(1/h)
.
为了有一个恒定的高度b -树包含n个不同的元素,用高度h常数,你需要z = n ^(1 / h)孩子每节点:h = log_z(n),因此h = log(n)/ log(z),因此日志(z)= log(n)/ h,因此z = e ^(log(n)/ h),因此z = n ^(1 / h)。
Example, with n=1000000
, h=10
, z=3.98
, that is z=4
.
例如,n=1000000, h=10, z=3.98,即z=4。
The time to reach a node in that case is O(h.log(z))
. Assuming h
and z
to be "constant" (since N=n.k
, then log(z)=log(n^(1/h))=log(N/k^(1/h))=ct
by properly choosing h
based on k
, you can then say that O(h.log(z))=O(1)
... This is a bit far-fetched, but maybe that was the kind of thing the interviewer wanted to hear?
在这种情况下,到达节点的时间是O(h.log(z))。假设h和z是“常数”(因为N= N。k,然后log(z)= log(n ^(1 / h))= log(n / k ^(1 / h))= ct通过正确选择基于k h,然后说O(h.log(z))= O(1)……这听起来有点牵强,但也许这正是面试官想要听到的?
#4
2
UPDATE: this one use hashing, so it's not a good answer :(
更新:这个使用散列,所以它不是一个好的答案:(
in python this would be linear time (set
will remove the duplicates):
在python中,这是线性时间(set将删除副本):
result = (sum(set(arr))*k - sum(arr)) / (k - b)
#5
0
If 'k' is even and 'b' is odd, then XOR will do. :)
如果k是偶数,b是奇数,那么x就可以了。:)
#1
9
An idea using cyclic groups.
使用循环群的想法。
To guess i-th bit of answer, follow this procedure:
要猜测第i位答案,请按照以下步骤:
- Count how many numbers in array has i-th bit set, store as
cnt
- 数数数组中有多少个数字设置为第i位,存储为cnt
- If
cnt % k
is non-zero, then i-th bit of answer is set. Otherwise it is clear. - 如果cnt % k是非零的,那么就设置了i-th比特,否则就很清楚了。
To guess whole number, repeat the above for every bit.
要猜出整个数字,请对每一比特重复以上内容。
This solution is technically O((n*k+b)*log max N)
, where max N
is maximal value in the table, but because number of bits is usually constant, this solution is linear in array size.
这个解在技术上是O((n*k+b)*log max n),其中max n是表中的最大值,但是因为位的个数通常是常数,所以这个解在数组大小上是线性的。
No hashing, memory usage is O(log k * log max N)
.
没有哈希,内存使用是O(log k * log max N)。
Example implementation:
示例实现:
from random import randint, shuffle
def generate_test_data(n, k, b):
k_rep = [randint(0, 1000) for i in xrange(n)]
b_rep = [randint(0, 1000)]
numbers = k_rep*k + b_rep*b
shuffle(numbers)
print "k_rep: ", k_rep
print "b_rep: ", b_rep
return numbers
def solve(data, k):
cnts = [0]*10
for number in data:
bits = [number >> b & 1 for b in xrange(10)]
cnts = [cnts[i] + bits[i] for i in xrange(10)]
return reduce(lambda a,b:2*a+(b%k>0), reversed(cnts), 0)
print "Answer: ", solve(generate_test_data(10, 15, 13), 3)
#2
11
I assume:
我认为:
- The elements of the array are comparable.
- 数组的元素是可比较的。
- We know the values of n and k beforehand.
- 我们事先知道n和k的值。
- A solution O(n*k+b) is good enough.
- 解O(n*k+b)就足够了。
Let the number occuring only b times be S. We are trying to find the S in an array of n*k+b size.
让只出现b的数为S,我们试图在n*k+b的数组中找到S。
Recursive Step: Find the median element of the current array slice as in Quick Sort in lineer time. Let the median element be M.
递归步骤:找到当前数组切片的中值元素,就像在lineer时间中快速排序一样。中位数是M。
After the recursive step you have an array where all elements smaller than M occur on the left of the first occurence of M. All M elements are next to each other and all element larger than M are on the right of all occurences of M.
在递归步骤之后,你会有一个数组,其中所有小于M的元素都出现在M的第一次出现的左边,所有M元素都是相邻的,所有大于M的元素都在M的所有发生的右边。
Look at the index of the leftmost M and calculate whether S<M or S>=M. Recurse either on the left slice or the right slice.
看最左边的M的指数,计算S
So you are doing a quick sort but delving only one part of the divisions at any time. You will recurse O(logN) times but each time with 1/2, 1/4, 1/8, .. sizes of the original array, so the total time will still be O(n).
所以你在做一个快速排序,但在任何时候只进行一个部分的划分。递归O(logN)次但每次都是1/2,1/4,1/8。原始数组的大小,所以总时间仍然是O(n)
Clarification: Let's say n=20 and k = 10. Then, there are 21 distinct elements in the array, 20 of which occur 10 times and the last occur let's say 7 times. I find the medium element, let's say it is 1111. If the S<1111 than the index of the leftmost occurence of 1111 will be less than 11*10. If S>=1111 then the index will be equal to 11*10.
说明:假设n=20 k = 10。然后,数组中有21个不同的元素,其中20个发生了10次,最后一次发生了7次。找到中间元素,假设是1111。如果S<1111,则最左发生率为1111的指数将小于11*10。如果S>=1111,那么指数将等于11*10。
Full example: n = 4. k = 3. Array = {1,2,3,4,5,1,2,3,4,5,1,2,3,5} After the first recursive step I find the median element is 3 and the array is something like: {1,2,1,2,1,2,3,3,3,5,4,5,5,4} There are 6 elements on the left of 3. 6 is a multiple of k=3. So each element must be occuring 3 times there. So S>=3. Recurse on the right side. And so on.
完整的例子:n = 4。k = 3。数组={1、2、3、4、5、1、2、4、5、1、2、3、5}在第一个递归步骤之后,我发现中位数元素是3,数组是这样的:{1、2、1、2、2、3、3、5、5、5、4}在3的左边有6个元素。6是k=3的倍数。所以每个元素必须出现3次。所以年代> = 3。在右边递归。等等。
#3
4
In order to have a constant height B-tree containing n
distinct elements, with height h
constant, you need z=n^(1/h)
children per nodes: h=log_z(n)
, thus h=log(n)/log(z)
, thus log(z)=log(n)/h
, thus z=e^(log(n)/h)
, thus z=n^(1/h)
.
为了有一个恒定的高度b -树包含n个不同的元素,用高度h常数,你需要z = n ^(1 / h)孩子每节点:h = log_z(n),因此h = log(n)/ log(z),因此日志(z)= log(n)/ h,因此z = e ^(log(n)/ h),因此z = n ^(1 / h)。
Example, with n=1000000
, h=10
, z=3.98
, that is z=4
.
例如,n=1000000, h=10, z=3.98,即z=4。
The time to reach a node in that case is O(h.log(z))
. Assuming h
and z
to be "constant" (since N=n.k
, then log(z)=log(n^(1/h))=log(N/k^(1/h))=ct
by properly choosing h
based on k
, you can then say that O(h.log(z))=O(1)
... This is a bit far-fetched, but maybe that was the kind of thing the interviewer wanted to hear?
在这种情况下,到达节点的时间是O(h.log(z))。假设h和z是“常数”(因为N= N。k,然后log(z)= log(n ^(1 / h))= log(n / k ^(1 / h))= ct通过正确选择基于k h,然后说O(h.log(z))= O(1)……这听起来有点牵强,但也许这正是面试官想要听到的?
#4
2
UPDATE: this one use hashing, so it's not a good answer :(
更新:这个使用散列,所以它不是一个好的答案:(
in python this would be linear time (set
will remove the duplicates):
在python中,这是线性时间(set将删除副本):
result = (sum(set(arr))*k - sum(arr)) / (k - b)
#5
0
If 'k' is even and 'b' is odd, then XOR will do. :)
如果k是偶数,b是奇数,那么x就可以了。:)