从数字的差异中得到尽可能低的和。

时间:2022-06-24 11:37:29

I have to find the lowest possible sum from numbers' difference.

我必须从数字的差异中找出最小可能的和。

Let's say I have 4 numbers. 1515, 1520, 1500 and 1535. The lowest sum of difference is 30, because 1535 - 1520 = 15 && 1515 - 1500 = 15 and 15 + 15 = 30. If I would do like this: 1520 - 1515 = 5 && 1535 - 1500 = 35 it would be 40 in sum.

假设有4个数。1515,1520,1500和1535。最小的差值是30,因为1535 - 1520 = 15,1515 - 1500 = 15,15 + 15 = 30。如果我这样做:1520 - 1515 = 5 & 1535 - 1500 = 35,总共是40。

Hope you got it, if not, ask me.

希望你得到了,如果不是,问我。

Any ideas how to program this? I just found this online, tried to translate from my language to English. It sounds interesting. I can't do bruteforce, because it would take ages to compile. I don't need code, just ideas how to program or little fragment of code.

有什么计划吗?我刚刚在网上找到了这个,试着把我的语言翻译成英语。它听起来很有趣。我不能做bruteforce,因为编译需要很长的时间。我不需要代码,只是想法如何编程或代码片段。

Thanks.

谢谢。

Edit: I didn't post everything... One more edition:

编辑:我没有把所有的…一个版本:

I have let's say 8 possible numbers. But I have to take only 6 of them to make the smallest sum. For instance, numbers 1731, 1572, 2041, 1561, 1682, 1572, 1609, 1731, the smallest sum will be 48, but here I have to take only 6 numbers from 8.

假设有8个可能的数字。但我只需要6个,就能得到最小的和。例如,数字1731,1572,2041,1561,1682,1572,1609,1731,最小的和是48,但这里我只能从8取6个数。

10 个解决方案

#1


13  

The solution by marcog is a correct, non-recursive, polynomial-time solution to the problem — it's a pretty standard DP problem — but, just for completeness, here's a proof that it works, and actual code for the problem. [@marcog: Feel free to copy any part of this answer into your own if you wish; I'll then delete this.]

marcog的解决方案是一个正确的、非递归的、多项式时间的解决方案,它是一个非常标准的DP问题——但是,为了完整起见,这里有一个证明它可以工作,以及实际的问题代码。如果你愿意,可以*地将这个答案的任何部分复制到你自己的地方;我将删除这个。)

Proof

Let the list be x1, …, xN. Assume wlog that the list is sorted. We're trying to find K (disjoint) pairs of elements from the list, such that the sum of their differences is minimised.

让列表为x1,…,xN,假设wlog列表已排序。我们试图从列表中找到K (disjoint)对元素,这样它们之间的差异就可以最小化。

Claim: An optimal solution always consists of the differences of consecutive elements.
Proof: Suppose you fix the subset of elements whose differences are taken. Then by the proof given by Jonas Kölker, the optimal solution for just this subset consists of differences of consecutive elements from the list. Now suppose there is a solution corresponding to a subset that does not comprise pairs of consecutive elements, i.e. the solution involves a difference xj-xi where j>i+1. Then, we can replace xj with xi+1 to get a smaller difference, since
xi ≤ xi+1 ≤ xj ⇒ xi+1-xi ≤ xj-xi.
(Needless to say, if xi+1=xj, then taking xi+1 is indistinguishable from taking xj.) This proves the claim.

声明:一个最优解总是由连续元素的不同组成。证明:假设您修复了所使用的元素的子集。然后通过Jonas Kolker给出的证明,这个子集的最优解包含了列表中连续元素的不同。现在假设有一个解对应于一个子集,它不构成连续的元素对,也就是说,解涉及到一个不同的xj-xi,其中j>i+1。然后,我们可以代替xj与*+ 1得到一个较小的差异,因为*≤*+ 1≤xj⇒xi + 1-xi≤xj-xi。(不用说,如果xi+1=xj,那么取xi+1与取xj是没有区别的。)这证明了这一说法。

The rest is just routine dynamic programming stuff: the optimal solution using k pairs from the first n elements either doesn't use the nth element at all (in which case it's just the optimal solution using k pairs from the first n-1), or it uses the nth element in which case it's the difference xn-xn-1 plus the optimal solution using k-1 pairs from the first n-2.

剩下的只是常规动态规划的东西:使用k条最优解决方案从第n个元素要么不使用第n个元素(在这种情况下,它只是使用k条最优解决方案从第n - 1),或者使用第n个元素在这种情况下,它是区别xn-xn-1加上使用从第一个n - k - 1对最优解。

The whole program runs in time O(N log N + NK), as marcog says. (Sorting + DP.)

整个程序运行时间O(N log N + NK),就像marcog说的。(排序+ DP)。

Code

Here's a complete program. I was lazy with initializing arrays and wrote Python code using dicts; this is a small log(N) factor over using actual arrays.

这是一个完整的程序。我很懒,初始化数组并使用dicts编写Python代码;这是一个使用实际数组的小日志(N)因子。

'''
The minimum possible sum|x_i - x_j| using K pairs (2K numbers) from N numbers
'''
import sys
def ints(): return [int(s) for s in sys.stdin.readline().split()]

N, K = ints()
num = sorted(ints())

best = {} #best[(k,n)] = minimum sum using k pairs out of 0 to n
def b(k,n):
    if best.has_key((k,n)): return best[(k,n)]
    if k==0: return 0
    return float('inf')

for n in range(1,N):
    for k in range(1,K+1):
        best[(k,n)] = min([b(k,n-1),                      #Not using num[n]
                           b(k-1,n-2) + num[n]-num[n-1]]) #Using num[n]

print best[(K,N-1)]

Test it:

测试:

Input
4 2
1515 1520 1500 1535
Output
30

Input
8 3
1731 1572 2041 1561 1682 1572 1609 1731
Output
48

#2


13  

Taking the edit into account:

将编辑考虑在内:

Start by sorting the list. Then use a dynamic programming solution, with state i, n representing the minimum sum of n differences when considering only the first i numbers in the sequence. Initial states: dp[*][0] = 0, everything else = infinity. Use two loops: outer loop looping through i from 1 to N, inner loop looping through n from 0 to R (3 in your example case in your edit - this uses 3 pairs of numbers which means 6 individual numbers). Your recurrence relation is dp[i][n] = min(dp[i-1][n], dp[i-2][n-1] + seq[i] - seq[i-1]).

首先对列表进行排序。然后使用动态规划解决方案,在只考虑序列中第一个i号的情况下,用状态i, n表示最小和n的差异。初始状态:dp[*][0] = 0,其他一切=无穷大。使用两个循环:外部循环遍历i从1到N,内部循环从0循环到N,从0到R(在您的编辑示例中是3),它使用3对数字,这意味着6个单独的数字。你的递归关系dp[我][n]= min(dp(张)[n],dp[我2](n - 1)+ seq[我]- seq[张])。

You have to be aware of handling boundary cases which I've ignored, but the general idea should work and will run in O(N log N + NR) and use O(NR) space.

您必须注意处理我忽略的边界情况,但是一般的想法应该工作,并将在O(N log N + NR)中运行,并使用O(NR)空间。

#3


9  

I assume the general problem is this: given a list of 2n integers, output a list of n pairs, such that the sum of |x - y| over all pairs (x, y) is as small as possible.

我假设一般的问题是这样的:给定一个2n个整数的列表,输出一个n对的列表,这样|x - y|对所有对(x, y)的和就越小越好。

In that case, the idea would be:

在这种情况下,想法是:

  • sort the numbers
  • 对数字进行排序
  • emit (numbers[2k], numbers[2k+1]) for k = 0, ..., n - 1.
  • 发射(数字[2k],数字[2k+1]) k = 0,…,n - 1所示。

This works. Proof:

这个作品。证明:

Suppose you have x_1 < x_2 < x_3 < x_4 (possibly with other values between them) and output (x_1, x_3) and (x_2, x_4). Then

假设您有x_1 < x_2 < x_3 < x_4(可能是它们之间的其他值)和输出(x_1、x_3)和(x_2, x_4)。然后

|x_4 - x_2| + |x_3 - x_1| = |x_4 - x_3| + |x_3 - x_2| + |x_3 - x_2| + |x_2 - x_1| >= |x_4 - x_3| + |x_2 - x_1|.

| x_4——x_2 | + | x_3——x_1 | = | x_4——x_3 | + | x_3——x_2 | + | x_3——x_2 | + | x_2——x_1 | > = | x_4——x_3 | + | x_2——x_1 |。

In other words, it's always better to output (x_1, x_2) and (x_3, x_4) because you don't redundantly cover the space between x_2 and x_3 twice. By induction, the smallest number of the 2n must be paired with the second smallest number; by induction on the rest of the list, pairing up smallest neighbours is always optimal, so the algorithm sketch I proposed is correct.

换句话说,输出(x_1、x_2)和(x_3, x_4)总是更好的,因为您没有冗余地覆盖x_2和x_3两次的空间。通过归纳法,2n中最小的数必须与第二个最小的数配对;通过对列表其余部分的归纳,对最小的邻居进行配对总是最优的,所以我提出的算法草图是正确的。

#4


6  

Order the list, then do the difference calculation.

依次排序,然后进行差分计算。

EDIT: hi @hey

编辑:你好@hey

You can solve the problem using dynamic programming.

你可以用动态规划来解决这个问题。

Say you have a list L of N integers, you must form k pairs (with 2*k <= N)

假设你有一个列表L的N个整数,你必须形成k对(2*k <= N)

Build a function that finds the smallest difference within a list (if the list is sorted, it will be faster ;) call it smallest(list l)

构建一个函数,该函数在列表中发现最小的差异(如果列表被排序,它将会更快;)调用它最小(list l)

Build another one that finds the same for two pairs (can be tricky, but doable) and call it smallest2(list l)

构建另一个同样适用于两对(可能比较棘手,但可行)的方法,并将其命名为smallest2(list l)

Let's define best(int i, list l) the function that gives you the best result for i pairs within the list l

让我们定义最佳(int i, list l)函数,它为我在列表l中对i组提供最好的结果。

The algorithm goes as follows:

该算法如下:

  1. best(1, L) = smallest(L)
  2. 最好的最小(L)=(左)
  3. best(2, L) = smallest2(L)
  4. 最好(2 L)= smallest2(左)
  5. for i from 1 to k:
  6. i从1到k:

loop

循环

compute min ( 
    stored_best(i-2) - smallest2( stored_remainder(i-2) ),
    stored_best(i-1) - smallest( stored_remainder(i-1) 
) and store as best(i)
store the remainder as well for the chosen solution

Now, the problem is once you have chosen a pair, the two ints that form the boundaries are reserved and can't be used to form a better solution. But by looking two levels back you can guaranty you have allowed switching candidates.

现在,问题是一旦你选择了一对,形成边界的两个ints是保留的,不能用来形成更好的解决方案。但是,如果你把两层都看回去,你就可以保证你已经允许交换候选人了。

(The switching work is done by smallest2)

(切换工作由smallest2完成)

#5


2  

Step 1: Calculate pair differences

步骤1:计算对差。

I think it is fairly obvious that the right approach is to sort the numbers and then take differences between each adjacent pair of numbers. These differences are the "candidate" differences contributing to the minimal difference sum. Using the numbers from your example would lead to:

我认为很明显,正确的方法是对数字进行排序,然后在相邻的两个数字之间取不同的值。这些差异是“候选”差异导致最小差异和。使用你的例子中的数字会导致:

Number Diff
====== ====
1561
        11
1572
         0
1572
        37
1609
        73
1682
        49
1731
         0
1731
       310
2041

Save the differences into an array or table or some other data structure where you can maintain the differences and the two numbers that contributed to each difference. Call this the DiffTable. It should look something like:

将差异保存到一个数组或表或其他数据结构中,这样您就可以保持差异,并且两个数字对每个差异都有贡献。称之为DiffTable。它应该是这样的:

Index Diff Number1 Number2
===== ==== ======= =======
  1     11    1561    1572
  2      0    1572    1572
  3     37    1572    1609
  4     73    1609    1682
  5     49    1682    1731
  6      0    1731    1731
  7    310    1731    2041

Step 2: Choose minimal Differences

第二步:选择最小的差异。

If all numbers had to be chosen, we could have stopped at step 1 by choosing the number pair for odd numbered indices: 1, 3, 5, 7. This is the correct answer. However, the problem states that a subset of pairs are chosen and this complicates the problem quite a bit. In your example 3 differences (6 numbers = 3 pairs = 3 differences) need to be chosen such that:

如果要选择所有数字,我们可以在第1步中选择奇数项的数字对:1、3、5、7。这是正确答案。然而,问题是选择了一个子集,这使问题变得复杂了一些。在你的例子中,需要选择3个不同(6个数字= 3对= 3个差异):

  • The sum of the differences is minimal
  • 差异的总和是最小的。
  • The numbers participating in any chosen difference are removed from the list.
  • 参与任何选择的差异的数字将从列表中删除。

The second point means that if we chose Diff 11 (Index = 1 above), the numbers 1561 and 1572 are removed from the list, and consequently, the next Diff of 0 at index 2 cannot be used because only 1 instance of 1572 is left. Whenever a Diff is chosen the adjacent Diff values are removed. This is why there is only one way to choose 4 pairs of numbers from a list containing eight numbers.

第二点意味着如果我们选择了Diff 11 (Index = 1),那么1561和1572的数字将从列表中删除,因此,索引2中的下一个Diff不能被使用,因为只剩下1572的一个实例。当选择一个Diff时,将删除相邻的Diff值。这就是为什么只有一种方法可以从包含8个数字的列表中选择4对数字。

About the only method I can think of to minimize the sum of the Diff above is to generate and test.

关于我能想到的唯一方法,我能想到的唯一方法就是生成和测试上面的Diff。

The following pseudo code outlines a process to generate all 'legal' sets of index values for a DiffTable of arbitrary size where an arbitrary number of number pairs are chosen. One (or more) of the generated index sets will contain the indices into the DiffTable yielding a minimum Diff sum.

下面的pseudo代码概括了一个过程,该过程生成一个任意大小的、任意数量的数字对的所有“合法”的索引值集合。生成的索引集的一个(或多个)将包含索引到可产生最小Diff和的扩展中。

/* Global Variables */
M = 7    /* Number of candidate pair differences in DiffTable */
N = 3    /* Number of indices in each candidate pair set (3 pairs of numbers) */
AllSets = [] /* Set of candidate index sets (set of sets) */

call GenIdxSet(1, []) /* Call generator with seed values */

/* AllSets now contains candidate index sets to perform min sum tests on */

end

procedure: GenIdxSet(i, IdxSet)
  /* Generate all the valid index values for current level */
  /* and subsequent levels until a complete index set is generated */
  do while i <= M
     if CountMembers(IdxSet) = N - 1 then  /* Set is complete */
        AllSets = AppendToSet(AllSets, AppendToSet(IdxSet, i))
     else                                  /* Add another index */
       call GenIdxSet(i + 2, AppendToSet(IdxSet, i))
     i = i + 1
     end
return

Function CountMembers returns the number of members in the given set, function AppendToSet returns a new set where the arguments are appended into a single ordered set. For example AppendToSet([a, b, c], d) returns the set: [a, b, c, d].

函数CountMembers返回给定集合中成员的数量,函数AppendToSet返回一个新的集合,其中的参数被追加到一个有序集合中。

For the given parameters, M = 7 and N = 3, AllSets becomes:

对于给定的参数M = 7和N = 3, AllSets为:

[[1 3 5]
 [1 3 6]  <= Diffs = (11 + 37 + 0) = 48
 [1 3 7]
 [1 4 6]
 [1 4 7]
 [1 5 7]
 [2 4 6]
 [2 4 7]
 [2 5 7]
 [3 5 7]]

Calculate the sums using each set of indices, the one that is minimum identifies the required number pairs in DiffTable. Above I show that the second set of indices gives the minimum you are looking for.

用每一组指标计算总和,最小值确定了可扩散的所需的数对。上面我展示了第二组指数给出了你想要的最小值。

This is a simple brute force technique and it does not scale very well. If you had a list of 50 number pairs and wanted to choose the 5 pairs, AllSets would contain 1,221,759 sets of number pairs to test.

这是一种简单的蛮力技术,它的伸缩性不太好。如果您有一个50对的列表,并且想要选择5对,那么AllSets将包含1,221,759组数字对来进行测试。

#6


2  

I know you said you did not need code but it is the best way for me to describe a set based solution. The solution runs under SQL Server 2008. Included in the code is the data for the two examples you give. The sql solution could be done with a single self joining table but I find it easier to explain when there are multiple tables.

我知道您说过您不需要代码,但这是我描述一个基于集合的解决方案的最好方法。该解决方案在SQL Server 2008下运行。代码中包含两个示例的数据。sql解决方案可以通过一个单独的自连接表完成,但我发现在有多个表时更容易解释。

    --table 1 holds the values

declare @Table1 table (T1_Val int)
Insert @Table1 
--this data is test 1
--Select (1515) Union ALL
--Select (1520) Union ALL
--Select (1500) Union ALL
--Select (1535) 

--this data is test 2
Select (1731) Union ALL
Select (1572) Union ALL
Select (2041) Union ALL
Select (1561) Union ALL
Select (1682) Union ALL
Select (1572) Union ALL
Select (1609) Union ALL
Select (1731) 
--Select * from @Table1

--table 2 holds the sorted numbered list
Declare @Table2 table (T2_id int identity(1,1), T1_Val int)
Insert @Table2 Select T1_Val from @Table1 order by T1_Val

--table 3 will hold the sorted pairs
Declare @Table3 table (T3_id int identity(1,1), T21_id int, T21_Val int, T22_id int, T22_val int)
Insert @Table3
Select T2_1.T2_id, T2_1.T1_Val,T2_2.T2_id, T2_2.T1_Val from @Table2 AS T2_1
LEFT Outer join @Table2 AS T2_2 on T2_1.T2_id = T2_2.T2_id +1

--select * from @Table3
--remove odd numbered rows
delete from @Table3 where T3_id % 2 > 0 

--select * from @Table3
--show the diff values
--select *, ABS(T21_Val - T22_val) from @Table3
--show the diff values in order
--select *, ABS(T21_Val - T22_val) from @Table3 order by ABS(T21_Val - T22_val)
--display the two lowest
select TOP 2 CAST(T22_val as varchar(24)) + ' and ' + CAST(T21_val as varchar(24)) as 'The minimum difference pairs are'
, ABS(T21_Val - T22_val) as 'Difference'
from @Table3
ORDER by ABS(T21_Val - T22_val) 

#7


0  

I think @marcog's approach can be simplified further.

我认为@marcog的方法可以进一步简化。

Take the basic approach that @jonas-kolker proved for finding the smallest differences. Take the resulting list and sort it. Take the R smallest entries from this list and use them as your differences. Proving that this is the smallest sum is trivial.

采用@jonas-kolker所证明的基本方法来发现最小的差异。获取结果列表并对其排序。从这个列表中取R最小的项,并将它们作为您的不同之处。证明这是最小的和是无关紧要的。

@marcog's approach is effectively O(N^2) because R == N is a legit option. This approach should be (2*(N log N))+N aka O(N log N).

@marcog的方法实际上是O(N ^ 2)因为R = = N是一个合法的选择。这个方法应该是(2*(N log N))+N,也就是(N log N)。

This requires a small data structure to hold a difference and the values it was derived from. But, that is constant per entry. Thus, space is O(N).

这需要一个小的数据结构来保持不同和它的值。但这是常数项。因此,空间是O(N)。

#8


0  

I would go with answer of marcog, you can sort using any of the sorting algoriothms. But there is little thing to analyze now.

我想用marcog的答案,你可以用任何排序的海藻。但是现在没有什么可以分析的了。

If you have to choose R numbers out N numbers so that the sum of their differences is minimum then the numbers be chosen in a sequence without missing any numbers in between.

如果你要选择R个数字,那么它们之间的差异是最小的,那么这个数字就会被选在一个序列中而不会漏掉任何数字。

Hence after sorting the array you should run an outer loop from 0 to N-R and an inner loop from 0 to R-1 times to calculate the sum of differnces.

因此,在对数组进行排序后,您应该运行一个从0到N-R的外部循环,以及从0到R-1的内部循环来计算不同的总和。

If needed, you should try with some examples.

如果需要,您应该尝试一些示例。

#9


0  

I've taken an approach which uses a recursive algorithm, but it does take some of what other people have contributed.

我采用了一种使用递归算法的方法,但它确实需要一些其他人的贡献。

First of all we sort the numbers:

首先,我们对数字排序:

[1561,1572,1572,1609,1682,1731,1731,2041]

Then we compute the differences, keeping track of which the indices of the numbers that contributed to each difference:

然后我们计算出这些差异,记录下每个不同的数字的指数:

[(11,(0,1)),(0,(1,2)),(37,(2,3)),(73,(3,4)),(49,(4,5)),(0,(5,6)),(310,(6,7))]

So we got 11 by getting the difference between number at index 0 and number at index 1, 37 from the numbers at indices 2 & 3.

所以我们得到了11,得到了指数0和指数1之间的差值,从指数2和3的数字中得到37。

I then sorted this list, so it tells me which pairs give me the smallest difference:

然后我对这个列表进行排序,所以它告诉我哪些对给了我最小的区别:

[(0,(1,2)),(0,(5,6)),(11,(0,1)),(37,(2,3)),(49,(4,5)),(73,(3,4)),(310,(6,7))]

What we can see here is that, given that we want to select n numbers, a naive solution might be to select the first n / 2 items of this list. The trouble is, in this list the third item shares an index with the first, so we'd only actually get 5 numbers, not 6. In this case you need to select the fourth pair as well to get a set of 6 numbers.

我们可以看到,考虑到我们要选择n个数字,一个朴素的解决方案可能是选择这个列表的前n / 2项。问题是,在这个列表中,第三项与第一项共享一个索引,所以我们只得到5个数字,而不是6。在本例中,您需要选择第4对,并得到一组6个数字。

From here, I came up with this algorithm. Throughout, there is a set of accepted indices which starts empty, and there's a number of numbers left to select n:

从这里,我提出了这个算法。在整个过程中,有一组被接受的索引开始为空,并且有一些数字可以选择n:

  1. If n is 0, we're done.
  2. 如果n = 0,就做完了。
  3. if n is 1, and the first item will provide just 1 index which isn't in our set, we taken the first item, and we're done.
  4. 如果n是1,第一项只提供1个索引,而不是我们的集合,我们取第一项,然后就完成了。
  5. if n is 2 or more, and the first item will provide 2 indices which aren't in our set, we taken the first item, and we recurse (e.g. goto 1). This time looking for n - 2 numbers that make the smallest difference in the remainder of the list.
  6. 如果n是2或更多,第一项将提供2个不在我们集合中的索引,我们取第一个项,然后递归(例如goto 1),这一次寻找n - 2个数字,使列表其余部分的差异最小。

This is the basic routine, but life isn't that simple. There are cases we haven't covered yet, but make sure you get the idea before you move on.

这是基本的程序,但生活并不是那么简单。有些情况我们还没有讲到,但要确保在你继续前进之前你已经明白了。

Actually step 3 is wrong (found that just before I posted this :-/), as it may be unnecessary to include an early difference to cover indices which are covered by later, essential differences. The first example ([1515, 1520, 1500, 1535]) falls foul of this. Because of this I've thrown it away in the section below, and expanded step 4 to deal with it.

实际上,步骤3是错误的(在我发布这篇文章之前就发现了这一点:-/),因为可能没有必要将早期的差异包含在后面的主要差异中。第一个例子([1515,1520,1500,1535])与此不符。因为这个,我把它扔到下面的部分,并展开了第4步来处理它。

So, now we get to look at the special cases:

现在我们来看看特殊情况:

  1. ** as above **
  2. * * * *之上
  3. ** as above **
  4. * * * *之上
  5. If n is 1, but the first item will provide two indices, we can't select it. We have to throw that item away and recurse. This time we're still looking for n indices, and there have been no changes to our accepted set.
  6. 如果n是1,但第一项将提供两个指标,我们不能选择它。我们必须把那个东西扔掉,然后递归。这一次,我们仍然在寻找n个指标,并且我们的被接受的集合没有变化。
  7. If n is 2 or more, we have a choice. Either we can a) choose this item, and recurse looking for n - (1 or 2) indices, or b) skip this item, and recurse looking for n indices.
  8. 如果n等于2或者更多,我们可以选择。我们可以选择这个项,然后递归寻找n -(1或2)个索引,或者b)跳过这个项,递归寻找n个索引。

4 is where it gets tricky, and where this routine turns into a search rather than just a sorting exercise. How can we decide which branch (a or b) to take? Well, we're recursive, so let's call both, and see which one is better. How will we judge them?

4是它变得棘手的地方,并且这个例程变成了一个搜索而不仅仅是一个排序练习。我们如何决定选哪一科(a或b) ?我们是递归的,我们把两者都调用,看看哪个更好。我们将如何评价他们?

  • We'll want to take whichever branch produces the lowest sum.
  • 我们要取最小的那一个。
  • ...but only if it will use up the right number of indices.
  • …但前提是它要用正确的指数。

So step 4 becomes something like this (pseudocode):

所以步骤4变成了这样(伪代码):

x       = numberOfIndicesProvidedBy(currentDifference)
branchA = findSmallestDifference (n-x, remainingDifferences) // recurse looking for **n-(1 or 2)**
branchB = findSmallestDifference (n  , remainingDifferences) // recurse looking for **n** 
sumA    = currentDifference + sumOf(branchA)
sumB    =                     sumOf(branchB) 

validA  = indicesAddedBy(branchA) == n
validB  = indicesAddedBy(branchB) == n

if not validA && not validB then return an empty branch

if validA && not validB then return branchA
if validB && not validA then return branchB

// Here, both must be valid.
if sumA <= sumB then return branchA else return branchB

I coded this up in Haskell (because I'm trying to get good at it). I'm not sure about posting the whole thing, because it might be more confusing than useful, but here's the main part:

我在Haskell中编写了这个代码(因为我正在努力使它变得更好)。我不确定是否把整个事情都发布出去,因为它可能比有用的更让人困惑,但这里有一个主要部分:

findSmallestDifference = findSmallestDifference' Set.empty

findSmallestDifference' _     _ [] = []
findSmallestDifference' taken n (d:ds)
    | n == 0                = []    -- Case 1
    | n == 1 && provides1 d = [d]   -- Case 2
    | n == 1 && provides2 d = findSmallestDifference' taken n ds -- Case 3
    | provides0 d           = findSmallestDifference' taken n ds -- Case 3a (See Edit)
    | validA && not validB             = branchA -- Case 4
    | validB && not validA             = branchB -- Case 4
    | validA && validB && sumA <= sumB = branchA -- Case 4
    | validA && validB && sumB <= sumA = branchB -- Case 4
    | otherwise             = []                 -- Case 4
        where branchA = d : findSmallestDifference' (newTaken d) (n - (provides taken d)) ds
              branchB = findSmallestDifference' taken n ds
              sumA    = sumDifferences branchA
              sumB    = sumDifferences branchB
              validA  = n == (indicesTaken branchA)
              validB  = n == (indicesTaken branchA)
              newTaken x = insertIndices x taken 

Hopefully you can see all the cases there. That code(-ish), plus some wrapper produces this:

希望你们能看到所有的例子。该代码(-ish),加上一些包装器产生的:

*Main> findLeastDiff 6 [1731, 1572, 2041, 1561, 1682, 1572, 1609, 1731]
Smallest Difference found is 48
      1572 -   1572 =      0
      1731 -   1731 =      0
      1572 -   1561 =     11
      1609 -   1572 =     37
*Main> findLeastDiff 4 [1515, 1520, 1500,1535]
Smallest Difference found is 30
      1515 -   1500 =     15
      1535 -   1520 =     15

This has become long, but I've tried to be explicit. Hopefully it was worth while.

这已经很长了,但我已经试着明确了。希望这是值得的。


Edit : There is a case 3a that can be added to avoid some unnecessary work. If the current difference provides no additional indices, it can be skipped. This is taken care of in step 4 above, but there's no point in evaluating both halves of the tree for no gain. I've added this to the Haskell.

编辑:有一个案例3a可以添加,以避免一些不必要的工作。如果当前的差异不提供额外的索引,则可以跳过它。这在上面的第4步中得到了处理,但是没有必要对树的两部分进行评估。我把这个加到Haskell中。

#10


0  

Something like

类似的

  1. Sort List
  2. 排序列表
  3. Find Duplicates
  4. 发现重复的
  5. Make the duplicates a pair
  6. 复制一对。
  7. remove duplicates from list
  8. 从列表中删除重复的
  9. break rest of list into pairs
  10. 把列表的其余部分分成两组。
  11. calculate differences of each pair
  12. 计算每一对的不同。
  13. take lowest amounts
  14. 采取最低数量

In your example you have 8 number and need the best 3 pairs. First sort the list which gives you

在你的例子中,你有8个数字,需要最好的3对。首先排序给你的列表。

1561, 1572, 1572, 1609, 1682, 1731, 1731, 2041

If you have duplicates make them a pair and remove them from the list so you have

如果你有副本,那就把它们配对并从列表中删除。

[1572, 1572] = 0
[1731, 1731] = 0
L = { 1561, 1609, 1682, 2041 }

Break the remaining list into pairs, giving you the 4 following pairs

把剩下的列表分成几组,给你下面的4对。

[1572, 1572] = 0
[1731, 1731] = 0
[1561, 1609] = 48
[1682, 2041] = 359

Then drop the amount of numbers you need to.

然后把你需要的数字放掉。

This gives you the following 3 pairs with the lowest pairs

这给了你下面的3对和最低的对。

[1572, 1572] = 0
[1731, 1731] = 0
[1561, 1609] = 48

So

所以

0 + 0 + 48 = 48

#1


13  

The solution by marcog is a correct, non-recursive, polynomial-time solution to the problem — it's a pretty standard DP problem — but, just for completeness, here's a proof that it works, and actual code for the problem. [@marcog: Feel free to copy any part of this answer into your own if you wish; I'll then delete this.]

marcog的解决方案是一个正确的、非递归的、多项式时间的解决方案,它是一个非常标准的DP问题——但是,为了完整起见,这里有一个证明它可以工作,以及实际的问题代码。如果你愿意,可以*地将这个答案的任何部分复制到你自己的地方;我将删除这个。)

Proof

Let the list be x1, …, xN. Assume wlog that the list is sorted. We're trying to find K (disjoint) pairs of elements from the list, such that the sum of their differences is minimised.

让列表为x1,…,xN,假设wlog列表已排序。我们试图从列表中找到K (disjoint)对元素,这样它们之间的差异就可以最小化。

Claim: An optimal solution always consists of the differences of consecutive elements.
Proof: Suppose you fix the subset of elements whose differences are taken. Then by the proof given by Jonas Kölker, the optimal solution for just this subset consists of differences of consecutive elements from the list. Now suppose there is a solution corresponding to a subset that does not comprise pairs of consecutive elements, i.e. the solution involves a difference xj-xi where j>i+1. Then, we can replace xj with xi+1 to get a smaller difference, since
xi ≤ xi+1 ≤ xj ⇒ xi+1-xi ≤ xj-xi.
(Needless to say, if xi+1=xj, then taking xi+1 is indistinguishable from taking xj.) This proves the claim.

声明:一个最优解总是由连续元素的不同组成。证明:假设您修复了所使用的元素的子集。然后通过Jonas Kolker给出的证明,这个子集的最优解包含了列表中连续元素的不同。现在假设有一个解对应于一个子集,它不构成连续的元素对,也就是说,解涉及到一个不同的xj-xi,其中j>i+1。然后,我们可以代替xj与*+ 1得到一个较小的差异,因为*≤*+ 1≤xj⇒xi + 1-xi≤xj-xi。(不用说,如果xi+1=xj,那么取xi+1与取xj是没有区别的。)这证明了这一说法。

The rest is just routine dynamic programming stuff: the optimal solution using k pairs from the first n elements either doesn't use the nth element at all (in which case it's just the optimal solution using k pairs from the first n-1), or it uses the nth element in which case it's the difference xn-xn-1 plus the optimal solution using k-1 pairs from the first n-2.

剩下的只是常规动态规划的东西:使用k条最优解决方案从第n个元素要么不使用第n个元素(在这种情况下,它只是使用k条最优解决方案从第n - 1),或者使用第n个元素在这种情况下,它是区别xn-xn-1加上使用从第一个n - k - 1对最优解。

The whole program runs in time O(N log N + NK), as marcog says. (Sorting + DP.)

整个程序运行时间O(N log N + NK),就像marcog说的。(排序+ DP)。

Code

Here's a complete program. I was lazy with initializing arrays and wrote Python code using dicts; this is a small log(N) factor over using actual arrays.

这是一个完整的程序。我很懒,初始化数组并使用dicts编写Python代码;这是一个使用实际数组的小日志(N)因子。

'''
The minimum possible sum|x_i - x_j| using K pairs (2K numbers) from N numbers
'''
import sys
def ints(): return [int(s) for s in sys.stdin.readline().split()]

N, K = ints()
num = sorted(ints())

best = {} #best[(k,n)] = minimum sum using k pairs out of 0 to n
def b(k,n):
    if best.has_key((k,n)): return best[(k,n)]
    if k==0: return 0
    return float('inf')

for n in range(1,N):
    for k in range(1,K+1):
        best[(k,n)] = min([b(k,n-1),                      #Not using num[n]
                           b(k-1,n-2) + num[n]-num[n-1]]) #Using num[n]

print best[(K,N-1)]

Test it:

测试:

Input
4 2
1515 1520 1500 1535
Output
30

Input
8 3
1731 1572 2041 1561 1682 1572 1609 1731
Output
48

#2


13  

Taking the edit into account:

将编辑考虑在内:

Start by sorting the list. Then use a dynamic programming solution, with state i, n representing the minimum sum of n differences when considering only the first i numbers in the sequence. Initial states: dp[*][0] = 0, everything else = infinity. Use two loops: outer loop looping through i from 1 to N, inner loop looping through n from 0 to R (3 in your example case in your edit - this uses 3 pairs of numbers which means 6 individual numbers). Your recurrence relation is dp[i][n] = min(dp[i-1][n], dp[i-2][n-1] + seq[i] - seq[i-1]).

首先对列表进行排序。然后使用动态规划解决方案,在只考虑序列中第一个i号的情况下,用状态i, n表示最小和n的差异。初始状态:dp[*][0] = 0,其他一切=无穷大。使用两个循环:外部循环遍历i从1到N,内部循环从0循环到N,从0到R(在您的编辑示例中是3),它使用3对数字,这意味着6个单独的数字。你的递归关系dp[我][n]= min(dp(张)[n],dp[我2](n - 1)+ seq[我]- seq[张])。

You have to be aware of handling boundary cases which I've ignored, but the general idea should work and will run in O(N log N + NR) and use O(NR) space.

您必须注意处理我忽略的边界情况,但是一般的想法应该工作,并将在O(N log N + NR)中运行,并使用O(NR)空间。

#3


9  

I assume the general problem is this: given a list of 2n integers, output a list of n pairs, such that the sum of |x - y| over all pairs (x, y) is as small as possible.

我假设一般的问题是这样的:给定一个2n个整数的列表,输出一个n对的列表,这样|x - y|对所有对(x, y)的和就越小越好。

In that case, the idea would be:

在这种情况下,想法是:

  • sort the numbers
  • 对数字进行排序
  • emit (numbers[2k], numbers[2k+1]) for k = 0, ..., n - 1.
  • 发射(数字[2k],数字[2k+1]) k = 0,…,n - 1所示。

This works. Proof:

这个作品。证明:

Suppose you have x_1 < x_2 < x_3 < x_4 (possibly with other values between them) and output (x_1, x_3) and (x_2, x_4). Then

假设您有x_1 < x_2 < x_3 < x_4(可能是它们之间的其他值)和输出(x_1、x_3)和(x_2, x_4)。然后

|x_4 - x_2| + |x_3 - x_1| = |x_4 - x_3| + |x_3 - x_2| + |x_3 - x_2| + |x_2 - x_1| >= |x_4 - x_3| + |x_2 - x_1|.

| x_4——x_2 | + | x_3——x_1 | = | x_4——x_3 | + | x_3——x_2 | + | x_3——x_2 | + | x_2——x_1 | > = | x_4——x_3 | + | x_2——x_1 |。

In other words, it's always better to output (x_1, x_2) and (x_3, x_4) because you don't redundantly cover the space between x_2 and x_3 twice. By induction, the smallest number of the 2n must be paired with the second smallest number; by induction on the rest of the list, pairing up smallest neighbours is always optimal, so the algorithm sketch I proposed is correct.

换句话说,输出(x_1、x_2)和(x_3, x_4)总是更好的,因为您没有冗余地覆盖x_2和x_3两次的空间。通过归纳法,2n中最小的数必须与第二个最小的数配对;通过对列表其余部分的归纳,对最小的邻居进行配对总是最优的,所以我提出的算法草图是正确的。

#4


6  

Order the list, then do the difference calculation.

依次排序,然后进行差分计算。

EDIT: hi @hey

编辑:你好@hey

You can solve the problem using dynamic programming.

你可以用动态规划来解决这个问题。

Say you have a list L of N integers, you must form k pairs (with 2*k <= N)

假设你有一个列表L的N个整数,你必须形成k对(2*k <= N)

Build a function that finds the smallest difference within a list (if the list is sorted, it will be faster ;) call it smallest(list l)

构建一个函数,该函数在列表中发现最小的差异(如果列表被排序,它将会更快;)调用它最小(list l)

Build another one that finds the same for two pairs (can be tricky, but doable) and call it smallest2(list l)

构建另一个同样适用于两对(可能比较棘手,但可行)的方法,并将其命名为smallest2(list l)

Let's define best(int i, list l) the function that gives you the best result for i pairs within the list l

让我们定义最佳(int i, list l)函数,它为我在列表l中对i组提供最好的结果。

The algorithm goes as follows:

该算法如下:

  1. best(1, L) = smallest(L)
  2. 最好的最小(L)=(左)
  3. best(2, L) = smallest2(L)
  4. 最好(2 L)= smallest2(左)
  5. for i from 1 to k:
  6. i从1到k:

loop

循环

compute min ( 
    stored_best(i-2) - smallest2( stored_remainder(i-2) ),
    stored_best(i-1) - smallest( stored_remainder(i-1) 
) and store as best(i)
store the remainder as well for the chosen solution

Now, the problem is once you have chosen a pair, the two ints that form the boundaries are reserved and can't be used to form a better solution. But by looking two levels back you can guaranty you have allowed switching candidates.

现在,问题是一旦你选择了一对,形成边界的两个ints是保留的,不能用来形成更好的解决方案。但是,如果你把两层都看回去,你就可以保证你已经允许交换候选人了。

(The switching work is done by smallest2)

(切换工作由smallest2完成)

#5


2  

Step 1: Calculate pair differences

步骤1:计算对差。

I think it is fairly obvious that the right approach is to sort the numbers and then take differences between each adjacent pair of numbers. These differences are the "candidate" differences contributing to the minimal difference sum. Using the numbers from your example would lead to:

我认为很明显,正确的方法是对数字进行排序,然后在相邻的两个数字之间取不同的值。这些差异是“候选”差异导致最小差异和。使用你的例子中的数字会导致:

Number Diff
====== ====
1561
        11
1572
         0
1572
        37
1609
        73
1682
        49
1731
         0
1731
       310
2041

Save the differences into an array or table or some other data structure where you can maintain the differences and the two numbers that contributed to each difference. Call this the DiffTable. It should look something like:

将差异保存到一个数组或表或其他数据结构中,这样您就可以保持差异,并且两个数字对每个差异都有贡献。称之为DiffTable。它应该是这样的:

Index Diff Number1 Number2
===== ==== ======= =======
  1     11    1561    1572
  2      0    1572    1572
  3     37    1572    1609
  4     73    1609    1682
  5     49    1682    1731
  6      0    1731    1731
  7    310    1731    2041

Step 2: Choose minimal Differences

第二步:选择最小的差异。

If all numbers had to be chosen, we could have stopped at step 1 by choosing the number pair for odd numbered indices: 1, 3, 5, 7. This is the correct answer. However, the problem states that a subset of pairs are chosen and this complicates the problem quite a bit. In your example 3 differences (6 numbers = 3 pairs = 3 differences) need to be chosen such that:

如果要选择所有数字,我们可以在第1步中选择奇数项的数字对:1、3、5、7。这是正确答案。然而,问题是选择了一个子集,这使问题变得复杂了一些。在你的例子中,需要选择3个不同(6个数字= 3对= 3个差异):

  • The sum of the differences is minimal
  • 差异的总和是最小的。
  • The numbers participating in any chosen difference are removed from the list.
  • 参与任何选择的差异的数字将从列表中删除。

The second point means that if we chose Diff 11 (Index = 1 above), the numbers 1561 and 1572 are removed from the list, and consequently, the next Diff of 0 at index 2 cannot be used because only 1 instance of 1572 is left. Whenever a Diff is chosen the adjacent Diff values are removed. This is why there is only one way to choose 4 pairs of numbers from a list containing eight numbers.

第二点意味着如果我们选择了Diff 11 (Index = 1),那么1561和1572的数字将从列表中删除,因此,索引2中的下一个Diff不能被使用,因为只剩下1572的一个实例。当选择一个Diff时,将删除相邻的Diff值。这就是为什么只有一种方法可以从包含8个数字的列表中选择4对数字。

About the only method I can think of to minimize the sum of the Diff above is to generate and test.

关于我能想到的唯一方法,我能想到的唯一方法就是生成和测试上面的Diff。

The following pseudo code outlines a process to generate all 'legal' sets of index values for a DiffTable of arbitrary size where an arbitrary number of number pairs are chosen. One (or more) of the generated index sets will contain the indices into the DiffTable yielding a minimum Diff sum.

下面的pseudo代码概括了一个过程,该过程生成一个任意大小的、任意数量的数字对的所有“合法”的索引值集合。生成的索引集的一个(或多个)将包含索引到可产生最小Diff和的扩展中。

/* Global Variables */
M = 7    /* Number of candidate pair differences in DiffTable */
N = 3    /* Number of indices in each candidate pair set (3 pairs of numbers) */
AllSets = [] /* Set of candidate index sets (set of sets) */

call GenIdxSet(1, []) /* Call generator with seed values */

/* AllSets now contains candidate index sets to perform min sum tests on */

end

procedure: GenIdxSet(i, IdxSet)
  /* Generate all the valid index values for current level */
  /* and subsequent levels until a complete index set is generated */
  do while i <= M
     if CountMembers(IdxSet) = N - 1 then  /* Set is complete */
        AllSets = AppendToSet(AllSets, AppendToSet(IdxSet, i))
     else                                  /* Add another index */
       call GenIdxSet(i + 2, AppendToSet(IdxSet, i))
     i = i + 1
     end
return

Function CountMembers returns the number of members in the given set, function AppendToSet returns a new set where the arguments are appended into a single ordered set. For example AppendToSet([a, b, c], d) returns the set: [a, b, c, d].

函数CountMembers返回给定集合中成员的数量,函数AppendToSet返回一个新的集合,其中的参数被追加到一个有序集合中。

For the given parameters, M = 7 and N = 3, AllSets becomes:

对于给定的参数M = 7和N = 3, AllSets为:

[[1 3 5]
 [1 3 6]  <= Diffs = (11 + 37 + 0) = 48
 [1 3 7]
 [1 4 6]
 [1 4 7]
 [1 5 7]
 [2 4 6]
 [2 4 7]
 [2 5 7]
 [3 5 7]]

Calculate the sums using each set of indices, the one that is minimum identifies the required number pairs in DiffTable. Above I show that the second set of indices gives the minimum you are looking for.

用每一组指标计算总和,最小值确定了可扩散的所需的数对。上面我展示了第二组指数给出了你想要的最小值。

This is a simple brute force technique and it does not scale very well. If you had a list of 50 number pairs and wanted to choose the 5 pairs, AllSets would contain 1,221,759 sets of number pairs to test.

这是一种简单的蛮力技术,它的伸缩性不太好。如果您有一个50对的列表,并且想要选择5对,那么AllSets将包含1,221,759组数字对来进行测试。

#6


2  

I know you said you did not need code but it is the best way for me to describe a set based solution. The solution runs under SQL Server 2008. Included in the code is the data for the two examples you give. The sql solution could be done with a single self joining table but I find it easier to explain when there are multiple tables.

我知道您说过您不需要代码,但这是我描述一个基于集合的解决方案的最好方法。该解决方案在SQL Server 2008下运行。代码中包含两个示例的数据。sql解决方案可以通过一个单独的自连接表完成,但我发现在有多个表时更容易解释。

    --table 1 holds the values

declare @Table1 table (T1_Val int)
Insert @Table1 
--this data is test 1
--Select (1515) Union ALL
--Select (1520) Union ALL
--Select (1500) Union ALL
--Select (1535) 

--this data is test 2
Select (1731) Union ALL
Select (1572) Union ALL
Select (2041) Union ALL
Select (1561) Union ALL
Select (1682) Union ALL
Select (1572) Union ALL
Select (1609) Union ALL
Select (1731) 
--Select * from @Table1

--table 2 holds the sorted numbered list
Declare @Table2 table (T2_id int identity(1,1), T1_Val int)
Insert @Table2 Select T1_Val from @Table1 order by T1_Val

--table 3 will hold the sorted pairs
Declare @Table3 table (T3_id int identity(1,1), T21_id int, T21_Val int, T22_id int, T22_val int)
Insert @Table3
Select T2_1.T2_id, T2_1.T1_Val,T2_2.T2_id, T2_2.T1_Val from @Table2 AS T2_1
LEFT Outer join @Table2 AS T2_2 on T2_1.T2_id = T2_2.T2_id +1

--select * from @Table3
--remove odd numbered rows
delete from @Table3 where T3_id % 2 > 0 

--select * from @Table3
--show the diff values
--select *, ABS(T21_Val - T22_val) from @Table3
--show the diff values in order
--select *, ABS(T21_Val - T22_val) from @Table3 order by ABS(T21_Val - T22_val)
--display the two lowest
select TOP 2 CAST(T22_val as varchar(24)) + ' and ' + CAST(T21_val as varchar(24)) as 'The minimum difference pairs are'
, ABS(T21_Val - T22_val) as 'Difference'
from @Table3
ORDER by ABS(T21_Val - T22_val) 

#7


0  

I think @marcog's approach can be simplified further.

我认为@marcog的方法可以进一步简化。

Take the basic approach that @jonas-kolker proved for finding the smallest differences. Take the resulting list and sort it. Take the R smallest entries from this list and use them as your differences. Proving that this is the smallest sum is trivial.

采用@jonas-kolker所证明的基本方法来发现最小的差异。获取结果列表并对其排序。从这个列表中取R最小的项,并将它们作为您的不同之处。证明这是最小的和是无关紧要的。

@marcog's approach is effectively O(N^2) because R == N is a legit option. This approach should be (2*(N log N))+N aka O(N log N).

@marcog的方法实际上是O(N ^ 2)因为R = = N是一个合法的选择。这个方法应该是(2*(N log N))+N,也就是(N log N)。

This requires a small data structure to hold a difference and the values it was derived from. But, that is constant per entry. Thus, space is O(N).

这需要一个小的数据结构来保持不同和它的值。但这是常数项。因此,空间是O(N)。

#8


0  

I would go with answer of marcog, you can sort using any of the sorting algoriothms. But there is little thing to analyze now.

我想用marcog的答案,你可以用任何排序的海藻。但是现在没有什么可以分析的了。

If you have to choose R numbers out N numbers so that the sum of their differences is minimum then the numbers be chosen in a sequence without missing any numbers in between.

如果你要选择R个数字,那么它们之间的差异是最小的,那么这个数字就会被选在一个序列中而不会漏掉任何数字。

Hence after sorting the array you should run an outer loop from 0 to N-R and an inner loop from 0 to R-1 times to calculate the sum of differnces.

因此,在对数组进行排序后,您应该运行一个从0到N-R的外部循环,以及从0到R-1的内部循环来计算不同的总和。

If needed, you should try with some examples.

如果需要,您应该尝试一些示例。

#9


0  

I've taken an approach which uses a recursive algorithm, but it does take some of what other people have contributed.

我采用了一种使用递归算法的方法,但它确实需要一些其他人的贡献。

First of all we sort the numbers:

首先,我们对数字排序:

[1561,1572,1572,1609,1682,1731,1731,2041]

Then we compute the differences, keeping track of which the indices of the numbers that contributed to each difference:

然后我们计算出这些差异,记录下每个不同的数字的指数:

[(11,(0,1)),(0,(1,2)),(37,(2,3)),(73,(3,4)),(49,(4,5)),(0,(5,6)),(310,(6,7))]

So we got 11 by getting the difference between number at index 0 and number at index 1, 37 from the numbers at indices 2 & 3.

所以我们得到了11,得到了指数0和指数1之间的差值,从指数2和3的数字中得到37。

I then sorted this list, so it tells me which pairs give me the smallest difference:

然后我对这个列表进行排序,所以它告诉我哪些对给了我最小的区别:

[(0,(1,2)),(0,(5,6)),(11,(0,1)),(37,(2,3)),(49,(4,5)),(73,(3,4)),(310,(6,7))]

What we can see here is that, given that we want to select n numbers, a naive solution might be to select the first n / 2 items of this list. The trouble is, in this list the third item shares an index with the first, so we'd only actually get 5 numbers, not 6. In this case you need to select the fourth pair as well to get a set of 6 numbers.

我们可以看到,考虑到我们要选择n个数字,一个朴素的解决方案可能是选择这个列表的前n / 2项。问题是,在这个列表中,第三项与第一项共享一个索引,所以我们只得到5个数字,而不是6。在本例中,您需要选择第4对,并得到一组6个数字。

From here, I came up with this algorithm. Throughout, there is a set of accepted indices which starts empty, and there's a number of numbers left to select n:

从这里,我提出了这个算法。在整个过程中,有一组被接受的索引开始为空,并且有一些数字可以选择n:

  1. If n is 0, we're done.
  2. 如果n = 0,就做完了。
  3. if n is 1, and the first item will provide just 1 index which isn't in our set, we taken the first item, and we're done.
  4. 如果n是1,第一项只提供1个索引,而不是我们的集合,我们取第一项,然后就完成了。
  5. if n is 2 or more, and the first item will provide 2 indices which aren't in our set, we taken the first item, and we recurse (e.g. goto 1). This time looking for n - 2 numbers that make the smallest difference in the remainder of the list.
  6. 如果n是2或更多,第一项将提供2个不在我们集合中的索引,我们取第一个项,然后递归(例如goto 1),这一次寻找n - 2个数字,使列表其余部分的差异最小。

This is the basic routine, but life isn't that simple. There are cases we haven't covered yet, but make sure you get the idea before you move on.

这是基本的程序,但生活并不是那么简单。有些情况我们还没有讲到,但要确保在你继续前进之前你已经明白了。

Actually step 3 is wrong (found that just before I posted this :-/), as it may be unnecessary to include an early difference to cover indices which are covered by later, essential differences. The first example ([1515, 1520, 1500, 1535]) falls foul of this. Because of this I've thrown it away in the section below, and expanded step 4 to deal with it.

实际上,步骤3是错误的(在我发布这篇文章之前就发现了这一点:-/),因为可能没有必要将早期的差异包含在后面的主要差异中。第一个例子([1515,1520,1500,1535])与此不符。因为这个,我把它扔到下面的部分,并展开了第4步来处理它。

So, now we get to look at the special cases:

现在我们来看看特殊情况:

  1. ** as above **
  2. * * * *之上
  3. ** as above **
  4. * * * *之上
  5. If n is 1, but the first item will provide two indices, we can't select it. We have to throw that item away and recurse. This time we're still looking for n indices, and there have been no changes to our accepted set.
  6. 如果n是1,但第一项将提供两个指标,我们不能选择它。我们必须把那个东西扔掉,然后递归。这一次,我们仍然在寻找n个指标,并且我们的被接受的集合没有变化。
  7. If n is 2 or more, we have a choice. Either we can a) choose this item, and recurse looking for n - (1 or 2) indices, or b) skip this item, and recurse looking for n indices.
  8. 如果n等于2或者更多,我们可以选择。我们可以选择这个项,然后递归寻找n -(1或2)个索引,或者b)跳过这个项,递归寻找n个索引。

4 is where it gets tricky, and where this routine turns into a search rather than just a sorting exercise. How can we decide which branch (a or b) to take? Well, we're recursive, so let's call both, and see which one is better. How will we judge them?

4是它变得棘手的地方,并且这个例程变成了一个搜索而不仅仅是一个排序练习。我们如何决定选哪一科(a或b) ?我们是递归的,我们把两者都调用,看看哪个更好。我们将如何评价他们?

  • We'll want to take whichever branch produces the lowest sum.
  • 我们要取最小的那一个。
  • ...but only if it will use up the right number of indices.
  • …但前提是它要用正确的指数。

So step 4 becomes something like this (pseudocode):

所以步骤4变成了这样(伪代码):

x       = numberOfIndicesProvidedBy(currentDifference)
branchA = findSmallestDifference (n-x, remainingDifferences) // recurse looking for **n-(1 or 2)**
branchB = findSmallestDifference (n  , remainingDifferences) // recurse looking for **n** 
sumA    = currentDifference + sumOf(branchA)
sumB    =                     sumOf(branchB) 

validA  = indicesAddedBy(branchA) == n
validB  = indicesAddedBy(branchB) == n

if not validA && not validB then return an empty branch

if validA && not validB then return branchA
if validB && not validA then return branchB

// Here, both must be valid.
if sumA <= sumB then return branchA else return branchB

I coded this up in Haskell (because I'm trying to get good at it). I'm not sure about posting the whole thing, because it might be more confusing than useful, but here's the main part:

我在Haskell中编写了这个代码(因为我正在努力使它变得更好)。我不确定是否把整个事情都发布出去,因为它可能比有用的更让人困惑,但这里有一个主要部分:

findSmallestDifference = findSmallestDifference' Set.empty

findSmallestDifference' _     _ [] = []
findSmallestDifference' taken n (d:ds)
    | n == 0                = []    -- Case 1
    | n == 1 && provides1 d = [d]   -- Case 2
    | n == 1 && provides2 d = findSmallestDifference' taken n ds -- Case 3
    | provides0 d           = findSmallestDifference' taken n ds -- Case 3a (See Edit)
    | validA && not validB             = branchA -- Case 4
    | validB && not validA             = branchB -- Case 4
    | validA && validB && sumA <= sumB = branchA -- Case 4
    | validA && validB && sumB <= sumA = branchB -- Case 4
    | otherwise             = []                 -- Case 4
        where branchA = d : findSmallestDifference' (newTaken d) (n - (provides taken d)) ds
              branchB = findSmallestDifference' taken n ds
              sumA    = sumDifferences branchA
              sumB    = sumDifferences branchB
              validA  = n == (indicesTaken branchA)
              validB  = n == (indicesTaken branchA)
              newTaken x = insertIndices x taken 

Hopefully you can see all the cases there. That code(-ish), plus some wrapper produces this:

希望你们能看到所有的例子。该代码(-ish),加上一些包装器产生的:

*Main> findLeastDiff 6 [1731, 1572, 2041, 1561, 1682, 1572, 1609, 1731]
Smallest Difference found is 48
      1572 -   1572 =      0
      1731 -   1731 =      0
      1572 -   1561 =     11
      1609 -   1572 =     37
*Main> findLeastDiff 4 [1515, 1520, 1500,1535]
Smallest Difference found is 30
      1515 -   1500 =     15
      1535 -   1520 =     15

This has become long, but I've tried to be explicit. Hopefully it was worth while.

这已经很长了,但我已经试着明确了。希望这是值得的。


Edit : There is a case 3a that can be added to avoid some unnecessary work. If the current difference provides no additional indices, it can be skipped. This is taken care of in step 4 above, but there's no point in evaluating both halves of the tree for no gain. I've added this to the Haskell.

编辑:有一个案例3a可以添加,以避免一些不必要的工作。如果当前的差异不提供额外的索引,则可以跳过它。这在上面的第4步中得到了处理,但是没有必要对树的两部分进行评估。我把这个加到Haskell中。

#10


0  

Something like

类似的

  1. Sort List
  2. 排序列表
  3. Find Duplicates
  4. 发现重复的
  5. Make the duplicates a pair
  6. 复制一对。
  7. remove duplicates from list
  8. 从列表中删除重复的
  9. break rest of list into pairs
  10. 把列表的其余部分分成两组。
  11. calculate differences of each pair
  12. 计算每一对的不同。
  13. take lowest amounts
  14. 采取最低数量

In your example you have 8 number and need the best 3 pairs. First sort the list which gives you

在你的例子中,你有8个数字,需要最好的3对。首先排序给你的列表。

1561, 1572, 1572, 1609, 1682, 1731, 1731, 2041

If you have duplicates make them a pair and remove them from the list so you have

如果你有副本,那就把它们配对并从列表中删除。

[1572, 1572] = 0
[1731, 1731] = 0
L = { 1561, 1609, 1682, 2041 }

Break the remaining list into pairs, giving you the 4 following pairs

把剩下的列表分成几组,给你下面的4对。

[1572, 1572] = 0
[1731, 1731] = 0
[1561, 1609] = 48
[1682, 2041] = 359

Then drop the amount of numbers you need to.

然后把你需要的数字放掉。

This gives you the following 3 pairs with the lowest pairs

这给了你下面的3对和最低的对。

[1572, 1572] = 0
[1731, 1731] = 0
[1561, 1609] = 48

So

所以

0 + 0 + 48 = 48

相关文章