找出8位数字的和的有效方法。

时间:2021-10-12 20:13:43

I have to find the sum of the first 4 digits, the sum of the last 4 digits and compare them (of all the numbers betweem m and n). But when I submit my solution online there's a problem with the time limit.

我必须找到前4位数字的和,最后4位数字的和,并比较它们(在m和n之间的所有数字),但是当我在网上提交我的解决方案时,有一个时间限制的问题。

Here's my code:

这是我的代码:

#include <stdio.h>

int main()
{
    int M, N, res = 0, cnt, first4, second4, sum1, sum2;

    scanf("%d", &M);
    scanf("%d", &N);

    for(cnt = M; cnt <= N; cnt++)
    {
        first4 = cnt % 10000;
        sum1 = first4 % 10 + (first4 / 10) % 10 + (first4 / 100) % 10 + (first4 / 1000) % 10;
        second4 = cnt / 10000;
        sum2 = second4 % 10 + (second4 / 10) % 10 + (second4 / 100) % 10 + (second4 / 1000) % 10;

        if(sum1 == sum2)
            res++;

    }

    printf("%d", res);


    return 0;
}

I'm trying to find a more efficient way to do this.

我想找一种更有效的方法。

7 个解决方案

#1


5  

Finally, if you are still interested, there is a much faster way to do this. Your task doesn't specifically require you to calculate the sums for all the numbers, it only asks for the number of some special numbers. In such cases optimization techniques like memoization or dynamic programming come really handy.

最后,如果你仍然感兴趣的话,有一种更快的方法。你的任务并不是特别要求你计算所有数字的总数,它只要求一些特殊数字的数目。在这种情况下,像memoization或dynamic programming这样的优化技术非常有用。

In this case, when you have the first four digits of some number (let them be 1234), you calculate their sum (in this case 10) and you immediately know, what is the sum of the other four digits supposed to be.

在这种情况下,当你有一个数字的前四位数字(让它们是1234)时,你计算它们的和(在这个例子中是10),你马上就会知道,其余四位数的和是多少。

Any 4-digit number, that yields sum 10 can now be the other half to create a valid number. Therefore total number of valid numbers beginning with 1234 is exactly the number of all four digit numbers that give the sum 10.

任何4位数的数字,即10的总和可以是另外一半来创建一个有效的数字。因此,从1234开始的有效数字的总数恰好是给出10的所有四位数字的数目。

Now consider another number, say 3412. This number has also sum equal to 10, therefore any right-side that completes 1234 also completes 3412.

现在考虑另一个数字,比如3412。这个数也等于10,所以任何右边的完成1234也完成3412。

What this means is that the number of valid numbers beginning with 3412 is the same as the number of valid numbers beginning with 1234, which is in turn the same as the total number of valid numbers, where the first half yields the sum 10.

这意味着,从3412开始的有效数字的数量与从1234开始的有效数字的数量是一样的,这与有效数字的总数相同,而前一半的有效数字是10。

Therefore if we precompute for each i the number of four digit numbers that yield the sum i, we would know for each first four digits the exact number of combinations of last four digits that complete a valid number, without having to iterate over all 10000 of them.

因此,如果我们预先计算每个i的4位数字的数目,我们就会知道,对于每一个前四位数字,我们都知道最后四位数字组合的准确数字,而不需要遍历所有10000个数字。

The following implementation of this algorithm

下面是这个算法的实现。

  1. Precomputes number of different ending halves for each sum of the beginning half
  2. 对开始的每一项的前半部分,预计算不同的结束部分的数目。
  3. Splits the [M,N] interval in three subintervals, because in the first and the last beginning not every ending is possible
  4. 将[M,N]间隔分成三个子区间,因为在第一个和最后一个开始并不是每个结束都是可能的。

This algorithm runs quadratically faster than the naive implementation (for sufficiently big N-M).

这个算法的运行速度比单纯的实现快得多(足够大的N-M)。

#include <string.h>

int sum_digits(int number) {
    return number%10 + (number/10)%10 + (number/100)%10 + (number/1000)%10;
}

int count(int M, int N) {
    if (M > N) return 0;

    int ret = 0;
    int tmp = 0;

    // for each i from 0 to 36 precompute number of ways we can get this sum
    // out of a four-digit number
    int A[37];
    memset(A, 0, 37*4);
    for (int i = 0; i <= 9999; ++i) {
        ++A[sum_digits(i)];
    }

    // nearest multiple of 10000 greater than M
    int near_M = ((M+9999)/10000)*10000;

    // nearest multiple of 10000 less than N
    int near_N = (N/10000)*10000;

    // count all numbers up to first multiple of 10000
    tmp = sum_digits(M/10000);
    if (near_M <= N) {
        for (int i = M; i < near_M; ++i) {
            if (tmp == sum_digits(i % 10000)) {
                ++ret;
            }
        }
    }

    // count all numbers between the 10000 multiples, use the precomputed values
    for (int i = near_M / 10000; i < near_N / 10000; ++i) {
        ret += A[sum_digits(i)];
    }

    // count all numbers after the last multiple of 10000
    tmp = sum_digits(N / 10000);
    if (near_N >= M) {
        for (int i = near_N; i <= N; ++i) {
            if (tmp == sum_digits(i % 10000)) {
                ++ret;
            }
        }
    }

    // special case when there are no multiples of 10000 between M and N
    if (near_M > near_N) {
        for (int i = M; i <= N; ++i) {
            if (sum_digits(i / 10000) == sum_digits(i % 10000)) {
                ++ret;
            }
        }
    }

    return ret;
}

EDIT: I fixed the bugs mentioned in the comments.

编辑:我修正了评论中提到的错误。

#2


5  

I don't know if this would be significantly faster or not, but you might try breaking the number into two 4 digit numbers, then use a table lookup to get the sums. That way there's only one division operation instead of eight.

我不知道这是否会大大加快速度,但您可以尝试将数字分解为两个4位数字,然后使用一个表查找来获得总数。这样就只有一个除法运算而不是8。

You can pre-compute the table of 10000 sums so it gets compiled in so there's no runtime cost at all.

你可以预先计算10000个数字的表,这样它就会被编译,所以根本没有运行时成本。

Another slightly more complicated, but probably much faster, approach that can be used is have a table or map of 10000 elements that's the reverse of the sum lookup table where you can map the sum to the set of four digit numbers that would produce that sum. That way, when you have to find the result for a particular range 10000 number range, it's a simple lookup on the sum of the most significant four digits. For example, to find the result for the range 12340000 - 12349999, you could use a binary search on the reverse lookup table to quickly find how many numbers in the range 0 - 9999 have the sum 10 (1 + 2 + 3 + 4).

另一个稍微复杂一些,但可能更快的方法,可以使用的方法是有一个表或10000个元素的映射,这是一个和查找表的反向,你可以将这个和映射到一个四位数的集合中,这个数字会产生这个和。这样,当您需要为一个特定范围的10000个数字范围查找结果时,它是对最重要的4位数字的和的简单查找。例如,为了查找范围12340000 - 12349999的结果,您可以在反向查找表中使用一个二进制搜索,以快速发现在0 - 9999范围内有多少个数字的总和为10(1 + 2 + 3 + 4)。

Again - this reverse sum lookup table can be pre-computed and compiled in as a static array.

同样,这个反向和查找表可以作为一个静态数组预先计算和编译。

In this way, the results for complete 10000 number ranges are performed with a couple binary searches. Any partial ranges can also be handled with the reverse lookup table with slightly more complication due to having to ignore matches that are from out of the range of interest. But that complication only has to happen at most twice for your whole set of subranges.

这样,完成10000个数字范围的结果是用一对二进制搜索执行的。任何部分范围也可以用反向查找表来处理,因为必须忽略来自于兴趣范围之外的匹配。但这种复杂性只会在你的整个子程序集中发生最多两次。

This would reduce the complexity of the algorithm from O(N*N) to O(N log N) (I think).

这将减少算法从O(N*N)到O(N log N)的复杂度(我认为)。


update:

更新:

Here's some timings I got (Win32-x86, using VS 2013 (MSVC 12) with release build default options):

以下是我得到的一些时间点(Win32-x86,使用VS 2013 (MSVC 12)和版本构建默认选项):

       range    range     
       start    end        count    time
================================================

alg1(10000000, 99999999): 4379055, 1.854 seconds
alg2(10000000, 99999999): 4379055, 0.049 seconds
alg3(10000000, 99999999): 4379055, 0.001 seconds

with:

:

  • alg1() is the original code from the question
  • alg1()是问题的原始代码。
  • alg2() is my first cut suggestion (lookup precomputed sums)
  • alg2()是我的第一个削减建议(查找预先计算的和)
  • alg3() is the second suggestion (binary search lookup of sum matches using a table sorted by sums)
  • alg3()是第二个建议(对sum匹配的二进制搜索查找,使用一个按和排序的表)

I'm actually surprised at the difference between alg1() to alg2()

实际上,我对alg1()和alg2()的区别感到惊讶

#3


3  

You are going about this the wrong way. A little bit of cleverness is worth a lot of horsepower. You should not be comparing the first and last four digits of every number.

你走错路了。一点点聪明就值很多马力。你不应该比较每个数字的第一个和最后四个数字。

First - notice that the first four digits will change very slowly - so for sure you can have a loop of 10000 of the last four digits without re-computing the first sum.

首先注意,前四位数字将会非常缓慢地变化——因此,在没有重新计算第一个和的情况下,您肯定可以拥有最后4位数字的10000个循环。

Second - the sum of digits repeats itself every 9th number (until you get overflow). This is the basis of the "number is divisible by 9 if sum of digits is divisible by 9". example:

第二,数字的和会重复自己的第9个数字(直到你得到溢出)。这是“数能被9整除,如果数能被9整除”的基础。例子:

1234  - sum = 10
1234 + 9 = 1243   - sum is still 10

What this means is that the following will work pretty well (pseudo code):

这意味着下面的操作会很好(pseudo代码):

take first 4 digits of M, find sum (call it A)
find sum of last four digits of M (call it B)
subtract: C = (A - B)
If C < 9:
  D = C%9
  first valid number is [A][B+D]. Then step by 9, until...

You need to think a bit about the "until", and also about what to do when C >= 9. This means you need to find a zero in B and replace it with a 9, then repeat the above.

您需要考虑一下“until”,以及当C >= 9时应该怎么做。这意味着你需要在B中找到一个0,然后用9替换它,然后重复上面的步骤。

If you want to do nothing else, then see that you don't need to re-compute the sum of digits that did not change. In general when you add 1 to a number, the sum of digits increases by 1 (unless there is carry - then it decreases by 9; and that happens every 9th, 99th (twice -> sum drops by 18), 999th (drop by 27), etc.

如果您不想做其他事情,那么请注意,您不需要重新计算没有更改的数字的和。一般情况下,当你把1加到一个数字时,数字的总和会增加1(除非有进位),那么它就会减少9;这发生在每9、99次(两次->和18),999(下降27),等等。

I hope this helps you think about the problem differently.

我希望这能帮助你以不同的方式思考这个问题。

#4


2  

I am going to try an approach which doesn't make use of the lookup table (even though I know that the second one should be faster) to investigate how much we can speedup just optimizing calculus. This algorithm can be used where stack is an important resource... Let's work on the idea that divisions and modulus are slow, for example in cortex R4 a 32 bit division requires up to 16 loops while a multiplication can be done in a single loop, with older ARMs things can be even worse. This basic idea will try to get rid of them using digit arrays instead of integers. To keep it simple let's show an implementation using printf before a pseudo optimized version.

我将尝试一种不使用查找表的方法(尽管我知道第二种方法应该更快),以研究我们能加速多少,只需要优化计算。该算法可以在堆栈是重要资源的地方使用。让我们来研究一下,划分和模量是缓慢的,例如在皮质R4中,32位分割需要16个循环,而乘法可以在一个循环中完成,而旧的武器可能会更糟。这个基本思想将试图用数字数组代替整数来摆脱它们。为了保持简单,让我们在pseudo优化版本之前显示一个使用printf的实现。

 void main() {
   int count=0;
   int nmax;       
   char num[9]={0};
   int n;
   printf( "Insert number1 ");
   scanf( "%d", &nm );
   printf( "Insert number2 ");
   scanf( "%d", &nmax );
   while( nm <= nmax ) {
       int sumup=0, sumdown=0;
       sprintf( num, "%d", nm );
       for( n=0; n<4; n++ ) {
         sumup += num[n] -'0'; // subtracting '0' is not necessary (see below)
         sumdown += num[7-n]-'0';  // subtracting '0' is not necessary (see below)
       }
       if( sumup == sumdown ) {
       /* whatever */
          count++;
       }
       nm++;
   }
 }

You may want to check that the string is a valid number using strtol before calling the for loop and the length of the string using strlen. I set here fixed values as you required (I assume length always 8).

在调用for循环和使用strlen的字符串长度之前,您可能需要检查字符串是否是一个有效的数字。我按照您的要求设置了固定值(我假定长度总是8)。

The downside of the shown algorithm is the sprintf for any loop that may do thing worse... So we apply two major changes

显示算法的缺点是,任何可能做得更糟的循环的sprintf。所以我们应用两个主要的变化。

  1. we use [0-9] instead of ['0';'9']
  2. 我们使用[0-9]而不是['0';'9']
  3. we drop the sprintf for a faster solution which takes in account that we need to format a digit string starting from the previous number (n-1)
  4. 我们将sprintf放到一个更快的解决方案中,它考虑到我们需要从以前的数字(n-1)开始格式化一个数字字符串

Finally the pseudo optimized algorithm should look something like the one shown below in which all divisions and modules are removed (apart from the first number) and bytes are used instead of ASCII.

最后,pseudo优化算法应该看起来像下面所示的那样,所有的分区和模块都被删除(除了第一个数字之外),而用字节代替ASCII。

void pseudo_optimized() {
   int count=0;
   int nmax,nm;       
   char num[9]={0};
   int sumup=0, sumdown=0;
   int n,i;
   printf( "Insert number1 ");
   scanf( "%d", &nm );
   printf( "Insert number2 ");
   scanf( "%d", &nmax );
   n = nm;
   for( i=7; i>=0; i-- ) {
      num[i]=n%10;
      n/=10;
   }
   while( nm <= nmax ) {

       sumup = num[0] + num[1] + num[2] + num[3];
       sumdown = num[7] + num[6] + num[5] + num[4];
       if( sumup == sumdown ) {
       /* whatever */
          count++;
       }
       nm++;
       /* Following loop is a faster sprintf replacement and
        * it will exit at the first value 9 times on 10
        */
        for( i=7; i>=0; i-- ) {
         if( num[i] == 9 ) {
            num[i]=0;
         } else {
            num[i] += 1;
            break;
         }
       }
   }
}

Original algo on my vm 5.500000 s, this algo 0.950000 s tested for [00000000=>99999999] The weak point of this algorithm is that it uses sum of digits (which are not necessary and a for...loop that can be unrolled.

这个算法的缺点是它使用了数字的和(这是没有必要的,而a是……)可以展开的循环。

* update * further optimization. The sums of digits are not necessary.... thinking about it I could improve the algorithm in the following way:

*更新*进一步优化。数字的金额是没有必要....考虑到这一点,我可以通过以下方式改进算法:

int optimized() {
   int nmax=99999999,
   int nm=0;
   clock_t time1, time2;
   char num[9]={0};
   int sumup=0, sumdown=0;
   int n,i;
   int count=0;
   n = nm;
   time1 = clock();
   for( i=7; i>=0; i-- ) {
      num[i]=n%10;
      n/=10;
   }
   sumup = num[0] + num[1] + num[2] + num[3];
   sumdown = num[7] + num[6] + num[5] + num[4];
   while( nm <= nmax ) {
      if( sumup == sumdown ) {
         count++;
      }
      nm++;
      for( i=7; i>=0; i-- ) {
         if( num[i] == 9 ) {
            num[i]=0;
            if( i>3 )
               sumdown-=9;
            else
               sumup-=9;
     } else {
            num[i] += 1;
            if( i>3 )
              sumdown++;
            else
              sumup++;
            break;
         }
      }
   }
   time2 = clock();
   printf( "Final-now %d %f\n", count, ((float)time2 - (float)time1) / 1000000);
   return 0;
}

with this we arrive to 0.760000 s which is 3 times slower than the result achieved on the same machine using lookup tables.

这样,我们到达0.76万s,比使用查找表在同一台机器上实现的结果慢3倍。

* update* Optimized and unrolled:

*更新*优化和展开:

int optimized_unrolled(int nm, int nmax) {
   char num[9]={0};
   int sumup=0, sumdown=0;
   int n,i;
   int count=0;
   n = nm;
   for( i=7; i>=0; i-- ) {
      num[i]=n%10;
      n/=10;
   }
   sumup = num[0] + num[1] + num[2] + num[3];
   sumdown = num[7] + num[6] + num[5] + num[4];
   while( nm <= nmax ) {
      if( sumup == sumdown ) {
         count++;
   }
   nm++;
   if( num[7] == 9 ) {
      num[7]=0;
      if( num[6] == 9 ) {
         num[6]=0;
         if( num[5] == 9 ) {
            num[5]=0;
            if( num[4] == 9 ) {
               num[4]=0;
               sumdown=0;
               if( num[3] == 9 ) {
                  num[3]=0;
                  if( num[2] == 9 ) {
                     num[2]=0;
                     if( num[1] == 9 ) {
                        num[1]=0;
                        num[0]++;
                        sumup-=26;
                     } else {
                        num[1]++;
                        sumup-=17;
                     }
                  } else {
                     num[2]++;
                     sumup-=8;
                  }
               } else {
                  num[3]++;
                  sumup++;
               }
            } else {
               num[4]++;
               sumdown-=26;
            }
        } else {
           num[5]++;
           sumdown-=17;
        }
     } else {
        num[6]++;
        sumdown-=8;
     }
  } else {
     num[7]++;
     sumdown++;
  }
}
return count;
}

Unrolling vectors improves the speed of about 50%. The algorithm costs now 0.36000 s, by the way it makes use of the stack a bit more than the previous solution (as some 'if' statements may result in a push, so it cannot be always used). The result is comparable with Alg2@Michael Burr on the same machine, [Alg3-Alg5]@Michael Burr are a lot faster where stack isn't a concern.

未滚动的矢量提高了大约50%的速度。这个算法现在的成本是0.36000 s,顺便说一下,它比以前的解决方案使用了更多的堆栈(因为一些“if”语句可能会导致push,所以不能总是使用它)。其结果与在同一台机器上的Alg2@Michael Burr相当,[Alg3-Alg5]@Michael Burr的速度要快得多,因为堆栈不是问题。

Note all test where performed on a intel VMS. I will try to run all those algos on a ARM device if I will have time.

注意所有在intel VMS上执行的测试。如果我有时间的话,我会试着在手臂上运行所有的阿尔戈斯。

#5


1  

#include <stdio.h>

int main(){
    int M, N;

    scanf("%d", &M);
    scanf("%d", &N);

    static int table[10000] = {0,1,2,3,4,5,6,7,8,9};
    {
        register int i=0,i1,i2,i3,i4;
        for(i1=0;i1<10;++i1)
        for(i2=0;i2<10;++i2)
        for(i3=0;i3<10;++i3)
        for(i4=0;i4<10;++i4)
            table[i++]=table[i1]+table[i2]+table[i3]+table[i4];
    }
    register int cnt = M, second4 = M % 10000;
    int res = 0, first4 = M / 10000, sum1=table[first4];
    for(; cnt <= N; ++cnt){
        if(sum1 == table[second4])
            ++res;
        if(++second4>9999){
            second4 -=10000;
            if(++first4>9999)break;
            sum1 = table[first4];
        }
    }
    printf("%d", res);
    return 0;
}

#6


0  

If you know that the numbers are fixed like that, then you can you substring functions to get the components and compare them. Otherwise, your modulator operations are contributing unnecessary time.

如果你知道这些数字是固定的,那么你就可以用子串的函数来获取和比较它们。否则,您的调制器操作将导致不必要的时间。

#7


0  

i found faster algorithm:

我发现更快的算法:

#include <stdio.h>
#include <ctime>

int main()
{
    clock_t time1, time2;
    int M, N, res = 0, cnt, first4, second4, sum1, sum2,last4_ofM,first4_ofM,last4_ofN,first4_ofN,j;

    scanf("%d", &M);
    scanf("%d", &N);

    time1 = clock();


    for(cnt = M; cnt <= N; cnt++)
    {
        first4 = cnt % 10000;
        sum1 = first4 % 10 + (first4 / 10) % 10 + (first4 / 100) % 10 + (first4 / 1000) % 10;
        second4 = cnt / 10000;
        sum2 = second4 % 10 + (second4 / 10) % 10 + (second4 / 100) % 10 + (second4 / 1000) % 10;

        if(sum1 == sum2)
            res++;

    }
    time2 = clock();
    printf("%d\n", res);
    printf("first algorithm time: %f\n",((float)time2 - (float)time1) / 1000000.0F );
    res=0;

    time1 = clock();
    first4_ofM = M / 10000;
    last4_ofM = M % 10000;
    first4_ofN = N / 10000;
    last4_ofN = N % 10000;

    for(int i = first4_ofM; i <= first4_ofN; i++)
    {
        sum1 = i % 10 + (i / 10) % 10 + (i / 100) % 10 + (i / 1000) % 10;

        if ( i == first4_ofM ) 
            j = last4_ofM;
        else 
            j = 0;
        while ( j <= 9999)
        {
            sum2 = j % 10 + (j / 10) % 10 + (j / 100) % 10 + (j / 1000) % 10;
            if(sum1 == sum2)
                res++;
            if ( i == first4_ofN && j == last4_ofN ) break;
            j++;
        }

    }
    time2 = clock();
        printf("%d\n", res);
        printf("second algorithm time: %f\n",((float)time2 - (float)time1) / 1000000.0F );
    return 0;
}

i just dont need to count sum of the first four digits all the time the number in changed. I need to count it one time per 10000 iterations. In worst case output is:

我只是不需要计算所有的前四位数字之和。我需要每10000次迭代计算一次。在最坏的情况下,输出是:

10000000
99999999
4379055
first algorithm time: 5.160000
4379055
second algorithm time: 2.240000

about half the better result.

大约有一半的效果更好。

#1


5  

Finally, if you are still interested, there is a much faster way to do this. Your task doesn't specifically require you to calculate the sums for all the numbers, it only asks for the number of some special numbers. In such cases optimization techniques like memoization or dynamic programming come really handy.

最后,如果你仍然感兴趣的话,有一种更快的方法。你的任务并不是特别要求你计算所有数字的总数,它只要求一些特殊数字的数目。在这种情况下,像memoization或dynamic programming这样的优化技术非常有用。

In this case, when you have the first four digits of some number (let them be 1234), you calculate their sum (in this case 10) and you immediately know, what is the sum of the other four digits supposed to be.

在这种情况下,当你有一个数字的前四位数字(让它们是1234)时,你计算它们的和(在这个例子中是10),你马上就会知道,其余四位数的和是多少。

Any 4-digit number, that yields sum 10 can now be the other half to create a valid number. Therefore total number of valid numbers beginning with 1234 is exactly the number of all four digit numbers that give the sum 10.

任何4位数的数字,即10的总和可以是另外一半来创建一个有效的数字。因此,从1234开始的有效数字的总数恰好是给出10的所有四位数字的数目。

Now consider another number, say 3412. This number has also sum equal to 10, therefore any right-side that completes 1234 also completes 3412.

现在考虑另一个数字,比如3412。这个数也等于10,所以任何右边的完成1234也完成3412。

What this means is that the number of valid numbers beginning with 3412 is the same as the number of valid numbers beginning with 1234, which is in turn the same as the total number of valid numbers, where the first half yields the sum 10.

这意味着,从3412开始的有效数字的数量与从1234开始的有效数字的数量是一样的,这与有效数字的总数相同,而前一半的有效数字是10。

Therefore if we precompute for each i the number of four digit numbers that yield the sum i, we would know for each first four digits the exact number of combinations of last four digits that complete a valid number, without having to iterate over all 10000 of them.

因此,如果我们预先计算每个i的4位数字的数目,我们就会知道,对于每一个前四位数字,我们都知道最后四位数字组合的准确数字,而不需要遍历所有10000个数字。

The following implementation of this algorithm

下面是这个算法的实现。

  1. Precomputes number of different ending halves for each sum of the beginning half
  2. 对开始的每一项的前半部分,预计算不同的结束部分的数目。
  3. Splits the [M,N] interval in three subintervals, because in the first and the last beginning not every ending is possible
  4. 将[M,N]间隔分成三个子区间,因为在第一个和最后一个开始并不是每个结束都是可能的。

This algorithm runs quadratically faster than the naive implementation (for sufficiently big N-M).

这个算法的运行速度比单纯的实现快得多(足够大的N-M)。

#include <string.h>

int sum_digits(int number) {
    return number%10 + (number/10)%10 + (number/100)%10 + (number/1000)%10;
}

int count(int M, int N) {
    if (M > N) return 0;

    int ret = 0;
    int tmp = 0;

    // for each i from 0 to 36 precompute number of ways we can get this sum
    // out of a four-digit number
    int A[37];
    memset(A, 0, 37*4);
    for (int i = 0; i <= 9999; ++i) {
        ++A[sum_digits(i)];
    }

    // nearest multiple of 10000 greater than M
    int near_M = ((M+9999)/10000)*10000;

    // nearest multiple of 10000 less than N
    int near_N = (N/10000)*10000;

    // count all numbers up to first multiple of 10000
    tmp = sum_digits(M/10000);
    if (near_M <= N) {
        for (int i = M; i < near_M; ++i) {
            if (tmp == sum_digits(i % 10000)) {
                ++ret;
            }
        }
    }

    // count all numbers between the 10000 multiples, use the precomputed values
    for (int i = near_M / 10000; i < near_N / 10000; ++i) {
        ret += A[sum_digits(i)];
    }

    // count all numbers after the last multiple of 10000
    tmp = sum_digits(N / 10000);
    if (near_N >= M) {
        for (int i = near_N; i <= N; ++i) {
            if (tmp == sum_digits(i % 10000)) {
                ++ret;
            }
        }
    }

    // special case when there are no multiples of 10000 between M and N
    if (near_M > near_N) {
        for (int i = M; i <= N; ++i) {
            if (sum_digits(i / 10000) == sum_digits(i % 10000)) {
                ++ret;
            }
        }
    }

    return ret;
}

EDIT: I fixed the bugs mentioned in the comments.

编辑:我修正了评论中提到的错误。

#2


5  

I don't know if this would be significantly faster or not, but you might try breaking the number into two 4 digit numbers, then use a table lookup to get the sums. That way there's only one division operation instead of eight.

我不知道这是否会大大加快速度,但您可以尝试将数字分解为两个4位数字,然后使用一个表查找来获得总数。这样就只有一个除法运算而不是8。

You can pre-compute the table of 10000 sums so it gets compiled in so there's no runtime cost at all.

你可以预先计算10000个数字的表,这样它就会被编译,所以根本没有运行时成本。

Another slightly more complicated, but probably much faster, approach that can be used is have a table or map of 10000 elements that's the reverse of the sum lookup table where you can map the sum to the set of four digit numbers that would produce that sum. That way, when you have to find the result for a particular range 10000 number range, it's a simple lookup on the sum of the most significant four digits. For example, to find the result for the range 12340000 - 12349999, you could use a binary search on the reverse lookup table to quickly find how many numbers in the range 0 - 9999 have the sum 10 (1 + 2 + 3 + 4).

另一个稍微复杂一些,但可能更快的方法,可以使用的方法是有一个表或10000个元素的映射,这是一个和查找表的反向,你可以将这个和映射到一个四位数的集合中,这个数字会产生这个和。这样,当您需要为一个特定范围的10000个数字范围查找结果时,它是对最重要的4位数字的和的简单查找。例如,为了查找范围12340000 - 12349999的结果,您可以在反向查找表中使用一个二进制搜索,以快速发现在0 - 9999范围内有多少个数字的总和为10(1 + 2 + 3 + 4)。

Again - this reverse sum lookup table can be pre-computed and compiled in as a static array.

同样,这个反向和查找表可以作为一个静态数组预先计算和编译。

In this way, the results for complete 10000 number ranges are performed with a couple binary searches. Any partial ranges can also be handled with the reverse lookup table with slightly more complication due to having to ignore matches that are from out of the range of interest. But that complication only has to happen at most twice for your whole set of subranges.

这样,完成10000个数字范围的结果是用一对二进制搜索执行的。任何部分范围也可以用反向查找表来处理,因为必须忽略来自于兴趣范围之外的匹配。但这种复杂性只会在你的整个子程序集中发生最多两次。

This would reduce the complexity of the algorithm from O(N*N) to O(N log N) (I think).

这将减少算法从O(N*N)到O(N log N)的复杂度(我认为)。


update:

更新:

Here's some timings I got (Win32-x86, using VS 2013 (MSVC 12) with release build default options):

以下是我得到的一些时间点(Win32-x86,使用VS 2013 (MSVC 12)和版本构建默认选项):

       range    range     
       start    end        count    time
================================================

alg1(10000000, 99999999): 4379055, 1.854 seconds
alg2(10000000, 99999999): 4379055, 0.049 seconds
alg3(10000000, 99999999): 4379055, 0.001 seconds

with:

:

  • alg1() is the original code from the question
  • alg1()是问题的原始代码。
  • alg2() is my first cut suggestion (lookup precomputed sums)
  • alg2()是我的第一个削减建议(查找预先计算的和)
  • alg3() is the second suggestion (binary search lookup of sum matches using a table sorted by sums)
  • alg3()是第二个建议(对sum匹配的二进制搜索查找,使用一个按和排序的表)

I'm actually surprised at the difference between alg1() to alg2()

实际上,我对alg1()和alg2()的区别感到惊讶

#3


3  

You are going about this the wrong way. A little bit of cleverness is worth a lot of horsepower. You should not be comparing the first and last four digits of every number.

你走错路了。一点点聪明就值很多马力。你不应该比较每个数字的第一个和最后四个数字。

First - notice that the first four digits will change very slowly - so for sure you can have a loop of 10000 of the last four digits without re-computing the first sum.

首先注意,前四位数字将会非常缓慢地变化——因此,在没有重新计算第一个和的情况下,您肯定可以拥有最后4位数字的10000个循环。

Second - the sum of digits repeats itself every 9th number (until you get overflow). This is the basis of the "number is divisible by 9 if sum of digits is divisible by 9". example:

第二,数字的和会重复自己的第9个数字(直到你得到溢出)。这是“数能被9整除,如果数能被9整除”的基础。例子:

1234  - sum = 10
1234 + 9 = 1243   - sum is still 10

What this means is that the following will work pretty well (pseudo code):

这意味着下面的操作会很好(pseudo代码):

take first 4 digits of M, find sum (call it A)
find sum of last four digits of M (call it B)
subtract: C = (A - B)
If C < 9:
  D = C%9
  first valid number is [A][B+D]. Then step by 9, until...

You need to think a bit about the "until", and also about what to do when C >= 9. This means you need to find a zero in B and replace it with a 9, then repeat the above.

您需要考虑一下“until”,以及当C >= 9时应该怎么做。这意味着你需要在B中找到一个0,然后用9替换它,然后重复上面的步骤。

If you want to do nothing else, then see that you don't need to re-compute the sum of digits that did not change. In general when you add 1 to a number, the sum of digits increases by 1 (unless there is carry - then it decreases by 9; and that happens every 9th, 99th (twice -> sum drops by 18), 999th (drop by 27), etc.

如果您不想做其他事情,那么请注意,您不需要重新计算没有更改的数字的和。一般情况下,当你把1加到一个数字时,数字的总和会增加1(除非有进位),那么它就会减少9;这发生在每9、99次(两次->和18),999(下降27),等等。

I hope this helps you think about the problem differently.

我希望这能帮助你以不同的方式思考这个问题。

#4


2  

I am going to try an approach which doesn't make use of the lookup table (even though I know that the second one should be faster) to investigate how much we can speedup just optimizing calculus. This algorithm can be used where stack is an important resource... Let's work on the idea that divisions and modulus are slow, for example in cortex R4 a 32 bit division requires up to 16 loops while a multiplication can be done in a single loop, with older ARMs things can be even worse. This basic idea will try to get rid of them using digit arrays instead of integers. To keep it simple let's show an implementation using printf before a pseudo optimized version.

我将尝试一种不使用查找表的方法(尽管我知道第二种方法应该更快),以研究我们能加速多少,只需要优化计算。该算法可以在堆栈是重要资源的地方使用。让我们来研究一下,划分和模量是缓慢的,例如在皮质R4中,32位分割需要16个循环,而乘法可以在一个循环中完成,而旧的武器可能会更糟。这个基本思想将试图用数字数组代替整数来摆脱它们。为了保持简单,让我们在pseudo优化版本之前显示一个使用printf的实现。

 void main() {
   int count=0;
   int nmax;       
   char num[9]={0};
   int n;
   printf( "Insert number1 ");
   scanf( "%d", &nm );
   printf( "Insert number2 ");
   scanf( "%d", &nmax );
   while( nm <= nmax ) {
       int sumup=0, sumdown=0;
       sprintf( num, "%d", nm );
       for( n=0; n<4; n++ ) {
         sumup += num[n] -'0'; // subtracting '0' is not necessary (see below)
         sumdown += num[7-n]-'0';  // subtracting '0' is not necessary (see below)
       }
       if( sumup == sumdown ) {
       /* whatever */
          count++;
       }
       nm++;
   }
 }

You may want to check that the string is a valid number using strtol before calling the for loop and the length of the string using strlen. I set here fixed values as you required (I assume length always 8).

在调用for循环和使用strlen的字符串长度之前,您可能需要检查字符串是否是一个有效的数字。我按照您的要求设置了固定值(我假定长度总是8)。

The downside of the shown algorithm is the sprintf for any loop that may do thing worse... So we apply two major changes

显示算法的缺点是,任何可能做得更糟的循环的sprintf。所以我们应用两个主要的变化。

  1. we use [0-9] instead of ['0';'9']
  2. 我们使用[0-9]而不是['0';'9']
  3. we drop the sprintf for a faster solution which takes in account that we need to format a digit string starting from the previous number (n-1)
  4. 我们将sprintf放到一个更快的解决方案中,它考虑到我们需要从以前的数字(n-1)开始格式化一个数字字符串

Finally the pseudo optimized algorithm should look something like the one shown below in which all divisions and modules are removed (apart from the first number) and bytes are used instead of ASCII.

最后,pseudo优化算法应该看起来像下面所示的那样,所有的分区和模块都被删除(除了第一个数字之外),而用字节代替ASCII。

void pseudo_optimized() {
   int count=0;
   int nmax,nm;       
   char num[9]={0};
   int sumup=0, sumdown=0;
   int n,i;
   printf( "Insert number1 ");
   scanf( "%d", &nm );
   printf( "Insert number2 ");
   scanf( "%d", &nmax );
   n = nm;
   for( i=7; i>=0; i-- ) {
      num[i]=n%10;
      n/=10;
   }
   while( nm <= nmax ) {

       sumup = num[0] + num[1] + num[2] + num[3];
       sumdown = num[7] + num[6] + num[5] + num[4];
       if( sumup == sumdown ) {
       /* whatever */
          count++;
       }
       nm++;
       /* Following loop is a faster sprintf replacement and
        * it will exit at the first value 9 times on 10
        */
        for( i=7; i>=0; i-- ) {
         if( num[i] == 9 ) {
            num[i]=0;
         } else {
            num[i] += 1;
            break;
         }
       }
   }
}

Original algo on my vm 5.500000 s, this algo 0.950000 s tested for [00000000=>99999999] The weak point of this algorithm is that it uses sum of digits (which are not necessary and a for...loop that can be unrolled.

这个算法的缺点是它使用了数字的和(这是没有必要的,而a是……)可以展开的循环。

* update * further optimization. The sums of digits are not necessary.... thinking about it I could improve the algorithm in the following way:

*更新*进一步优化。数字的金额是没有必要....考虑到这一点,我可以通过以下方式改进算法:

int optimized() {
   int nmax=99999999,
   int nm=0;
   clock_t time1, time2;
   char num[9]={0};
   int sumup=0, sumdown=0;
   int n,i;
   int count=0;
   n = nm;
   time1 = clock();
   for( i=7; i>=0; i-- ) {
      num[i]=n%10;
      n/=10;
   }
   sumup = num[0] + num[1] + num[2] + num[3];
   sumdown = num[7] + num[6] + num[5] + num[4];
   while( nm <= nmax ) {
      if( sumup == sumdown ) {
         count++;
      }
      nm++;
      for( i=7; i>=0; i-- ) {
         if( num[i] == 9 ) {
            num[i]=0;
            if( i>3 )
               sumdown-=9;
            else
               sumup-=9;
     } else {
            num[i] += 1;
            if( i>3 )
              sumdown++;
            else
              sumup++;
            break;
         }
      }
   }
   time2 = clock();
   printf( "Final-now %d %f\n", count, ((float)time2 - (float)time1) / 1000000);
   return 0;
}

with this we arrive to 0.760000 s which is 3 times slower than the result achieved on the same machine using lookup tables.

这样,我们到达0.76万s,比使用查找表在同一台机器上实现的结果慢3倍。

* update* Optimized and unrolled:

*更新*优化和展开:

int optimized_unrolled(int nm, int nmax) {
   char num[9]={0};
   int sumup=0, sumdown=0;
   int n,i;
   int count=0;
   n = nm;
   for( i=7; i>=0; i-- ) {
      num[i]=n%10;
      n/=10;
   }
   sumup = num[0] + num[1] + num[2] + num[3];
   sumdown = num[7] + num[6] + num[5] + num[4];
   while( nm <= nmax ) {
      if( sumup == sumdown ) {
         count++;
   }
   nm++;
   if( num[7] == 9 ) {
      num[7]=0;
      if( num[6] == 9 ) {
         num[6]=0;
         if( num[5] == 9 ) {
            num[5]=0;
            if( num[4] == 9 ) {
               num[4]=0;
               sumdown=0;
               if( num[3] == 9 ) {
                  num[3]=0;
                  if( num[2] == 9 ) {
                     num[2]=0;
                     if( num[1] == 9 ) {
                        num[1]=0;
                        num[0]++;
                        sumup-=26;
                     } else {
                        num[1]++;
                        sumup-=17;
                     }
                  } else {
                     num[2]++;
                     sumup-=8;
                  }
               } else {
                  num[3]++;
                  sumup++;
               }
            } else {
               num[4]++;
               sumdown-=26;
            }
        } else {
           num[5]++;
           sumdown-=17;
        }
     } else {
        num[6]++;
        sumdown-=8;
     }
  } else {
     num[7]++;
     sumdown++;
  }
}
return count;
}

Unrolling vectors improves the speed of about 50%. The algorithm costs now 0.36000 s, by the way it makes use of the stack a bit more than the previous solution (as some 'if' statements may result in a push, so it cannot be always used). The result is comparable with Alg2@Michael Burr on the same machine, [Alg3-Alg5]@Michael Burr are a lot faster where stack isn't a concern.

未滚动的矢量提高了大约50%的速度。这个算法现在的成本是0.36000 s,顺便说一下,它比以前的解决方案使用了更多的堆栈(因为一些“if”语句可能会导致push,所以不能总是使用它)。其结果与在同一台机器上的Alg2@Michael Burr相当,[Alg3-Alg5]@Michael Burr的速度要快得多,因为堆栈不是问题。

Note all test where performed on a intel VMS. I will try to run all those algos on a ARM device if I will have time.

注意所有在intel VMS上执行的测试。如果我有时间的话,我会试着在手臂上运行所有的阿尔戈斯。

#5


1  

#include <stdio.h>

int main(){
    int M, N;

    scanf("%d", &M);
    scanf("%d", &N);

    static int table[10000] = {0,1,2,3,4,5,6,7,8,9};
    {
        register int i=0,i1,i2,i3,i4;
        for(i1=0;i1<10;++i1)
        for(i2=0;i2<10;++i2)
        for(i3=0;i3<10;++i3)
        for(i4=0;i4<10;++i4)
            table[i++]=table[i1]+table[i2]+table[i3]+table[i4];
    }
    register int cnt = M, second4 = M % 10000;
    int res = 0, first4 = M / 10000, sum1=table[first4];
    for(; cnt <= N; ++cnt){
        if(sum1 == table[second4])
            ++res;
        if(++second4>9999){
            second4 -=10000;
            if(++first4>9999)break;
            sum1 = table[first4];
        }
    }
    printf("%d", res);
    return 0;
}

#6


0  

If you know that the numbers are fixed like that, then you can you substring functions to get the components and compare them. Otherwise, your modulator operations are contributing unnecessary time.

如果你知道这些数字是固定的,那么你就可以用子串的函数来获取和比较它们。否则,您的调制器操作将导致不必要的时间。

#7


0  

i found faster algorithm:

我发现更快的算法:

#include <stdio.h>
#include <ctime>

int main()
{
    clock_t time1, time2;
    int M, N, res = 0, cnt, first4, second4, sum1, sum2,last4_ofM,first4_ofM,last4_ofN,first4_ofN,j;

    scanf("%d", &M);
    scanf("%d", &N);

    time1 = clock();


    for(cnt = M; cnt <= N; cnt++)
    {
        first4 = cnt % 10000;
        sum1 = first4 % 10 + (first4 / 10) % 10 + (first4 / 100) % 10 + (first4 / 1000) % 10;
        second4 = cnt / 10000;
        sum2 = second4 % 10 + (second4 / 10) % 10 + (second4 / 100) % 10 + (second4 / 1000) % 10;

        if(sum1 == sum2)
            res++;

    }
    time2 = clock();
    printf("%d\n", res);
    printf("first algorithm time: %f\n",((float)time2 - (float)time1) / 1000000.0F );
    res=0;

    time1 = clock();
    first4_ofM = M / 10000;
    last4_ofM = M % 10000;
    first4_ofN = N / 10000;
    last4_ofN = N % 10000;

    for(int i = first4_ofM; i <= first4_ofN; i++)
    {
        sum1 = i % 10 + (i / 10) % 10 + (i / 100) % 10 + (i / 1000) % 10;

        if ( i == first4_ofM ) 
            j = last4_ofM;
        else 
            j = 0;
        while ( j <= 9999)
        {
            sum2 = j % 10 + (j / 10) % 10 + (j / 100) % 10 + (j / 1000) % 10;
            if(sum1 == sum2)
                res++;
            if ( i == first4_ofN && j == last4_ofN ) break;
            j++;
        }

    }
    time2 = clock();
        printf("%d\n", res);
        printf("second algorithm time: %f\n",((float)time2 - (float)time1) / 1000000.0F );
    return 0;
}

i just dont need to count sum of the first four digits all the time the number in changed. I need to count it one time per 10000 iterations. In worst case output is:

我只是不需要计算所有的前四位数字之和。我需要每10000次迭代计算一次。在最坏的情况下,输出是:

10000000
99999999
4379055
first algorithm time: 5.160000
4379055
second algorithm time: 2.240000

about half the better result.

大约有一半的效果更好。