是否可以在一个循环中计算一个阵列中另一个阵列的元素数量,反之亦然?

时间:2022-11-28 20:22:28

I have a double array x and a double array y. Both can have duplicates elements.

我有一个双数组x和一个双数组y。两者都可以有重复元素。

const double MAX = 10000.0;
const int X_LENGTH = 10000;
const int Y_LENGTH = 10000;
const double TOLERANCE = 0.01;

Random random = new Random();

double[] x = new double[X_LENGTH];
for(int i = 0; i < X_LENGTH; i++)
{
        x[i] = MAX * random.NextDouble();
}

double[] y = new double[Y_LENGTH];
for(int j = 0; j < Y_LENGTH; j++)
{
        y[j] = MAX * random.NextDouble();
}

I am trying to count how many elements in array x are found in array y within a tolerance, and how many elements in array y are found in array x within the same tolerance. Note that these numbers can be different. The simplest way to do this is with two sets of two embedded loops:

我试图计算在公差范围内在数组y中找到数组x中有多少元素,以及在相同容差内的数组x中找到数组y中有多少元素。请注意,这些数字可能不同。最简单的方法是使用两组嵌入式循环:

int x_matches = 0;
for(int i = 0; i < X_LENGTH; i++)
{
        for(int j = 0; j < Y_LENGTH; j++)
        {
                if(Math.Abs(x[i] - y[j]) <= TOLERANCE)
                {
                        x_matches++;
                        break;
                }
        }
}

int y_matches = 0;
for(int j = 0; j < Y_LENGTH; j++)
{
        for(int i = 0; i < X_LENGTH; i++)
        {
                if(Math.Abs(x[i] - y[j]) <= TOLERANCE)
                {
                        y_matches++;
                        break;
                }
        }
}

However, this code is run thousands of times and is the main bottleneck in the software. I am trying to speed it up. I have already optimized by sorting both arrays first and then asynchronously iterating through them.

但是,此代码运行数千次,是软件的主要瓶颈。我正在努力加快速度。我已经通过先对两个数组进行排序然后异步迭代它们进行了优化。

Array.Sort(x);
Array.Sort(y);

int x_matches_2 = 0;
int i2 = 0;
int j2 = 0;
while(i2 < X_LENGTH && j2 < Y_LENGTH)
{
        if(Math.Abs(x[i2] - y[j2]) <= TOLERANCE)
        {
                x_matches_2++;
                i2++;
        }
        else if(x[i2] < y[j2])
        {
                i2++;
        }
        else if(x[i2] > y[j2])
        {
                j2++;
        }
}

int y_matches_2 = 0;
int i3 = 0;
int j3 = 0;
while(i3 < X_LENGTH && j3 < Y_LENGTH)
{
        if(Math.Abs(x[i3] - y[j3]) <= TOLERANCE)
        {
                y_matches_2++;
                j3++;
        }
        else if(x[i3] < y[j3])
        {
                i3++;
        }
        else if(x[i3] > y[j3])
        {
                j3++;
        }
}

I am wondering if anybody knows of a way to merge these two loops into one and still obtain the same answer. I can only come up with this:

我想知道是否有人知道将这两个循环合并为一个并仍然获得相同答案的方法。我只能想出这个:

int x_matches_2 = 0;
int y_matches_2 = 0;
bool[] y_matched = new bool[Y_LENGTH];
for(int i = 0; i < X_LENGTH; i++)
{
        bool x_matched = false;

        for(int j = 0; j < Y_LENGTH; j++)
        {
                if(Math.Abs(x[i] - y[j]) <= TOLERANCE)
                {
                        if(!x_matched)
                        {
                                x_matches_2++;
                                x_matched = true;
                        }
                        if(!y_matched[j])
                        {
                                y_matches_2++;
                                y_matched[j] = true;
                        }
                }
        }
}

It doesn't require sorting; however, it ends up being slower because more comparisons must be done.

它不需要排序;然而,它最终会变慢,因为必须进行更多的比较。

P.S. This is an oversimplification of my actual problem, but I think the solution to this will apply to both.

附:这是对我实际问题的过度简化,但我认为这个解决方案适用于两者。

4 个解决方案

#1


1  

It is possible to have a single loop, but you will process some part of each array more than once.

可以有一个循环,但是您将多次处理每个数组的某些部分。

public static void JustDoIt(double[] x, double[] y)
{
    Array.Sort(x);
    Array.Sort(y);

    bool mustContinue = true;
    bool isXTurns;
    bool withinTolerance;

    int lastBase_x = 0;
    int lastBase_y = 0;

    int lastMatch_x = 0;
    int lastMatch_y = 0;

    int current_x = 0;
    int current_y = 0;

    int matchedCount_x = 0;
    int matchedCount_y = 0;

    double yourTolerance = 0.001;

    while (mustContinue)
    {
        isXTurns = x[current_x] <= y[current_y];

        if (isXTurns)
        {
            withinTolerance = (y[current_y] - x[current_x] <= yourTolerance);
        }
        else
        {
            withinTolerance = (x[current_x] - y[current_y] <= yourTolerance);
        }

        if (withinTolerance)
        {
            if (isXTurns)
            {
                if (current_x > lastMatch_x)
                {
                    matchedCount_x++;
                    lastMatch_x = current_x;
                }

                if (current_y > lastMatch_y)
                {
                    matchedCount_y++;
                    lastMatch_y = current_y;
                }

                if (current_y + 1 < y.Length)
                {
                    current_y++;
                }
                else if (current_x + 1 < x.Length)
                {
                    current_x++;
                }
                else
                {
                    mustContinue = false;
                }

            }
            else
            {
                if (current_y > lastMatch_y)
                {
                    matchedCount_y++;
                    lastMatch_y = current_y;
                } 

                if (current_x > lastMatch_x)
                {
                    matchedCount_x++;
                    lastMatch_x = current_x;
                }

                if (current_x + 1 < x.Length)
                {
                    current_x++;
                }
                else if (current_y + 1 < y.Length)
                {
                    current_y++;
                }
                else
                {
                    mustContinue = false;
                }
            }

        }
        else
        {
            if (isXTurns)
            {
                lastBase_x++;
                mustContinue = lastBase_x < x.Length;
            }
            else
            {
                lastBase_y++;
                mustContinue = lastBase_y < y.Length;
            }

            current_x = lastBase_x;
            current_y = lastBase_y;
        }
    }
}

Some odd results you'll get : if you have 2 arrays of 2 elements each, it's possible that you have more than 2 match from x to y or y to x. It happens cause x[0] can match with y[0] ans y[1], so can x[1]. This way, you'd end up with 4 match in both "direction". For example, when I ran this code with 2 arrays of 1000 items each, I had 1048 matches in one, and 978 in the other. I hope it helps.

你会得到一些奇怪的结果:如果你有2个元素的2个数组,那么从x到y或y到x的匹配可能超过2个。它发生的原因是x [0]可以与y [0]和y [1]匹配,因此x [1]也是如此。这样,你最终会在“方向”上进行4场比赛。例如,当我用2个1000个项目的数组运行此代码时,我在一个中有1048个匹配,在另一个中有978个匹配。我希望它有所帮助。

Edit: Here is a generic version :

编辑:这是一个通用版本:

public static void JustDoIt<T>(IEnumerable<T> items_x, IEnumerable<T> items_y, out int matchedCount_x, out int matchedCount_y, IComparer<T> comparer, Func<T, T, bool> toleranceReferee)
{

    T[] x = items_x.OrderBy(item => item, comparer).ToArray();
    T[] y = items_y.OrderBy(item => item, comparer).ToArray();

    bool mustContinue = true;
    bool isXTurns;
    bool withinTolerance;

    int lastBase_x = 0;
    int lastBase_y = 0;

    int lastMatch_x = 0;
    int lastMatch_y = 0;

    int current_x = 0;
    int current_y = 0;

    matchedCount_x = 0;
    matchedCount_y = 0;

    while (mustContinue)
    {
        isXTurns = comparer.Compare(x[current_x], y[current_y]) <= 0;

        withinTolerance = toleranceReferee(x[current_x], y[current_y]);

        if (withinTolerance)
        {
            if (isXTurns)
            {
                if (current_x > lastMatch_x)
                {
                    matchedCount_x++;
                    lastMatch_x = current_x;
                }

                if (current_y > lastMatch_y)
                {
                    matchedCount_y++;
                    lastMatch_y = current_y;
                }

                if (current_y + 1 < y.Length)
                {
                    current_y++;
                }
                else if (current_x + 1 < x.Length)
                {
                    current_x++;
                }
                else
                {
                    mustContinue = false;
                }

            }
            else
            {
                if (current_y > lastMatch_y)
                {
                    matchedCount_y++;
                    lastMatch_y = current_y;
                }

                if (current_x > lastMatch_x)
                {
                    matchedCount_x++;
                    lastMatch_x = current_x;
                }

                if (current_x + 1 < x.Length)
                {
                    current_x++;
                }
                else if (current_y + 1 < y.Length)
                {
                    current_y++;
                }
                else
                {
                    mustContinue = false;
                }
            }

        }
        else
        {
            if (isXTurns)
            {
                lastBase_x++;
                mustContinue = lastBase_x < x.Length;
            }
            else
            {
                lastBase_y++;
                mustContinue = lastBase_y < y.Length;
            }

            current_x = lastBase_x;
            current_y = lastBase_y;
        }
    }
}

With an example of how you'd call it for int :

以你如何为int调用它为例:

List<int> x2 = new List<int>() { 2, 4, 4, 6, 9, 9 };    // To test an IEnumerable
IEnumerable<int> y2 = new int[] { 1, 3, 3, 4, 6, 9 };   // To test another

int xcount;
int ycount;

SingleLoop.JustDoIt(
    x2,
    y2,
    out xcount,
    out ycount,
    Comparer<int>.Default,
   (currentX, currentY) => { return currentX == currentY; });

#2


2  

Make a HashSet<int> for both arrays and populate them with the elements of the arrays. Traverse both arrays, look up elements in corresponding (opposite) HashSet - HashSet lookup is O(1) so overall effort is O(m+n) with m,n being the array sizes.

为两个数组创建一个HashSet ,并使用数组的元素填充它们。遍历两个数组,在相应的(相反的)HashSet中查找元素 - HashSet查找是O(1)所以整体努力是O(m + n),其中m,n是数组大小。

#3


1  

Looks like it has to be O(n*m) in worst case - all elements of one array are "similar" to all elements in another array - you have to run comparison for each pair.

在最坏的情况下看起来它必须是O(n * m) - 一个数组的所有元素与另一个数组中的所有元素“相似” - 你必须为每对运行比较。

Since you already have sorting, consider dividing each array into ranges (i.e. 20 ranges with about 50 numbers in each [0, 0.05), [0.05, 0.1),..[0.95, 1]) so you can compare ranges first and than compare individual numbers - depending on the data (works good with random or other distribution without huge clumps of values) you may decrease number of comparisons significantly.

由于您已经进行了排序,请考虑将每个数组划分为范围(即20个范围,每个[0,0.05],[0.05,0.1],... [0.95,1]中约有50个数字),因此您可以先比较范围比较单个数字 - 取决于数据(适用于随机或其他分布而没有大量值的数据),您可以显着减少比较次数。

#4


0  

If your data is actually in the range [0..1] like Random.NextDouble(), then you could just make an array with 1000 (=1/TOLERANCE) bins, one for every [x..x+TOLERANCE] range in the total range. Put each value in one array into its bin. Then go over the other array and compare each value to the contents of the closest bin, the bin before and the one after.

如果您的数据实际上在[0..1]范围内,如Random.NextDouble(),那么您可以创建一个包含1000(= 1 / TOLERANCE)个数组的数组,每个[x..x + TOLERANCE]范围一个在总范围内。将每个值放在一个数组中。然后遍历另一个数组并将每个值与最近的bin,bin之前和之后的bin的内容进行比较。

Edit in response to changed question: Instead of using a flat array with Range/Tolerance elements, you could use a hash table: Use HashFunction(x/Tolerance) as hash key to the items x in the first array, then compare every element y in the second array with the elements stored for the hashes HashFunction(y/Tolerance-1), HashFunction(y/Tolerance), HashFunction(y/Tolerance+1). In theory, a hash table access is O(1), so the whole operation would be O(m+n)

编辑以响应更改的问题:您可以使用散列表而不是使用具有Range / Tolerance元素的平面数组:使用HashFunction(x / Tolerance)作为第一个数组中的项x的散列键,然后比较每个元素y在第二个数组中,为哈希HashFunction(y / Tolerance-1),HashFunction(y / Tolerance),HashFunction(y / Tolerance + 1)存储了元素。理论上,哈希表访问是O(1),因此整个操作将是O(m + n)

#1


1  

It is possible to have a single loop, but you will process some part of each array more than once.

可以有一个循环,但是您将多次处理每个数组的某些部分。

public static void JustDoIt(double[] x, double[] y)
{
    Array.Sort(x);
    Array.Sort(y);

    bool mustContinue = true;
    bool isXTurns;
    bool withinTolerance;

    int lastBase_x = 0;
    int lastBase_y = 0;

    int lastMatch_x = 0;
    int lastMatch_y = 0;

    int current_x = 0;
    int current_y = 0;

    int matchedCount_x = 0;
    int matchedCount_y = 0;

    double yourTolerance = 0.001;

    while (mustContinue)
    {
        isXTurns = x[current_x] <= y[current_y];

        if (isXTurns)
        {
            withinTolerance = (y[current_y] - x[current_x] <= yourTolerance);
        }
        else
        {
            withinTolerance = (x[current_x] - y[current_y] <= yourTolerance);
        }

        if (withinTolerance)
        {
            if (isXTurns)
            {
                if (current_x > lastMatch_x)
                {
                    matchedCount_x++;
                    lastMatch_x = current_x;
                }

                if (current_y > lastMatch_y)
                {
                    matchedCount_y++;
                    lastMatch_y = current_y;
                }

                if (current_y + 1 < y.Length)
                {
                    current_y++;
                }
                else if (current_x + 1 < x.Length)
                {
                    current_x++;
                }
                else
                {
                    mustContinue = false;
                }

            }
            else
            {
                if (current_y > lastMatch_y)
                {
                    matchedCount_y++;
                    lastMatch_y = current_y;
                } 

                if (current_x > lastMatch_x)
                {
                    matchedCount_x++;
                    lastMatch_x = current_x;
                }

                if (current_x + 1 < x.Length)
                {
                    current_x++;
                }
                else if (current_y + 1 < y.Length)
                {
                    current_y++;
                }
                else
                {
                    mustContinue = false;
                }
            }

        }
        else
        {
            if (isXTurns)
            {
                lastBase_x++;
                mustContinue = lastBase_x < x.Length;
            }
            else
            {
                lastBase_y++;
                mustContinue = lastBase_y < y.Length;
            }

            current_x = lastBase_x;
            current_y = lastBase_y;
        }
    }
}

Some odd results you'll get : if you have 2 arrays of 2 elements each, it's possible that you have more than 2 match from x to y or y to x. It happens cause x[0] can match with y[0] ans y[1], so can x[1]. This way, you'd end up with 4 match in both "direction". For example, when I ran this code with 2 arrays of 1000 items each, I had 1048 matches in one, and 978 in the other. I hope it helps.

你会得到一些奇怪的结果:如果你有2个元素的2个数组,那么从x到y或y到x的匹配可能超过2个。它发生的原因是x [0]可以与y [0]和y [1]匹配,因此x [1]也是如此。这样,你最终会在“方向”上进行4场比赛。例如,当我用2个1000个项目的数组运行此代码时,我在一个中有1048个匹配,在另一个中有978个匹配。我希望它有所帮助。

Edit: Here is a generic version :

编辑:这是一个通用版本:

public static void JustDoIt<T>(IEnumerable<T> items_x, IEnumerable<T> items_y, out int matchedCount_x, out int matchedCount_y, IComparer<T> comparer, Func<T, T, bool> toleranceReferee)
{

    T[] x = items_x.OrderBy(item => item, comparer).ToArray();
    T[] y = items_y.OrderBy(item => item, comparer).ToArray();

    bool mustContinue = true;
    bool isXTurns;
    bool withinTolerance;

    int lastBase_x = 0;
    int lastBase_y = 0;

    int lastMatch_x = 0;
    int lastMatch_y = 0;

    int current_x = 0;
    int current_y = 0;

    matchedCount_x = 0;
    matchedCount_y = 0;

    while (mustContinue)
    {
        isXTurns = comparer.Compare(x[current_x], y[current_y]) <= 0;

        withinTolerance = toleranceReferee(x[current_x], y[current_y]);

        if (withinTolerance)
        {
            if (isXTurns)
            {
                if (current_x > lastMatch_x)
                {
                    matchedCount_x++;
                    lastMatch_x = current_x;
                }

                if (current_y > lastMatch_y)
                {
                    matchedCount_y++;
                    lastMatch_y = current_y;
                }

                if (current_y + 1 < y.Length)
                {
                    current_y++;
                }
                else if (current_x + 1 < x.Length)
                {
                    current_x++;
                }
                else
                {
                    mustContinue = false;
                }

            }
            else
            {
                if (current_y > lastMatch_y)
                {
                    matchedCount_y++;
                    lastMatch_y = current_y;
                }

                if (current_x > lastMatch_x)
                {
                    matchedCount_x++;
                    lastMatch_x = current_x;
                }

                if (current_x + 1 < x.Length)
                {
                    current_x++;
                }
                else if (current_y + 1 < y.Length)
                {
                    current_y++;
                }
                else
                {
                    mustContinue = false;
                }
            }

        }
        else
        {
            if (isXTurns)
            {
                lastBase_x++;
                mustContinue = lastBase_x < x.Length;
            }
            else
            {
                lastBase_y++;
                mustContinue = lastBase_y < y.Length;
            }

            current_x = lastBase_x;
            current_y = lastBase_y;
        }
    }
}

With an example of how you'd call it for int :

以你如何为int调用它为例:

List<int> x2 = new List<int>() { 2, 4, 4, 6, 9, 9 };    // To test an IEnumerable
IEnumerable<int> y2 = new int[] { 1, 3, 3, 4, 6, 9 };   // To test another

int xcount;
int ycount;

SingleLoop.JustDoIt(
    x2,
    y2,
    out xcount,
    out ycount,
    Comparer<int>.Default,
   (currentX, currentY) => { return currentX == currentY; });

#2


2  

Make a HashSet<int> for both arrays and populate them with the elements of the arrays. Traverse both arrays, look up elements in corresponding (opposite) HashSet - HashSet lookup is O(1) so overall effort is O(m+n) with m,n being the array sizes.

为两个数组创建一个HashSet ,并使用数组的元素填充它们。遍历两个数组,在相应的(相反的)HashSet中查找元素 - HashSet查找是O(1)所以整体努力是O(m + n),其中m,n是数组大小。

#3


1  

Looks like it has to be O(n*m) in worst case - all elements of one array are "similar" to all elements in another array - you have to run comparison for each pair.

在最坏的情况下看起来它必须是O(n * m) - 一个数组的所有元素与另一个数组中的所有元素“相似” - 你必须为每对运行比较。

Since you already have sorting, consider dividing each array into ranges (i.e. 20 ranges with about 50 numbers in each [0, 0.05), [0.05, 0.1),..[0.95, 1]) so you can compare ranges first and than compare individual numbers - depending on the data (works good with random or other distribution without huge clumps of values) you may decrease number of comparisons significantly.

由于您已经进行了排序,请考虑将每个数组划分为范围(即20个范围,每个[0,0.05],[0.05,0.1],... [0.95,1]中约有50个数字),因此您可以先比较范围比较单个数字 - 取决于数据(适用于随机或其他分布而没有大量值的数据),您可以显着减少比较次数。

#4


0  

If your data is actually in the range [0..1] like Random.NextDouble(), then you could just make an array with 1000 (=1/TOLERANCE) bins, one for every [x..x+TOLERANCE] range in the total range. Put each value in one array into its bin. Then go over the other array and compare each value to the contents of the closest bin, the bin before and the one after.

如果您的数据实际上在[0..1]范围内,如Random.NextDouble(),那么您可以创建一个包含1000(= 1 / TOLERANCE)个数组的数组,每个[x..x + TOLERANCE]范围一个在总范围内。将每个值放在一个数组中。然后遍历另一个数组并将每个值与最近的bin,bin之前和之后的bin的内容进行比较。

Edit in response to changed question: Instead of using a flat array with Range/Tolerance elements, you could use a hash table: Use HashFunction(x/Tolerance) as hash key to the items x in the first array, then compare every element y in the second array with the elements stored for the hashes HashFunction(y/Tolerance-1), HashFunction(y/Tolerance), HashFunction(y/Tolerance+1). In theory, a hash table access is O(1), so the whole operation would be O(m+n)

编辑以响应更改的问题:您可以使用散列表而不是使用具有Range / Tolerance元素的平面数组:使用HashFunction(x / Tolerance)作为第一个数组中的项x的散列键,然后比较每个元素y在第二个数组中,为哈希HashFunction(y / Tolerance-1),HashFunction(y / Tolerance),HashFunction(y / Tolerance + 1)存储了元素。理论上,哈希表访问是O(1),因此整个操作将是O(m + n)