在有序序列中获得第一个缺失元素的有效方法?

时间:2022-09-01 20:59:57

I have an ordered sequence like {1, 3, 5, 6, 8, 9} I want to get first missing element(2 in the example) or max() if sequence contains no missing elements. Now I'm doing it like this:

我有一个有序的序列,如{1,3,5,6,8,9}我想得到第一个缺少元素(示例中为2)或max()如果序列不包含缺失元素。现在我这样做:

public static int GetRegisterNumber<T>(this IQueryable<T> enumerable, Func<T, bool> whereFunc, Func<T, int?> selectFunc)
{
    var regNums = enumerable.OrderBy(selectFunc).Where(whereFunc).ToArray();

    if (regNums.Count() == 0)
    {
        return 1;
    }

    for (int i = 0; i < regNums.Count(); i++)
    {
        if (i + 1 != regNums[i])
        {
            return regNums[i].Value + 1;
        }
    }

    return regNums.Last().Value + 1;
}

But i think there are much faster methods. Any suggestions?

但我认为有更快的方法。有什么建议?

8 个解决方案

#1


Edit: I just noticed that enumerable is IQueryable<T> but selectFunc and whereFunc are of type Func<T, _>. This will cause the Enumerable versions of OrderBy and Where to be called, rather than using database calls. You probably want to switch them to Expression<Func<T, _>> instead.

编辑:我刚刚注意到可枚举是IQueryable 但selectFunc和whereFunc的类型为Func 。这将导致OrderBy的可枚举版本和Where调用,而不是使用数据库调用。您可能希望将它们切换为Expression >。 ,_>

If you don't want to order regNums first, here's a O(n) golf-style solution:

如果您不想先订购regNums,这里有一个O(n)高尔夫风格的解决方案:

var max = regNums.Max(i => (int?)i) ?? 0;
return Enumerable.Range(1, max + 1)
                 .Except(regNums)
                 .Min();

By line:

  1. By casting to int?, Max will return null if regNums is empty, coalesced to 0.

    通过强制转换为int ?,如果regNums为空,则Max将返回null,并且合并为0。

  2. Build a sequence of all possible registers, including our next value if full.

    构建所有可能寄存器的序列,包括我们的下一个值(如果已满)。

  3. Subtract the current set of registers.

    减去当前的寄存器组。

  4. Pick the lowest.

    挑选最低价。

#2


I'd probably look at something like below; the Where can be done outside (as can the selector to be honest):

我可能会看下面的东西;在外面可以做到的地方(诚实的选择者也是如此):

If you expect to start at 1:

如果您希望从1开始:

public static int GetRegisterNumber<T>(this IEnumerable<T> enumerable,
    Func<T, int> selectFunc)
{
    int expected = 1;
    foreach (T item in enumerable) {
        if (selectFunc(item) != expected) return expected;
        expected++;
    }
    return expected;
}

To start with the first item in the list:

要从列表中的第一项开始:

public static int GetRegisterNumber<T>(this IEnumerable<T> enumerable,
    Func<T, int> selectFunc)
{
    bool first = true;
    int prev = -1;
    foreach (T item in enumerable)
    {
        int val = selectFunc(item);
        if(first) {
            prev = val;
            first = false;
        } else if (val != prev + 1) {
            return prev + 1;
        }
        prev = val;
    }
    return first ? 1 : prev + 1;
}

It isn't clear how you wanted to handle nulls, so I didn't. Note that this only iterates once, and doesn't buffer everything.

目前尚不清楚你想如何处理空值,所以我没有。请注意,这只迭代一次,并不缓冲所有内容。

#3


Assuming that your OrderBy and Where have already been applied:

假设您的OrderBy和Where已经应用:

int firstMissing = collection.TakeWhile((x, i) => x == ++i).LastOrDefault() + 1;

#4


Suggestion: run your code through a profiler. Then you'll know where it is slow. Intuitively, the OrderBy is the slowest thing in this program. But intuitions about the slowest thing are often very, very wrong. Use a profiler.

建议:通过分析器运行代码。然后你会知道它在哪里慢。直观地说,OrderBy是这个程序中最慢的东西。但关于最慢的东西的直觉往往是非常非常错误的。使用分析器。

Of course, you should also eliminate the massive inefficiencies in this program. Remember, Count() counts the sequence by re-enumerating it. Count() does not know that you have not changed the sequence since the last time you counted it! You probably want to store the count rather than recalculating it every time, or use Length, since you have an array.

当然,您还应该消除此计划中的大量低效率。请记住,Count()通过重新枚举它来计算序列。自上次计数以来,Count()不知道您没有更改序列!您可能希望存储计数而不是每次重新计算它,或者使用Length,因为您有一个数组。

#5


Why not do something like a binary search?

为什么不做二分搜索?

Say you have a list 10 elements long. Read the first element. Then read the fifth element. If the fifth element isn't first element + 4 then you know there's a missing number, otherwise you know there isn't. Then just iterate like this until you find the first missing element, or reach the end of the list.

假设你有10个元素的列表。阅读第一个元素。然后阅读第五个元素。如果第五个元素不是第一个元素+4那么你知道有一个缺失的数字,否则你知道没有。然后像这样迭代,直到找到第一个缺失的元素,或者到达列表的末尾。

This of course assumes you know the size (which wasn't explicitly mentioned in the question), but you've converted to an array, so you should know.

这当然假设您知道大小(问题中未明确提及),但您已转换为数组,因此您应该知道。

O(log N) instead of O(n)

O(log N)代替O(n)

#6


Assuming that the incoming sequence of values is already sorted, how about:

假设传入的值序列已经排序,那么:

var upperBoundValue = values.Last() + 1;
var firstMissingItem = Enumerable.Range(1, upperBoundValue).Except(values).First();

If you are performing this operation iteratively, you can optimize the process by storing an index to the last place that you were in the sequence where you found a gap, and start the next iteration from there.

如果您正在迭代地执行此操作,则可以通过将索引存储到您在序列中找到间隙的最后位置来优化过程,并从那里开始下一次迭代。

#7


As mentioned, first use a profiler to find where it is slow. If the sequence is really big, and the ordering is slow, you can use radix sort, that is O(kn), where k is the max number of digits, and n is the number of elements in the sequence. Sorting algorithms based on comparison are usually O(n logn).

如上所述,首先使用分析器来查找它的慢速位置。如果序列非常大,并且排序很慢,则可以使用基数排序,即O(kn),其中k是最大位数,n是序列中元素的数量。基于比较的排序算法通常为O(n logn)。

This way the whole algorithm will be O(kn), which depending on n, is asymptotically faster, and consequently more scalable.

这样,整个算法将是O(kn),其取决于n,渐近地更快,并且因此更具可伸缩性。

#8


You put a LinqToSql tag in your question. I presume you are looking for a "first available" id, so that you can create a new record using this id. Consider turning IDENTITY on in the database instead.

您在问题中添加了LinqToSql标记。我认为您正在寻找“第一个可用”ID,以便您可以使用此ID创建新记录。请考虑改为在数据库中打开IDENTITY。

#1


Edit: I just noticed that enumerable is IQueryable<T> but selectFunc and whereFunc are of type Func<T, _>. This will cause the Enumerable versions of OrderBy and Where to be called, rather than using database calls. You probably want to switch them to Expression<Func<T, _>> instead.

编辑:我刚刚注意到可枚举是IQueryable 但selectFunc和whereFunc的类型为Func 。这将导致OrderBy的可枚举版本和Where调用,而不是使用数据库调用。您可能希望将它们切换为Expression >。 ,_>

If you don't want to order regNums first, here's a O(n) golf-style solution:

如果您不想先订购regNums,这里有一个O(n)高尔夫风格的解决方案:

var max = regNums.Max(i => (int?)i) ?? 0;
return Enumerable.Range(1, max + 1)
                 .Except(regNums)
                 .Min();

By line:

  1. By casting to int?, Max will return null if regNums is empty, coalesced to 0.

    通过强制转换为int ?,如果regNums为空,则Max将返回null,并且合并为0。

  2. Build a sequence of all possible registers, including our next value if full.

    构建所有可能寄存器的序列,包括我们的下一个值(如果已满)。

  3. Subtract the current set of registers.

    减去当前的寄存器组。

  4. Pick the lowest.

    挑选最低价。

#2


I'd probably look at something like below; the Where can be done outside (as can the selector to be honest):

我可能会看下面的东西;在外面可以做到的地方(诚实的选择者也是如此):

If you expect to start at 1:

如果您希望从1开始:

public static int GetRegisterNumber<T>(this IEnumerable<T> enumerable,
    Func<T, int> selectFunc)
{
    int expected = 1;
    foreach (T item in enumerable) {
        if (selectFunc(item) != expected) return expected;
        expected++;
    }
    return expected;
}

To start with the first item in the list:

要从列表中的第一项开始:

public static int GetRegisterNumber<T>(this IEnumerable<T> enumerable,
    Func<T, int> selectFunc)
{
    bool first = true;
    int prev = -1;
    foreach (T item in enumerable)
    {
        int val = selectFunc(item);
        if(first) {
            prev = val;
            first = false;
        } else if (val != prev + 1) {
            return prev + 1;
        }
        prev = val;
    }
    return first ? 1 : prev + 1;
}

It isn't clear how you wanted to handle nulls, so I didn't. Note that this only iterates once, and doesn't buffer everything.

目前尚不清楚你想如何处理空值,所以我没有。请注意,这只迭代一次,并不缓冲所有内容。

#3


Assuming that your OrderBy and Where have already been applied:

假设您的OrderBy和Where已经应用:

int firstMissing = collection.TakeWhile((x, i) => x == ++i).LastOrDefault() + 1;

#4


Suggestion: run your code through a profiler. Then you'll know where it is slow. Intuitively, the OrderBy is the slowest thing in this program. But intuitions about the slowest thing are often very, very wrong. Use a profiler.

建议:通过分析器运行代码。然后你会知道它在哪里慢。直观地说,OrderBy是这个程序中最慢的东西。但关于最慢的东西的直觉往往是非常非常错误的。使用分析器。

Of course, you should also eliminate the massive inefficiencies in this program. Remember, Count() counts the sequence by re-enumerating it. Count() does not know that you have not changed the sequence since the last time you counted it! You probably want to store the count rather than recalculating it every time, or use Length, since you have an array.

当然,您还应该消除此计划中的大量低效率。请记住,Count()通过重新枚举它来计算序列。自上次计数以来,Count()不知道您没有更改序列!您可能希望存储计数而不是每次重新计算它,或者使用Length,因为您有一个数组。

#5


Why not do something like a binary search?

为什么不做二分搜索?

Say you have a list 10 elements long. Read the first element. Then read the fifth element. If the fifth element isn't first element + 4 then you know there's a missing number, otherwise you know there isn't. Then just iterate like this until you find the first missing element, or reach the end of the list.

假设你有10个元素的列表。阅读第一个元素。然后阅读第五个元素。如果第五个元素不是第一个元素+4那么你知道有一个缺失的数字,否则你知道没有。然后像这样迭代,直到找到第一个缺失的元素,或者到达列表的末尾。

This of course assumes you know the size (which wasn't explicitly mentioned in the question), but you've converted to an array, so you should know.

这当然假设您知道大小(问题中未明确提及),但您已转换为数组,因此您应该知道。

O(log N) instead of O(n)

O(log N)代替O(n)

#6


Assuming that the incoming sequence of values is already sorted, how about:

假设传入的值序列已经排序,那么:

var upperBoundValue = values.Last() + 1;
var firstMissingItem = Enumerable.Range(1, upperBoundValue).Except(values).First();

If you are performing this operation iteratively, you can optimize the process by storing an index to the last place that you were in the sequence where you found a gap, and start the next iteration from there.

如果您正在迭代地执行此操作,则可以通过将索引存储到您在序列中找到间隙的最后位置来优化过程,并从那里开始下一次迭代。

#7


As mentioned, first use a profiler to find where it is slow. If the sequence is really big, and the ordering is slow, you can use radix sort, that is O(kn), where k is the max number of digits, and n is the number of elements in the sequence. Sorting algorithms based on comparison are usually O(n logn).

如上所述,首先使用分析器来查找它的慢速位置。如果序列非常大,并且排序很慢,则可以使用基数排序,即O(kn),其中k是最大位数,n是序列中元素的数量。基于比较的排序算法通常为O(n logn)。

This way the whole algorithm will be O(kn), which depending on n, is asymptotically faster, and consequently more scalable.

这样,整个算法将是O(kn),其取决于n,渐近地更快,并且因此更具可伸缩性。

#8


You put a LinqToSql tag in your question. I presume you are looking for a "first available" id, so that you can create a new record using this id. Consider turning IDENTITY on in the database instead.

您在问题中添加了LinqToSql标记。我认为您正在寻找“第一个可用”ID,以便您可以使用此ID创建新记录。请考虑改为在数据库中打开IDENTITY。