当“WHERE IN()”具有数千个值时,如何使用Nhibernate检索数据? (sql中的参数太多)

时间:2022-10-30 23:32:07

The problem: Nhibernate parses each value in the "WHERE IN()" sql as parameters and MS SQL server doesn't support enough parameters (over 2000).

问题:Nhibernate解析“WHERE IN()”sql中的每个值作为参数,MS SQL服务器不支持足够的参数(超过2000)。

I am using Nhibernate with Linq to retrive my data from the SQL server and I need to load alot of entities based on already known ID's.

我正在使用Nhibernate和Linq从SQL服务器中检索我的数据,我需要根据已知的ID加载很多实体。

My code looks something like this:

我的代码看起来像这样:

int[] knownIds = GetIDsFromFile();
var loadedEntities = _Repository.GetAll()
                                .Where(x => knownIds.Contains(x.ID))
                                .ToList();

Which give an sql like this:

哪个给出这样的sql:

SELECT id, name FROM MyTable 
WHERE id IN (1 /* @p0 */,2 /* @p1 */,3 /* @p2 */,4 /* @p3 */, 5 /* @p4 */)

If there is too many values in knownIds, then this code will throw an Exception because of the many parameters that NHibernate uses.

如果knownIds中的值太多,那么由于NHibernate使用的许多参数,此代码将抛出异常。

I think the best solution would be if I could make NHibernate use only 1 parameter for the whole "WHERE IN()", but I don't know how to do this:

我认为最好的解决方案是如果我能让NHibernate只为整个“WHERE IN()”使用1个参数,但我不知道如何做到这一点:

SELECT id, name FROM MyTable WHERE id IN (1, 2, 3, 4, 5 /* @p0 */)

I'll be glad to hear any ideas of how to solve this - either by extending the LINQ provider or by other means. One solution is to simple do the query x times (knownIds.Count / 1000), but I rather want a generic solution which would work for all my entities.

我很高兴听到有关如何解决这个问题的任何想法 - 通过扩展LINQ提供程序或通过其他方式。一个解决方案是简单地执行查询x次(knownIds.Count / 1000),但我更想要一个适用于我所有实体的通用解决方案。

I have tried looking at extending the LINQ provider by searching google and *, however I can't find a solution and I dont have any experience with either HQL or the treebuilder. Here are a few of the sites I have been at:

我试图通过搜索谷歌和*来扩展LINQ提供程序,但我找不到解决方案,我没有任何经验与HQL或树建设者。这里有一些我去过的网站:

UPDATE: I know it ain't good practice by having so many values in the IN clause, but I don't know a better solution for what I want to do.
Consider a company where all the customers pay for the company's services one time each month. The company don't handle the payments itself, but have another company to collect the money. One time each month the company receives a file containing the status of these payments: if they have been paid or not. The file only contains the ID of the specific payment, and not the ID of the customer. A company with 3000 monthly customers, will then make 3000 LogPayments each month, where the status need to be updated. After 1 year there will be around 36.000 LogPayments, so just loading them all doesn't seem like a good solution either.

更新:我知道在IN子句中有这么多的值是不好的做法,但我不知道我想要做的更好的解决方案。考虑一家公司,每个月,所有客户都会为公司的服务付费一次。该公司本身不处理付款,但有另一家公司来收钱。公司每月一次收到一份文件,其中包含这些付款的状态:是否已付款。该文件仅包含特定付款的ID,而不包含客户的ID。一家拥有3000个月客户的公司,每个月将进行3000次LogPayments,其状态需要更新。 1年后将会有大约36,000个LogPayments,所以加载它们似乎也不是一个好的解决方案。

MY SOLUTION: Thanks for all the usefull answers. In the end I choosed to use a combination of the answers. For this specific case I did something like Fourth suggested, as that would increase performance a great deal. However I have allso implemented the generic method Stefan Steinegger suggested, because I like that I can do this, if that is what I really want. Besides, I don't want my program to crash with an exception, so in the future I will allso use this ContainsAlot-method as a safeguard.

我的解决方案:感谢所有有用的答案。最后,我选择使用答案的组合。对于这个特定的情况,我做了像Fourth建议的事情,因为这会大大提高性能。但是我已经实现了Stefan Steinegger建议的通用方法,因为我喜欢我可以做到这一点,如果那是我真正想要的。此外,我不希望我的程序因异常而崩溃,所以将来我也会使用这个ContainsAlot方法作为安全措施。

7 个解决方案

#1


12  

See this similar question: NHibernate Restrictions.In with hundreds of value

看到这个类似的问题:NHibernate Restrictions.In有数百个值

I usually set up several queries, which all get for instance 1000 entries. Just split you array of ids into several pieces.

我通常会设置几个查询,例如1000个条目。只需将你的数组分成几个部分。

Something like this:

像这样的东西:

// only flush the session once. I have a using syntax to disable
// autoflush within a limited scope (without direct access to the
// session from the business logic)
session.Flush();
session.FlushMode = FlushMode.Never;

for (int i = 0; i < knownIds; i += 1000)
{
  var page = knownIds.Skip(i).Take(1000).ToArray();
  loadedEntities.AddRange(
    Repository.GetAll()
      .Where(x => page.Contains(x.ID)));
}

session.FlushMode = FlushMode.Auto;

Generic implementation using criteria (only filtering a single property, which is a common case):

使用标准的通用实现(仅过滤单个属性,这是常见情况):

public IList<T> GetMany<TEntity, TProp>(
  Expression<Func<TEntity, TProp>> property,
  IEnumerable<TProp> values)
{
    string propertyName = ((System.Linq.Expressions.MemberExpression)property.Body).Member.Name;

    List<T> loadedEntities = new List<T>();

    // only flush the session once. 
    session.Flush();
    var previousFlushMode = session.FlushMode;
    session.FlushMode = FlushMode.Never;

    for (int i = 0; i < knownIds; i += 1000)
    {
      var page = knownIds.Skip(i).Take(1000).ToArray();

      loadedEntities.AddRange(session
        .CreateCriteria(typeof(T))
        .Add(Restriction.PropertyIn(propertyName, page)
        .List<TEntity>();
    }

    session.FlushMode = previousFlushMode;
    return loadedEntities;
}

To be used like this:

要像这样使用:

int[] ids = new [] {1, 2, 3, 4, 5 ....};
var entities = GetMany((MyEntity x) => x.Id, ids);

string[] names = new [] {"A", "B", "C", "D" ... };
var users = GetMany((User x) => x.Name, names);

#2


2  

http://ayende.com/blog/2583/nhibernates-xml-in has a possible solution, passing the parameters as XML (unfortunately most of the links in the page are broken..)

http://ayende.com/blog/2583/nhibernates-xml-in有一个可能的解决方案,将参数作为XML传递(遗憾的是,页面中的大多数链接都被破坏了..)

#3


2  

WHERE IN should not be the norm and should be used only in specific, and limited, cases. If you find yourself using it a lot, it probably indicates a problem with your data model. What I would probably do in your case is get ALL the entities from the database in a lazy-load, and then as I iterate through the IDs that I have, pull them out of the entity collection. This way the performance hit is distributed across many queries and you dont hit the WHERE IN threshold.

IN在哪里不应该是常态,只应在特定的和有限的情况下使用。如果您发现自己经常使用它,则可能表明您的数据模型存在问题。在你的情况下我可能会做的是在延迟加载中从数据库中获取所有实体,然后当我遍历我拥有的ID时,将它们从实体集合中拉出来。这样,性能命中分布在许多查询中,并且您不会达到WHERE IN阈值。

Just to note, if the IDs will represent most of the entities rather than a small subset (that is you know you are going to wind up getting them all, or most, anyway) then dont lazy-load.

需要注意的是,如果ID代表大多数实体而不是一个小子集(你知道你最终会得到它们全部或大部分),那么就不要懒惰。

Edit based on your update

根据您的更新进行修改

If you are talking about 36,000 records after 1 year, BUT you are only dealing with loads in the recent time, then eager load the recent records you care about. I would do something like: create a criteria to load the records for the past... month? Then I will have all the records I might need, match them up to the IDs in the file via code, and bingo-bango-bongo.

如果你在1年后谈论36,000条记录,但是你只是在最近的时间处理负载,然后急切地加载你关心的最近记录。我会做一些事情:创建一个标准来加载过去......月的记录?然后我将获得我可能需要的所有记录,通过代码将它们与文件中的ID匹配,以及bingo-bango-bongo。

The table will definitely grow in size over time so it doesn't make sense to always pull back everything, but if you have a way of saying "i only care about these records" then SQL can do that constraint for you.

这个表肯定会随着时间的推移而变大,所以总是把所有内容都拉下来是没有意义的,但如果你有办法说“我只关心这些记录”,那么SQL可以为你做这个约束。

#4


1  

The only place I've seen code like that before where the Ids extend into the thousands, was where that list of IDs had just been loaded from the database as a separate query. It should instead by created as a DetachedCriteria, and then consumed using a Subqueries.PropertyNotIn or PropertyIn criteria query (rather than LINQ).

在Ids扩展到数千个之前,我见过的唯一代码就是那个ID列表刚从数据库中作为单独的查询加载的地方。它应该被创建为DetachedCriteria,然后使用Subqueries.PropertyNotIn或PropertyIn条件查询(而不是LINQ)来使用。


Another way to look at this sort of thing is - 2100 parameters feels like an arbitrary limit. I'm sure SQL Server could be modified to accept more parameters (but I'm sure a Connect request would get closed almost immediately), or you can use workarounds (such as sending XML, or pre-populating a table) to pass this many parameters. But if you're hitting this limit, shouldn't you step back and consider that there's something else broken in what you're doing?

另一种看待这种事情的方法是 - 2100参数感觉就像一个任意限制。我确信可以修改SQL Server以接受更多参数(但我确定Connect请求几乎立即关闭),或者您可以使用变通方法(例如发送XML或预先填充表)来传递很多参数。但是,如果你达到这个限制,你不应该退后一步,并认为你正在做的事情还有其他一些问题吗?

#5


0  

You can't make the IN list only one parameter (e.g. an array), because that is not supported by SQL. The only way I know of to have more than 1000 elements in the IN list is to put a sub query there.
Having said that, one workaround would be to put the known IDs into a temporary table and change your NHibernate statement to use this table so it will result in a subquery in the SQL statement.

您不能使IN列表只有一个参数(例如数组),因为SQL不支持它。我知道在IN列表中有超过1000个元素的唯一方法是在那里放置一个子查询。话虽如此,一种解决方法是将已知的ID放入临时表并更改您的NHibernate语句以使用此表,这样它将在SQL语句中生成子查询。

#6


0  

I would definitely recomend using temporary table for this kind of data.

我肯定会建议使用临时表来获取这种数据。

You will get ability to check if the parameters are correct, by querying this temporary table. And you can have foreign key constraints, so you are baked up before bad IDs. And you can have history of them in your database.

通过查询此临时表,您将能够检查参数是否正确。并且您可以拥有外键约束,因此您可以在错误的ID之前进行烘焙。您可以在数据库中记录它们的历史记录。

#7


0  

I faced the same problem in Oracle which doesn't allow more than 1000 elements inside IN condition as well. The error is: "ORA-01795: maximum number of expressions in a list is 1000". Here is my solution:

我在Oracle中遇到了同样的问题,它在IN条件下也不允许超过1000个元素。错误是:“ORA-01795:列表中的最大表达式数为1000”。这是我的解决方案:

    //partition an IEnumerable into fixed size IEnumerables
    public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> source, int partitionSize)
    {
        return source
            .Select((value, index) => new { Index = index, Value = value })
            .GroupBy(i => i.Index / partitionSize)
            .Select(i => i.Select(i2 => i2.Value));
    }

    public IEnumerable<T> Get(List<long> listOfIDs)
    {
        var partitionedList = listOfIDs.Partition(1000).ToList();
        List<ICriterion> criterions = new List<ICriterion>();
        foreach (var ids in partitionedList)
        {
            criterions.Add(Restrictions.In("Id", ids.ToArray()));
        }
        var criterion = criterions.Aggregate(Restrictions.Or);
        var criteria = session.CreateCriteria<T>().Add(criterion);
        return criteria.Future<T>();
    }

First part is an extension method to IEnumerable, to partition a big list into fixed size lists. Second part uses NHibernate Criterions to dynamically generate multiple IN conditions to be later joined with OR conditions.

第一部分是IEnumerable的扩展方法,用于将大型列表划分为固定大小的列表。第二部分使用NHibernate Criterions动态生成多个IN条件,以便稍后与OR条件连接。

#1


12  

See this similar question: NHibernate Restrictions.In with hundreds of value

看到这个类似的问题:NHibernate Restrictions.In有数百个值

I usually set up several queries, which all get for instance 1000 entries. Just split you array of ids into several pieces.

我通常会设置几个查询,例如1000个条目。只需将你的数组分成几个部分。

Something like this:

像这样的东西:

// only flush the session once. I have a using syntax to disable
// autoflush within a limited scope (without direct access to the
// session from the business logic)
session.Flush();
session.FlushMode = FlushMode.Never;

for (int i = 0; i < knownIds; i += 1000)
{
  var page = knownIds.Skip(i).Take(1000).ToArray();
  loadedEntities.AddRange(
    Repository.GetAll()
      .Where(x => page.Contains(x.ID)));
}

session.FlushMode = FlushMode.Auto;

Generic implementation using criteria (only filtering a single property, which is a common case):

使用标准的通用实现(仅过滤单个属性,这是常见情况):

public IList<T> GetMany<TEntity, TProp>(
  Expression<Func<TEntity, TProp>> property,
  IEnumerable<TProp> values)
{
    string propertyName = ((System.Linq.Expressions.MemberExpression)property.Body).Member.Name;

    List<T> loadedEntities = new List<T>();

    // only flush the session once. 
    session.Flush();
    var previousFlushMode = session.FlushMode;
    session.FlushMode = FlushMode.Never;

    for (int i = 0; i < knownIds; i += 1000)
    {
      var page = knownIds.Skip(i).Take(1000).ToArray();

      loadedEntities.AddRange(session
        .CreateCriteria(typeof(T))
        .Add(Restriction.PropertyIn(propertyName, page)
        .List<TEntity>();
    }

    session.FlushMode = previousFlushMode;
    return loadedEntities;
}

To be used like this:

要像这样使用:

int[] ids = new [] {1, 2, 3, 4, 5 ....};
var entities = GetMany((MyEntity x) => x.Id, ids);

string[] names = new [] {"A", "B", "C", "D" ... };
var users = GetMany((User x) => x.Name, names);

#2


2  

http://ayende.com/blog/2583/nhibernates-xml-in has a possible solution, passing the parameters as XML (unfortunately most of the links in the page are broken..)

http://ayende.com/blog/2583/nhibernates-xml-in有一个可能的解决方案,将参数作为XML传递(遗憾的是,页面中的大多数链接都被破坏了..)

#3


2  

WHERE IN should not be the norm and should be used only in specific, and limited, cases. If you find yourself using it a lot, it probably indicates a problem with your data model. What I would probably do in your case is get ALL the entities from the database in a lazy-load, and then as I iterate through the IDs that I have, pull them out of the entity collection. This way the performance hit is distributed across many queries and you dont hit the WHERE IN threshold.

IN在哪里不应该是常态,只应在特定的和有限的情况下使用。如果您发现自己经常使用它,则可能表明您的数据模型存在问题。在你的情况下我可能会做的是在延迟加载中从数据库中获取所有实体,然后当我遍历我拥有的ID时,将它们从实体集合中拉出来。这样,性能命中分布在许多查询中,并且您不会达到WHERE IN阈值。

Just to note, if the IDs will represent most of the entities rather than a small subset (that is you know you are going to wind up getting them all, or most, anyway) then dont lazy-load.

需要注意的是,如果ID代表大多数实体而不是一个小子集(你知道你最终会得到它们全部或大部分),那么就不要懒惰。

Edit based on your update

根据您的更新进行修改

If you are talking about 36,000 records after 1 year, BUT you are only dealing with loads in the recent time, then eager load the recent records you care about. I would do something like: create a criteria to load the records for the past... month? Then I will have all the records I might need, match them up to the IDs in the file via code, and bingo-bango-bongo.

如果你在1年后谈论36,000条记录,但是你只是在最近的时间处理负载,然后急切地加载你关心的最近记录。我会做一些事情:创建一个标准来加载过去......月的记录?然后我将获得我可能需要的所有记录,通过代码将它们与文件中的ID匹配,以及bingo-bango-bongo。

The table will definitely grow in size over time so it doesn't make sense to always pull back everything, but if you have a way of saying "i only care about these records" then SQL can do that constraint for you.

这个表肯定会随着时间的推移而变大,所以总是把所有内容都拉下来是没有意义的,但如果你有办法说“我只关心这些记录”,那么SQL可以为你做这个约束。

#4


1  

The only place I've seen code like that before where the Ids extend into the thousands, was where that list of IDs had just been loaded from the database as a separate query. It should instead by created as a DetachedCriteria, and then consumed using a Subqueries.PropertyNotIn or PropertyIn criteria query (rather than LINQ).

在Ids扩展到数千个之前,我见过的唯一代码就是那个ID列表刚从数据库中作为单独的查询加载的地方。它应该被创建为DetachedCriteria,然后使用Subqueries.PropertyNotIn或PropertyIn条件查询(而不是LINQ)来使用。


Another way to look at this sort of thing is - 2100 parameters feels like an arbitrary limit. I'm sure SQL Server could be modified to accept more parameters (but I'm sure a Connect request would get closed almost immediately), or you can use workarounds (such as sending XML, or pre-populating a table) to pass this many parameters. But if you're hitting this limit, shouldn't you step back and consider that there's something else broken in what you're doing?

另一种看待这种事情的方法是 - 2100参数感觉就像一个任意限制。我确信可以修改SQL Server以接受更多参数(但我确定Connect请求几乎立即关闭),或者您可以使用变通方法(例如发送XML或预先填充表)来传递很多参数。但是,如果你达到这个限制,你不应该退后一步,并认为你正在做的事情还有其他一些问题吗?

#5


0  

You can't make the IN list only one parameter (e.g. an array), because that is not supported by SQL. The only way I know of to have more than 1000 elements in the IN list is to put a sub query there.
Having said that, one workaround would be to put the known IDs into a temporary table and change your NHibernate statement to use this table so it will result in a subquery in the SQL statement.

您不能使IN列表只有一个参数(例如数组),因为SQL不支持它。我知道在IN列表中有超过1000个元素的唯一方法是在那里放置一个子查询。话虽如此,一种解决方法是将已知的ID放入临时表并更改您的NHibernate语句以使用此表,这样它将在SQL语句中生成子查询。

#6


0  

I would definitely recomend using temporary table for this kind of data.

我肯定会建议使用临时表来获取这种数据。

You will get ability to check if the parameters are correct, by querying this temporary table. And you can have foreign key constraints, so you are baked up before bad IDs. And you can have history of them in your database.

通过查询此临时表,您将能够检查参数是否正确。并且您可以拥有外键约束,因此您可以在错误的ID之前进行烘焙。您可以在数据库中记录它们的历史记录。

#7


0  

I faced the same problem in Oracle which doesn't allow more than 1000 elements inside IN condition as well. The error is: "ORA-01795: maximum number of expressions in a list is 1000". Here is my solution:

我在Oracle中遇到了同样的问题,它在IN条件下也不允许超过1000个元素。错误是:“ORA-01795:列表中的最大表达式数为1000”。这是我的解决方案:

    //partition an IEnumerable into fixed size IEnumerables
    public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> source, int partitionSize)
    {
        return source
            .Select((value, index) => new { Index = index, Value = value })
            .GroupBy(i => i.Index / partitionSize)
            .Select(i => i.Select(i2 => i2.Value));
    }

    public IEnumerable<T> Get(List<long> listOfIDs)
    {
        var partitionedList = listOfIDs.Partition(1000).ToList();
        List<ICriterion> criterions = new List<ICriterion>();
        foreach (var ids in partitionedList)
        {
            criterions.Add(Restrictions.In("Id", ids.ToArray()));
        }
        var criterion = criterions.Aggregate(Restrictions.Or);
        var criteria = session.CreateCriteria<T>().Add(criterion);
        return criteria.Future<T>();
    }

First part is an extension method to IEnumerable, to partition a big list into fixed size lists. Second part uses NHibernate Criterions to dynamically generate multiple IN conditions to be later joined with OR conditions.

第一部分是IEnumerable的扩展方法,用于将大型列表划分为固定大小的列表。第二部分使用NHibernate Criterions动态生成多个IN条件,以便稍后与OR条件连接。