为什么DataTable比DataReader更快

时间:2022-06-08 01:12:20

So we have had a heated debate at work as to which DataAccess route to take: DataTable or DataReader.

因此,我们就工作中的DataAccess路径进行了激烈的争论:DataTable或DataReader。

DISCLAIMER I am on the DataReader side and these results have shaken my world.

免责声明我在DataReader方面,这些结果震撼了我的世界。

We ended up writing some benchmarks to test the speed differences. It was generally agreed that a DataReader is faster, but we wanted to see how much faster.

我们最终编写了一些基准来测试速度差异。人们普遍认为DataReader速度更快,但我们想看看速度有多快。

The results surprised us. The DataTable was consistently faster than the DataReader. Approaching twice as fast sometimes.

结果让我们感到惊讶。 DataTable始终比DataReader更快。有时接近两倍的速度。

So I turn to you, members of SO. Why, when most of the documentation and even Microsoft, state that a DataReader is faster are our test showing otherwise.

所以我转向你,SO的成员。为什么,当大多数文档甚至微软声明DataReader更快时,我们的测试显示不然。

And now for the code:

现在代码:

The test harness:

测试工具:

    private void button1_Click(object sender, EventArgs e)
    {
        System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
        sw.Start();

        DateTime date = DateTime.Parse("01/01/1900");

        for (int i = 1; i < 1000; i++)
        {

            using (DataTable aDataTable = ArtifactBusinessModel.BusinessLogic.ArtifactBL.RetrieveDTModified(date))
            {
            }
        }
        sw.Stop();
        long dataTableTotalSeconds = sw.ElapsedMilliseconds;

        sw.Restart();


        for (int i = 1; i < 1000; i++)
        {
            List<ArtifactBusinessModel.Entities.ArtifactString> aList = ArtifactBusinessModel.BusinessLogic.ArtifactBL.RetrieveModified(date);

        }

        sw.Stop();

        long listTotalSeconds = sw.ElapsedMilliseconds;

        MessageBox.Show(String.Format("list:{0}, table:{1}", listTotalSeconds, dataTableTotalSeconds));
    }

This is the DAL for the DataReader:

这是DataReader的DAL:

        internal static List<ArtifactString> RetrieveByModifiedDate(DateTime modifiedLast)
        {
            List<ArtifactString> artifactList = new List<ArtifactString>();

            try
            {
                using (SqlConnection conn = SecuredResource.GetSqlConnection("Artifacts"))
                {
                    using (SqlCommand command = new SqlCommand("[cache].[Artifacts_SEL_ByModifiedDate]", conn))
                    {
                        command.CommandType = CommandType.StoredProcedure;
                        command.Parameters.Add(new SqlParameter("@LastModifiedDate", modifiedLast));
                        using (SqlDataReader reader = command.ExecuteReader())
                        {
                            int formNumberOrdinal = reader.GetOrdinal("FormNumber");
                            int formOwnerOrdinal = reader.GetOrdinal("FormOwner");
                            int descriptionOrdinal = reader.GetOrdinal("Description");
                            int descriptionLongOrdinal = reader.GetOrdinal("DescriptionLong");
                            int thumbnailURLOrdinal = reader.GetOrdinal("ThumbnailURL");
                            int onlineSampleURLOrdinal = reader.GetOrdinal("OnlineSampleURL");
                            int lastModifiedMetaDataOrdinal = reader.GetOrdinal("LastModifiedMetaData");
                            int lastModifiedArtifactFileOrdinal = reader.GetOrdinal("LastModifiedArtifactFile");
                            int lastModifiedThumbnailOrdinal = reader.GetOrdinal("LastModifiedThumbnail");
                            int effectiveDateOrdinal = reader.GetOrdinal("EffectiveDate");
                            int viewabilityOrdinal = reader.GetOrdinal("Viewability");
                            int formTypeOrdinal = reader.GetOrdinal("FormType");
                            int inventoryTypeOrdinal = reader.GetOrdinal("InventoryType");
                            int createDateOrdinal = reader.GetOrdinal("CreateDate");

                            while (reader.Read())
                            {
                                ArtifactString artifact = new ArtifactString();
                                ArtifactDAL.Map(formNumberOrdinal, formOwnerOrdinal, descriptionOrdinal, descriptionLongOrdinal, formTypeOrdinal, inventoryTypeOrdinal, createDateOrdinal, thumbnailURLOrdinal, onlineSampleURLOrdinal, lastModifiedMetaDataOrdinal, lastModifiedArtifactFileOrdinal, lastModifiedThumbnailOrdinal, effectiveDateOrdinal, viewabilityOrdinal, reader, artifact);
                                artifactList.Add(artifact);
                            }
                        }
                    }
                }
            }
            catch (ApplicationException)
            {
                throw;
            }
            catch (Exception e)
            {
                string errMsg = String.Format("Error in ArtifactDAL.RetrieveByModifiedDate. Date: {0}", modifiedLast);
                Logging.Log(Severity.Error, errMsg, e);
                throw new ApplicationException(errMsg, e);
            }

            return artifactList;
        }
    internal static void Map(int? formNumberOrdinal, int? formOwnerOrdinal, int? descriptionOrdinal, int? descriptionLongOrdinal, int? formTypeOrdinal, int? inventoryTypeOrdinal, int? createDateOrdinal,
        int? thumbnailURLOrdinal, int? onlineSampleURLOrdinal, int? lastModifiedMetaDataOrdinal, int? lastModifiedArtifactFileOrdinal, int? lastModifiedThumbnailOrdinal,
        int? effectiveDateOrdinal, int? viewabilityOrdinal, IDataReader dr, ArtifactString entity)
    {

            entity.FormNumber = dr[formNumberOrdinal.Value].ToString();
            entity.FormOwner = dr[formOwnerOrdinal.Value].ToString();
            entity.Description = dr[descriptionOrdinal.Value].ToString();
            entity.DescriptionLong = dr[descriptionLongOrdinal.Value].ToString();
            entity.FormType = dr[formTypeOrdinal.Value].ToString();
            entity.InventoryType = dr[inventoryTypeOrdinal.Value].ToString();
            entity.CreateDate = DateTime.Parse(dr[createDateOrdinal.Value].ToString());
            entity.ThumbnailURL = dr[thumbnailURLOrdinal.Value].ToString();
            entity.OnlineSampleURL = dr[onlineSampleURLOrdinal.Value].ToString();
            entity.LastModifiedMetaData = dr[lastModifiedMetaDataOrdinal.Value].ToString();
            entity.LastModifiedArtifactFile = dr[lastModifiedArtifactFileOrdinal.Value].ToString();
            entity.LastModifiedThumbnail = dr[lastModifiedThumbnailOrdinal.Value].ToString();
            entity.EffectiveDate = dr[effectiveDateOrdinal.Value].ToString();
            entity.Viewability = dr[viewabilityOrdinal.Value].ToString();
    }

This is the DAL for the DataTable:

这是DataTable的DAL:

        internal static DataTable RetrieveDTByModifiedDate(DateTime modifiedLast)
        {
            DataTable dt= new DataTable("Artifacts");

            try
            {
                using (SqlConnection conn = SecuredResource.GetSqlConnection("Artifacts"))
                {
                    using (SqlCommand command = new SqlCommand("[cache].[Artifacts_SEL_ByModifiedDate]", conn))
                    {
                        command.CommandType = CommandType.StoredProcedure;
                        command.Parameters.Add(new SqlParameter("@LastModifiedDate", modifiedLast));

                        using (SqlDataAdapter da = new SqlDataAdapter(command))
                        {
                            da.Fill(dt);
                        }
                    }
                }
            }
            catch (ApplicationException)
            {
                throw;
            }
            catch (Exception e)
            {
                string errMsg = String.Format("Error in ArtifactDAL.RetrieveByModifiedDate. Date: {0}", modifiedLast);
                Logging.Log(Severity.Error, errMsg, e);
                throw new ApplicationException(errMsg, e);
            }

            return dt;
        }

The results:

结果:

For 10 iterations within the Test Harness

在测试工具中进行10次迭代

为什么DataTable比DataReader更快

For 1000 iterations within the Test Harness

在测试工具中进行1000次迭代

为什么DataTable比DataReader更快

These results are the second run, to mitigate the differences due to creating the connection.

这些结果是第二次运行,以减轻由于创建连接而产生的差异。

4 个解决方案

#1


25  

I see three issues:

我看到三个问题:

  1. the way you use a DataReader negates it's big single-item-in-memory advantage by converting it to list,
  2. 你使用DataReader的方式通过将其转换为列表来否定它的大单项内存优势,
  3. you're running the benchmark in an environment that differs significantly from production in a way that favors the DataTable, and
  4. 你在一个与生产有很大不同的环境中运行基准测试,以一种有利于DataTable的方式运行
  5. you're spending time converting DataReader record to Artifact objects that is not duplicated in the DataTable code.
  6. 您将花费时间将DataReader记录转换为DataTable代码中未重复的Artifact对象。

The main advantage of a DataReader is that you don't have to load everything into memory at once. This should be a huge advantage for DataReader in web apps, where memory, rather than cpu, is often the bottleneck, but by adding each row to a generic list you've negated this. That also means that even after you change your code to only use one record at a time, the difference might not show up on your benchmarks because you're running them on a system with lot of free memory, which will favor the DataTable. Also, the DataReader version is spending time parsing the results into Artifact objects that the DataTable has not done yet.

DataReader的主要优点是您不必一次将所有内容加载到内存中。对于Web应用程序中的DataReader而言,这应该是一个巨大的优势,其中内存而不是cpu通常是瓶颈,但是通过将每行添加到通用列表中,您已经否定了这一点。这也意味着,即使您将代码更改为一次只使用一条记录,差异也可能不会显示在您的基准测试上,因为您在具有大量可用内存的系统上运行它们,这将有利于DataTable。此外,DataReader版本花费时间将结果解析为DataTable尚未完成的Artifact对象。

To fix the DataReader usage issue, change List<ArtifactString> to IEnumerable<ArtifactString> everywhere, and in your DataReader DAL change this line:

要修复DataReader使用问题,请将List 更改为IEnumerable ,并在DataReader DAL中更改此行:

artifactList.Add(artifact);

to this:

对此:

yield return artifact;

This means you also need to add code that iterates over the results to your DataReader test harness to keep things fair.

这意味着您还需要将迭代结果的代码添加到DataReader测试工具中以保持公平。

I'm not sure how to adjust the benchmark to create a more typical scenario that is fair to both DataTable and DataReader, except to build two versions of your page, and serve up each version for an hour under a similar production-level load so that we have real memory pressure... do some real A/B testing. Also, make sure you cover converting the DataTable rows to Artifacts... and if the argument is that you need to do this for a DataReader, but not for a DataTable, that is just plain wrong.

我不确定如何调整基准来创建一个对DataTable和DataReader都公平的更典型的场景,除了构建两个版本的页面,并在类似的生产级别负载下提供每个版本一小时我们有真正的记忆压力...做一些真正的A / B测试。另外,请确保覆盖将DataTable行转换为Artifacts ...如果参数是您需要为DataReader执行此操作,而不是DataTable,则这是完全错误的。

#2


2  

SqlDataAdapter.Fill calls SqlCommand.ExecuteReader with CommandBehavior.SequentialAccess set. Maybe that's enough to make the difference.

SqlDataAdapter.Fill使用CommandBehavior.SequentialAccess set调用SqlCommand.ExecuteReader。也许这足以让人与众不同。

As an aside, I see your IDbReader implementation caches the ordinals of each field for performance reasons. An alternative to this approach is to use the DbEnumerator class.

顺便说一下,出于性能原因,我看到你的IDbReader实现缓存了每个字段的序数。此方法的替代方法是使用DbEnumerator类。

DbEnumerator caches a field name -> ordinal dictionary internally, so gives you much of the performance benefit of using ordinals with the simplicity of using field names:

DbEnumerator在内部缓存一个字段名称 - >序数字典,因此使用简单的字段名称可以为您提供使用序数的大部分性能优势:

foreach(IDataRecord record in new DbEnumerator(reader))
{
    artifactList.Add(new ArtifactString() {
        FormNumber = (int) record["FormNumber"],
        FormOwner = (int) record["FormOwner"],
        ...
    });
}

or even:

甚至:

return new DbEnumerator(reader)
    .Select(record => new ArtifactString() {
        FormNumber = (int) record["FormNumber"],
        FormOwner = (int) record["FormOwner"],
        ...
      })
    .ToList();

#3


2  

2 things could be slowing you down.

2件事可能会让你失望。

First, I wouldn't do a "find ordinal by name" for each column, if you're interested in performance. Note, the "layout" class below to take care of this lookup. And the layout providers later readability, instead of using "0", "1", "2", etc. And it allows me to code to an Interface (IDataReader) instead of the Concrete.

首先,如果您对性能感兴趣,我不会为每一列做“按名称排序”。注意,下面的“布局”类来处理这个查找。并且布局提供者以后可读性,而不是使用“0”,“1”,“2”等。它允许我编码到接口(IDataReader)而不是混凝土。

Second. You're using the ".Value" property. (and I would think this does make a difference)

第二。您正在使用“.Value”属性。 (我认为这确实有所作为)

You'll get better results (IMHO) if you use the concrete datatype "getters".

如果你使用具体的数据类型“getters”,你会得到更好的结果(恕我直言)。

GetString, GetDateTime, GetInt32, etc,etc.

GetString,GetDateTime,GetInt32等等。

Here is my typical IDataReader to DTO/POCO code.

这是我对DTO / POCO代码的典型IDataReader。

[Serializable]
public partial class Employee
{
    public int EmployeeKey { get; set; }                   
    public string LastName { get; set; }                   
    public string FirstName { get; set; }   
    public DateTime HireDate  { get; set; }  
}

[Serializable]
public class EmployeeCollection : List<Employee>
{
}   

internal static class EmployeeSearchResultsLayouts
{
    public static readonly int EMPLOYEE_KEY = 0;
    public static readonly int LAST_NAME = 1;
    public static readonly int FIRST_NAME = 2;
    public static readonly int HIRE_DATE = 3;
}


    public EmployeeCollection SerializeEmployeeSearchForCollection(IDataReader dataReader)
    {
        Employee item = new Employee();
        EmployeeCollection returnCollection = new EmployeeCollection();
        try
        {

            int fc = dataReader.FieldCount;//just an FYI value

            int counter = 0;//just an fyi of the number of rows

            while (dataReader.Read())
            {

                if (!(dataReader.IsDBNull(EmployeeSearchResultsLayouts.EMPLOYEE_KEY)))
                {
                    item = new Employee() { EmployeeKey = dataReader.GetInt32(EmployeeSearchResultsLayouts.EMPLOYEE_KEY) };

                    if (!(dataReader.IsDBNull(EmployeeSearchResultsLayouts.LAST_NAME)))
                    {
                        item.LastName = dataReader.GetString(EmployeeSearchResultsLayouts.LAST_NAME);
                    }

                    if (!(dataReader.IsDBNull(EmployeeSearchResultsLayouts.FIRST_NAME)))
                    {
                        item.FirstName = dataReader.GetString(EmployeeSearchResultsLayouts.FIRST_NAME);
                    }

                    if (!(dataReader.IsDBNull(EmployeeSearchResultsLayouts.HIRE_DATE)))
                    {
                        item.HireDate = dataReader.GetDateTime(EmployeeSearchResultsLayouts.HIRE_DATE);
                    }


                    returnCollection.Add(item);
                }

                counter++;
            }

            return returnCollection;

        }
        //no catch here... see  http://blogs.msdn.com/brada/archive/2004/12/03/274718.aspx
        finally
        {
            if (!((dataReader == null)))
            {
                try
                {
                    dataReader.Close();
                }
                catch
                {
                }
            }
        }
    }

#4


0  

I don't think it will account for all the difference, but try something like this to eliminate some of the extra variables and function calls:

我不认为它会解释所有的差异,但尝试这样的东西来消除一些额外的变量和函数调用:

using (SqlDataReader reader = command.ExecuteReader())
{
    while (reader.Read())
    {
        artifactList.Add(new ArtifactString
        {
            FormNumber = reader["FormNumber"].ToString(),
            //etc
        });
     }
}

#1


25  

I see three issues:

我看到三个问题:

  1. the way you use a DataReader negates it's big single-item-in-memory advantage by converting it to list,
  2. 你使用DataReader的方式通过将其转换为列表来否定它的大单项内存优势,
  3. you're running the benchmark in an environment that differs significantly from production in a way that favors the DataTable, and
  4. 你在一个与生产有很大不同的环境中运行基准测试,以一种有利于DataTable的方式运行
  5. you're spending time converting DataReader record to Artifact objects that is not duplicated in the DataTable code.
  6. 您将花费时间将DataReader记录转换为DataTable代码中未重复的Artifact对象。

The main advantage of a DataReader is that you don't have to load everything into memory at once. This should be a huge advantage for DataReader in web apps, where memory, rather than cpu, is often the bottleneck, but by adding each row to a generic list you've negated this. That also means that even after you change your code to only use one record at a time, the difference might not show up on your benchmarks because you're running them on a system with lot of free memory, which will favor the DataTable. Also, the DataReader version is spending time parsing the results into Artifact objects that the DataTable has not done yet.

DataReader的主要优点是您不必一次将所有内容加载到内存中。对于Web应用程序中的DataReader而言,这应该是一个巨大的优势,其中内存而不是cpu通常是瓶颈,但是通过将每行添加到通用列表中,您已经否定了这一点。这也意味着,即使您将代码更改为一次只使用一条记录,差异也可能不会显示在您的基准测试上,因为您在具有大量可用内存的系统上运行它们,这将有利于DataTable。此外,DataReader版本花费时间将结果解析为DataTable尚未完成的Artifact对象。

To fix the DataReader usage issue, change List<ArtifactString> to IEnumerable<ArtifactString> everywhere, and in your DataReader DAL change this line:

要修复DataReader使用问题,请将List 更改为IEnumerable ,并在DataReader DAL中更改此行:

artifactList.Add(artifact);

to this:

对此:

yield return artifact;

This means you also need to add code that iterates over the results to your DataReader test harness to keep things fair.

这意味着您还需要将迭代结果的代码添加到DataReader测试工具中以保持公平。

I'm not sure how to adjust the benchmark to create a more typical scenario that is fair to both DataTable and DataReader, except to build two versions of your page, and serve up each version for an hour under a similar production-level load so that we have real memory pressure... do some real A/B testing. Also, make sure you cover converting the DataTable rows to Artifacts... and if the argument is that you need to do this for a DataReader, but not for a DataTable, that is just plain wrong.

我不确定如何调整基准来创建一个对DataTable和DataReader都公平的更典型的场景,除了构建两个版本的页面,并在类似的生产级别负载下提供每个版本一小时我们有真正的记忆压力...做一些真正的A / B测试。另外,请确保覆盖将DataTable行转换为Artifacts ...如果参数是您需要为DataReader执行此操作,而不是DataTable,则这是完全错误的。

#2


2  

SqlDataAdapter.Fill calls SqlCommand.ExecuteReader with CommandBehavior.SequentialAccess set. Maybe that's enough to make the difference.

SqlDataAdapter.Fill使用CommandBehavior.SequentialAccess set调用SqlCommand.ExecuteReader。也许这足以让人与众不同。

As an aside, I see your IDbReader implementation caches the ordinals of each field for performance reasons. An alternative to this approach is to use the DbEnumerator class.

顺便说一下,出于性能原因,我看到你的IDbReader实现缓存了每个字段的序数。此方法的替代方法是使用DbEnumerator类。

DbEnumerator caches a field name -> ordinal dictionary internally, so gives you much of the performance benefit of using ordinals with the simplicity of using field names:

DbEnumerator在内部缓存一个字段名称 - >序数字典,因此使用简单的字段名称可以为您提供使用序数的大部分性能优势:

foreach(IDataRecord record in new DbEnumerator(reader))
{
    artifactList.Add(new ArtifactString() {
        FormNumber = (int) record["FormNumber"],
        FormOwner = (int) record["FormOwner"],
        ...
    });
}

or even:

甚至:

return new DbEnumerator(reader)
    .Select(record => new ArtifactString() {
        FormNumber = (int) record["FormNumber"],
        FormOwner = (int) record["FormOwner"],
        ...
      })
    .ToList();

#3


2  

2 things could be slowing you down.

2件事可能会让你失望。

First, I wouldn't do a "find ordinal by name" for each column, if you're interested in performance. Note, the "layout" class below to take care of this lookup. And the layout providers later readability, instead of using "0", "1", "2", etc. And it allows me to code to an Interface (IDataReader) instead of the Concrete.

首先,如果您对性能感兴趣,我不会为每一列做“按名称排序”。注意,下面的“布局”类来处理这个查找。并且布局提供者以后可读性,而不是使用“0”,“1”,“2”等。它允许我编码到接口(IDataReader)而不是混凝土。

Second. You're using the ".Value" property. (and I would think this does make a difference)

第二。您正在使用“.Value”属性。 (我认为这确实有所作为)

You'll get better results (IMHO) if you use the concrete datatype "getters".

如果你使用具体的数据类型“getters”,你会得到更好的结果(恕我直言)。

GetString, GetDateTime, GetInt32, etc,etc.

GetString,GetDateTime,GetInt32等等。

Here is my typical IDataReader to DTO/POCO code.

这是我对DTO / POCO代码的典型IDataReader。

[Serializable]
public partial class Employee
{
    public int EmployeeKey { get; set; }                   
    public string LastName { get; set; }                   
    public string FirstName { get; set; }   
    public DateTime HireDate  { get; set; }  
}

[Serializable]
public class EmployeeCollection : List<Employee>
{
}   

internal static class EmployeeSearchResultsLayouts
{
    public static readonly int EMPLOYEE_KEY = 0;
    public static readonly int LAST_NAME = 1;
    public static readonly int FIRST_NAME = 2;
    public static readonly int HIRE_DATE = 3;
}


    public EmployeeCollection SerializeEmployeeSearchForCollection(IDataReader dataReader)
    {
        Employee item = new Employee();
        EmployeeCollection returnCollection = new EmployeeCollection();
        try
        {

            int fc = dataReader.FieldCount;//just an FYI value

            int counter = 0;//just an fyi of the number of rows

            while (dataReader.Read())
            {

                if (!(dataReader.IsDBNull(EmployeeSearchResultsLayouts.EMPLOYEE_KEY)))
                {
                    item = new Employee() { EmployeeKey = dataReader.GetInt32(EmployeeSearchResultsLayouts.EMPLOYEE_KEY) };

                    if (!(dataReader.IsDBNull(EmployeeSearchResultsLayouts.LAST_NAME)))
                    {
                        item.LastName = dataReader.GetString(EmployeeSearchResultsLayouts.LAST_NAME);
                    }

                    if (!(dataReader.IsDBNull(EmployeeSearchResultsLayouts.FIRST_NAME)))
                    {
                        item.FirstName = dataReader.GetString(EmployeeSearchResultsLayouts.FIRST_NAME);
                    }

                    if (!(dataReader.IsDBNull(EmployeeSearchResultsLayouts.HIRE_DATE)))
                    {
                        item.HireDate = dataReader.GetDateTime(EmployeeSearchResultsLayouts.HIRE_DATE);
                    }


                    returnCollection.Add(item);
                }

                counter++;
            }

            return returnCollection;

        }
        //no catch here... see  http://blogs.msdn.com/brada/archive/2004/12/03/274718.aspx
        finally
        {
            if (!((dataReader == null)))
            {
                try
                {
                    dataReader.Close();
                }
                catch
                {
                }
            }
        }
    }

#4


0  

I don't think it will account for all the difference, but try something like this to eliminate some of the extra variables and function calls:

我不认为它会解释所有的差异,但尝试这样的东西来消除一些额外的变量和函数调用:

using (SqlDataReader reader = command.ExecuteReader())
{
    while (reader.Read())
    {
        artifactList.Add(new ArtifactString
        {
            FormNumber = reader["FormNumber"].ToString(),
            //etc
        });
     }
}