c#、EF和LINQ:在SQL Server中插入大型(7Mb)记录的速度很慢

时间:2022-01-15 02:14:15

There's a long version of this question, and a short version.

这个问题有一个很长的版本,也有一个很短的版本。

The short version:

简短的版本:

why are both LINQ and EF so slow at inserting a single, large (7 Mb) record into a remote SQL Server database ?

为什么LINQ和EF在向远程SQL Server数据库插入单个大(7mb)记录时都这么慢?

And here's the long version (with some information about workarounds, which might be useful to other readers):

这里有一个很长的版本(有一些关于变通方法的信息,可能对其他读者有用):

All of the following example code does run okay, but as my users are in Europe and our Data Centers are based in America, it is damned slow. But if I run the same code on a Virtual PC in America, it runs instantly. (And no, sadly my company wants to keep all data in-house, so I can't use Azure, Amazon Cloud Services, etc)

下面的所有示例代码都运行良好,但是由于我的用户在欧洲,我们的数据中心位于美国,所以运行速度非常慢。但如果我在美国的一台虚拟PC上运行同样的代码,它会立即运行。(不幸的是,我的公司想把所有数据保存在内部,所以我不能使用Azure、Amazon云服务等等)

Quite a few of my corporate apps involve reading/writing data from Excel into SQL Server, and often, we'll want to save a raw-copy of the Excel file in a SQL Server table.

我的许多企业应用程序都涉及从Excel读取/写入数据到SQL Server中,我们通常希望在SQL Server表中保存Excel文件的原始副本。

This is very straightforward to do, simply reading in the raw data from a local file, and saving it into a record.

这非常简单,只需从本地文件中读取原始数据,并将其保存到记录中。

private int SaveFileToSQLServer(string filename)
{
    //  Read in an Excel file, and store it in a SQL Server [External_File] record.
    //
    //  Returns the ID of the [External_File] record which was added.
    //

    DateTime lastModifed = System.IO.File.GetLastWriteTime(filename);
    byte[] fileData = File.ReadAllBytes(filename);

    //  Create a new SQL Server database record, containing our file's raw data 
    //  (Note: the table has an IDENTITY Primary-Key, so will generate a ExtFile_ID for us.)
    External_File newFile = new External_File()
    {
        ExtFile_Filename = System.IO.Path.GetFileName(filename),
        ExtFile_Data = fileData,
        ExtFile_Last_Modified = lastModifed,
        Update_By = "mike",
        Update_Time = DateTime.UtcNow
    };
    dc.External_Files.InsertOnSubmit(newFile);
    dc.SubmitChanges(); 

    return newFile.ExtFile_ID;
}

Yup, no surprises there, and it works fine.

是的,这一点也不奇怪,而且效果很好。

But, what I noticed is that for large Excel files (7-8Mb), this code to insert one (large!) record would take 40-50 seconds to run. I put this in a background thread, and it all worked fine, but, of course, if the user quit my application, this process would get killed off, which would cause problems.

但是,我注意到,对于大型Excel文件(7-8Mb),插入一个(大型!)记录的代码需要40-50秒才能运行。我把它放到一个后台线程中,它运行得很好,但是,当然,如果用户退出我的应用程序,这个进程将被终止,这将导致问题。

As a test, I tried to replace this function with code to do this:

作为测试,我尝试用代码替换这个函数来完成以下操作:

  • copy the file into a shared directory on the SQL Server machine
  • 将文件复制到SQL服务器机器上的共享目录中
  • called a stored procedure to read the raw data (blob) into the same table
  • 调用一个存储过程,将原始数据(blob)读入同一个表

Using this method, the entire process would take just 3-4 seconds.

使用这种方法,整个过程只需要3-4秒。

If you're interested, here's the Stored Procedure I used to upload a file (which MUST be stored in a folder on the SQL Server machine itself) into a database record:

如果您感兴趣,下面是我用来将文件(必须存储在SQL Server机器本身的文件夹中)上载到数据库记录中的存储过程:

CREATE PROCEDURE [dbo].[UploadFileToDatabase]
    @LocalFilename nvarchar(400)
AS
BEGIN
    --  By far, the quickest way to do this is to copy the file onto the SQL Server machine, then call this stored
    --  procedure to read the raw data into a [External_File] record, and link it to the Pricing Account record.
    --
    --      EXEC [dbo].[UploadPricingToolFile] 'D:\ImportData\SomeExcelFile.xlsm'
    -- 
    --  Returns: -1 if something went wrong  (eg file didn't exist) or the ID of our new [External_File] record
    --
    --  Note that the INSERT will go wrong, if the user doesn't have "bulkadmin" rights.
    --      "You do not have permission to use the bulk load statement."
    --  EXEC master..sp_addsrvrolemember @loginame = N'GPP_SRV', @rolename = N'bulkadmin'
    --
    SET NOCOUNT ON;

    DECLARE 
        @filename nvarchar(300),        --  eg "SomeFilename.xlsx"  (without the path)
        @SQL nvarchar(2000),
        @New_ExtFile_ID int

    --  Extract (just) the filename from our Path+Filename parameter
    SET @filename = RIGHT(@LocalFilename,charindex('\',reverse(@LocalFilename))-1)

    SET @SQL = 'INSERT INTO [External_File]  ([ExtFile_Filename], [ExtFile_Data]) '
    SET @SQL = @SQL + 'SELECT ''' + @Filename + ''', * 
    SET @SQL = @SQL + ' FROM OPENROWSET(BULK ''' + @LocalFilename +''', SINGLE_BLOB) rs'

    PRINT convert(nvarchar, GetDate(), 108) + ' Running: ' + @SQL
    BEGIN TRY
        EXEC (@SQL)
        SELECT @New_ExtFile_ID = @@IDENTITY
    END TRY
    BEGIN CATCH
        PRINT convert(nvarchar, GetDate(), 108) + ' An exception occurred.'
        SELECT -1
        RETURN
    END CATCH

    PRINT convert(nvarchar, GetDate(), 108) + ' Finished.'

    --  Return the ID of our new [External_File] record
    SELECT @New_ExtFile_ID
END

The key to this code is that it builds up a SQL command like this:

这段代码的关键在于它构建了如下SQL命令:

INSERT INTO [External_File]  ([ExtFile_Filename], [ExtFile_Data])
SELECT 'SomeFilename.xlsm', * FROM OPENROWSET(BULK N'D:\ImportData\SomeExcelFile.xlsm', SINGLE_BLOB) rs

.. and, as both the database and file to be uploaded are both on the same machine, this runs almost instantly.

. .而且,由于要上传的数据库和文件都在同一台机器上,这几乎是即时运行的。

As I said, overall, it took 3-4 seconds to copy the file to a folder on the SQL Server machine, and run this stored procedure, compared to 40-50 seconds to do the same using C# code with LINQ or EF.

如我所说,总的来说,将文件复制到SQL Server机器上的文件夹并运行这个存储过程需要3-4秒,而使用c#代码执行LINQ或EF则需要40-50秒。

Exporting blob data from SQL Server into an external file

将blob数据从SQL服务器导出到外部文件

And, of course, the same is true in the opposite direction.

当然,反过来也是一样的。

First, I wrote some C#/LINQ code to load the one (7Mb !) database record and write its binary data into a raw-file. This took about 30-40 seconds to run.

首先,我编写了一些c# /LINQ代码来加载一个(7Mb !)数据库记录,并将其二进制数据写到一个原始文件中。这需要30-40秒。

But if I exported the SQL Server data to a file (saved on the SQL Server machine) first..

但是如果我先将SQL服务器数据导出到一个文件(保存在SQL服务器机器上)。

EXEC master..xp_cmdshell 'BCP "select ef.ExtFile_Data FROM [External_File] ef where ExtFile_ID = 585" queryout "D:\ImportData\SomeExcelFile.xslx" -T -N'

...and then copied the file from the SQL Server folder to the user's folder, then once again, it ran in a couple of seconds.

…然后将该文件从SQL Server文件夹中复制到用户的文件夹中,然后再一次地,在几秒钟内运行。

And this is my question: Why are both LINQ and EF so bad at inserting a single large record into the database ?

这就是我的问题:为什么LINQ和EF在向数据库插入一个大记录方面都如此糟糕?

I assume the latency (distance between us, here in Europe, and our Data Centers in the States) are a major cause of the delay, but it's just odd that a bog-standard file-copy can be so much faster.

我认为延迟(我们在欧洲的距离,以及我们在美国的数据中心之间的距离)是造成延迟的主要原因,但奇怪的是,一个bog标准的文件拷贝竟然可以快得多。

Am I missing something ?

我是不是漏掉了什么?

Obviously, I've found walkarounds to these problems, but they involve added some extra permissions to our SQL Server machines and shared folders on SQL Server machines, and our DBAs really don't like granting rights for things like "xp_cmdshell"...

显然,我已经找到了解决这些问题的方法,但是它们涉及到向SQL服务器机器上的SQL服务器和共享文件夹添加一些额外的权限,而且我们的dba真的不喜欢为“xp_cmdshell”之类的东西授予权限……

A few months later...

几个月后…

I had the same issue again this week, and tried Kevin H's suggestion to use Bulk-Insert to insert a large (6Mb) record into SQL Server.

这周我又遇到了同样的问题,我尝试了Kevin H的建议,使用大容量插入将一个大的(6Mb)记录插入到SQL Server中。

Using bulk-insert, it took around 90 seconds to insert the 6Mb record, even though our data centre is 6,000 miles away.

使用大容量插入,插入6Mb的记录需要大约90秒,尽管我们的数据中心在6000英里之外。

So, the moral of the story: when inserting very-large database records, avoid using a regular SubmitChanges() command, and stick to using bulk-insert.

因此,这个故事的寓意是:当插入大量的数据库记录时,避免使用常规的SubmitChanges()命令,并坚持使用bulk-insert。

1 个解决方案

#1


5  

You could try using profiler to see what Entity Framework is doing with the insert. For example, if it's selecting data out of your table, it could be taking a long time to return the data over the wire, and you may not notice that locally.

您可以尝试使用profiler来查看实体框架如何处理insert。例如,如果它正在从表中选择数据,可能需要很长时间才能通过连接返回数据,而且您可能不会在本地注意到这一点。

I have found that the best way to load a large amount of data (both record count and record size) into sql server from c# is to use the SqlBulkCopy class. Even though you are inserting only 1 record, you may still benefit from this change.

我发现,将大量数据(记录计数和记录大小)从c#加载到sql server的最佳方式是使用SqlBulkCopy类。即使您只插入了一条记录,您仍然可以从这个更改中获益。

To use bulk copy, just create a datatable that matches the structure of your table. Then call the code like this.

要使用批量复制,只需创建一个匹配表结构的datatable。然后像这样调用代码。

using (SqlConnection destinationConnection = new    SqlConnection(connectionString))
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(destinationConnection))
{
    bulkCopy.DestinationTableName = "External_File";
    bulkCopy.WriteToServer(dataTable);
}

#1


5  

You could try using profiler to see what Entity Framework is doing with the insert. For example, if it's selecting data out of your table, it could be taking a long time to return the data over the wire, and you may not notice that locally.

您可以尝试使用profiler来查看实体框架如何处理insert。例如,如果它正在从表中选择数据,可能需要很长时间才能通过连接返回数据,而且您可能不会在本地注意到这一点。

I have found that the best way to load a large amount of data (both record count and record size) into sql server from c# is to use the SqlBulkCopy class. Even though you are inserting only 1 record, you may still benefit from this change.

我发现,将大量数据(记录计数和记录大小)从c#加载到sql server的最佳方式是使用SqlBulkCopy类。即使您只插入了一条记录,您仍然可以从这个更改中获益。

To use bulk copy, just create a datatable that matches the structure of your table. Then call the code like this.

要使用批量复制,只需创建一个匹配表结构的datatable。然后像这样调用代码。

using (SqlConnection destinationConnection = new    SqlConnection(connectionString))
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(destinationConnection))
{
    bulkCopy.DestinationTableName = "External_File";
    bulkCopy.WriteToServer(dataTable);
}