SQL SERVER批量插入忽略变形行

I have to import SAP unconvered lists. These reports look quite ugly and are not that well suited for automated processing. However there is no other option. The data is borderd around minus and pipe symbols similar to the following example:

我必须导入SAP未转换列表。这些报告看起来很难看，不太适合自动处理。然而，没有其他选择。这些数据与下面的例子类似:

02.07.2012
--------------------
Report name
--------------------
|Header1 |Header2  |
|Value 11|Value1 2 |
|Value 21|Value2 2 | 
--------------------

I use a format file and a statement like the following:

我使用格式文件和如下所示的语句:

SELECT Header1, Header2
FROM  OPENROWSET(BULK  'report.txt',
FORMATFILE='formatfile_report.xml'  ,
errorfile='rejects.txt',
firstrOW = 2,
maxerrors = 100 ) as report

Unfortunately I receive the follwing error code:

不幸的是，我收到了下面的错误代码:

Msg 4832, Level 16, State 1, Line 1
Bulk load: An unexpected end of file was encountered in the data file.
Msg 7399, Level 16, State 1, Line 1
The OLE DB provider "BULK" for linked server "(null)" reported an error. The provider did not give any information about the error.
Msg 7330, Level 16, State 2, Line 1
Cannot fetch a row from OLE DB provider "BULK" for linked server "(null)".

The rejects txt file contains the last row from the file with just minuses in it. The rejects.txt.Error.Txt documents:

被拒绝的txt文件包含来自文件的最后一行，其中只包含减号。rejects.txt.Error。Txt文档:

Row 21550 File Offset 3383848 ErrorFile Offset 0 - HRESULT 0x80004005

The culprit that raises the error is obviously the very last row that does not conform to the format as declared in the format file. However the ugly header does not cause much problems (at least the one at the very top).

引起错误的罪魁祸首显然是最后一行，它不符合格式文件中声明的格式。但是丑陋的头并不会引起很多问题(至少最上面的那个)。

Although I defined the maxerror attribute that very one deformed line kills the whole operation. If I manually delete the last line containing all that minuses (-) everything works fine. Since that import shall run frequently and particularly unattended that extra post-treatment is not serious solution.

虽然我定义了maxerror属性，但是只有一条变形的行会终止整个操作。如果我手动删除包含所有minuse(-)的最后一行，一切都可以正常工作。由于该进口应频繁运行，特别是在无人照料的情况下，额外的后处理不是严重的解决办法。

Can anyone help me to get sql server to be less picky and susceptible respectively. It is good that it documents the lines that couldn't be loaded but why does it abort the whole operation? And further after one execution of a statement that caused the creation of the reject.txt no other (or the same) statement can be executed before the txt file gets deleted manually:

谁能帮我让sql server变得不那么挑剔和容易受影响?它可以记录不能被加载的行，但是为什么它会中止整个操作呢?并且在执行了一个语句之后，导致了拒绝的创建。在手动删除txt文件之前，不能执行任何其他(或相同的)语句:

Msg 4861, Level 16, State 1, Line 1
Cannot bulk load because the file "rejects.txt" could not be opened. Operating system error code 80(The file exists.).
Msg 4861, Level 16, State 1, Line 1
Cannot bulk load because the file "rejects.txt.Error.Txt" could not be opened. Operating system error code 80(The file exists.).

I think that is weird behavior. Please help me to suppress it.

我认为这是一种奇怪的行为。请帮助我抑制它。

EDIT - FOLLOWUP: Here is the format file I use:

编辑-跟进:这是我使用的格式文件:

<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" 
           xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 <RECORD>
   <FIELD ID="EMPTY" xsi:type="CharTerm" TERMINATOR="|" MAX_LENGTH="100"/>
   <FIELD ID="HEADER1" xsi:type="CharTerm" TERMINATOR="|" MAX_LENGTH="100"/>
   <FIELD ID="HEADER2" xsi:type="CharTerm" TERMINATOR="|\r\n" MAX_LENGTH="100"/>
 </RECORD>
 <ROW>
   <COLUMN SOURCE="HEADER1" NAME="HEADER2" xsi:type="SQLNVARCHAR"/>
   <COLUMN SOURCE="HEADER2" NAME="HEADER2" xsi:type="SQLNVARCHAR"/>
 </ROW>
 </BCPFORMAT>

3 个解决方案

#1

BULK INSERT is notoriously fiddly and unhelpful when it comes to handling data that doesn't meet the specifications provided.

当处理不符合提供的规范的数据时，批量插入是出了名的麻烦和无用。

I haven't done a lot of work with format files, but one thing you might want to consider as a replacement is using BULK INSERT to drop each line of the file into a temporary staging table with a single nvarchar(max) column.

我还没有做过很多格式文件的工作，但是有一件事您可能想要考虑的，那就是使用批量插入将文件的每一行都放到一个临时的staging表中，其中包含一个nvarchar(max)列。

This lets you get your data into SQL for further examination, and then you can use the various string manipulation functions to break it down into the data you want to finally insert.

这使您可以将数据放入SQL中进行进一步检查，然后可以使用各种字符串操作函数将其分解为您希望最终插入的数据。

#2

I was in the same trouble, but using bcp command line the problem was solved, its simply dont take the last row

我也遇到了同样的麻烦，但是使用bcp命令行，问题就解决了，根本不需要最后一行

#3

I had the same problem. I had a file with 115 billions of rows so manually deleting the last row was not an option, as I couldn't even open the file manually, as it was too big.

我也有同样的问题。我有一个有1150亿行的文件，所以手动删除最后一行不是一个选项，因为我甚至不能手动打开文件，因为它太大了。

Instead of using BULK INSERT command, I used the bcp command, which looks like this: (Open a DOS cmd in administrator then write)

我没有使用批量插入命令，而是使用bcp命令，如下所示:(在管理员中打开DOS cmd，然后写入)

bcp DatabaseName.dbo.TableNameToInsertIn in C:\Documents\FileNameToImport.dat -S ServerName -U UserName -P PassWord

It's about the same speed as bulk insert as far as I can tell (took me only 12 minutes to import my data). When looking at activity monitor, I can see a bulk insert, so I guess that it logs the same way when the database is in bulk recovery mode.

据我所知，它的速度与批量插入差不多(我只花了12分钟就导入了数据)。在查看activity monitor时，我可以看到批量插入，因此我猜测，当数据库处于批量恢复模式时，它以同样的方式进行日志记录。

#1