My SSIS program reads as input from a .csv file.
我的SSIS程序从.csv文件中读取为输入。
The file has about 60,000 rows. And my SSIS package fails during the read, saying cannot convert a certain column due to potential loss of data.
该文件有大约60,000行。并且我的SSIS包在读取期间失败,说由于潜在的数据丢失而无法转换某个列。
Now, I am certain that the majority of the rows are correct. As I have tried pasting random subset of the file and the SSIS reads fine.
现在,我确信大多数行都是正确的。因为我已经尝试粘贴文件的随机子集并且SSIS读取正常。
But I can't figure out a way to determine exactly on what line did my package fail.
但我无法找到一种方法来确定我的包失败的确切线。
I have spent 2 months on this problem, any advice?
我花了两个月的时间来解决这个问题,有什么建议吗?
4 个解决方案
#1
You could find out the first culprit with 16 iterations. Here is a brain + brawn method:
你可以找到第一个16次迭代的罪魁祸首。这是一个大脑+布朗方法:
First: Back everything up. Make copies of backups in safe places. Sorry to state the obvious, but I've recently been bitten, and I know better.
第一:支持一切。在安全的地方制作备份副本。很抱歉说明显了,但我最近被咬了,我知道的更好。
The file with 60K records - let's call this your base file.
具有60K记录的文件 - 让我们将其称为您的基本文件。
- Split base file into two files (FileA, FileB)
- Use one of them as input.
- Run SSIS - if SSIS fails, use FileA as your base file - else use FileB as base file
- Go to step 1
将基本文件拆分为两个文件(FileA,FileB)
使用其中一个作为输入。
运行SSIS - 如果SSIS失败,请使用FileA作为基本文件 - 否则使用FileB作为基本文件
转到第1步
You will have the offending record at the 16 iteration. (60k, 30k, 15k, 7500, 3750, 1875, 937, 468, 234, 117, 58, 29, 14, 7, 3, 1)
您将在16次迭代中获得违规记录。 (60k,30k,15k,7500,3750,1875,937,468,234,117,58,29,14,7,3,1)
Turn logging on for everything and rerun the SSIS package. You should have the offending record in the base file and the exact data point in the log.
打开所有内容的登录并重新运行SSIS包。您应该在基本文件中包含违规记录,并在日志中包含确切的数据点。
#2
First, simplify the problem. Create a data flow task that only uses this flat file source, and some dummy destination. Watch that fail.
首先,简化问题。创建仅使用此平面文件源和一些虚拟目标的数据流任务。注意失败。
Turn on all logging and page through the logs. Turn off logging areas you find are obviously worthless, and run it again.
打开所有日志记录并翻阅日志。关闭您发现的记录区域显然毫无价值,并再次运行它。
Also, you should configure the error output of the source and/or destination: whichever one is giving you the error. Send the erroneous row to a separate destination that you can look at after the run.
此外,您应该配置源和/或目标的错误输出:无论哪个给您错误。将错误的行发送到您可以在运行后查看的单独目标。
#3
Most of the time when I have run across this it was the result of either data that was longer than expected (i.e. trying to fit a 60 character string into a varchar(50) field), or it was a number where precision might be lost (i.e. fitting a 26.5 into an integer field or a 26.55 into a number field that only allows for one decimal place).
大多数情况下,当我碰到这个时,结果是数据长于预期(即尝试将60个字符的字符串放入varchar(50)字段中),或者它是一个可能丢失精度的数字(即将26.5插入整数字段或将26.55插入到仅允许一位小数的数字字段中)。
#4
Set DefaultBufferMaxRows = 1
设置DefaultBufferMaxRows = 1
This will read and process each line one-by-one and will fail on the row that it's having conversions issues with.
这将逐个读取和处理每一行,并且在转换问题的行上将失败。
There's no need to do this manually by splitting out the file.
通过拆分文件无需手动执行此操作。
#1
You could find out the first culprit with 16 iterations. Here is a brain + brawn method:
你可以找到第一个16次迭代的罪魁祸首。这是一个大脑+布朗方法:
First: Back everything up. Make copies of backups in safe places. Sorry to state the obvious, but I've recently been bitten, and I know better.
第一:支持一切。在安全的地方制作备份副本。很抱歉说明显了,但我最近被咬了,我知道的更好。
The file with 60K records - let's call this your base file.
具有60K记录的文件 - 让我们将其称为您的基本文件。
- Split base file into two files (FileA, FileB)
- Use one of them as input.
- Run SSIS - if SSIS fails, use FileA as your base file - else use FileB as base file
- Go to step 1
将基本文件拆分为两个文件(FileA,FileB)
使用其中一个作为输入。
运行SSIS - 如果SSIS失败,请使用FileA作为基本文件 - 否则使用FileB作为基本文件
转到第1步
You will have the offending record at the 16 iteration. (60k, 30k, 15k, 7500, 3750, 1875, 937, 468, 234, 117, 58, 29, 14, 7, 3, 1)
您将在16次迭代中获得违规记录。 (60k,30k,15k,7500,3750,1875,937,468,234,117,58,29,14,7,3,1)
Turn logging on for everything and rerun the SSIS package. You should have the offending record in the base file and the exact data point in the log.
打开所有内容的登录并重新运行SSIS包。您应该在基本文件中包含违规记录,并在日志中包含确切的数据点。
#2
First, simplify the problem. Create a data flow task that only uses this flat file source, and some dummy destination. Watch that fail.
首先,简化问题。创建仅使用此平面文件源和一些虚拟目标的数据流任务。注意失败。
Turn on all logging and page through the logs. Turn off logging areas you find are obviously worthless, and run it again.
打开所有日志记录并翻阅日志。关闭您发现的记录区域显然毫无价值,并再次运行它。
Also, you should configure the error output of the source and/or destination: whichever one is giving you the error. Send the erroneous row to a separate destination that you can look at after the run.
此外,您应该配置源和/或目标的错误输出:无论哪个给您错误。将错误的行发送到您可以在运行后查看的单独目标。
#3
Most of the time when I have run across this it was the result of either data that was longer than expected (i.e. trying to fit a 60 character string into a varchar(50) field), or it was a number where precision might be lost (i.e. fitting a 26.5 into an integer field or a 26.55 into a number field that only allows for one decimal place).
大多数情况下,当我碰到这个时,结果是数据长于预期(即尝试将60个字符的字符串放入varchar(50)字段中),或者它是一个可能丢失精度的数字(即将26.5插入整数字段或将26.55插入到仅允许一位小数的数字字段中)。
#4
Set DefaultBufferMaxRows = 1
设置DefaultBufferMaxRows = 1
This will read and process each line one-by-one and will fail on the row that it's having conversions issues with.
这将逐个读取和处理每一行,并且在转换问题的行上将失败。
There's no need to do this manually by splitting out the file.
通过拆分文件无需手动执行此操作。