I am trying to insert a .csv file into a database with unix linebreaks. The command I am running is:
我正在尝试使用unix换行符将.csv文件插入到数据库中。我正在运行的命令是:
BULK INSERT table_name
FROM 'C:\file.csv'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
If I convert the file into Windows format the load works, but I don't want to do this extra step if it can be avoided. Any ideas?
如果我将文件转换为Windows格式,加载就可以工作了,但是如果可以避免,我不想做这个额外的步骤。什么好主意吗?
8 个解决方案
#1
92
I felt compelled to contribute as I was having the same issue, and I need to read 2 UNIX files from SAP at least a couple of times a day. Therefore, instead of using unix2dos, I needed something with less manual intervention and more automatic via programming.
当我遇到同样的问题时,我觉得有必要做出贡献,我需要每天至少读几次SAP的2个UNIX文件。因此,我需要的不是使用unix2dos,而是更少的手工干预,更自动化的编程。
As noted, the Char(10) works within the sql string. I didn't want to use an sql string, and so I used ''''+Char(10)+'''', but for some reason, this didn't compile.
如前所述,Char(10)在sql字符串中工作。我不想使用sql字符串,所以我使用了" " " +Char(10)+ " ",但出于某种原因,这并没有编译。
What did work very slick was: with (ROWTERMINATOR = '0x0a')
工作非常流畅的是:with (ROWTERMINATOR = '0x0a')
Problem solved with Hex!
与十六进制问题解决了!
Hope this helps someone.
希望这可以帮助别人。
#2
13
Thanks to all who have answered but I found my preferred solution.
谢谢大家的回答,但我找到了我喜欢的解决方案。
When you tell SQL Server ROWTERMINATOR='\n' it interprets this as meaning the default row terminator under Windows which is actually "\r\n" (using C/C++ notation). If your row terminator is really just "\n" you will have to use the dynamic SQL shown below.
当您告诉SQL Server ROWTERMINATOR='\n'时,它将其解释为Windows下的默认行终止符,实际上是“\r\n”(使用C/C+表示法)。如果您的行结束符实际上只是“\n”,那么您必须使用下面所示的动态SQL。
DECLARE @bulk_cmd varchar(1000)
SET @bulk_cmd = 'BULK INSERT table_name
FROM ''C:\file.csv''
WITH (FIELDTERMINATOR = '','', ROWTERMINATOR = '''+CHAR(10)+''')'
EXEC (@bulk_cmd)
Why you can't say BULK INSERT ...(ROWTERMINATOR = CHAR(10)) is beyond me. It doesn't look like you can evaluate any expressions in the WITH section of the command.
为什么你不能说批量插入……我不懂行终结者= CHAR。看起来您无法在命令的WITH部分计算任何表达式。
What the above does is create a string of the command and execute that. Neatly sidestepping the need to create an additional file or go through extra steps.
上面所做的是创建命令的字符串并执行它。巧妙地避开了创建额外文件或执行额外步骤的需要。
#3
3
I confirm that the syntax
我确认语法
ROWTERMINATOR = '''+CHAR(10)+'''
works when used with an EXEC command.
使用EXEC命令时有效。
If you have multiple ROWTERMINATOR characters (e.g. a pipe and a unix linefeed) then the syntax for this is:
如果您有多个行终止符字符(例如管道和unix换行符),那么它的语法是:
ROWTERMINATOR = '''+CHAR(124)+''+CHAR(10)+'''
#4
2
It's a bit more complicated than that! When you tell SQL Server ROWTERMINATOR='\n' it interprets this as meaning the default row terminator under Windows which is actually "\r\n" (using C/C++ notation). If your row terminator is really just "\n" you will have to use the dynamic SQL shown above. I have just spent the best part of an hour figuring out why \n doesn't really mean \n when used with BULK INSERT!
这比那复杂一点!当您告诉SQL Server ROWTERMINATOR='\n'时,它将其解释为Windows下的默认行终止符,实际上是“\r\n”(使用C/C+表示法)。如果您的行结束符实际上只是“\n”,那么您必须使用上面所示的动态SQL。我刚刚花了一个小时的时间来弄明白为什么当用大容量插入时,\n并不是真正意义上的\n !
#5
1
One option would be to use bcp, and set up a control file with '\n'
as the line break character.
一种选择是使用bcp,并设置一个以'\n'作为换行字符的控制文件。
Although you've indicated that you would prefer not to, another option would be to use unix2dos to pre-process the file into one with '\r\n'
line breaks.
虽然您已经表示不希望这样做,但是另一个选择是使用unix2dos将文件预先处理成具有'\r\n'断行符的文件。
Finally, you can use the FORMATFILE
option on BULK INSERT
. This will use a bcp control file to specify the import format.
最后,您可以在批量插入中使用FORMATFILE选项。这将使用bcp控制文件来指定导入格式。
#6
0
Looks to me there are two general avenues that can be taken: some alternate way to read the CSV in the SQL script or convert the CSV beforehand with any of the numerous ways you can do that (bcp, unix2dos, if it is a one-time king of a thing, you can probably even use your code editor to fix the file for you).
看起来对我来说一般有两种途径可以采取:一些替代方法读CSV SQL脚本或将CSV事先与任何的为数众多的方法可以这样做(bcp unix2dos,如果这是一个一次性的王,您甚至可以使用代码编辑器为您修复文件)。
But you will have to have an extra step!
但是你必须多走一步!
If this SQL is launched from a program, you might want to convert the line endings in that program. In that case and you decide to code the conversion yourself, here is what you need to watch out for: 1. The line ending might be \n 2. or \r\n 3. or even \r (Mac!) 4. good grief, it could be that some lines have \r\n and others \n, any combination is possible unless you control where the CSV came from
如果这个SQL是从一个程序启动的,您可能想要转换该程序中的行结束符。在这种情况下,您决定自己编写代码,下面是您需要注意的内容:1。行结尾可能是\ n2。或\ r \ n 3。甚至是\r (Mac!) 4。天哪,可能有些行有\r\n和其他\n,除非您控制CSV的来源,否则任何组合都是可能的
OK, OK. Possibility 4 is farfetched. It happens in email, but that is another story.
好吧,好吧。可能性4是牵强的。它发生在电子邮件中,但那是另一个故事。
#7
0
I would think "ROWTERMINATOR = '\n'" would work. I would suggest opening the file in a tool that shows "hidden characters" to make sure the line is being terminated like you think. I use notepad++ for things like this.
我认为“ROWTERMINATOR = '\n'”会有用。我建议在一个显示“隐藏字符”的工具中打开文件,以确保行像您想的那样被终止。我用notepad++来处理类似的事情。
#8
0
It comes down to this. Unix uses LF (ctrl-J), MS-DOS/Windows uses CR/LF (ctrl-M/Ctrl-J).
它归结于这个。Unix使用LF (ctrl-J), MS-DOS/Windows使用CR/LF (ctrl-M/ ctrl-J)。
When you use '\n' on Unix, it gets translated to a LF character. On MS-DOS/Windows it gets translated to CR/LF. When the your import runs on the Unix formatted file, it sees only a LF. Hence, its often easier to run the file through unix2dos first. But as you said in you original question, you don't want to do this (I'll assume there is a good reason why you can't).
当您在Unix上使用“\n”时,它会被转换成一个LF字符。在MS-DOS/Windows上,它被翻译成CR/LF。当您的导入在Unix格式化文件上运行时,它只看到一个LF。因此,首先通过unix2dos运行文件通常更容易。但正如你在最初的问题中说的,你不想这么做(我认为有一个很好的理由你不能这么做)。
Why can't you do:
为什么你不能做的:
(ROWTERMINATOR = CHAR(10))
Probably because when the SQL code is being parsed, it is not replacing the char(10) with the LF character (because it's already encased in single-quotes). Or perhaps its being interpreted as:
可能是因为在解析SQL代码时,它并没有将char(10)替换为LF字符(因为它已经包含在单引号中)。或者它被解释为:
(ROWTERMINATOR =
)
What happens when you echo out the contents of @bulk_cmd?
当您回显@bulk_cmd的内容时会发生什么?
#1
92
I felt compelled to contribute as I was having the same issue, and I need to read 2 UNIX files from SAP at least a couple of times a day. Therefore, instead of using unix2dos, I needed something with less manual intervention and more automatic via programming.
当我遇到同样的问题时,我觉得有必要做出贡献,我需要每天至少读几次SAP的2个UNIX文件。因此,我需要的不是使用unix2dos,而是更少的手工干预,更自动化的编程。
As noted, the Char(10) works within the sql string. I didn't want to use an sql string, and so I used ''''+Char(10)+'''', but for some reason, this didn't compile.
如前所述,Char(10)在sql字符串中工作。我不想使用sql字符串,所以我使用了" " " +Char(10)+ " ",但出于某种原因,这并没有编译。
What did work very slick was: with (ROWTERMINATOR = '0x0a')
工作非常流畅的是:with (ROWTERMINATOR = '0x0a')
Problem solved with Hex!
与十六进制问题解决了!
Hope this helps someone.
希望这可以帮助别人。
#2
13
Thanks to all who have answered but I found my preferred solution.
谢谢大家的回答,但我找到了我喜欢的解决方案。
When you tell SQL Server ROWTERMINATOR='\n' it interprets this as meaning the default row terminator under Windows which is actually "\r\n" (using C/C++ notation). If your row terminator is really just "\n" you will have to use the dynamic SQL shown below.
当您告诉SQL Server ROWTERMINATOR='\n'时,它将其解释为Windows下的默认行终止符,实际上是“\r\n”(使用C/C+表示法)。如果您的行结束符实际上只是“\n”,那么您必须使用下面所示的动态SQL。
DECLARE @bulk_cmd varchar(1000)
SET @bulk_cmd = 'BULK INSERT table_name
FROM ''C:\file.csv''
WITH (FIELDTERMINATOR = '','', ROWTERMINATOR = '''+CHAR(10)+''')'
EXEC (@bulk_cmd)
Why you can't say BULK INSERT ...(ROWTERMINATOR = CHAR(10)) is beyond me. It doesn't look like you can evaluate any expressions in the WITH section of the command.
为什么你不能说批量插入……我不懂行终结者= CHAR。看起来您无法在命令的WITH部分计算任何表达式。
What the above does is create a string of the command and execute that. Neatly sidestepping the need to create an additional file or go through extra steps.
上面所做的是创建命令的字符串并执行它。巧妙地避开了创建额外文件或执行额外步骤的需要。
#3
3
I confirm that the syntax
我确认语法
ROWTERMINATOR = '''+CHAR(10)+'''
works when used with an EXEC command.
使用EXEC命令时有效。
If you have multiple ROWTERMINATOR characters (e.g. a pipe and a unix linefeed) then the syntax for this is:
如果您有多个行终止符字符(例如管道和unix换行符),那么它的语法是:
ROWTERMINATOR = '''+CHAR(124)+''+CHAR(10)+'''
#4
2
It's a bit more complicated than that! When you tell SQL Server ROWTERMINATOR='\n' it interprets this as meaning the default row terminator under Windows which is actually "\r\n" (using C/C++ notation). If your row terminator is really just "\n" you will have to use the dynamic SQL shown above. I have just spent the best part of an hour figuring out why \n doesn't really mean \n when used with BULK INSERT!
这比那复杂一点!当您告诉SQL Server ROWTERMINATOR='\n'时,它将其解释为Windows下的默认行终止符,实际上是“\r\n”(使用C/C+表示法)。如果您的行结束符实际上只是“\n”,那么您必须使用上面所示的动态SQL。我刚刚花了一个小时的时间来弄明白为什么当用大容量插入时,\n并不是真正意义上的\n !
#5
1
One option would be to use bcp, and set up a control file with '\n'
as the line break character.
一种选择是使用bcp,并设置一个以'\n'作为换行字符的控制文件。
Although you've indicated that you would prefer not to, another option would be to use unix2dos to pre-process the file into one with '\r\n'
line breaks.
虽然您已经表示不希望这样做,但是另一个选择是使用unix2dos将文件预先处理成具有'\r\n'断行符的文件。
Finally, you can use the FORMATFILE
option on BULK INSERT
. This will use a bcp control file to specify the import format.
最后,您可以在批量插入中使用FORMATFILE选项。这将使用bcp控制文件来指定导入格式。
#6
0
Looks to me there are two general avenues that can be taken: some alternate way to read the CSV in the SQL script or convert the CSV beforehand with any of the numerous ways you can do that (bcp, unix2dos, if it is a one-time king of a thing, you can probably even use your code editor to fix the file for you).
看起来对我来说一般有两种途径可以采取:一些替代方法读CSV SQL脚本或将CSV事先与任何的为数众多的方法可以这样做(bcp unix2dos,如果这是一个一次性的王,您甚至可以使用代码编辑器为您修复文件)。
But you will have to have an extra step!
但是你必须多走一步!
If this SQL is launched from a program, you might want to convert the line endings in that program. In that case and you decide to code the conversion yourself, here is what you need to watch out for: 1. The line ending might be \n 2. or \r\n 3. or even \r (Mac!) 4. good grief, it could be that some lines have \r\n and others \n, any combination is possible unless you control where the CSV came from
如果这个SQL是从一个程序启动的,您可能想要转换该程序中的行结束符。在这种情况下,您决定自己编写代码,下面是您需要注意的内容:1。行结尾可能是\ n2。或\ r \ n 3。甚至是\r (Mac!) 4。天哪,可能有些行有\r\n和其他\n,除非您控制CSV的来源,否则任何组合都是可能的
OK, OK. Possibility 4 is farfetched. It happens in email, but that is another story.
好吧,好吧。可能性4是牵强的。它发生在电子邮件中,但那是另一个故事。
#7
0
I would think "ROWTERMINATOR = '\n'" would work. I would suggest opening the file in a tool that shows "hidden characters" to make sure the line is being terminated like you think. I use notepad++ for things like this.
我认为“ROWTERMINATOR = '\n'”会有用。我建议在一个显示“隐藏字符”的工具中打开文件,以确保行像您想的那样被终止。我用notepad++来处理类似的事情。
#8
0
It comes down to this. Unix uses LF (ctrl-J), MS-DOS/Windows uses CR/LF (ctrl-M/Ctrl-J).
它归结于这个。Unix使用LF (ctrl-J), MS-DOS/Windows使用CR/LF (ctrl-M/ ctrl-J)。
When you use '\n' on Unix, it gets translated to a LF character. On MS-DOS/Windows it gets translated to CR/LF. When the your import runs on the Unix formatted file, it sees only a LF. Hence, its often easier to run the file through unix2dos first. But as you said in you original question, you don't want to do this (I'll assume there is a good reason why you can't).
当您在Unix上使用“\n”时,它会被转换成一个LF字符。在MS-DOS/Windows上,它被翻译成CR/LF。当您的导入在Unix格式化文件上运行时,它只看到一个LF。因此,首先通过unix2dos运行文件通常更容易。但正如你在最初的问题中说的,你不想这么做(我认为有一个很好的理由你不能这么做)。
Why can't you do:
为什么你不能做的:
(ROWTERMINATOR = CHAR(10))
Probably because when the SQL code is being parsed, it is not replacing the char(10) with the LF character (because it's already encased in single-quotes). Or perhaps its being interpreted as:
可能是因为在解析SQL代码时,它并没有将char(10)替换为LF字符(因为它已经包含在单引号中)。或者它被解释为:
(ROWTERMINATOR =
)
What happens when you echo out the contents of @bulk_cmd?
当您回显@bulk_cmd的内容时会发生什么?