Amazon Redshift - CSV中的COPY - 行中的单个双引号 - CSV错误的引用格式无效

时间:2022-08-10 23:04:37

I'm loading a CSV file from S3 into Redshift. This CSV file is analytics data which contains the PageUrl (which may contain user search info inside a query string for example).

我正在从S3加载一个CSV文件到Redshift。此CSV文件是包含PageUrl的分析数据(例如,查询字符串中可能包含用户搜索信息)。

It chokes on rows where there is a single, double-quote character, for example if there is a page for a 14" toy then the PageUrl would contain:

它会在有单个双引号字符的行上窒息,例如,如果有一个14“玩具的页面,那么PageUrl将包含:

http://www.mywebsite.com/a-14"-toy/1234.html

http://www.mywebsite.com/a-14"-toy/1234.html

Redshift understandably can't handle this as it is expecting a closing double quote character.

Redshift无法理解,因为它期待一个收尾双引号字符。

The way I see it my options are:

我看待它的方式是我的选择:

  1. Pre-process the input and remove these characters
  2. 预处理输入并删除这些字符
  3. Configure the COPY command in Redshift to ignore these characters but still load the row
  4. 在Redshift中配置COPY命令以忽略这些字符但仍加载该行
  5. Set MAXERRORS to a high value and sweep up the errors using a separate process
  6. 将MAXERRORS设置为较高值并使用单独的过程清除错误

Option 2 would be ideal, but I can't find it!

选项2将是理想的,但我找不到它!

Any other suggestions if I'm just not looking hard enough?

如果我只是不够努力,还有其他建议吗?

Thanks

谢谢

Duncan

邓肯

2 个解决方案

#1


5  

Unfortunately, there is no way to fix this. You will need to pre-process the file before loading it into Amazon Redshift.

不幸的是,没有办法解决这个问题。在将文件加载到Amazon Redshift之前,您需要预先处理该文件。

The closest options you have are CSV [ QUOTE [AS] 'quote_character' ] to wrap fields in an alternative quote character, and ESCAPE if the quote character is preceded by a slash. Alas, both require the file to be in a particular format before loading.

您最接近的选项是CSV [QUOTE [AS]'quote_character']来包装替代引号字符中的字段,如果引号字符前面有斜杠,则使用ESCAPE。唉,在加载之前都要求文件采用特定格式。

See:

看到:

#2


3  

It's 2017 and I run into the same problem, happy to report there is now a way to get redshift to load csv files with the odd " in the data.

它是2017年,我遇到了同样的问题,很高兴地报告现在有一种方法可以使用红色加载csv文件,并在数据中使用奇数“。

The trick is to use the ESCAPE keyword, and also to NOT use the CSV keyword. I don't know why, but having the CSV and ESCAPE keywords together in a copy command resulted in failure with the error message "CSV is not compatible with ESCAPE;" However with no change to the loaded data I was able to successfully load once I removed the CSV keyword from the COPY command.

诀窍是使用ESCAPE关键字,也不使用CSV关键字。我不知道为什么,但在复制命令中将CSV和ESCAPE关键字放在一起导致失败,并显示错误消息“CSV与ESCAPE不兼容;”但是,如果没有更改加载的数据,我可以在从COPY命令中删除CSV关键字后成功加载。

You can also refer to this documentation for help: http://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-conversion.html#copy-escape

您也可以参考此文档获取帮助:http://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-conversion.html#copy-escape

#1


5  

Unfortunately, there is no way to fix this. You will need to pre-process the file before loading it into Amazon Redshift.

不幸的是,没有办法解决这个问题。在将文件加载到Amazon Redshift之前,您需要预先处理该文件。

The closest options you have are CSV [ QUOTE [AS] 'quote_character' ] to wrap fields in an alternative quote character, and ESCAPE if the quote character is preceded by a slash. Alas, both require the file to be in a particular format before loading.

您最接近的选项是CSV [QUOTE [AS]'quote_character']来包装替代引号字符中的字段,如果引号字符前面有斜杠,则使用ESCAPE。唉,在加载之前都要求文件采用特定格式。

See:

看到:

#2


3  

It's 2017 and I run into the same problem, happy to report there is now a way to get redshift to load csv files with the odd " in the data.

它是2017年,我遇到了同样的问题,很高兴地报告现在有一种方法可以使用红色加载csv文件,并在数据中使用奇数“。

The trick is to use the ESCAPE keyword, and also to NOT use the CSV keyword. I don't know why, but having the CSV and ESCAPE keywords together in a copy command resulted in failure with the error message "CSV is not compatible with ESCAPE;" However with no change to the loaded data I was able to successfully load once I removed the CSV keyword from the COPY command.

诀窍是使用ESCAPE关键字,也不使用CSV关键字。我不知道为什么,但在复制命令中将CSV和ESCAPE关键字放在一起导致失败,并显示错误消息“CSV与ESCAPE不兼容;”但是,如果没有更改加载的数据,我可以在从COPY命令中删除CSV关键字后成功加载。

You can also refer to this documentation for help: http://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-conversion.html#copy-escape

您也可以参考此文档获取帮助:http://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-conversion.html#copy-escape