I'm loading a CSV file from S3 into Redshift. This CSV file is analytics data which contains the PageUrl (which may contain user search info inside a query string for example).
我正在从S3加载一个CSV文件到Redshift。此CSV文件是包含PageUrl的分析数据(例如,查询字符串中可能包含用户搜索信息)。
It chokes on rows where there is a single, double-quote character, for example if there is a page for a 14" toy then the PageUrl would contain:
它会在有单个双引号字符的行上窒息,例如,如果有一个14“玩具的页面,那么PageUrl将包含:
http://www.mywebsite.com/a-14"-toy/1234.html
http://www.mywebsite.com/a-14"-toy/1234.html
Redshift understandably can't handle this as it is expecting a closing double quote character.
Redshift无法理解,因为它期待一个收尾双引号字符。
The way I see it my options are:
我看待它的方式是我的选择:
- Pre-process the input and remove these characters
- 预处理输入并删除这些字符
- Configure the COPY command in Redshift to ignore these characters but still load the row
- 在Redshift中配置COPY命令以忽略这些字符但仍加载该行
- Set MAXERRORS to a high value and sweep up the errors using a separate process
- 将MAXERRORS设置为较高值并使用单独的过程清除错误
Option 2 would be ideal, but I can't find it!
选项2将是理想的,但我找不到它!
Any other suggestions if I'm just not looking hard enough?
如果我只是不够努力,还有其他建议吗?
Thanks
谢谢
Duncan
邓肯
2 个解决方案
#1
5
Unfortunately, there is no way to fix this. You will need to pre-process the file before loading it into Amazon Redshift.
不幸的是,没有办法解决这个问题。在将文件加载到Amazon Redshift之前,您需要预先处理该文件。
The closest options you have are CSV [ QUOTE [AS] 'quote_character' ]
to wrap fields in an alternative quote character, and ESCAPE
if the quote character is preceded by a slash. Alas, both require the file to be in a particular format before loading.
您最接近的选项是CSV [QUOTE [AS]'quote_character']来包装替代引号字符中的字段,如果引号字符前面有斜杠,则使用ESCAPE。唉,在加载之前都要求文件采用特定格式。
See:
看到:
- Redshift COPY Data Conversion Parameters
- Redshift COPY数据转换参数
- Redshift COPY Data Format Parameters
- Redshift COPY数据格式参数
#2
3
It's 2017 and I run into the same problem, happy to report there is now a way to get redshift to load csv files with the odd " in the data.
它是2017年,我遇到了同样的问题,很高兴地报告现在有一种方法可以使用红色加载csv文件,并在数据中使用奇数“。
The trick is to use the ESCAPE keyword, and also to NOT use the CSV keyword. I don't know why, but having the CSV and ESCAPE keywords together in a copy command resulted in failure with the error message "CSV is not compatible with ESCAPE;" However with no change to the loaded data I was able to successfully load once I removed the CSV keyword from the COPY command.
诀窍是使用ESCAPE关键字,也不使用CSV关键字。我不知道为什么,但在复制命令中将CSV和ESCAPE关键字放在一起导致失败,并显示错误消息“CSV与ESCAPE不兼容;”但是,如果没有更改加载的数据,我可以在从COPY命令中删除CSV关键字后成功加载。
You can also refer to this documentation for help: http://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-conversion.html#copy-escape
您也可以参考此文档获取帮助:http://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-conversion.html#copy-escape
#1
5
Unfortunately, there is no way to fix this. You will need to pre-process the file before loading it into Amazon Redshift.
不幸的是,没有办法解决这个问题。在将文件加载到Amazon Redshift之前,您需要预先处理该文件。
The closest options you have are CSV [ QUOTE [AS] 'quote_character' ]
to wrap fields in an alternative quote character, and ESCAPE
if the quote character is preceded by a slash. Alas, both require the file to be in a particular format before loading.
您最接近的选项是CSV [QUOTE [AS]'quote_character']来包装替代引号字符中的字段,如果引号字符前面有斜杠,则使用ESCAPE。唉,在加载之前都要求文件采用特定格式。
See:
看到:
- Redshift COPY Data Conversion Parameters
- Redshift COPY数据转换参数
- Redshift COPY Data Format Parameters
- Redshift COPY数据格式参数
#2
3
It's 2017 and I run into the same problem, happy to report there is now a way to get redshift to load csv files with the odd " in the data.
它是2017年,我遇到了同样的问题,很高兴地报告现在有一种方法可以使用红色加载csv文件,并在数据中使用奇数“。
The trick is to use the ESCAPE keyword, and also to NOT use the CSV keyword. I don't know why, but having the CSV and ESCAPE keywords together in a copy command resulted in failure with the error message "CSV is not compatible with ESCAPE;" However with no change to the loaded data I was able to successfully load once I removed the CSV keyword from the COPY command.
诀窍是使用ESCAPE关键字,也不使用CSV关键字。我不知道为什么,但在复制命令中将CSV和ESCAPE关键字放在一起导致失败,并显示错误消息“CSV与ESCAPE不兼容;”但是,如果没有更改加载的数据,我可以在从COPY命令中删除CSV关键字后成功加载。
You can also refer to this documentation for help: http://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-conversion.html#copy-escape
您也可以参考此文档获取帮助:http://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-conversion.html#copy-escape