Consider the following comma separated file. For simplicity let it contain one line:
请考虑以下逗号分隔文件。为简单起见,它包含一行:
'I am quoted','so, can use comma inside - it is not separator here','but can\'t use escaped quote :=('
If you try to read it with the command
如果您尝试使用该命令读取它
table <- read.csv(filename, header=FALSE)
the line will be separated to 4 parts, because line contains 3 commas. In fact I want to read only 3 parts, one of which contains comma itself. There quote flag comes for help. I tried:
该行将分为4个部分,因为行包含3个逗号。实际上我只想读3个部分,其中一个部分包含逗号本身。有报价标志来寻求帮助。我试过了:
table <- read.csv(filename, header=FALSE, quote="'")
but that falls with error "incomplete final line found by readTableHeader on table"
. That happens because of odd (seven) number of quotes.
但是由于错误“由表上的readTableHeader找到的不完整的最终行”而下降。这是因为奇数(七)引号。
read.table()
as well as scan()
have parameter allowEscapes
, but setting it to TRUE
doesn't help. It is ok, cause from help(scan)
you can read:
read.table()以及scan()都有参数allowEscapes,但将其设置为TRUE没有帮助。没关系,因为从帮助(扫描)你可以阅读:
The escapes which are interpreted are the control characters ‘\a, \b, \f, \n, \r, \t, \v’, ... ... Any other escaped character is treated as itself, including backslash
解释的转义是控制字符'\ a,\ b,\ f,\ n,\ r,\ t,\ v',...... ......任何其他转义字符都被视为自身,包括反斜杠
Please suggest how would you read such quoted csv-files, containing escaped \'
quotes.
请建议您如何阅读这些带引号的csv文件,其中包含转义的引号。
1 个解决方案
#1
5
One possibility is to use readLines()
to get everything read in as is, and then proceed by replacing the quote character by something else, eg :
一种可能性是使用readLines()来按原样读取所有内容,然后通过用其他内容替换引号字符来继续,例如:
tt <- readLines("F:/temp/test.txt")
tt <- gsub("([^\\]|^)'","\\1\"",tt) # replace ' by "
tt <- gsub("\\\\","\\",tt) # get rid of the double escape due to readLines
This allows you to read the vector tt in using a textConnection
这允许您使用textConnection读取向量tt
zz <- textConnection(tt)
read.csv(zz,header=F,quote="\"") # give text input
close(zz)
Not the most beautiful solution, but it works (provided you don't have a " character somewhere in the file off course...)
不是最美丽的解决方案,但它的工作原理(前提是你没有“文件中的某个字符”...)
#1
5
One possibility is to use readLines()
to get everything read in as is, and then proceed by replacing the quote character by something else, eg :
一种可能性是使用readLines()来按原样读取所有内容,然后通过用其他内容替换引号字符来继续,例如:
tt <- readLines("F:/temp/test.txt")
tt <- gsub("([^\\]|^)'","\\1\"",tt) # replace ' by "
tt <- gsub("\\\\","\\",tt) # get rid of the double escape due to readLines
This allows you to read the vector tt in using a textConnection
这允许您使用textConnection读取向量tt
zz <- textConnection(tt)
read.csv(zz,header=F,quote="\"") # give text input
close(zz)
Not the most beautiful solution, but it works (provided you don't have a " character somewhere in the file off course...)
不是最美丽的解决方案,但它的工作原理(前提是你没有“文件中的某个字符”...)