I'm having problem reading text file into R. The text file has 8 columns and a header which looks exactly like this:
我有一个问题,读取文本文件到r。这个文本文件有8个列和一个标题,看起来是这样的:
ID 1990 1991 1992 1993 1994 1995 1996
A 36.88 45.48 52.46 111.31 138.45 121.09 122.62
B 19.11 27.97 37.14 47.68 60.78 35.84 38.64
C 56.21 74.94 92.3 118.62 138.13 104.65 113.98
D 30.48 51.54 61.57 99.87 80.9 84.97 99.34
When I do the following, I get the error
当我做下面的操作时,我得到了错误。
> extra<- read.table("extrab.txt", header=T, sep="\t")
Error in make.names(col.names, unique = TRUE) :
invalid multibyte string at '<ff><fe>I'
So I tried adding fileEnconding
所以我尝试添加fileEnconding。
> extra<- read.table("extrab.txt", header=T, sep="\t", fileEncoding="UCS-2LE")
This worked, but I ended up with a dataframe with one variable where ID to 1996 was treated as one column. Would there be a way to solve this?
这是有效的,但我最后得到了一个dataframe,其中一个变量ID到1996被视为一个列。有办法解决这个问题吗?
I'm adding few more lines on this problem, because I found a different error when I tried to import the file through R
我在这个问题上增加了更多的行,因为当我试图通过R导入文件时,我发现了一个不同的错误。
2 个解决方案
#1
2
As per this SO question, the error you're getting seems to be related to file encoding.
根据这个问题,你得到的错误似乎与文件编码有关。
Option 1:
You likely just need to figure out the right file encoding to use.
您可能只需要找到正确的文件编码即可使用。
Example:
例子:
extra<- read.table("extrab.txt", header=T, sep="\t", fileEncoding="latin1")
Option 2:
You can try opening the file in Notepad/whatever text editor and then "save as" using a a common format like ANSI, Unicode or UTF-8.
您可以尝试在记事本/任何文本编辑器中打开文件,然后“保存为”使用一种通用格式,如ANSI、Unicode或UTF-8。
In Windows Notepad, notice there's an "Encoding" dropdown when you SaveAs. ANSI should work fine.
在Windows记事本中,当你保存时,注意有一个“编码”下拉。ANSI可正常工作。
#2
1
Now that you aren't getting the file encoding problem, it might just be that your separator is actually not a tab. Try:
现在您没有得到文件编码问题,可能只是您的分隔符实际上不是一个选项卡。试一试:
extra<- read.table("extrab.txt", header=T, fileEncoding="UCS-2LE")
This will separate on any whitespace
这将在任何空格中分离。
#1
2
As per this SO question, the error you're getting seems to be related to file encoding.
根据这个问题,你得到的错误似乎与文件编码有关。
Option 1:
You likely just need to figure out the right file encoding to use.
您可能只需要找到正确的文件编码即可使用。
Example:
例子:
extra<- read.table("extrab.txt", header=T, sep="\t", fileEncoding="latin1")
Option 2:
You can try opening the file in Notepad/whatever text editor and then "save as" using a a common format like ANSI, Unicode or UTF-8.
您可以尝试在记事本/任何文本编辑器中打开文件,然后“保存为”使用一种通用格式,如ANSI、Unicode或UTF-8。
In Windows Notepad, notice there's an "Encoding" dropdown when you SaveAs. ANSI should work fine.
在Windows记事本中,当你保存时,注意有一个“编码”下拉。ANSI可正常工作。
#2
1
Now that you aren't getting the file encoding problem, it might just be that your separator is actually not a tab. Try:
现在您没有得到文件编码问题,可能只是您的分隔符实际上不是一个选项卡。试一试:
extra<- read.table("extrab.txt", header=T, fileEncoding="UCS-2LE")
This will separate on any whitespace
这将在任何空格中分离。