I'm trying to read in a .csv file in R from the IRS and it doesn't appear to be formatted in any weird way, and I'm just using the read.table function as I've used a million times before. I can't find any solutions already available to this problem that seem to help me. Here's the code I'm using:
我试着从IRS读取。csv文件,它看起来没有任何奇怪的格式,我只是使用读取。表函数,我已经用了一百万次了。我找不到任何解决这个问题的办法,似乎对我有帮助。下面是我使用的代码:
data_0910<-read.table("/Users/blahblahblah/countyinflow0910.csv",header=T,stringsAsFactors=FALSE,colClasses="character")
Error in read.table("/Users/blahblahblah/countyinflow0910.csv", :
more columns than column names
Why is it doing this? If it helps, the .csv files can be found at:
为什么会这样?如果有帮助,.csv文件可在:
http://www.irs.gov/uac/SOI-Tax-Stats-County-to-County-Migration-Data-Files
http://www.irs.gov/uac/SOI-Tax-Stats-County-to-County-Migration-Data-Files
(The ones I need are under the county to county migration .csv section - either inflow or outflow)
(我需要的是县到县的迁移。csv部分-流入或流出)
2 个解决方案
#1
16
It uses commas as separators. So you can either set sep=","
or just use read.csv
:
它使用逗号作为分隔符。所以你可以设置sep=",或者使用read.csv:
x <- read.csv(file="http://www.irs.gov/file_source/pub/irs-soi/countyinflow1011.csv")
dim(x)
## [1] 113593 9
The error is caused by spaces in some of the values, and unmatched quotes. There are no spaces in the header, so read.table
thinks that there is one column. Then it thinks it sees multiple columns in some of the rows. For example, the first two lines (header and first row):
错误是由一些值中的空格和不匹配的引号引起的。标题中没有空格,请阅读。表认为有一列。然后它认为它在某些行中看到了多个列。例如,前两行(标题和第一行):
State_Code_Dest,County_Code_Dest,State_Code_Origin,County_Code_Origin,State_Abbrv,County_Name,Return_Num,Exmpt_Num,Aggr_AGI
00,000,96,000,US,Total Mig - US & For,6973489,12948316,303495582
And unmatched quotes, for example on line 1336 (row 1335) which will confuse read.table
with the default quote
argument (but not read.csv
):
和不匹配的引号,例如在第1336行(第1335行),这将使读取变得混乱。包含默认引用参数的表(但不包含read.csv):
01,089,24,033,MD,Prince George's County,13,30,1040
#2
3
For the Germans:
德国人:
you have to change your decimal commas into a Full stop in your csv-file (in Excel:File -> Options -> Advanced -> "Decimal seperator") , then the error is solved.
您必须将十进制逗号在您的csv文件(在Excel中:File ->选项-> Advanced ->“decimal seperator”)中更改为一个完整的stop,然后就可以解决这个错误。
#1
16
It uses commas as separators. So you can either set sep=","
or just use read.csv
:
它使用逗号作为分隔符。所以你可以设置sep=",或者使用read.csv:
x <- read.csv(file="http://www.irs.gov/file_source/pub/irs-soi/countyinflow1011.csv")
dim(x)
## [1] 113593 9
The error is caused by spaces in some of the values, and unmatched quotes. There are no spaces in the header, so read.table
thinks that there is one column. Then it thinks it sees multiple columns in some of the rows. For example, the first two lines (header and first row):
错误是由一些值中的空格和不匹配的引号引起的。标题中没有空格,请阅读。表认为有一列。然后它认为它在某些行中看到了多个列。例如,前两行(标题和第一行):
State_Code_Dest,County_Code_Dest,State_Code_Origin,County_Code_Origin,State_Abbrv,County_Name,Return_Num,Exmpt_Num,Aggr_AGI
00,000,96,000,US,Total Mig - US & For,6973489,12948316,303495582
And unmatched quotes, for example on line 1336 (row 1335) which will confuse read.table
with the default quote
argument (but not read.csv
):
和不匹配的引号,例如在第1336行(第1335行),这将使读取变得混乱。包含默认引用参数的表(但不包含read.csv):
01,089,24,033,MD,Prince George's County,13,30,1040
#2
3
For the Germans:
德国人:
you have to change your decimal commas into a Full stop in your csv-file (in Excel:File -> Options -> Advanced -> "Decimal seperator") , then the error is solved.
您必须将十进制逗号在您的csv文件(在Excel中:File ->选项-> Advanced ->“decimal seperator”)中更改为一个完整的stop,然后就可以解决这个错误。