阅读。与“右对齐”数据表

时间:2021-10-11 12:37:03

I have a text file to read in R, but the file does not seem to be tab-delimited. The only structure of the file is that columns always finish at some point (i.e. columns are right aligned).

我有一个要在R中读取的文本文件,但是这个文件似乎不是用制表符分隔的。文件的唯一结构是列总是在某个点结束(即列是正确对齐的)。

So, first, is there a name for this type of data structure? Then, how can read it in R?

那么,首先,这种类型的数据结构有名字吗?那么,怎么用R来表示呢?

    2.37      2.03                          2.38
   5,397     5,082                         5,609
    13.0      21.6          15.2            15.2
   128.0     103.1         134.2           133.4

Just using read.table() doesn't work, the missing value won't be put at the right place...

仅仅使用read.table()不起作用,丢失的值就不会放在正确的位置……

# download data:
tmp <- tempfile()
f <- download.file("http://usda.mannlib.cornell.edu/usda/waob/wasde//1990s/1995/wasde-01-12-1995.txt", tmp)
D <- file(tmp)
data_enc <- readLines(D, warn=FALSE)
close(D)
dat <- sapply(strsplit(data_enc[232:236], ":"), function(x) x[2])
writeLines(dat, tmp)

## try to read data:
read.table(tmp, fill = TRUE, sep ="", header=FALSE)

Gives:

给:

      V1    V2    V3    V4
 1  2.37  2.03  2.38    NA
 2 5,397 5,082 5,609    NA
 3  13.0  21.6  15.2  15.2

1 个解决方案

#1


2  

Maybe try using read.fwf to read a table of fixed width formatted data:

也许试着用阅读。fwf读取固定宽度格式化数据表:

widths <- gregexpr("\\.\\d", readLines(tmp)[5])[[1]]+1L # line 5 looks complete
widths <- c(widths[1], diff(widths)) # posis after the decimal points as widths
read.fwf(tmp, widths = widths)
#         V1         V2    V3               V4
# 1     2.37       2.03    NA             2.38
# 2    5,397      5,082    NA            5,609
# 3     13.0       21.6  15.2             15.2
# 4    128.0      103.1 134.2            133.4
# 5    146.4      130.9 156.5            155.7

#1


2  

Maybe try using read.fwf to read a table of fixed width formatted data:

也许试着用阅读。fwf读取固定宽度格式化数据表:

widths <- gregexpr("\\.\\d", readLines(tmp)[5])[[1]]+1L # line 5 looks complete
widths <- c(widths[1], diff(widths)) # posis after the decimal points as widths
read.fwf(tmp, widths = widths)
#         V1         V2    V3               V4
# 1     2.37       2.03    NA             2.38
# 2    5,397      5,082    NA            5,609
# 3     13.0       21.6  15.2             15.2
# 4    128.0      103.1 134.2            133.4
# 5    146.4      130.9 156.5            155.7