1.read.table() 返回的是一个数据框
如:test.txt中数据如下:
name age
A 12
B 15
Ben 18
Peter 20
如:dt<-read.table("test.txt"); 可以用dt$name,或是dt["name"]来取columnname为namez的资料
read.table(file, header = FALSE, sep = "", quote = "\"'", dec = ".", row.names, col.names, as.is = !stringsAsFactors, na.strings = "NA", colClasses = NA, nrows = -1, skip = 0, check.names = TRUE, fill = !blank.lines.skip, strip.white = FALSE, blank.lines.skip = TRUE, comment.char = "#", allowEscapes = FALSE, flush = FALSE, stringsAsFactors = default.stringsAsFactors(), fileEncoding = "", encoding = "unknown", text)
read.table还有几种变体,以下几种返回的是list类型的数据
read.csv(file, header = TRUE, sep = ",", quote="\"", dec=".", fill = TRUE, comment.char="", ...)
read.csv2(file, header = TRUE, sep = ";", quote="\"", dec=",", fill = TRUE, comment.char="", ...)
read.delim(file, header = TRUE, sep = "\t", quote="\"", dec=".", fill = TRUE, comment.char="", ...)
read.delim2(file, header = TRUE, sep = "\t", quote="\"", dec=",", fill = TRUE, comment.char="", ...)
前两个一般用于读取用逗号分割的数据;后两个则针对使用其它分隔符分割的数据(它们不使用行号)
Arguments
file |
the name of the file which the data are to be read from. Each row of the table appears as one line of the file. If it does not contain anabsolute path, the file name isrelative to the current working directory, Alternatively,
|
header |
a logical value indicating whether the file contains the names of the variables as its first line. If missing, the value is determined from the file format: |
sep |
the field separator character. Values on each line of the file are separated by this character. If |
quote |
the set of quoting characters. To disable quoting altogether, use |
dec |
the character used in the file for decimal points. |
row.names |
a vector of row names. This can be a vector giving the actual row names, or a single number giving the column of the table which contains the row names, or character string giving the name of the table column containing the row names. If there is a header and the first row contains one fewer field than the number of columns, the first column in the input is used for the row names. Otherwise if Using |
col.names |
a vector of optional names for the variables. The default is to use |
as.is |
the default behavior of Note: to suppress all conversions including those of numeric columns, set Note that |
na.strings |
a character vector of strings which are to be interpreted as |
colClasses |
character. A vector of classes to be assumed for the columns. Recycled as necessary, or if the character vector is named, unspecified values are taken to be Possible values are Note that |
nrows |
integer: the maximum number of rows to read in. Negative and other invalid values are ignored. |
skip |
integer: the number of lines of the data file to skip before beginning to read data. |
check.names |
logical. If |
fill |
logical. If |
strip.white |
logical. Used only when |
blank.lines.skip |
logical: if |
comment.char |
character: a character vector of length one containing a single character or an empty string. Use |
allowEscapes |
logical. Should C-style escapes such as \n be processed or read verbatim (the default)? Note that if not within quotes these could be interpreted as a delimiter (but not as a comment character). For more details see |
flush |
logical: if |
stringsAsFactors |
logical: should character vectors be converted to factors? Note that this is overridden by |
fileEncoding |
character string: if non-empty declares the encoding used on a file (not a connection) so the character data can be re-encoded. See the ‘Encoding’ section of the help for |
encoding |
encoding to be assumed for input strings. It is used to mark character strings as known to be in Latin-1 or UTF-8 (see |
text |
character string: if |
... |
Further arguments to be passed to |
函数scan()比read.table()要更加灵活,区别是:scan()可以指定变量的类型
scan()可以用来创建不同的对象:向量,矩阵,数据框,列表等,在缺省情况下(即what被省略),scan()将创建一个数值型向量,如果读取的数据类型与缺省类型或指定类型不符,则将返回一个错误信息。
scan(file = "", what = double(), nmax = -1, n = -1, sep = "", quote = if(identical(sep, "\n")) "" else "'\"", dec = ".", skip = 0, nlines = 0, na.strings = "NA", flush = FALSE, fill = FALSE, strip.white = FALSE, quiet = FALSE, blank.lines.skip = TRUE, multi.line = TRUE, comment.char = "", allowEscapes = FALSE, fileEncoding = "", encoding = "unknown", text)
mydata<-scan("scan.txt",what=list("",0,0));mydata
# 读取了文件scan.txt中三个变量,第一个是字符型,后两个是数字型
其中what参数是一个名义列表结构,用来确定要读取的三个向量的模式,在名义列表中,我们可以直接命名对象,如:
mydata<-scan("data.txt",what=list(Sex="",Weight=0,Height=0))
>mydata
$ Sex
[1] "M" "M" "F" " F"
$Weight
[1] 65 70 50 58
$Height
[1] 168 172 156 163
11.读取Excel文件的数据
library(RODBC)
z<-odbcConnectExcel("rexceltest.xls")
dd<-sqlFetch(z,"Sheet1")
close(z)
12.读取数据库的数据
>library(RODBC)
>ch<-odbcConnect("StocksDSN",uid="myuse",pwd="mypassword")
>stocks<-sqlQuery(ch,"select * from quotes")
>odbcClose(ch)