I would like to read online data to R using download.file()
as shown below.
我想用download.file()来读取在线数据,如下所示。
URL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
download.file(URL, destfile = "./data/data.csv", method="curl")
Someone suggested to me that I add the line setInternet2(TRUE)
, but it still doesn't work.
有人建议我添加setInternet2(TRUE),但它仍然不起作用。
The error I get is:
我得到的错误是:
Warning messages:
1: running command 'curl "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv" -o "./data/data.csv"' had status 127
2: In download.file(URL, destfile = "./data/data.csv", method = "curl", :
download had nonzero exit status
Appreciate your help.
感谢你的帮助。
9 个解决方案
#1
34
It might be easiest to try the RCurl package. Install the package and try the following:
可能最容易尝试RCurl包。安装包并尝试如下:
# install.packages("RCurl")
library(RCurl)
URL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x <- getURL(URL)
## Or
## x <- getURL(URL, ssl.verifypeer = FALSE)
out <- read.csv(textConnection(x))
head(out[1:6])
# RT SERIALNO DIVISION PUMA REGION ST
# 1 H 186 8 700 4 16
# 2 H 306 8 700 4 16
# 3 H 395 8 100 4 16
# 4 H 506 8 700 4 16
# 5 H 835 8 800 4 16
# 6 H 989 8 700 4 16
dim(out)
# [1] 6496 188
download.file("https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv",destfile="reviews.csv",method="libcurl")
#2
10
Here's an update as of Nov 2014. I find that setting method='curl'
did the trick for me (while method='auto'
, does not).
以下是截至2014年11月的最新情况。我发现设置方法='curl'为我做了这个(while方法='auto',不)。
For example:
例如:
# does not work
download.file(url='https://s3.amazonaws.com/tripdata/201307-citibike-tripdata.zip',
destfile='localfile.zip')
# does not work. this appears to be the default anyway
download.file(url='https://s3.amazonaws.com/tripdata/201307-citibike-tripdata.zip',
destfile='localfile.zip', method='auto')
# works!
download.file(url='https://s3.amazonaws.com/tripdata/201307-citibike-tripdata.zip',
destfile='localfile.zip', method='curl')
#3
4
I've succeed with the following code:
我获得了以下代码:
url = "http://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x = read.csv(file=url)
Note that I've changed the protocol from https to http, since the first one doesn't seem to be supported in R.
注意,我已经将协议从https更改为http,因为第一个协议似乎没有在R中得到支持。
#4
3
If using RCurl you get an SSL error on the GetURL() function then set these options before GetURL(). This will set the CurlSSL settings globally.
如果使用RCurl,在GetURL()函数上得到一个SSL错误,然后在GetURL()之前设置这些选项。这将在全局设置CurlSSL设置。
The extended code:
扩展代码:
install.packages("RCurl")
library(RCurl)
options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))
URL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x <- getURL(URL)
Worked for me on Windows 7 64-bit using R3.1.0!
使用R3.1.0在Windows 7上为我工作!
#5
2
Had exactly the same problem as UseR (original question), I'm also using windows 7. I tried all proposed solutions and they didn't work.
与用户的问题完全相同(原始问题),我也使用windows 7。我尝试了所有的解决方案,但都不起作用。
I resolved the problem doing as follows:
我解决了以下问题:
-
Using RStudio instead of R console.
使用RStudio而不是R控制台。
-
Actualising the version of R (from 3.1.0 to 3.1.1) so that the library RCurl runs OK on it. (I'm using now R3.1.1 32bit although my system is 64bit).
实现R版本(从3.1.0到3.1.1),使库的RCurl可以运行。(我现在使用的是R3.1.1 32位,虽然我的系统是64位)。
-
I typed the URL adress as https (secure connection) and with "/" instead of backslashes "\".
我键入URL地址作为https(安全连接)和“/”而不是反斜杠“\”。
-
Setting method = "auto".
设置方法=“汽车”。
It works for me now. You should see the message:
现在它对我起作用了。你应该看到这样的信息:
Content type 'text/csv; charset=utf-8' length 9294 bytes opened URL downloaded 9294 by
内容类型的文本/ csv;charset=utf-8长度9294字节打开URL下载9294。
#6
1
127 means command not found
命令未被发现。
In your case, curl command was not found. Therefore it means, curl was not found.
在您的情况中,curl命令没有被找到。所以它的意思是,旋度没有被发现。
You need to install/reinstall CURL. That's all. Get latest version for your OS from http://curl.haxx.se/download.html
您需要安装/重新安装CURL。这是所有。从http://curl.haxx.se/download.html获得您的操作系统的最新版本。
Close RStudio before installation.
关闭RStudio之前安装。
#7
1
Offering the curl package as an alternative that I found to be reliable when extracting large files from an online database. In a recent project, I had to download 120 files from an online database and found it to half the transfer times and to be much more reliable than download.file.
提供curl包作为一种选择,在从在线数据库中提取大型文件时,我发现它是可靠的。在最近的一个项目中,我不得不从一个在线数据库下载了120个文件,并发现它有一半的传输时间,而且比下载文件要可靠得多。
#install.packages("curl")
library(curl)
#install.packages("RCurl")
library(RCurl)
ptm <- proc.time()
URL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x <- getURL(URL)
proc.time() - ptm
ptm
ptm1 <- proc.time()
curl_download(url =URL ,destfile="TEST.CSV",quiet=FALSE, mode="wb")
proc.time() - ptm1
ptm1
ptm2 <- proc.time()
y = download.file(URL, destfile = "./data/data.csv", method="curl")
proc.time() - ptm2
ptm2
In this case, rough timing on your URL showed no consistent difference in transfer times. In my application, using curl_download in a script to select and download 120 files from a website decreased my transfer times from 2000 seconds per file to 1000 seconds and increased the reliability from 50% to 2 failures in 120 files. The script is posted in my answer to a question I asked earlier, see .
在这种情况下,URL的粗略时间显示在传输时间上没有一致的差异。在我的应用程序中,使用curl_download脚本从一个网站选择和下载120个文件,将我的传输时间从2000秒降低到1000秒,并将可靠性从50%提高到120个文件中的2个故障。这个剧本是在我之前问过的一个问题的答案里写出来的。
#8
0
You can set global options and try-
您可以设置全局选项并尝试。
options('download.file.method'='curl')
download.file(URL, destfile = "./data/data.csv", method="auto")
For issue refer to link- https://stat.ethz.ch/pipermail/bioconductor/2011-February/037723.html
有关问题,请参考link- https://stat.ethz.ch/pipermail/bioconductor/2011-February/037723.html。
#9
0
Try following with heavy files
试试下面的重文件。
library(data.table)
URL <- "http://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x <- fread(URL)
#1
34
It might be easiest to try the RCurl package. Install the package and try the following:
可能最容易尝试RCurl包。安装包并尝试如下:
# install.packages("RCurl")
library(RCurl)
URL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x <- getURL(URL)
## Or
## x <- getURL(URL, ssl.verifypeer = FALSE)
out <- read.csv(textConnection(x))
head(out[1:6])
# RT SERIALNO DIVISION PUMA REGION ST
# 1 H 186 8 700 4 16
# 2 H 306 8 700 4 16
# 3 H 395 8 100 4 16
# 4 H 506 8 700 4 16
# 5 H 835 8 800 4 16
# 6 H 989 8 700 4 16
dim(out)
# [1] 6496 188
download.file("https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv",destfile="reviews.csv",method="libcurl")
#2
10
Here's an update as of Nov 2014. I find that setting method='curl'
did the trick for me (while method='auto'
, does not).
以下是截至2014年11月的最新情况。我发现设置方法='curl'为我做了这个(while方法='auto',不)。
For example:
例如:
# does not work
download.file(url='https://s3.amazonaws.com/tripdata/201307-citibike-tripdata.zip',
destfile='localfile.zip')
# does not work. this appears to be the default anyway
download.file(url='https://s3.amazonaws.com/tripdata/201307-citibike-tripdata.zip',
destfile='localfile.zip', method='auto')
# works!
download.file(url='https://s3.amazonaws.com/tripdata/201307-citibike-tripdata.zip',
destfile='localfile.zip', method='curl')
#3
4
I've succeed with the following code:
我获得了以下代码:
url = "http://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x = read.csv(file=url)
Note that I've changed the protocol from https to http, since the first one doesn't seem to be supported in R.
注意,我已经将协议从https更改为http,因为第一个协议似乎没有在R中得到支持。
#4
3
If using RCurl you get an SSL error on the GetURL() function then set these options before GetURL(). This will set the CurlSSL settings globally.
如果使用RCurl,在GetURL()函数上得到一个SSL错误,然后在GetURL()之前设置这些选项。这将在全局设置CurlSSL设置。
The extended code:
扩展代码:
install.packages("RCurl")
library(RCurl)
options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))
URL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x <- getURL(URL)
Worked for me on Windows 7 64-bit using R3.1.0!
使用R3.1.0在Windows 7上为我工作!
#5
2
Had exactly the same problem as UseR (original question), I'm also using windows 7. I tried all proposed solutions and they didn't work.
与用户的问题完全相同(原始问题),我也使用windows 7。我尝试了所有的解决方案,但都不起作用。
I resolved the problem doing as follows:
我解决了以下问题:
-
Using RStudio instead of R console.
使用RStudio而不是R控制台。
-
Actualising the version of R (from 3.1.0 to 3.1.1) so that the library RCurl runs OK on it. (I'm using now R3.1.1 32bit although my system is 64bit).
实现R版本(从3.1.0到3.1.1),使库的RCurl可以运行。(我现在使用的是R3.1.1 32位,虽然我的系统是64位)。
-
I typed the URL adress as https (secure connection) and with "/" instead of backslashes "\".
我键入URL地址作为https(安全连接)和“/”而不是反斜杠“\”。
-
Setting method = "auto".
设置方法=“汽车”。
It works for me now. You should see the message:
现在它对我起作用了。你应该看到这样的信息:
Content type 'text/csv; charset=utf-8' length 9294 bytes opened URL downloaded 9294 by
内容类型的文本/ csv;charset=utf-8长度9294字节打开URL下载9294。
#6
1
127 means command not found
命令未被发现。
In your case, curl command was not found. Therefore it means, curl was not found.
在您的情况中,curl命令没有被找到。所以它的意思是,旋度没有被发现。
You need to install/reinstall CURL. That's all. Get latest version for your OS from http://curl.haxx.se/download.html
您需要安装/重新安装CURL。这是所有。从http://curl.haxx.se/download.html获得您的操作系统的最新版本。
Close RStudio before installation.
关闭RStudio之前安装。
#7
1
Offering the curl package as an alternative that I found to be reliable when extracting large files from an online database. In a recent project, I had to download 120 files from an online database and found it to half the transfer times and to be much more reliable than download.file.
提供curl包作为一种选择,在从在线数据库中提取大型文件时,我发现它是可靠的。在最近的一个项目中,我不得不从一个在线数据库下载了120个文件,并发现它有一半的传输时间,而且比下载文件要可靠得多。
#install.packages("curl")
library(curl)
#install.packages("RCurl")
library(RCurl)
ptm <- proc.time()
URL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x <- getURL(URL)
proc.time() - ptm
ptm
ptm1 <- proc.time()
curl_download(url =URL ,destfile="TEST.CSV",quiet=FALSE, mode="wb")
proc.time() - ptm1
ptm1
ptm2 <- proc.time()
y = download.file(URL, destfile = "./data/data.csv", method="curl")
proc.time() - ptm2
ptm2
In this case, rough timing on your URL showed no consistent difference in transfer times. In my application, using curl_download in a script to select and download 120 files from a website decreased my transfer times from 2000 seconds per file to 1000 seconds and increased the reliability from 50% to 2 failures in 120 files. The script is posted in my answer to a question I asked earlier, see .
在这种情况下,URL的粗略时间显示在传输时间上没有一致的差异。在我的应用程序中,使用curl_download脚本从一个网站选择和下载120个文件,将我的传输时间从2000秒降低到1000秒,并将可靠性从50%提高到120个文件中的2个故障。这个剧本是在我之前问过的一个问题的答案里写出来的。
#8
0
You can set global options and try-
您可以设置全局选项并尝试。
options('download.file.method'='curl')
download.file(URL, destfile = "./data/data.csv", method="auto")
For issue refer to link- https://stat.ethz.ch/pipermail/bioconductor/2011-February/037723.html
有关问题,请参考link- https://stat.ethz.ch/pipermail/bioconductor/2011-February/037723.html。
#9
0
Try following with heavy files
试试下面的重文件。
library(data.table)
URL <- "http://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x <- fread(URL)