如何在R中下载gzip压缩文件而不将其保存到计算机

时间:2021-01-12 14:57:57

I am a new user of R.

我是R.的新用户

I have some txt.gz files on the web of approximate size 9x500000.
I'm trying to uncompress a file and read it straight to R with read.table().
I have used this code (url censored):

我在网上有一些大小为9x500000的txt.gz文件。我正在尝试解压缩文件并使用read.table()直接读取到R。我用过这段代码(url censored):

LoadData <- function(){

con <- gzcon(url("http://"))           
raw <- textConnection(readLines(con, n = 25000))
close(con)
dat <- read.table(raw,skip = 2, na.strings = "99.9")
close(raw)

return(dat)

}  

The problem is that if I read more lines with readLines, the
program will take much more time to do what it should.

问题是如果我用readLines读取更多行,程序将花费更多的时间来做它应该做的事情。

How can I do this is reasonable time?

我怎么能这样做是合理的时间?

2 个解决方案

#1


0  

Don't do this.

不要这样做。

Each time you want to access the file, you'll have to re-download it, which is both time consuming for you and costly for the file hoster.

每次要访问该文件时,都必须重新下载该文件,这对您来说既费时又费力,而且文件托管服务器成本高昂。

It is better practise to download the file (see download.file) and then read in a local copy in a separate step.

最好下载文件(请参阅download.file),然后在单独的步骤中读取本地副本。

You can decompress the file with untar(..., compressed = "gzip").

您可以使用untar(...,compressed =“gzip”)解压缩文件。

#2


1  

You can make a temporary file like this:

你可以制作一个这样的临时文件:

tmpfile <- tempfile(tmpdir=getwd()) 
file.create(tmpfile)
download.file(url,tmpfile)
#do your stuff
file.remove(tmpfile)  #delete the tmpfile

#1


0  

Don't do this.

不要这样做。

Each time you want to access the file, you'll have to re-download it, which is both time consuming for you and costly for the file hoster.

每次要访问该文件时,都必须重新下载该文件,这对您来说既费时又费力,而且文件托管服务器成本高昂。

It is better practise to download the file (see download.file) and then read in a local copy in a separate step.

最好下载文件(请参阅download.file),然后在单独的步骤中读取本地副本。

You can decompress the file with untar(..., compressed = "gzip").

您可以使用untar(...,compressed =“gzip”)解压缩文件。

#2


1  

You can make a temporary file like this:

你可以制作一个这样的临时文件:

tmpfile <- tempfile(tmpdir=getwd()) 
file.create(tmpfile)
download.file(url,tmpfile)
#do your stuff
file.remove(tmpfile)  #delete the tmpfile