从密码保护站点读取信息

时间:2022-11-11 22:49:09

I have been using readLines() to scrape information from a website in an R tutorial. I now wish to extract data from my own website (specifically the awstats data) however the domain is password protected.

我一直在使用readLines()从一个网站上抓取信息,在一个R教程。我现在希望从我自己的网站(特别是awstats数据)中提取数据,但是域是受密码保护的。

Is there a way that I can pass the url for the specific awstats data I require with a username and password.

是否有一种方法可以通过用户名和密码传递所需的特定awstats数据的url。

the format of the url is:

url的格式为:

http://domain.name:port/awstats.pl?month=02&year=2011&config=domain.name&lang=en&framename=mainright&output=alldomains

http://domain.name awstats.pl ?月= 02年= 2011配置= domain.name&lang = en&framename = mainright&output = alldomains

Thanks.

谢谢。

4 个解决方案

#1


6  

If it is indeed a http basic access authentication, the documentation on connections provides some help:

如果它确实是一个http基本访问身份验证,那么关于连接的文档提供了一些帮助:

URLs

url

Note that https:// connections are only supported if --internet2 or setInternet2(TRUE) was used (to make use of Internet Explorer internals), and then only if the certificate is considered to be valid. With that option only, the http://user:pass@site notation for sites requiring authentication is also accepted.

注意,只有在使用—internet2或setInternet2(TRUE)(用于使用Internet Explorer内部)时才支持https://连接,并且只有在证书被认为是有效的情况下才支持。只有该选项,还可以接受需要验证的站点的http://user:pass@site表示法。

So your URL string should look like this:

你的URL字符串应该是这样的:

http://username:password@domain.name:port/awstats.pl?month=02&year=2011&config=domain.name&lang=en&framename=mainright&output=alldomains

http://username:password@domain.name:港口/ awstats.pl ?月= 02年= 2011配置= domain.name&lang = en&framename = mainright&output = alldomains

This might be Windows-only though.

这可能只是windows操作系统。

Hope this helps!

希望这可以帮助!

#2


5  

You can embed the username and password in the url like :

您可以在url中嵌入用户名和密码,如:

http://userid:passw@domain.name:port/...

http://userid:passw@domain.name:港口/…

This you can try to use with readLines(). If that doesn't work, you can always try a workaround using url() to open the connection :

您可以尝试使用readLines()。如果这不起作用,您可以尝试使用url()来打开连接:

zz <- url("http://userid:passw@domain.name:port/...")
readLines(zz)
close(zz)

You can also download the file and save it somewhere using download.file()

您还可以下载该文件并使用download.file()将其保存在某处

download.file("theurl","/path/to/file/filename",method="wget")

This saves the file on the local path that is specified.

这将在指定的本地路径上保存文件。

EDIT :

编辑:

as csgillespie said, you shouldn't include your username and password in the script. If you run scripts with source() or interactively, you could add eg :

正如csgillespie所说,不应该在脚本中包含用户名和密码。如果您运行带有source()或交互性的脚本,您可以添加eg:

user <- readline("Give the username : ")
passw <- readline("Give the password : ")

Url <- paste("http://",user,":",passw,"@domain.name...")
readLines(Url,...)

When running from the commandline, you could pass the arguments after --args and access them using commandArgs (see ?commandArgs)

当从命令行运行时,您可以传递参数after -args并使用commandArgs访问它们(参见?commandArgs)

#3


3  

If you have access to the box, you could always just read the awstats log files. If you can ssh into the box, then you could easily sync the latest file using rsync.

如果您有访问该框的权限,您可以只读取awstats日志文件。如果您能够ssh到这个框,那么您可以使用rsync轻松同步最新的文件。

The slight snag with using

使用上的小问题

http://username:password@domain...

is that you are putting your password in an R script - best to avoid this. Of course you can secure it the script, but it only takes one slip. For example,

你把密码放在R脚本中——最好避免这种情况。当然,您可以保护它的脚本,但它只需要一个疏忽。例如,

  • Someone asks you a similar question and you publish your script
  • 有人问你一个类似的问题,你就发表你的剧本
  • The url http://username:password@domain... will(?) now show up on your server logs
  • url http://username:password@domain…现在会(?)显示在您的服务器日志上吗
  • ...

#4


2  

Formatting the url as http://username:password@domain... for use with download.file didn't work for me, but R.utils provides the function downloadFile that works perfectly:

将url格式化为http://username:password@domain…用于下载。文件对我没用,但是R。utils提供功能下载文件,工作完美:

require(R.utils)
downloadFile(myurl, myfile, username = "myusername", password ="mypassword")

See @joris-meys answer for a way to avoid including your username and password in plain text in your script.

查看@joris-meys的答案,以避免在脚本中以纯文本形式包含用户名和密码。

EDIT Except it looks like downloadFile just reformats the URL to http://username:password@domain...? Hmm...

除了它看起来像下载文件只是重新格式化URL到http://username:password@domain…?嗯…

#1


6  

If it is indeed a http basic access authentication, the documentation on connections provides some help:

如果它确实是一个http基本访问身份验证,那么关于连接的文档提供了一些帮助:

URLs

url

Note that https:// connections are only supported if --internet2 or setInternet2(TRUE) was used (to make use of Internet Explorer internals), and then only if the certificate is considered to be valid. With that option only, the http://user:pass@site notation for sites requiring authentication is also accepted.

注意,只有在使用—internet2或setInternet2(TRUE)(用于使用Internet Explorer内部)时才支持https://连接,并且只有在证书被认为是有效的情况下才支持。只有该选项,还可以接受需要验证的站点的http://user:pass@site表示法。

So your URL string should look like this:

你的URL字符串应该是这样的:

http://username:password@domain.name:port/awstats.pl?month=02&year=2011&config=domain.name&lang=en&framename=mainright&output=alldomains

http://username:password@domain.name:港口/ awstats.pl ?月= 02年= 2011配置= domain.name&lang = en&framename = mainright&output = alldomains

This might be Windows-only though.

这可能只是windows操作系统。

Hope this helps!

希望这可以帮助!

#2


5  

You can embed the username and password in the url like :

您可以在url中嵌入用户名和密码,如:

http://userid:passw@domain.name:port/...

http://userid:passw@domain.name:港口/…

This you can try to use with readLines(). If that doesn't work, you can always try a workaround using url() to open the connection :

您可以尝试使用readLines()。如果这不起作用,您可以尝试使用url()来打开连接:

zz <- url("http://userid:passw@domain.name:port/...")
readLines(zz)
close(zz)

You can also download the file and save it somewhere using download.file()

您还可以下载该文件并使用download.file()将其保存在某处

download.file("theurl","/path/to/file/filename",method="wget")

This saves the file on the local path that is specified.

这将在指定的本地路径上保存文件。

EDIT :

编辑:

as csgillespie said, you shouldn't include your username and password in the script. If you run scripts with source() or interactively, you could add eg :

正如csgillespie所说,不应该在脚本中包含用户名和密码。如果您运行带有source()或交互性的脚本,您可以添加eg:

user <- readline("Give the username : ")
passw <- readline("Give the password : ")

Url <- paste("http://",user,":",passw,"@domain.name...")
readLines(Url,...)

When running from the commandline, you could pass the arguments after --args and access them using commandArgs (see ?commandArgs)

当从命令行运行时,您可以传递参数after -args并使用commandArgs访问它们(参见?commandArgs)

#3


3  

If you have access to the box, you could always just read the awstats log files. If you can ssh into the box, then you could easily sync the latest file using rsync.

如果您有访问该框的权限,您可以只读取awstats日志文件。如果您能够ssh到这个框,那么您可以使用rsync轻松同步最新的文件。

The slight snag with using

使用上的小问题

http://username:password@domain...

is that you are putting your password in an R script - best to avoid this. Of course you can secure it the script, but it only takes one slip. For example,

你把密码放在R脚本中——最好避免这种情况。当然,您可以保护它的脚本,但它只需要一个疏忽。例如,

  • Someone asks you a similar question and you publish your script
  • 有人问你一个类似的问题,你就发表你的剧本
  • The url http://username:password@domain... will(?) now show up on your server logs
  • url http://username:password@domain…现在会(?)显示在您的服务器日志上吗
  • ...

#4


2  

Formatting the url as http://username:password@domain... for use with download.file didn't work for me, but R.utils provides the function downloadFile that works perfectly:

将url格式化为http://username:password@domain…用于下载。文件对我没用,但是R。utils提供功能下载文件,工作完美:

require(R.utils)
downloadFile(myurl, myfile, username = "myusername", password ="mypassword")

See @joris-meys answer for a way to avoid including your username and password in plain text in your script.

查看@joris-meys的答案,以避免在脚本中以纯文本形式包含用户名和密码。

EDIT Except it looks like downloadFile just reformats the URL to http://username:password@domain...? Hmm...

除了它看起来像下载文件只是重新格式化URL到http://username:password@domain…?嗯…