I am trying to use python to download the results from the following website:
我正在尝试使用python从以下网站下载结果:
http://david.abcc.ncifcrf.gov/api.jsp?type=GENBANK_ACCESSION&ids=CP000010,CP000125,CP000124,CP000124,CP000124,CP000124&tool=chartReport&annot=KEGG_PATHWAY
I was attempting to use mechanize before I realized that the Download File is written in javascript which mechanize does not support. My code so far opens the web page as shown below. I am stuck on how to access the Download link on the web page in order to save the data onto my machine.
在我意识到下载文件是用机制化不支持的javascript编写之前,我试图使用机械化。到目前为止,我的代码打开了如下所示的网页。我被困在如何访问网页上的下载链接,以便将数据保存到我的机器上。
import urllib2
def downloadFile():
url = 'http://david.abcc.ncifcrf.gov/api.jsp?type=GENBANK_ACCESSION&ids=CP000010,CP000125,CP000124,CP000124,CP000124,CP000124&tool=chartReport&annot=KEGG_PATHWAY'
t = urllib2.urlopen(url)
s = t.read()
print s
The results that are printed are
打印的结果是
<html>
<head></head>
<body>
<form name="apiForm" method="POST">
<input type="hidden" name="rowids">
<input type="hidden" name="annot">
<script type="text/javascript">
document.apiForm.rowids.value="4791928,3403495,...."; //There are really about 500 values
document.apiForm.annot.value="48";
document.apiForm.action = "chartReport.jsp";
document.apiForm.submit();
</script>
</form>
</body>
</html>
Does anybody know how I can select and move to the Download File page and save that file to my computer?
有谁知道我如何选择并移动到下载文件页面并将该文件保存到我的电脑?
1 个解决方案
#1
2
After some more research on that link, I came up with this. You can definitely use mechanize to do it.
经过对该链接的更多研究后,我想出了这个。你绝对可以使用机械化来做到这一点。
import mechanize
def getJSVariableValue(content, variable):
value_start_index = content.find(variable)
value_start_index = content.find('"', value_start_index) + 1
value_end_index = content.find('"', value_start_index)
value = content[value_start_index:value_end_index]
return value
def getChartReport(url):
br = mechanize.Browser()
resp = br.open(url)
content = resp.read()
br.select_form(name = 'apiForm')
br.form.set_all_readonly(False)
br.form['rowids'] = getJSVariableValue(content, 'document.apiForm.rowids.value')
br.form['annot'] = getJSVariableValue(content, 'document.apiForm.annot.value')
br.form.action = 'http://david.abcc.ncifcrf.gov/' + getJSVariableValue(content, 'document.apiForm.action')
print br.form['rowids']
print br.form['annot']
br.submit()
resp = br.follow_link(text_regex=r'Download File')
content = resp.read()
f = open('output.txt', 'w')
f.write(content)
url = 'http://david.abcc.ncifcrf.gov/api.jsp?type=GENBANK_ACCESSION&ids=CP000010,CP000125,CP000124,CP000124,CP000124,CP000124&tool=chartReport&annot=KEGG_PATHWAY'
chart_output = getChartReport(url)
#1
2
After some more research on that link, I came up with this. You can definitely use mechanize to do it.
经过对该链接的更多研究后,我想出了这个。你绝对可以使用机械化来做到这一点。
import mechanize
def getJSVariableValue(content, variable):
value_start_index = content.find(variable)
value_start_index = content.find('"', value_start_index) + 1
value_end_index = content.find('"', value_start_index)
value = content[value_start_index:value_end_index]
return value
def getChartReport(url):
br = mechanize.Browser()
resp = br.open(url)
content = resp.read()
br.select_form(name = 'apiForm')
br.form.set_all_readonly(False)
br.form['rowids'] = getJSVariableValue(content, 'document.apiForm.rowids.value')
br.form['annot'] = getJSVariableValue(content, 'document.apiForm.annot.value')
br.form.action = 'http://david.abcc.ncifcrf.gov/' + getJSVariableValue(content, 'document.apiForm.action')
print br.form['rowids']
print br.form['annot']
br.submit()
resp = br.follow_link(text_regex=r'Download File')
content = resp.read()
f = open('output.txt', 'w')
f.write(content)
url = 'http://david.abcc.ncifcrf.gov/api.jsp?type=GENBANK_ACCESSION&ids=CP000010,CP000125,CP000124,CP000124,CP000124,CP000124&tool=chartReport&annot=KEGG_PATHWAY'
chart_output = getChartReport(url)