如何使用mechanize提交表单以获取下一个网页的URL?

时间:2022-11-23 17:42:34

I am trying to scrape some data, decided to employ mechanize in conjunction with beautifulsoup. I have to enter the field that I want to search in a form on this webpage, then click the search button to get to the next relevant page whose URL I want to get to scrape data off.

我试图刮掉一些数据,决定将机械化与beautifulsoup结合使用。我必须在此网页上输入我要在表单中搜索的字段,然后单击搜索按钮以转到我想要删除数据的下一个相关页面。

The developer mode shows the following code for the form-

开发者模式显示表格的以下代码 -

<form name="topsearch" id="topsearch" method="get" onsubmit="javascript:return search_post();" action="">
        <input type="hidden" name="search_data" id="search_data" value="">
            <input type="hidden" name="cid" id="cid" value="">
            <input type="hidden" name="mbsearch_str" id="mbsearch_str">
            <input type="hidden" name="topsearch_type" id="topsearch_type" value="1">
            <input name="search_str" id="search_str" autocomplete="off" onkeyup="getAutosuggesion();" type="text" value="Search Quotes, News, NAVs..." onblur="if(this.value=='')this.value='Search Quotes, News, NAVs...';" onfocus="if(this.value=='Search Quotes, News, NAVs...')this.value='';if(this.value=='Search Quotes, News, NAVs...')this.value='';" class="txtsrchbox">     
            <div id="autosugg_mc" class="sugbx"></div>
            <div class="PR srch_qote">
                <div class="srchdrp" id="srchR">Quotes</div>
                <div id="srch" class="qubx">
                    <ul class="qlist">
                        <li><a onclick="tab_topser('1');getAutosuggesion();" id="tab1" href="javascript:void(0)" class="">Quotes</a></li>
                        <li><a onclick="tab_topser('2');getAutosuggesion();" id="tab2" href="javascript:void(0)" class="">NAVs</a></li>
                        <li><a onclick="tab_topser('5');" id="tab5" href="javascript:void(0)" class="">Commodities</a></li>
                        <li><a onclick="tab_topser('9');" id="tab9" href="javascript:void(0)" class="active">Futures</a></li>
                        <li><a onclick="tab_topser('3');getAutosuggesion();" id="tab3" href="javascript:void(0)" class="">News</a></li>
                        <li><a onclick="tab_topser('4');" id="tab4" href="javascript:void(0)" class="">Messages</a></li>                                    
                        <li><a onclick="tab_topser('6');getAutosuggesion();" id="tab6" href="javascript:void(0)" class="">Notices</a></li>
                        <li><a onclick="tab_topser('7');" id="tab7" href="javascript:void(0)" class="">Videos</a></li>
                        <li><a class="" onclick="tab_topser('8');" id="tab8" href="javascript:void(0)">All</a></li>
                    </ul>
                </div>                  
            </div>
            <a href="javascript:;" onclick="$('#topsearch').submit()" style="float:left;" class="btn_search"></a>    
            <div class="CL"></div>
            </form>

I fill up the form with my relevant search item using-

我用以下相关搜索项填写表格 -

import pandas as  pd
import urllib2
import BeautifulSoup as bs
import mechanize

baseURL = "someBaseURL"
br = mechanize.Browser()
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)


#Open the Website
r = br.open(baseURL)

#Selecting the first form of the page
br.select_form(nr=0)
print br.geturl()

br.form['search_str'] = "Some Search"
br.submit()

print br.geturl()

After submitting the form, the url does NOT change to the url it goes to if I search the same string on the website manually.

提交表单后,如果我手动搜索网站上的相同字符串,则网址不会更改为网址。

I am getting the url after submitting as -

提交后我收到了网址 -

'baseURL?search_data=&cid=&mbsearch_str=&topsearch_type=1&search_str=Kiri+Industries'

whereas if I submit manually I get to the next page with the URL -

而如果我手动提交,我会使用URL进入下一页 -

'baseURL/stockpricequote/dyes-pigments/kiriindustries/KDC01'

This is the URL I need to be able to scrape the data.

这是我需要能够抓取数据的URL。

Is the submit button using javascript that cannot be called using mechanize, if that is the issue how can I make it to work?

提交按钮是否使用无法使用机械化调用的javascript,如果这是问题,我该如何使其工作?

Any help is appreciated, thank you.

感谢任何帮助,谢谢。

1 个解决方案

#1


0  

It seems , at least from my similar problem that mechanize does not handle Javascript at all. Try using selenium it handles javascript well. I am building my script on this I'll update if it solves my issue.

看来,至少从我的类似问题来看,机械化根本不能处理Javascript。尝试使用selenium它可以很好地处理javascript。我正在构建我的脚本,如果它解决了我的问题,我会更新。

#1


0  

It seems , at least from my similar problem that mechanize does not handle Javascript at all. Try using selenium it handles javascript well. I am building my script on this I'll update if it solves my issue.

看来,至少从我的类似问题来看,机械化根本不能处理Javascript。尝试使用selenium它可以很好地处理javascript。我正在构建我的脚本,如果它解决了我的问题,我会更新。