so I wanna extract lets say value="THE TEXT IWANNA EXTRACT ;0" at the html code below. I wanna extract all the string inside value attribute of "td class="regu". But I cant seem to find a way to extract it. I have extracted Names of ppl but I cant extract the string inside value attrib. Any help is much appreciated. Thankyou. Im stuck for like 24 hours already. Im open to use other Libraries as long as I can extract it.
所以我想在下面的html代码提取让我们说值=“文本IWANNA EXTRACT; 0”。我想要提取“td class =”regu“的所有字符串内部值属性。但我似乎无法找到提取它的方法。我已经提取了ppl的名称,但我无法提取字符串里面的值。任何帮助都很多感谢。谢谢你。我已经坚持了24个小时。我可以开放使用其他库,只要我能提取它。
<table class="dbtable" border="0" width="100%">
<tbody><tr>
<td class="tableheader" align="center" width="1%"><b>#</b></td>
<td class="tableheader" align="center" width="60%"><b>User Name</b></td>
<td class="tableheader" align="center"><b>User Type</b></td>
</tr><tr bgcolor="#ffffff">
<td class="regu"><input name="chkStud" value="THE TEXT IWANNA EXTRACT ;0" type="checkbox"></td>
<td class="regu">NAME OF STUDENT HERE </td>
<td class="regu"> Student</td>
</tr><tr bgcolor="#ffffff">
<td class="regu"><input name="chkStud" value="PLEASE EXTRACT ME HERE, IM DYING TO GET OUT;0" type="checkbox"></td>
<td class="regu">FOO BAR FOO BAR</td>
<td class="regu"> Student</td>
</tbody></table>
Here is the python code
这是python代码
#!/usr/bin/python
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import logging
driver = webdriver.Firefox()
driver.get("http://somewebsite/iwannascrape/login.php") #page requires a login T_T
assert "Student" in driver.title
elem = driver.find_element_by_name("txtUser")
elem.clear()
elem.send_keys("YOU_SIR_NAME") #login creds. please dont mind :D
elem2 = driver.find_element_by_name("txtPwd")
elem2.clear()
elem2.send_keys("PASSSWORDHERE")
elem2.send_keys(Keys.RETURN)
driver.get("http://somewebsite/iwannascrape/afterlogin/illhere")
# using this to extract only the table with class='dbtable' so its easier to manipulate :)
table_clas = driver.find_element_by_xpath("//*[@class='dbtable']")
source_code = table_clas.get_attribute("outerHTML") #this prints out the table and its children.
print source_code
for i in range (10): # spacing for readability
print "\n"
print table_clas.text #this prints out the names.
2 个解决方案
#1
2
Once you locate the desired element, use get_attribute()
method:
找到所需元素后,使用get_attribute()方法:
elm = driver.find_element_by_css_selector("#dbtable input[name=chkStud]")
print(elm.get_attribute("value"))
#2
-1
table_clas = driver.find_element_by_xpath("//*[@class='dbtable']")
#select the desired element to thin down the html
td = table_clas.find_elements_by_xpath("//*[@name='chkStud']")
#finally hunt down the element you want specifally.
#find_elements or find_element
#should you use find_elements, then it returns a list you can iterate it
# like
for things in td:
print things.get_attribute("value")
this prints:
THE TEXT IWANNA EXTRACT
文字IWANNA EXTRACT
PLEASE EXTRACT ME HERE, IM DYING TO GET OUT;0
请在这里提取我,即时消失; 0
#1
2
Once you locate the desired element, use get_attribute()
method:
找到所需元素后,使用get_attribute()方法:
elm = driver.find_element_by_css_selector("#dbtable input[name=chkStud]")
print(elm.get_attribute("value"))
#2
-1
table_clas = driver.find_element_by_xpath("//*[@class='dbtable']")
#select the desired element to thin down the html
td = table_clas.find_elements_by_xpath("//*[@name='chkStud']")
#finally hunt down the element you want specifally.
#find_elements or find_element
#should you use find_elements, then it returns a list you can iterate it
# like
for things in td:
print things.get_attribute("value")
this prints:
THE TEXT IWANNA EXTRACT
文字IWANNA EXTRACT
PLEASE EXTRACT ME HERE, IM DYING TO GET OUT;0
请在这里提取我,即时消失; 0