如何使用Selenium和Python迭代项目列表并提取特定部分

时间:2021-04-10 19:28:02

如何使用Selenium和Python迭代项目列表并提取特定部分From this web page "https://meshb.nlm.nih.gov/treeView", I want to iterate through each node of the tree and if I see the word "Cardiovascular..." in their items, I want to create a dictionary that lists the top level node along with all of cardiovascular associated items. For example, in the above page you can see that if you expand "Anatomy [A]", you will see cardiovascular. Now, I want this part along with whatever included in cardiovascular if you expand it. A part of the html page that I want to iterate through some of its elements is as follows:

从这个网页“https://meshb.nlm.nih.gov/treeView”,我想迭代树的每个节点,如果我在他们的项目中看到“心血管......”这个词,我想创建列出*节点以及所有心血管相关项目的字典。例如,在上面的页面中,您可以看到如果展开“Anatomy [A]”,您将看到心血管疾病。现在,如果你扩展它,我想要这部分以及心血管中的任何内容。我要迭代它的一些元素的html页面的一部分如下:

<a class="ng-scope">
   <span class="ng-binding ng-scope">Anatomy [A]</span>
</a>
    <ul class="treeItem ng-scope">
        <li class ="ng-scope" >
              < a  class ="ng-scope" href="/record/ui?ui=D001829" >
              < span  class ="ng-binding ng-scope" > Body Regions[A01] < / span >
              </a>
        </li>
        < li class ="ng-scope" >
              <a  class ="ng-scope" href="/record/ui?ui=D001829" >
                < span  class ="ng-binding ng-scope" > Cardio Vascular< / span >
              </a>
                    <ul class="treeItem ng-scope">
                        <li class="ng-scope">
                           <a class="ng-scope" href="/record/ui?ui=D015824">
                           <span class="ng-binding ng-scope">Blood-Air Barrier [A07.025]</span>
                           </a>
                                 <ul class="treeItem ng-scope">                    
                                   <li class="ng-scope">
                                       <a class="ng-scope" href="/record/ui?ui=D018916">
                                       <span class="ng-binding ng-scope">Blood-Aqueous Barrier [A07.030]</span>                        
                                       </a>
                                    </li>
                                 </ul>
                        </li>
                    </ul>
        </li>
    </ul>

..... and here is what I was able to accomplish so far! in Python; As the first step, I wanted to iterate through the top level nodes and find the word "cardiovascular.." but I keep seeing the error" no such element: Unable to locate element". Can someone tell me what am I missing here?

.....到目前为止,这是我能够完成的!在Python中;作为第一步,我想迭代*节点并找到“心血管......”这个词,但我一直看到错误“没有这样的元素:无法找到元素”。谁能告诉我,我在这里失踪了什么?

from selenium import webdriver
chrome_path=r"G:\My Drive\A\chrome_driver\chromedriver_win32\chromedriver.exe"
driver=webdriver.Chrome(chrome_path)
driver.get('https://meshb.nlm.nih.gov/treeView')
for links in driver.find_elements_by_css_selector('a.ng-scope'):
    cardio = links.find_element_by_css_selector('li>a>span.ng-binding.ng-scope')        
    print(cardio.text)

1 个解决方案

#1


0  

There are some issues in your code. You cannot iterate through the list unless you click on the "+" icon on the parent node.

您的代码中存在一些问题。除非单击父节点上的“+”图标,否则无法遍历列表。

In your code, I can see that you have created a list which contains parent nodes like Anatomy, Organisms and etc but you haven't written a code to expand the list.

在您的代码中,我可以看到您创建了一个包含父节点(如Anatomy,Organisms等)的列表,但您还没有编写代码来扩展列表。

Steps which you have to follow are:

您必须遵循的步骤是:

  1. Store parent nodes in the list => This step is covered in your code.
  2. 将父节点存储在列表中=>您的代码中包含此步骤。

  3. Iterate through each parent node by clicking on the expand icon(+ icon) => needs to be covered.
  4. 通过单击需要覆盖的展开图标(+图标)=>来遍历每个父节点。

  5. Store the child nodes in the list and iterate through the child nodes as well => needs to be covered
  6. 将子节点存储在列表中并迭代子节点以及=>需要覆盖它们

  7. Keep iterating unless you find the child node "cardiovascular" => needs to be covered.
  8. 除非您发现子节点“心肺”=>需要被覆盖,否则继续迭代。

  9. Click on the + icon in front of the child node "cardiovascular" and store the elements under the node "cardiovascular" in the dictionary => needs to be covered.
  10. 单击子节点“心血管”前面的+图标,并将字典“>心血管”下的元素存储在字典中=>需要覆盖。

I have created a code which covers 1st,2nd and 3rd steps for you. Please proceed in the same way.

我已经创建了一个代码,涵盖了第1步,第2步和第3步。请以同样的方式进行。

from selenium import webdriver
chrome_path=r"G:\MyDrive\A\chrome_driver\chromedriver_win32\chromedriver.exe"
driver=webdriver.Chrome(chrome_path)
driver.get('https://meshb.nlm.nih.gov/treeView')
for links in driver.find_elements_by_css_selector('a.ng-scope'):
    links.find_element_by_xpath("./following-sibling::span/i[1]").click();
      for sublinks in links.find_elements_by_xpath('./following-sibling::ul/li//a'):
        print(sublinks.text)

I have a java background so please forgive me for any language related syntax issues.

我有一个java背景所以请原谅我任何语言相关的语法问题。

#1


0  

There are some issues in your code. You cannot iterate through the list unless you click on the "+" icon on the parent node.

您的代码中存在一些问题。除非单击父节点上的“+”图标,否则无法遍历列表。

In your code, I can see that you have created a list which contains parent nodes like Anatomy, Organisms and etc but you haven't written a code to expand the list.

在您的代码中,我可以看到您创建了一个包含父节点(如Anatomy,Organisms等)的列表,但您还没有编写代码来扩展列表。

Steps which you have to follow are:

您必须遵循的步骤是:

  1. Store parent nodes in the list => This step is covered in your code.
  2. 将父节点存储在列表中=>您的代码中包含此步骤。

  3. Iterate through each parent node by clicking on the expand icon(+ icon) => needs to be covered.
  4. 通过单击需要覆盖的展开图标(+图标)=>来遍历每个父节点。

  5. Store the child nodes in the list and iterate through the child nodes as well => needs to be covered
  6. 将子节点存储在列表中并迭代子节点以及=>需要覆盖它们

  7. Keep iterating unless you find the child node "cardiovascular" => needs to be covered.
  8. 除非您发现子节点“心肺”=>需要被覆盖,否则继续迭代。

  9. Click on the + icon in front of the child node "cardiovascular" and store the elements under the node "cardiovascular" in the dictionary => needs to be covered.
  10. 单击子节点“心血管”前面的+图标,并将字典“>心血管”下的元素存储在字典中=>需要覆盖。

I have created a code which covers 1st,2nd and 3rd steps for you. Please proceed in the same way.

我已经创建了一个代码,涵盖了第1步,第2步和第3步。请以同样的方式进行。

from selenium import webdriver
chrome_path=r"G:\MyDrive\A\chrome_driver\chromedriver_win32\chromedriver.exe"
driver=webdriver.Chrome(chrome_path)
driver.get('https://meshb.nlm.nih.gov/treeView')
for links in driver.find_elements_by_css_selector('a.ng-scope'):
    links.find_element_by_xpath("./following-sibling::span/i[1]").click();
      for sublinks in links.find_elements_by_xpath('./following-sibling::ul/li//a'):
        print(sublinks.text)

I have a java background so please forgive me for any language related syntax issues.

我有一个java背景所以请原谅我任何语言相关的语法问题。