CINERGIWebCrawler:EarthCube 的爬虫 - 用于爬取 URLS 的 CINERGI 项目下载

【文件属性】：

文件名称：CINERGIWebCrawler:EarthCube 的爬虫 - 用于爬取 URLS 的 CINERGI 项目

文件大小：190KB

文件格式：ZIP

更新时间：2024-07-20 20:49:23

Python

CINERGI网络爬虫该爬虫/刮刀旨在从地球科学资源中收集元数据。依赖项是 Python 3.4 代码中使用了以下库：xml.etree.ElementTree、urllib.request、urllib.parse、re 和 bs4 请通过运行以下命令安装 urllib3、beautifulsoup4 和 tldextract： $ python -m pip install urllib3（在 c:\python34 中） $ python -m pip install beautifulsoup4 $ python -m pip install tldextract 运行爬虫： git clone cd CINERGIWebCrawler python3.4 crawler_base.py 会出现提示， $ 输入要抓取的网址：输入一个有效的 URL 来抓取组

立即下载

【文件预览】：
CINERGIWebCrawler-master
----.gitignore(7B)
----resourceTypes.py(5KB)
----harvestGCMD.py(825B)
----check_link.py(705B)
----Excel output()
--------Antarctica_9_15.xlsx(15KB)
--------CrawlGreenSeas.xlsx(29KB)
--------Crawl_9_10.xlsx(21KB)
--------Crawl.xlsx(35KB)
--------Crawl_9_15.xlsx(21KB)
--------CrawlCopy2.0.xlsx(22KB)
--------Crawl_with_Class.xlsx(16KB)
--------CrawlCopyAntarctica.xlsx(17KB)
--------Cinergi_test_bed.xlsx(17KB)
----write.py(1KB)
----visible.py(248B)
----README.md(876B)
----check_type.py(239B)
----Resource.py(7KB)
----crawler_base.py(2KB)
----Organization.py(3KB)
----term_links.py(9KB)
----Tests()
--------Greensease_testbed(95B)
--------ListOfLinks.txt(27B)
--------CINERGI_testbed(43B)
--------Antarctica_testbed(43B)
----disciplines_known.py(8KB)

秒客网

CINERGIWebCrawler:EarthCube 的爬虫 - 用于爬取 URLS 的 CINERGI 项目

网友评论

相关文章