文件名称:crawler:去网络爬虫
文件大小:38KB
文件格式:ZIP
更新时间:2024-07-20 16:08:42
Go
crawler crawler 致力于实现中文友好的网络抓取系统,项目基于PuerkitoBio的一个初级的并行的轻量级抓取库gocrawl.本项目目标: 实现分布式 增加机器学习算法 优化中文编码处理 完善文档 Features Full control over the URLs to visit, inspect and query (using a pre-initialized [goquery][] document) Crawl delays applied per host Obedience to robots.txt rules (using the [robotstxt.go][robots] library) Concurrent execution using goroutines Configurable logging Open, customizable d
【文件预览】:
crawler-master
----testdata()
--------robota()
--------hostc()
--------hostb()
--------robotb()
--------robotc()
--------hosta()
----options.go(1KB)
----README.rst(3KB)
----worker.go(11KB)
----tblrun_test.go(3KB)
----assert_test.go(1018B)
----examples_test.go(1KB)
----LICENSE(1KB)
----complex_test.go(7KB)
----spyext_test.go(7KB)
----popchannel.go(1KB)
----ext.go(7KB)
----urlcontext.go(4KB)
----crawler.go(11KB)
----tbldef_test.go(28KB)
----fileext_test.go(1KB)
----cmd()
--------example()
----logger.go(669B)
----errors.go(2KB)