文件名称:Heritrix 网络爬虫
文件大小:21.72MB
文件格式:ZIP
更新时间:2014-10-04 04:59:11
网络爬虫 java
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. Heritrix (sometimes spelled heretrix, or misspelled or missaid as heratrix/heritix/ heretix/heratix) is an archaic word for heiress (woman who inherits). Since our crawler seeks to collect and preserve the digital artifacts of our culture for the benefit of future researchers and generations, this name seemed apt.