用Java编写的最好的开源Web爬虫工具是什么?

时间:2021-05-26 22:47:58

What is the best Open Source Web Crawler Tool, written in Java.

用Java编写的最好的开源Web爬虫工具是什么?

2 个解决方案

#1


9  

Try crawler4j. You just need to implement a simple interface which controls which URLs to visit and what to do with each crawled page.

crawler4j试试。您只需实现一个简单的接口,该接口控制要访问哪些url以及如何处理每个爬行页面。

#2


5  

in java I think it boils down to Nutch vs Heritrix. You should specify what your needs are to get a better answer.

在java,我认为这可以归结为Nutch和Heritrix。你应该明确自己的需求,以便得到更好的答案。

#1


9  

Try crawler4j. You just need to implement a simple interface which controls which URLs to visit and what to do with each crawled page.

crawler4j试试。您只需实现一个简单的接口,该接口控制要访问哪些url以及如何处理每个爬行页面。

#2


5  

in java I think it boils down to Nutch vs Heritrix. You should specify what your needs are to get a better answer.

在java,我认为这可以归结为Nutch和Heritrix。你应该明确自己的需求,以便得到更好的答案。