如何从Java网页获取信息？

Does anyone know of a quick way that I can get information from a webpage in Java? For instance, if I'm looking at a page like this: http://www.ncbi.nlm.nih.gov/pubmed/?term=10952317 and i want to extract the list of words beneath the heading "MeSH Terms", how would I go about doing so?

有谁知道我可以从Java网页获取信息的快捷方式？例如，如果我正在查看这样的页面：http：//www.ncbi.nlm.nih.gov/pubmed/？term = 10952317，我想提取“MeSH Terms”标题下的单词列表，我该怎么做呢？

I have something that can read the source but it is full of HTML tags and such...

我有一些东西可以读取源，但它充满了HTML标签等...

Any help is much appreciated!

任何帮助深表感谢！

2 个解决方案

#1

As has been mentioned on here countless times before have a look at JSoup, which is a HTML parsing library for Java. Or write your own (not recommended).

正如前面已经提到的那样，无数次看看JSoup，这是一个用于Java的HTML解析库。或者自己写（不推荐）。

#2

Probably TagSoup is for you.

可能TagSoup适合你。

#1