I am trying to extract some data from a table by parsing the HTML using jsoup.
我试图通过使用jsoup解析HTML从表中提取一些数据。
Here is an example,
这是一个例子,
String tableHtml =
"<table>
<thead>
<tr><th>
<table>
<tr><td>asdf</td></tr>
</table>
<table>
<tr><td>asdf</td></tr>
</table>
</th></tr>
</thead>
<tfoot>
<tr><td>
THE TEXT I WANT TO GET
</td></tr>
</tfoot>
</table>";
Document doc = Jsoup.parseBodyFragment(tableHtml);
Element table = doc.select("table").first();
Element r = table.select("tfoot").first(); // I get NULL here/// WHY???
System.out.println("-----------" + r.text());
I get null pointer exception !
我得到空指针异常!
However if I remove one of the inner tables, I don't get an exception and it works. Also if I changed the tag <th>
to <td>
, it works. Strange behavior. This is just an example of real html that I am trying to parse. I would appreciate if anyone can point me out why I am getting this exception. Thank you.
但是,如果我删除其中一个内部表,我不会得到一个例外,它的工作原理。此外,如果我将标签更改为,它也可以。奇怪的行为。这只是我试图解析的真实html的一个例子。如果有人能指出我为什么会得到这个例外,我将不胜感激。谢谢。
NOTE. Please assume that I cannot modify the HTML. I just want to parse it as it is.
注意。请假设我无法修改HTML。我只是想解析它。
1 个解决方案
#1
1
Maybe instead of using HTML parser (which apparently doesn't fully support this kind of nesting tables) use XML parser. Try with
也许不使用HTML解析器(显然不完全支持这种嵌套表)使用XML解析器。试试吧
Document doc = Jsoup.parse(tableHtml,"",Parser.xmlParser());
Element table = doc.select("table").first();
Element r = table.select("tfoot").first();
System.out.println("->" + r.text());
#1
1
Maybe instead of using HTML parser (which apparently doesn't fully support this kind of nesting tables) use XML parser. Try with
也许不使用HTML解析器(显然不完全支持这种嵌套表)使用XML解析器。试试吧
Document doc = Jsoup.parse(tableHtml,"",Parser.xmlParser());
Element table = doc.select("table").first();
Element r = table.select("tfoot").first();
System.out.println("->" + r.text());