Android使用DOM解析器为RSS提要解析HTML实体

时间:2021-10-12 03:58:19

I am using the google books api for an Android app that I am building. This is a sample of the XML file

我正在使用google books api来构建我正在构建的Android应用程序。这是XML文件的示例

<dc:description>This trilogy includes &amp;quot; The Hitchhiker&amp;#39;s Guide to the Galaxy&amp;quot; , &amp;quot; TheRestaurant at the End of the Universe&amp;quot; , &amp;quot; Life, Universe and Everything&amp;quot; and &amp;quot; So Long ...</dc:description>
<dc:format>590 pages</dc:format>
<dc:format>book</dc:format>

And this is a fraction of the code I'm using to extract the description

这只是我用来提取描述的代码的一小部分

if ( entry.getElementsByTagName( "dc:description" ).item( 0 ) != null ) {
  Element d = ( Element ) entry.getElementsByTagName( "dc:description" )
      .item( 0 );
  b.setDescription( d.getFirstChild( ).getNodeValue( ) );

}

The problem is when using the HTML.fromHtml(Str) function it cuts off the text at the first HTML entity (so in this example it says simply

问题是当使用HTML.fromHtml(Str)函数时,它会切断第一个HTML实体的文本(所以在这个例子中它简单地说

This trilogy includes

这部三部曲包括

When I run the same code outside of Android it works ok and at least shows the string with the escape characters, i.e.

当我在Android之外运行相同的代码时,它工作正常,至少显示带有转义字符的字符串,即

This trilogy includes &quot; The Hitchhiker&#39;s Guide to the Galaxy&quot; , &quot; TheRestaurant at the End of the Universe&quot; , &quot; Life, Universe and Everything&quot; and &quot; So Long ...

If I then manually add this to the app the HTML.fromHtml() works fine so I guess the problem is Android's implementation of the parser.

如果我然后手动将其添加到应用程序,HTML.fromHtml()工作正常,所以我猜问题是Android的解析器实现。

A similar problem is Android decoding html in xml file. I have tried setting the validation of the factory to false, and as it is an RSS feed I cannot declare an HTML root element (as suggested in this post).

类似的问题是Android解码xml文件中的html。我已经尝试将工厂的验证设置为false,因为它是RSS提要我不能声明HTML根元素(如本文所示)。

1 个解决方案

#1


0  

I ended up not getting the description data from Google but I think the problem might be solved by running normalise() on the document element - I had a similar problem with another API and that fixed it.

我最终没有从Google获取描述数据,但我认为可以通过在文档元素上运行normalize()来解决问题 - 我遇到了与另一个API类似的问题并修复了它。

#1


0  

I ended up not getting the description data from Google but I think the problem might be solved by running normalise() on the document element - I had a similar problem with another API and that fixed it.

我最终没有从Google获取描述数据,但我认为可以通过在文档元素上运行normalize()来解决问题 - 我遇到了与另一个API类似的问题并修复了它。