如何使用DOM从这个XML中解析

时间:2022-11-27 18:36:45

It look like this in XML. I want to get he Image src value...

在XML中是这样的。我想要得到图像src值……

<description><![CDATA[<div class="images"><img src="http://www.voicetv.co.th/cache/images/8a1a6f2aeb7b0e9c1d6bb3eae314165f.jpg" /></div>]]></description>

What I am doing is

我正在做的是

if ((theElement.getElementsByTagName("description")).getLength() > 0) {

            allChildern = theElement.getElementsByTagName("description").item(0).getChildNodes();

            for (int index = 0; index < allChildern.getLength(); index++) {
                description += allChildern.item(index).getNodeValue();

                NodeList chNodes = allChildern.item(index).getChildNodes();
                for (int i = 0; i < chNodes.getLength(); i++) {

                    String name = chNodes.item(i).getNodeName();
                    if(name.equals("div")) {
                        String clas = allChildern.item(index).getAttributes().getNamedItem("class").getNodeValue();
                        if(clas.equals("images")){
                            String nName = allChildern.item(index).getChildNodes().item(0).getNodeName();
                            if(nName.equals("img")) {
                                String nValue = allChildern.item(index).getChildNodes().item(0).getAttributes().getNamedItem("src").getNodeValue();
                            }
                        }
                    }
                }


            }
            currentStory.setDescription(description);
        }

But is is not working

但这是行不通的

2 个解决方案

#1


5  

The description element contains a CDATA node. This means that the <img> "element" you are trying to access is really just a piece of text (and not an element at all).

description元素包含一个CDATA节点。这意味着您试图访问的如何使用DOM从这个XML中解析 "元素"实际上只是一段文本(而不是一个元素)。

You'll need to parse the text as a new XML document in order to access it via DOM methods.

您需要将文本解析为新的XML文档,以便通过DOM方法访问它。

#2


0  

Warning: This might be a bit dirty, and it can also be fragile if the xml can contain comments that contains something that looks like image tags.

警告:这可能有点脏,如果xml可以包含包含看起来像图像标记的注释,那么它也可能是脆弱的。

An alternative to using xml parsing for that short xml snippet that has a cdata section is to get the image url using regexp. Here's an example:

对于具有cdata部分的简短xml片段使用xml解析的替代方法是使用regexp获取图像url。这里有一个例子:

String xml = "<description><![CDATA[<div class=\"images\"><img src=\"http://www.voicetv.co.th/cache/images/8a1a6f2aeb7b0e9c1d6bb3eae314165f.jpg\"/></div>]]></description>";
Matcher matcher = Pattern.compile("<img src=\"([^\"]+)").matcher(xml);
while (matcher.find()) {
    System.out.println("img url: " + matcher.group(1));
}

#1


5  

The description element contains a CDATA node. This means that the <img> "element" you are trying to access is really just a piece of text (and not an element at all).

description元素包含一个CDATA节点。这意味着您试图访问的如何使用DOM从这个XML中解析 "元素"实际上只是一段文本(而不是一个元素)。

You'll need to parse the text as a new XML document in order to access it via DOM methods.

您需要将文本解析为新的XML文档,以便通过DOM方法访问它。

#2


0  

Warning: This might be a bit dirty, and it can also be fragile if the xml can contain comments that contains something that looks like image tags.

警告:这可能有点脏,如果xml可以包含包含看起来像图像标记的注释,那么它也可能是脆弱的。

An alternative to using xml parsing for that short xml snippet that has a cdata section is to get the image url using regexp. Here's an example:

对于具有cdata部分的简短xml片段使用xml解析的替代方法是使用regexp获取图像url。这里有一个例子:

String xml = "<description><![CDATA[<div class=\"images\"><img src=\"http://www.voicetv.co.th/cache/images/8a1a6f2aeb7b0e9c1d6bb3eae314165f.jpg\"/></div>]]></description>";
Matcher matcher = Pattern.compile("<img src=\"([^\"]+)").matcher(xml);
while (matcher.find()) {
    System.out.println("img url: " + matcher.group(1));
}