使用vtd-xml从xml文件中删除注释,删除注释

时间:2021-09-08 23:31:25

is there a way to remove the comments from a huge xml file (>200 MB), parsed by vtd-xml ?

有没有办法从一个巨大的xml文件(> 200 MB)中删除注释,由vtd-xml解析?

Both, comments before the root element

两者,根元素之前的注释

<!-- comment -->
<rootElement>
.
.
.
 </rootElement>

and comments within

和评论

<rootElement>
<book>
<!-- comment -->
</book>
</rootElement>

The best solution would be with xPath. I tried

最好的解决方案是使用xPath。我试过了

//comment()

which works with DOM but not with vtd-xml

它适用于DOM但不适用于vtd-xml

Here is my code for selecting comments

这是我选择评论的代码

String xPath = "//comment()"
XMLModifier xm = new XMLModifier();
VTDGen vg = new VTDGen();
if (vg.parseFile(fnIn,true)){
       VTDNav vn = vg.getNav();
       xm.bind(vn);
       nodeXpath(xPath,vn);
}

private void nodeXpath(String xPath, VTDNav vn) throws Exception{
    int result;

    AutoPilot ap = new AutoPilot();
    ap.selectXPath(xPath);
    ap.bind(vn);
    while((result = ap.evalXPath())!=-1){
        int p = vn.getText();

        if (p!=-1) {                
            System.out.println(vn.getText() + ", " + vn.toString(p));               
        }
    }
}

But the nothing is printed to screen here.

但这里没有任何东西打印到屏幕上。

Is there a way to do that with vtd xml?

有没有办法用vtd xml做到这一点?

Thanks for your help.

谢谢你的帮助。

1 个解决方案

#1


0  

You mentioned that your code prints nothing to the screen... not even commas? I wouldn't expect it to necessarily print anything from getText(), since the doc for getText() seems to indicate that it returns "the type character data or CDATA", which I don't think includes the content of a comment. (Thank you, @vtd-xml-author, for confirming that.)

你提到你的代码没有打印到屏幕上......甚至没有逗号?我不希望它必须从getText()打印任何东西,因为getText()的doc似乎表明它返回“类型字符数据或CDATA”,我不认为它包含注释的内容。 (谢谢你,@ vtd-xml-author,确认。)

A good test would be to print something in every iteration of your while loop before p = vn.getText(), so you'll know whether it's finding the comments at all.

一个好的测试是在p = vn.getText()之前在while循环的每次迭代中打印一些东西,这样你就会知道它是否正在查找注释。

If it is finding the comments, I think you'll want to call xm.removeToken(result) on each one.

如果它正在查找注释,我想你会想要在每个注释上调用xm.removeToken(result)。

#1


0  

You mentioned that your code prints nothing to the screen... not even commas? I wouldn't expect it to necessarily print anything from getText(), since the doc for getText() seems to indicate that it returns "the type character data or CDATA", which I don't think includes the content of a comment. (Thank you, @vtd-xml-author, for confirming that.)

你提到你的代码没有打印到屏幕上......甚至没有逗号?我不希望它必须从getText()打印任何东西,因为getText()的doc似乎表明它返回“类型字符数据或CDATA”,我不认为它包含注释的内容。 (谢谢你,@ vtd-xml-author,确认。)

A good test would be to print something in every iteration of your while loop before p = vn.getText(), so you'll know whether it's finding the comments at all.

一个好的测试是在p = vn.getText()之前在while循环的每次迭代中打印一些东西,这样你就会知道它是否正在查找注释。

If it is finding the comments, I think you'll want to call xm.removeToken(result) on each one.

如果它正在查找注释,我想你会想要在每个注释上调用xm.removeToken(result)。