如何在Groovy或Java中按元素“paths”过滤XML字符串

时间:2021-12-06 12:16:53

I have an object that's currently mapped from a Java POJO to XML using JAXB. Once I have that XML, I occasionally need to whittle it down to only a select set of elements based on input by a user. The result should be XML with ONLY the specified "fields".

我有一个对象,当前使用JAXB从Java POJO映射到XML。一旦我拥有了这个XML,我偶尔需要根据用户的输入将其简化为一组选定的元素。结果应该是只包含指定“字段”的XML。

I've come across a number of similar use cases which us SAX Filters, but they seem very complicated and the answers don't quite get me where I need. The closest example is this one, which excludes a single path from the result. I want the opposite -- whitelist a select list of elements.

我遇到过许多类似于SAX Filters的用例,但它们看起来非常复杂,并且答案并不能让我满足我的需要。最接近的例子是这一个,它排除了结果中的单个路径。我希望相反 - 白名单选择元素列表。

Example object: School.xml

示例对象:School.xml

<SchoolInfo RefId="34060F68BE3942F1B1264E6D2CC3C353">
        <LocalId>57</LocalId>
        <SchoolName>Foobar School of Technology</SchoolName>
        <Principal>
           <FirstName>Bob</FirstName>
           <LastName>Smith</LastName>
        </Principal>
        <StateProvinceId>34573</StateProvinceId>
        <LEAInfoRefId>340666687E3942F1B1264E1223453C353</LEAInfoRefId>
        <PhoneNumberList>
           <PhoneNumber Type="0096">
              <Number>555-832-5555</Number>
           </PhoneNumber>
           <PhoneNumber Type="0096">
              <Number>555-999-5555</Number>
           </PhoneNumber>
        </PhoneNumberList>
     </SchoolInfo>

Given the following input as a "filter":

给出以下输入作为“过滤器”:

List<String> filter = [ 
    "LocalId",
    "SchoolName",
    "Principal/FirstName",
    "PhoneNumberList/PhoneNumber/Number",
 ]

I need the output to be:

我需要输出为:

<SchoolInfo RefId="34060F68BE3942F1B1264E6D2CC3C353">
    <LocalId>57</LocalId>
    <SchoolName>Foobar School of Technology</SchoolName>
    <Principal>
       <FirstName>Bob</FirstName>
    </Principal>
    <PhoneNumberList>
        <PhoneNumber Type="0096">
            <Number>555-832-5555</Number>
        </PhoneNumber>
        <PhoneNumber Type="0096">
            <Number>555-999-5555</Number>
        </PhoneNumber>
    </PhoneNumberList>
</SchoolInfo>

What is the best library to achieve this? SAX Filtering feels to complicated, and XSLT doesn't seem like a good fit given the dynamic filtering.

实现这一目标的最佳图书馆是什么? SAX过滤感觉很复杂,而且XSLT似乎不适合动态过滤。

Examples to help me get closer would be highly appreciated.

帮助我走近的例子将受到高度赞赏。

2 个解决方案

#1


0  

This is the code that does the white listing... it is based on XPath and VTD-XML. Its output has indentation issues... this is the first pass that emphasizes correctness...

这是执行白名单的代码......它基于XPath和VTD-XML。它的输出有缩进问题......这是第一个强调正确性的过程......

import com.ximpleware.*;
import java.io.*;
import java.util.*;

public class whiteList {

    public static void main(String[] s) throws VTDException, IOException{
        VTDGen vg = new VTDGen();
        List <String> filter = Arrays.asList("LocalId",
                "SchoolName",
                "Principal/FirstName",
                "PhoneNumberList/PhoneNumber/Number");
        if (!vg.parseFile("d:\\xml\\schoolInfo.xml", false)){
            return;
        }
        VTDNav vn = vg.getNav();
        FastIntBuffer fib = new FastIntBuffer();
        // build a bitmap for the entire token pool consisting of elements
        int i,k;
        for (i=0;i<vn.getTokenCount();i++){
            if (vn.getTokenType(i)==VTDNav.TOKEN_STARTING_TAG){
                fib.append(0x1);// b'11 since it is a white list,
            }else{
                fib.append(0);
            }
        }
        AutoPilot ap = new AutoPilot(vn);
        AutoPilot ap1= new AutoPilot(vn);
        ap1.selectXPath("descendant::*");// mark descendant as keep
        for (int j=0;j<filter.size();j++){
            ap.selectXPath(filter.get(j));
            while((i=ap.evalXPath())!=-1){
                fib.modifyEntry(i, 0x3);
                vn.push();
                do{
                    if( vn.getTokenDepth(vn.getCurrentIndex())>=0)
                       fib.modifyEntry(vn.getCurrentIndex(), 0x3);
                    else
                        break;
                }while(vn.toElement(VTDNav.P));
                vn.pop();
                vn.push();
                while((k=ap1.evalXPath())!=-1){
                    fib.modifyEntry(k, 0x3);
                }
                ap1.resetXPath();
                vn.pop();
            }
            ap.resetXPath();
        }

        //remove those not on the whitelist
        XMLModifier xm = new XMLModifier(vn);
        for (int j=0;j<fib.size();j++){
            if (fib.intAt(j)==0x1){
                vn.recoverNode(j);
                xm.remove();
            }
        }
        xm.output("d:\\xml\\newSchoolInfo.xml");                    
    }
}

#2


0  

All Groovy:

所有Groovy:

import groovy.xml.XmlUtil

def xml = '''<SchoolInfo RefId="34060F68BE3942F1B1264E6D2CC3C353">
    <LocalId>57</LocalId>
    <SchoolName>Foobar School of Technology</SchoolName>
    <Principal>
       <FirstName>Bob</FirstName>
       <LastName>Smith</LastName>
    </Principal>
    <StateProvinceId>34573</StateProvinceId>
    <LEAInfoRefId>340666687E3942F1B1264E1223453C353</LEAInfoRefId>
    <PhoneNumberList>
       <PhoneNumber Type="0096">
          <Number>555-832-5555</Number>
       </PhoneNumber>
       <PhoneNumber Type="0096">
          <Number>555-999-5555</Number>
       </PhoneNumber>
    </PhoneNumberList>
 </SchoolInfo>'''

def node = new XmlParser().parseText(xml)

def whitelist = [ 'LocalId', 'SchoolName', 'Principal/FirstName', "PhoneNumberList/PhoneNumber/Number" ]*.split('/')

def void loveRemovalMachine(node, whitelist) {
    def elementNamesToKeep = whitelist*.head()
    println "Retaining nodes ${elementNamesToKeep} for node $node"
    def nodesToRemove = node.'*'.findAll { child -> !elementNamesToKeep.contains(child.name()) }
    nodesToRemove.each { node.remove it }
    def nextWhitelist = whitelist*.tail().findAll { it }
    println "Next level: $nextWhitelist"
    if (!nextWhitelist) {
        return
    }
    // The "*" operator seems to return text nodes...very stupid.
    node.'*:*'.each { loveRemovalMachine it, nextWhitelist }
}

loveRemovalMachine node, whitelist

XmlUtil.serialize node

#1


0  

This is the code that does the white listing... it is based on XPath and VTD-XML. Its output has indentation issues... this is the first pass that emphasizes correctness...

这是执行白名单的代码......它基于XPath和VTD-XML。它的输出有缩进问题......这是第一个强调正确性的过程......

import com.ximpleware.*;
import java.io.*;
import java.util.*;

public class whiteList {

    public static void main(String[] s) throws VTDException, IOException{
        VTDGen vg = new VTDGen();
        List <String> filter = Arrays.asList("LocalId",
                "SchoolName",
                "Principal/FirstName",
                "PhoneNumberList/PhoneNumber/Number");
        if (!vg.parseFile("d:\\xml\\schoolInfo.xml", false)){
            return;
        }
        VTDNav vn = vg.getNav();
        FastIntBuffer fib = new FastIntBuffer();
        // build a bitmap for the entire token pool consisting of elements
        int i,k;
        for (i=0;i<vn.getTokenCount();i++){
            if (vn.getTokenType(i)==VTDNav.TOKEN_STARTING_TAG){
                fib.append(0x1);// b'11 since it is a white list,
            }else{
                fib.append(0);
            }
        }
        AutoPilot ap = new AutoPilot(vn);
        AutoPilot ap1= new AutoPilot(vn);
        ap1.selectXPath("descendant::*");// mark descendant as keep
        for (int j=0;j<filter.size();j++){
            ap.selectXPath(filter.get(j));
            while((i=ap.evalXPath())!=-1){
                fib.modifyEntry(i, 0x3);
                vn.push();
                do{
                    if( vn.getTokenDepth(vn.getCurrentIndex())>=0)
                       fib.modifyEntry(vn.getCurrentIndex(), 0x3);
                    else
                        break;
                }while(vn.toElement(VTDNav.P));
                vn.pop();
                vn.push();
                while((k=ap1.evalXPath())!=-1){
                    fib.modifyEntry(k, 0x3);
                }
                ap1.resetXPath();
                vn.pop();
            }
            ap.resetXPath();
        }

        //remove those not on the whitelist
        XMLModifier xm = new XMLModifier(vn);
        for (int j=0;j<fib.size();j++){
            if (fib.intAt(j)==0x1){
                vn.recoverNode(j);
                xm.remove();
            }
        }
        xm.output("d:\\xml\\newSchoolInfo.xml");                    
    }
}

#2


0  

All Groovy:

所有Groovy:

import groovy.xml.XmlUtil

def xml = '''<SchoolInfo RefId="34060F68BE3942F1B1264E6D2CC3C353">
    <LocalId>57</LocalId>
    <SchoolName>Foobar School of Technology</SchoolName>
    <Principal>
       <FirstName>Bob</FirstName>
       <LastName>Smith</LastName>
    </Principal>
    <StateProvinceId>34573</StateProvinceId>
    <LEAInfoRefId>340666687E3942F1B1264E1223453C353</LEAInfoRefId>
    <PhoneNumberList>
       <PhoneNumber Type="0096">
          <Number>555-832-5555</Number>
       </PhoneNumber>
       <PhoneNumber Type="0096">
          <Number>555-999-5555</Number>
       </PhoneNumber>
    </PhoneNumberList>
 </SchoolInfo>'''

def node = new XmlParser().parseText(xml)

def whitelist = [ 'LocalId', 'SchoolName', 'Principal/FirstName', "PhoneNumberList/PhoneNumber/Number" ]*.split('/')

def void loveRemovalMachine(node, whitelist) {
    def elementNamesToKeep = whitelist*.head()
    println "Retaining nodes ${elementNamesToKeep} for node $node"
    def nodesToRemove = node.'*'.findAll { child -> !elementNamesToKeep.contains(child.name()) }
    nodesToRemove.each { node.remove it }
    def nextWhitelist = whitelist*.tail().findAll { it }
    println "Next level: $nextWhitelist"
    if (!nextWhitelist) {
        return
    }
    // The "*" operator seems to return text nodes...very stupid.
    node.'*:*'.each { loveRemovalMachine it, nextWhitelist }
}

loveRemovalMachine node, whitelist

XmlUtil.serialize node