在跳过某些元素的同时从DOM生成XML文档

时间:2022-11-29 10:32:50

I have HTML that I'm trying to generate an XML document from. I want to skip certain elements (basically all but my divs) and for this purpose, I've written a simple DOM traversal function, but I seem to be getting stuck in an infinite loop. (More details below.)

我有HTML,我正在尝试从中生成XML文档。我想跳过某些元素(基本上除了我的div之外),为此,我编写了一个简单的DOM遍历函数,但我似乎陷入了无限循环。 (更多细节如下。)

So I have HTML that looks something like this:

<div id="browserDiv">
    <h3>Library</h3>
    <ul>
        <li>
            <div id="t-0" class="section topic" data-content="2b-2t-38-w-2c-2w-2t-33-36-3d">
                <p>Set Theory</p>
                <img class="toggle"><img class="edit">
                <img class="add-entry"><img class="delete">
                <ul>
                    <li>
                        <div id="t-0-0" class="section topic" data-content="1t-3c-2x-33-31-37">
                            <p>Axioms</p>
                            <img class="toggle"><img class="edit">
                            <img class="add-entry"><img class="delete">
                            <ul>
                                <li>
                                    <div id="t-0-0-0" class="section topic" data-content="1t-3c-2x-33-31-w-33-2u-w-2b-2t-34-2p-36-2p-38-2x-33-32">
                                        <p>Axiom of Separation</p>
                                        <img class="toggle"><img class="edit">
                                        <img class="add-entry"><img class="delete">
                                        <ul>
                                            <li>
                                                <img class="add-new">
                                            </li>
                                        </ul>
                                </li>
                                <li>
                                    <img class="add-new">
                                </li>
                        </div>
                    </li>
                    <li>
                        <img class="add-new">
                    </li>
                </ul>
            </div>
        </li>
        <li>
            <div id="t-1" class="section topic" data-content="1t-32-2p-30-3d-37-2x-37">
                <p>Analysis</p>
                <img class="toggle"><img class="edit">
                <img class="add-entry"><img class="delete">
                <ul>
                    <li>
                        <img class="add-new">
                    </li>
                </ul>
            </div>
        </li>
        <li>
            <img class="add-new">
        </li>
    </ul>
</div>

Here's a screenshot:

在跳过某些元素的同时从DOM生成XML文档

And I'm trying to convert this html into an XML file. But the XML only stores info contained in the div elements, so I'm trying to skip over all the other elements when I iterate through the DOM tree.

我正在尝试将此HTML转换为XML文件。但XML只存储div元素中包含的信息,所以当我遍历DOM树时,我试图跳过所有其他元素。

The sort of XML I'm aiming to produce (eventually):

<?xml version="1.0" encoding="UTF-8"?>
<library userid="095209376">
    <title>UserID095209376's Library</title>
    <topic children="yes" loadable="no">
        <id>0</id>
        <encoding>2b-2t-38-w-2c-2w-2t-33-36-3d</encoding>
        <topic children="yes" loadable="no">
            <id>0-0</id>
            <encoding>1t-3c-2x-33-31-37</encoding>
            <topic children="no" loadable="yes">
                <id>0-0-0</id>
                <encoding>1t-3c-2x-33-31-w-33-2u-w-2b-2t-34-2p-36-2p-38-2x-33-32</encoding>
            </topic>
        </topic>
    <topic children="yes" loadable="no">
        <id>1</id>
        <encoding>1t-32-2p-30-3d-37-2x-37</encoding>
    </topic>
</library>

Here's how I'm iterating through it currently:

(Note that the script tags are only there to get SO to do syntax highlighting.)

(请注意,脚本标记只是为了让语法高亮显示。)

<script>
function saveLibrary(){

    var xmlDoc = document.implementation.createDocument('http://www.tuningcode.com', 'library');
    var rootNode = document.getElementById('browserDiv');
    console.log("rootNode here: " + rootNode);
    var libraryTree = walkLibraryTree2(rootNode, xmlDoc);
    xmlDoc.documentElement.appendChild(libraryTree);
    var oSerializer = new XMLSerializer();
    var sXML = oSerializer.serializeToString(xmlDoc);
    console.log("xmlDoc: " + xmlDoc);
    console.log(sXML);

}

function walkLibraryTree2(nodeToWalk, doc){

    var elem = doc.createElement(nodeToWalk.tagName);
    console.log(elem);
    if(nodeToWalk.hasChildNodes()){
        var ch = nodeToWalk.children;
        for(var i = 0; i < ch.length; i++){
            var theWalk = walkLibraryTree2(ch[i], doc);
            if(theWalk != null){
                if(ch[i].tagName == 'DIV'){
                    elem.appendChild(theWalk);
                } else{
                    elem = theWalk;
                }
            }
        }
        return elem;
    } else {
        return null;
    }
}

saveLibrary();
</script>

The problem is that when I run it, (edit) it takes much longer than it should and produces something like this:

问题是,当我运行它时,(编辑)它需要的时间比它应该的长得多,并产生这样的东西:

<library xmlns="http://www.tuningcode.com"><LI xmlns=""/></library>.

In other words, it doesn't print any of the divs, and only one li element. I have it printing to the console quite a bit, and even with only with the amount of nodes shown above, it's printing thousands of statements to the console.

换句话说,它不打印任何div,只打印一个li元素。我把它打印到控制台很多,即使只有上面显示的节点数量,它也会向控制台打印数千个语句。

The question:

How can I traverse the tree skipping all but the div elements? Or why is the code above not working correctly?

除了div元素之外,我怎样才能遍历树跳过?或者为什么上面的代码无法正常工作?

Here's a JSFiddle:

http://jsfiddle.net/4bGjH/

1 个解决方案

#1


1  

I think you're encountering that very long running time because you call walkLibraryTree2 twice for every iteration of your for loop, resulting in an exponential expansion (your HTML is 13 levels deep, so that means walkLibraryTree2 is called over 8,000 times).

我认为你遇到了很长的运行时间,因为你为for循环的每次迭代都调用了两次walkLibraryTree2,导致指数式扩展(你的HTML深度为13级,这意味着walkLibraryTree2被调用超过8000次)。

When working with a complicated problem, it's a good idea to break it down into smaller parts. The following seems to work:

处理复杂问题时,最好将其分解为较小的部分。以下似乎有效:

<script>
function saveLibrary() {
    var xmlDoc = document.implementation.createDocument(null, 'library');
    var rootNode = document.getElementById('browserDiv');
    console.log("rootNode here: " + rootNode);

    appendNodes(xmlDoc.documentElement, processChildren(rootNode, xmlDoc));

    var oSerializer = new XMLSerializer();
    var sXML = oSerializer.serializeToString(xmlDoc);
    console.log("xmlDoc: " + xmlDoc);
    console.log(sXML);
}

// DomNode, Document -> Array[DomNode]
function processChildren(node, doc) {
    var nodes = [],
        i;

    for (i = 0; i < node.childNodes.length; i += 1) {
        nodes = nodes.concat(processNode(node.childNodes[i], doc));
    }

    return nodes;
}

// DomNode, Array[DomNode] -> void
function appendNodes(destNode, nodes) {
    var i;

    for (i = 0; i < nodes.length; i += 1) {
        destNode.appendChild(nodes[i]);
    }
}

// DomNode, Document -> Array[DomNode]
function processNode(node, doc) {
    var children = processChildren(node, doc);

    if (node.tagName == "DIV") {
        return [createTopicElement(node, doc, children)];
    } else {
        return children;
    }
}

// DomNode, Document, Array[DomNode] -> DomNode
function createTopicElement(baseNode, doc, children) {
    var el = doc.createElement("topic"),
        hasChildren = !! children.length,
        id = node.id.substring(2),
        encoding = node.getAttribute("data-content");

    el.setAttribute("children", hasChildren ? "yes" : "no");
    el.appendChild(createElementWithValue(doc, "id", id));
    el.appendChild(createElementWithValue(doc, "encoding", encoding));
    appendNodes(el, children);

    return el;
}

// Document, String, String -> DomNode
function createElementWithValue(doc, name, value) {
    var el = doc.createElement(name);
    el.textContent = value;
    return el;
}

saveLibrary();    
</script>

This produces the XML:

这会产生XML:

<library>
    <topic children="yes">
        <id>0</id>
        <encoding>2b-2t-38-w-2c-2w-2t-33-36-3d</encoding>
        <topic children="yes">
            <id>0-0</id>
            <encoding>1t-3c-2x-33-31-37</encoding>
            <topic children="no">
                <id>0-0-0</id>
                <encoding>1t-3c-2x-33-31-w-33-2u-w-2b-2t-34-2p-36-2p-38-2x-33-32</encoding>
            </topic>
        </topic>
    </topic>
    <topic children="no">
        <id>1</id>
        <encoding>1t-32-2p-30-3d-37-2x-37</encoding>
    </topic>
</library>

I don't know how your loadable attribute is determined, or where the title comes from, but this should get you most of the way there.

我不知道你的可加载属性是如何确定的,或者标题来自哪里,但这应该可以让你在那里大部分时间。

http://jsfiddle.net/Weu4A/4/

#1


1  

I think you're encountering that very long running time because you call walkLibraryTree2 twice for every iteration of your for loop, resulting in an exponential expansion (your HTML is 13 levels deep, so that means walkLibraryTree2 is called over 8,000 times).

我认为你遇到了很长的运行时间,因为你为for循环的每次迭代都调用了两次walkLibraryTree2,导致指数式扩展(你的HTML深度为13级,这意味着walkLibraryTree2被调用超过8000次)。

When working with a complicated problem, it's a good idea to break it down into smaller parts. The following seems to work:

处理复杂问题时,最好将其分解为较小的部分。以下似乎有效:

<script>
function saveLibrary() {
    var xmlDoc = document.implementation.createDocument(null, 'library');
    var rootNode = document.getElementById('browserDiv');
    console.log("rootNode here: " + rootNode);

    appendNodes(xmlDoc.documentElement, processChildren(rootNode, xmlDoc));

    var oSerializer = new XMLSerializer();
    var sXML = oSerializer.serializeToString(xmlDoc);
    console.log("xmlDoc: " + xmlDoc);
    console.log(sXML);
}

// DomNode, Document -> Array[DomNode]
function processChildren(node, doc) {
    var nodes = [],
        i;

    for (i = 0; i < node.childNodes.length; i += 1) {
        nodes = nodes.concat(processNode(node.childNodes[i], doc));
    }

    return nodes;
}

// DomNode, Array[DomNode] -> void
function appendNodes(destNode, nodes) {
    var i;

    for (i = 0; i < nodes.length; i += 1) {
        destNode.appendChild(nodes[i]);
    }
}

// DomNode, Document -> Array[DomNode]
function processNode(node, doc) {
    var children = processChildren(node, doc);

    if (node.tagName == "DIV") {
        return [createTopicElement(node, doc, children)];
    } else {
        return children;
    }
}

// DomNode, Document, Array[DomNode] -> DomNode
function createTopicElement(baseNode, doc, children) {
    var el = doc.createElement("topic"),
        hasChildren = !! children.length,
        id = node.id.substring(2),
        encoding = node.getAttribute("data-content");

    el.setAttribute("children", hasChildren ? "yes" : "no");
    el.appendChild(createElementWithValue(doc, "id", id));
    el.appendChild(createElementWithValue(doc, "encoding", encoding));
    appendNodes(el, children);

    return el;
}

// Document, String, String -> DomNode
function createElementWithValue(doc, name, value) {
    var el = doc.createElement(name);
    el.textContent = value;
    return el;
}

saveLibrary();    
</script>

This produces the XML:

这会产生XML:

<library>
    <topic children="yes">
        <id>0</id>
        <encoding>2b-2t-38-w-2c-2w-2t-33-36-3d</encoding>
        <topic children="yes">
            <id>0-0</id>
            <encoding>1t-3c-2x-33-31-37</encoding>
            <topic children="no">
                <id>0-0-0</id>
                <encoding>1t-3c-2x-33-31-w-33-2u-w-2b-2t-34-2p-36-2p-38-2x-33-32</encoding>
            </topic>
        </topic>
    </topic>
    <topic children="no">
        <id>1</id>
        <encoding>1t-32-2p-30-3d-37-2x-37</encoding>
    </topic>
</library>

I don't know how your loadable attribute is determined, or where the title comes from, but this should get you most of the way there.

我不知道你的可加载属性是如何确定的,或者标题来自哪里,但这应该可以让你在那里大部分时间。

http://jsfiddle.net/Weu4A/4/