C++解析xml的开源库有很多,在此我就不一一列举了,今天主要说下Rapidxml,我使用这个库也并不是很多,如有错误之处还望大家能够之处,谢谢。
附:
官方链接:http://rapidxml.sourceforge.net/
官方手册:http://rapidxml.sourceforge.net/manual.html
之前有一次用到,碰到了个"坑",当时时间紧迫并未及时查找,今天再次用到这个库,对这样的"坑"不能踩第二次,因此我决定探个究竟。
先写两段示例:
创建xm:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
|
void CreateXml()
{
rapidxml::xml_document<> doc;
auto nodeDecl = doc.allocate_node(rapidxml::node_declaration);
nodeDecl->append_attribute(doc.allocate_attribute("version", "1.0"));
nodeDecl->append_attribute(doc.allocate_attribute("encoding", "UTF-8"));
doc.append_node(nodeDecl);//添加xml声明
auto nodeRoot = doc.allocate_node(rapidxml::node_element, "Root");//创建一个Root节点
nodeRoot->append_node(doc.allocate_node(rapidxml::node_comment, NULL, "编程语言"));//添加一个注释内容到Root,注释没有name 所以第二个参数为NULL
auto nodeLangrage = doc.allocate_node(rapidxml::node_element, "language", "This is C language");//创建一个language节点
nodeLangrage->append_attribute(doc.allocate_attribute("name", "C"));//添加一个name属性到language
nodeRoot->append_node(nodeLangrage); //添加一个language到Root节点
nodeLangrage = doc.allocate_node(rapidxml::node_element, "language", "This is C++ language");//创建一个language节点
nodeLangrage->append_attribute(doc.allocate_attribute("name", "C++"));//添加一个name属性到language
nodeRoot->append_node(nodeLangrage); //添加一个language到Root节点
doc.append_node(nodeRoot);//添加Root节点到Document
std::string buffer;
rapidxml::print(std::back_inserter(buffer), doc, 0);
std::ofstream outFile("language.xml");
outFile << buffer;
outFile.close();
}
|
结果:
1
2
3
4
5
6
|
<? xml version = "1.0" encoding = "UTF-8" ?>
< Root >
<!--编程语言-->
< language name = "C" >This is C language</ language >
< language name = "C++" >This is C++ language</ language >
</ Root >
|
修改xml:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
void MotifyXml()
{
rapidxml::file<> requestFile("language.xml");//从文件加载xml
rapidxml::xml_document<> doc;
doc.parse< 0 >(requestFile.data());//解析xml
auto nodeRoot = doc.first_node();//获取第一个节点,也就是Root节点
auto nodeLanguage = nodeRoot->first_node("language");//获取Root下第一个language节点
nodeLanguage->first_attribute("name")->value("Motify C");//修改language节点的name属性为 Motify C
std::string buffer;
rapidxml::print(std::back_inserter(buffer), doc, 0);
std::ofstream outFile("MotifyLanguage.xml");
outFile << buffer;
outFile.close();
}
|
结果:
1
2
3
4
|
< Root >
< language name = "Motify C" >This is C language</ language >
< language name = "C++" >This is C++ language</ language >
</ Root >
|
由第二个结果得出:
第一个language的name属性确实改成我们所期望的值了,不过不难发现xml的声明和注释都消失了。是怎么回事呢?这个问题也困扰了我一段时间,既然是开源库,那我们跟一下看看他都干了什么,从代码可以看出可疑的地方主要有两处:print和parse,这两个函数均需要提供一个flag,这个flag到底都干了什么呢,从官方给的教程来看 均使用的0,既然最终执行的是print我们就从print开始调试跟踪吧
找到了找到print调用的地方:
1
2
3
4
5
|
template< class OutIt, class Ch>
inline OutIt print(OutIt out, const xml_node< Ch > &node, int flags = 0)
{
return internal::print_node(out, &node, flags, 0);
}
|
继续跟踪:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
|
// Print node
template< class OutIt, class Ch>
inline OutIt print_node(OutIt out, const xml_node< Ch > *node, int flags, int indent)
{
// Print proper node type
switch (node->type())
{
// Document
case node_document:
out = print_children(out, node, flags, indent);
break;
// Element
case node_element:
out = print_element_node(out, node, flags, indent);
break;
// Data
case node_data:
out = print_data_node(out, node, flags, indent);
break;
// CDATA
case node_cdata:
out = print_cdata_node(out, node, flags, indent);
break;
// Declaration
case node_declaration:
out = print_declaration_node(out, node, flags, indent);
break;
// Comment
case node_comment:
out = print_comment_node(out, node, flags, indent);
break;
// Doctype
case node_doctype:
out = print_doctype_node(out, node, flags, indent);
break;
// Pi
case node_pi:
out = print_pi_node(out, node, flags, indent);
break;
// Unknown
default:
assert(0);
break;
}
// If indenting not disabled, add line break after node
if (!(flags & print_no_indenting))
*out = Ch('\n'), ++out;
// Return modified iterator
return out;
}
|
跟进print_children 发现这实际是个递归,我们继续跟踪
1
2
3
4
5
6
7
8
9
10
11
12
|
// Print element node
template< class OutIt, class Ch>
inline OutIt print_element_node(OutIt out, const xml_node< Ch > *node, int flags, int indent)
{
assert(node->type() == node_element);
// Print element name and attributes, if any
if (!(flags & print_no_indenting))
...//省略部分代码
return out;
}
|
我们发现第8行有一个&判断 查看print_no_indenting的定义:
1
2
|
// Printing flags
const int print_no_indenting = 0x1; //!< Printer flag instructing the printer to suppress indenting of XML. See print() function.
|
据此我们就可以分析了,按照开发风格统一的思想,parse也应该有相同的标志定义
省略分析parse流程..
我也顺便去查看了官方文档,确实和我预想的一样,贴一下头文件中对这些标志的描述,详细信息可参考官方文档
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
|
// Parsing flags
//! Parse flag instructing the parser to not create data nodes.
//! Text of first data node will still be placed in value of parent element, unless rapidxml::parse_no_element_values flag is also specified.
//! Can be combined with other flags by use of | operator.
//! < br >< br >
//! See xml_document::parse() function.
const int parse_no_data_nodes = 0x1;
//! Parse flag instructing the parser to not use text of first data node as a value of parent element.
//! Can be combined with other flags by use of | operator.
//! Note that child data nodes of element node take precendence over its value when printing.
//! That is, if element has one or more child data nodes < em >and</ em > a value, the value will be ignored.
//! Use rapidxml::parse_no_data_nodes flag to prevent creation of data nodes if you want to manipulate data using values of elements.
//! < br >< br >
//! See xml_document::parse() function.
const int parse_no_element_values = 0x2;
//! Parse flag instructing the parser to not place zero terminators after strings in the source text.
//! By default zero terminators are placed, modifying source text.
//! Can be combined with other flags by use of | operator.
//! < br >< br >
//! See xml_document::parse() function.
const int parse_no_string_terminators = 0x4;
//! Parse flag instructing the parser to not translate entities in the source text.
//! By default entities are translated, modifying source text.
//! Can be combined with other flags by use of | operator.
//! < br >< br >
//! See xml_document::parse() function.
const int parse_no_entity_translation = 0x8;
//! Parse flag instructing the parser to disable UTF-8 handling and assume plain 8 bit characters.
//! By default, UTF-8 handling is enabled.
//! Can be combined with other flags by use of | operator.
//! < br >< br >
//! See xml_document::parse() function.
const int parse_no_utf8 = 0x10;
//! Parse flag instructing the parser to create XML declaration node.
//! By default, declaration node is not created.
//! Can be combined with other flags by use of | operator.
//! < br >< br >
//! See xml_document::parse() function.
const int parse_declaration_node = 0x20;
//! Parse flag instructing the parser to create comments nodes.
//! By default, comment nodes are not created.
//! Can be combined with other flags by use of | operator.
//! < br >< br >
//! See xml_document::parse() function.
const int parse_comment_nodes = 0x40;
//! Parse flag instructing the parser to create DOCTYPE node.
//! By default, doctype node is not created.
//! Although W3C specification allows at most one DOCTYPE node, RapidXml will silently accept documents with more than one.
//! Can be combined with other flags by use of | operator.
//! < br >< br >
//! See xml_document::parse() function.
const int parse_doctype_node = 0x80;
//! Parse flag instructing the parser to create PI nodes.
//! By default, PI nodes are not created.
//! Can be combined with other flags by use of | operator.
//! < br >< br >
//! See xml_document::parse() function.
const int parse_pi_nodes = 0x100;
//! Parse flag instructing the parser to validate closing tag names.
//! If not set, name inside closing tag is irrelevant to the parser.
//! By default, closing tags are not validated.
//! Can be combined with other flags by use of | operator.
//! < br >< br >
//! See xml_document::parse() function.
const int parse_validate_closing_tags = 0x200;
//! Parse flag instructing the parser to trim all leading and trailing whitespace of data nodes.
//! By default, whitespace is not trimmed.
//! This flag does not cause the parser to modify source text.
//! Can be combined with other flags by use of | operator.
//! < br >< br >
//! See xml_document::parse() function.
const int parse_trim_whitespace = 0x400;
//! Parse flag instructing the parser to condense all whitespace runs of data nodes to a single space character.
//! Trimming of leading and trailing whitespace of data is controlled by rapidxml::parse_trim_whitespace flag.
//! By default, whitespace is not normalized.
//! If this flag is specified, source text will be modified.
//! Can be combined with other flags by use of | operator.
//! < br >< br >
//! See xml_document::parse() function.
const int parse_normalize_whitespace = 0x800;
// Compound flags
//! Parse flags which represent default behaviour of the parser.
//! This is always equal to 0, so that all other flags can be simply ored together.
//! Normally there is no need to inconveniently disable flags by anding with their negated (~) values.
//! This also means that meaning of each flag is a < i >negation</ i > of the default setting.
//! For example, if flag name is rapidxml::parse_no_utf8, it means that utf-8 is < i >enabled</ i > by default,
//! and using the flag will disable it.
//! < br >< br >
//! See xml_document::parse() function.
const int parse_default = 0;
//! A combination of parse flags that forbids any modifications of the source text.
//! This also results in faster parsing. However, note that the following will occur:
//! < ul >
//! < li >names and values of nodes will not be zero terminated, you have to use xml_base::name_size() and xml_base::value_size() functions to determine where name and value ends</ li >
//! < li >entities will not be translated</ li >
//! < li >whitespace will not be normalized</ li >
//! </ ul >
//! See xml_document::parse() function.
const int parse_non_destructive = parse_no_string_terminators | parse_no_entity_translation;
//! A combination of parse flags resulting in fastest possible parsing, without sacrificing important data.
//! < br >< br >
//! See xml_document::parse() function.
const int parse_fastest = parse_non_destructive | parse_no_data_nodes;
//! A combination of parse flags resulting in largest amount of data being extracted.
//! This usually results in slowest parsing.
//! < br >< br >
//! See xml_document::parse() function.
const int parse_full = parse_declaration_node | parse_comment_nodes | parse_doctype_node | parse_pi_nodes | parse_validate_closing_tags;
|
根据以上提供的信息我们改下之前的源代码:
将
1
2
|
doc.parse< 0 >(requestFile.data());//解析xml
auto nodeRoot = doc.first_node("");//获取第一个节点,也就是Root节点
|
改为
1
2
|
doc.parse< rapidxml::parse_declaration_node | rapidxml::parse_comment_nodes | rapidxml::parse_non_destructive>(requestFile.data());//解析xml
auto nodeRoot = doc.first_node("Root");//获取第一个节点,也就是Root节点
|
这里解释一下,parse加入了三个标志,分别是告诉解析器创建声明节点、告诉解析器创建注释节点、和不希望解析器修改传进去的数据,第二句是当有xml的声明时,默认的first_node并不是我们期望的Root节点,因此通过传节点名来找到我们需要的节点。
注:
1、这个库在append的时候并不去判断添加项(节点、属性等)是否存在
2、循环遍历时对项(节点、属性等)进行修改会导致迭代失效
总结:用别人写的库,总会有些意想不到的问题,至今我只遇到了这些问题,如果还有其它问题欢迎补充,顺便解释下"坑"并不一定是用的开源库有问题,更多的时候可能是还没有熟练的去使用这个工具。
感谢rapidxml的作者,为我们提供一个如此高效便利的工具。
以上这篇浅谈使用Rapidxml 库遇到的问题和分析过程(分享)就是小编分享给大家的全部内容了,希望能给大家一个参考,也希望大家多多支持服务器之家。