浅谈使用Rapidxml 库遇到的问题和分析过程(分享)

C++解析xml的开源库有很多，在此我就不一一列举了，今天主要说下Rapidxml，我使用这个库也并不是很多，如有错误之处还望大家能够之处，谢谢。

附：

官方链接：http://rapidxml.sourceforge.net/

官方手册：http://rapidxml.sourceforge.net/manual.html

之前有一次用到，碰到了个"坑"，当时时间紧迫并未及时查找，今天再次用到这个库，对这样的"坑"不能踩第二次，因此我决定探个究竟。

先写两段示例：

创建xm：

				?

									void CreateXml()

									{

									  rapidxml::xml_document<> doc;

									  auto nodeDecl = doc.allocate_node(rapidxml::node_declaration);

									  nodeDecl->append_attribute(doc.allocate_attribute("version", "1.0"));

									  nodeDecl->append_attribute(doc.allocate_attribute("encoding", "UTF-8"));

									  doc.append_node(nodeDecl);//添加xml声明

									  auto nodeRoot = doc.allocate_node(rapidxml::node_element, "Root");//创建一个Root节点

									  nodeRoot->append_node(doc.allocate_node(rapidxml::node_comment, NULL, "编程语言"));//添加一个注释内容到Root，注释没有name 所以第二个参数为NULL

									  auto nodeLangrage = doc.allocate_node(rapidxml::node_element, "language", "This is C language");//创建一个language节点

									  nodeLangrage->append_attribute(doc.allocate_attribute("name", "C"));//添加一个name属性到language

									  nodeRoot->append_node(nodeLangrage); //添加一个language到Root节点

									  nodeLangrage = doc.allocate_node(rapidxml::node_element, "language", "This is C++ language");//创建一个language节点

									  nodeLangrage->append_attribute(doc.allocate_attribute("name", "C++"));//添加一个name属性到language

									  nodeRoot->append_node(nodeLangrage); //添加一个language到Root节点

									  doc.append_node(nodeRoot);//添加Root节点到Document

									  std::string buffer;

									  rapidxml::print(std::back_inserter(buffer), doc, 0);

									  std::ofstream outFile("language.xml");

									  outFile << buffer;

									  outFile.close();

									}

结果：

				?

									<?xml version="1.0" encoding="UTF-8"?>

									<Root>

									  <!--编程语言-->

									  <language name="C">This is C language</language>

									  <language name="C++">This is C++ language</language>

									</Root>

修改xml：

				?

									void MotifyXml()

									{

									  rapidxml::file<> requestFile("language.xml");//从文件加载xml

									  rapidxml::xml_document<> doc;

									  doc.parse<0>(requestFile.data());//解析xml

									  auto nodeRoot = doc.first_node();//获取第一个节点，也就是Root节点

									  auto nodeLanguage = nodeRoot->first_node("language");//获取Root下第一个language节点

									  nodeLanguage->first_attribute("name")->value("Motify C");//修改language节点的name属性为 Motify C

									  std::string buffer;

									  rapidxml::print(std::back_inserter(buffer), doc, 0);

									  std::ofstream outFile("MotifyLanguage.xml");

									  outFile << buffer;

									  outFile.close();

									}

结果：

				?

									<Root>

									  <language name="Motify C">This is C language</language>

									  <language name="C++">This is C++ language</language>

									</Root>

由第二个结果得出：

第一个language的name属性确实改成我们所期望的值了，不过不难发现xml的声明和注释都消失了。是怎么回事呢？这个问题也困扰了我一段时间，既然是开源库，那我们跟一下看看他都干了什么，从代码可以看出可疑的地方主要有两处：print和parse，这两个函数均需要提供一个flag，这个flag到底都干了什么呢，从官方给的教程来看均使用的0，既然最终执行的是print我们就从print开始调试跟踪吧

找到了找到print调用的地方：

				?

									template<class OutIt, class Ch> 

									   inline OutIt print(OutIt out, const xml_node<Ch> &node, int flags = 0)

									   {

									     return internal::print_node(out, &node, flags, 0);

									   }

继续跟踪：

				?

									// Print node

									    template<class OutIt, class Ch>

									    inline OutIt print_node(OutIt out, const xml_node<Ch> *node, int flags, int indent)

									    {

									      // Print proper node type

									      switch (node->type())

									      {

									      // Document

									      case node_document:

									        out = print_children(out, node, flags, indent);

									        break;

									      // Element

									      case node_element:

									        out = print_element_node(out, node, flags, indent);

									        break;

									      // Data

									      case node_data:

									        out = print_data_node(out, node, flags, indent);

									        break;

									      // CDATA

									      case node_cdata:

									        out = print_cdata_node(out, node, flags, indent);

									        break;

									      // Declaration

									      case node_declaration:

									        out = print_declaration_node(out, node, flags, indent);

									        break;

									      // Comment

									      case node_comment:

									        out = print_comment_node(out, node, flags, indent);

									        break;

									      // Doctype

									      case node_doctype:

									        out = print_doctype_node(out, node, flags, indent);

									        break;

									      // Pi

									      case node_pi:

									        out = print_pi_node(out, node, flags, indent);

									        break;

									        // Unknown

									      default:

									        assert(0);

									        break;

									      }

									      // If indenting not disabled, add line break after node

									      if (!(flags & print_no_indenting))

									        *out = Ch('\n'), ++out;

									      // Return modified iterator

									      return out;

									    }

跟进print_children 发现这实际是个递归，我们继续跟踪

				?

									// Print element node

									template<class OutIt, class Ch>

									inline OutIt print_element_node(OutIt out, const xml_node<Ch> *node, int flags, int indent)

									{

									  assert(node->type() == node_element);

									  // Print element name and attributes, if any

									  if (!(flags & print_no_indenting))

									  ...//省略部分代码

									  return out;

									}

我们发现第8行有一个&判断查看print_no_indenting的定义：

				?

									// Printing flags

									const int print_no_indenting = 0x1;  //!< Printer flag instructing the printer to suppress indenting of XML. See print() function.

据此我们就可以分析了，按照开发风格统一的思想，parse也应该有相同的标志定义

省略分析parse流程..

我也顺便去查看了官方文档，确实和我预想的一样，贴一下头文件中对这些标志的描述，详细信息可参考官方文档

				?

									// Parsing flags

									  //! Parse flag instructing the parser to not create data nodes. 

									  //! Text of first data node will still be placed in value of parent element, unless rapidxml::parse_no_element_values flag is also specified.

									  //! Can be combined with other flags by use of | operator.

									  //! <br><br>

									  //! See xml_document::parse() function.

									  const int parse_no_data_nodes = 0x1;      

									  //! Parse flag instructing the parser to not use text of first data node as a value of parent element.

									  //! Can be combined with other flags by use of | operator.

									  //! Note that child data nodes of element node take precendence over its value when printing. 

									  //! That is, if element has one or more child data nodes <em>and</em> a value, the value will be ignored.

									  //! Use rapidxml::parse_no_data_nodes flag to prevent creation of data nodes if you want to manipulate data using values of elements.

									  //! <br><br>

									  //! See xml_document::parse() function.

									  const int parse_no_element_values = 0x2;

									  //! Parse flag instructing the parser to not place zero terminators after strings in the source text.

									  //! By default zero terminators are placed, modifying source text.

									  //! Can be combined with other flags by use of | operator.

									  //! <br><br>

									  //! See xml_document::parse() function.

									  const int parse_no_string_terminators = 0x4;

									  //! Parse flag instructing the parser to not translate entities in the source text.

									  //! By default entities are translated, modifying source text.

									  //! Can be combined with other flags by use of | operator.

									  //! <br><br>

									  //! See xml_document::parse() function.

									  const int parse_no_entity_translation = 0x8;

									  //! Parse flag instructing the parser to disable UTF-8 handling and assume plain 8 bit characters.

									  //! By default, UTF-8 handling is enabled.

									  //! Can be combined with other flags by use of | operator.

									  //! <br><br>

									  //! See xml_document::parse() function.

									  const int parse_no_utf8 = 0x10;

									  //! Parse flag instructing the parser to create XML declaration node.

									  //! By default, declaration node is not created.

									  //! Can be combined with other flags by use of | operator.

									  //! <br><br>

									  //! See xml_document::parse() function.

									  const int parse_declaration_node = 0x20;

									  //! Parse flag instructing the parser to create comments nodes.

									  //! By default, comment nodes are not created.

									  //! Can be combined with other flags by use of | operator.

									  //! <br><br>

									  //! See xml_document::parse() function.

									  const int parse_comment_nodes = 0x40;

									  //! Parse flag instructing the parser to create DOCTYPE node.

									  //! By default, doctype node is not created.

									  //! Although W3C specification allows at most one DOCTYPE node, RapidXml will silently accept documents with more than one.

									  //! Can be combined with other flags by use of | operator.

									  //! <br><br>

									  //! See xml_document::parse() function.

									  const int parse_doctype_node = 0x80;

									  //! Parse flag instructing the parser to create PI nodes.

									  //! By default, PI nodes are not created.

									  //! Can be combined with other flags by use of | operator.

									  //! <br><br>

									  //! See xml_document::parse() function.

									  const int parse_pi_nodes = 0x100;

									  //! Parse flag instructing the parser to validate closing tag names. 

									  //! If not set, name inside closing tag is irrelevant to the parser.

									  //! By default, closing tags are not validated.

									  //! Can be combined with other flags by use of | operator.

									  //! <br><br>

									  //! See xml_document::parse() function.

									  const int parse_validate_closing_tags = 0x200;

									  //! Parse flag instructing the parser to trim all leading and trailing whitespace of data nodes.

									  //! By default, whitespace is not trimmed. 

									  //! This flag does not cause the parser to modify source text.

									  //! Can be combined with other flags by use of | operator.

									  //! <br><br>

									  //! See xml_document::parse() function.

									  const int parse_trim_whitespace = 0x400;

									  //! Parse flag instructing the parser to condense all whitespace runs of data nodes to a single space character.

									  //! Trimming of leading and trailing whitespace of data is controlled by rapidxml::parse_trim_whitespace flag.

									  //! By default, whitespace is not normalized. 

									  //! If this flag is specified, source text will be modified.

									  //! Can be combined with other flags by use of | operator.

									  //! <br><br>

									  //! See xml_document::parse() function.

									  const int parse_normalize_whitespace = 0x800;

									  // Compound flags

									  //! Parse flags which represent default behaviour of the parser. 

									  //! This is always equal to 0, so that all other flags can be simply ored together.

									  //! Normally there is no need to inconveniently disable flags by anding with their negated (~) values.

									  //! This also means that meaning of each flag is a <i>negation</i> of the default setting. 

									  //! For example, if flag name is rapidxml::parse_no_utf8, it means that utf-8 is <i>enabled</i> by default,

									  //! and using the flag will disable it.

									  //! <br><br>

									  //! See xml_document::parse() function.

									  const int parse_default = 0;

									  //! A combination of parse flags that forbids any modifications of the source text. 

									  //! This also results in faster parsing. However, note that the following will occur:

									  //! <ul>

									  //! <li>names and values of nodes will not be zero terminated, you have to use xml_base::name_size() and xml_base::value_size() functions to determine where name and value ends</li>

									  //! <li>entities will not be translated</li>

									  //! <li>whitespace will not be normalized</li>

									  //! </ul>

									  //! See xml_document::parse() function.

									  const int parse_non_destructive = parse_no_string_terminators | parse_no_entity_translation;

									  //! A combination of parse flags resulting in fastest possible parsing, without sacrificing important data.

									  //! <br><br>

									  //! See xml_document::parse() function.

									  const int parse_fastest = parse_non_destructive | parse_no_data_nodes;

									  //! A combination of parse flags resulting in largest amount of data being extracted. 

									  //! This usually results in slowest parsing.

									  //! <br><br>

									  //! See xml_document::parse() function.

									  const int parse_full = parse_declaration_node | parse_comment_nodes | parse_doctype_node | parse_pi_nodes | parse_validate_closing_tags;

根据以上提供的信息我们改下之前的源代码：

将

				?

									doc.parse<0>(requestFile.data());//解析xml

									auto nodeRoot = doc.first_node("");//获取第一个节点，也就是Root节点

改为

				?

									doc.parse<rapidxml::parse_declaration_node | rapidxml::parse_comment_nodes | rapidxml::parse_non_destructive>(requestFile.data());//解析xml

									auto nodeRoot = doc.first_node("Root");//获取第一个节点，也就是Root节点

这里解释一下，parse加入了三个标志，分别是告诉解析器创建声明节点、告诉解析器创建注释节点、和不希望解析器修改传进去的数据，第二句是当有xml的声明时，默认的first_node并不是我们期望的Root节点，因此通过传节点名来找到我们需要的节点。

注：

1、这个库在append的时候并不去判断添加项（节点、属性等）是否存在

2、循环遍历时对项（节点、属性等）进行修改会导致迭代失效

总结：用别人写的库，总会有些意想不到的问题，至今我只遇到了这些问题，如果还有其它问题欢迎补充，顺便解释下"坑"并不一定是用的开源库有问题，更多的时候可能是还没有熟练的去使用这个工具。

感谢rapidxml的作者，为我们提供一个如此高效便利的工具。

以上这篇浅谈使用Rapidxml 库遇到的问题和分析过程(分享)就是小编分享给大家的全部内容了，希望能给大家一个参考，也希望大家多多支持服务器之家。

秒客网

浅谈使用Rapidxml 库遇到的问题和分析过程(分享)

相关文章