[Java拾遗一] XML的书写规范与解析.

前言
今天天气大好, 起了个大早开始总结一些常用的基础知识.
XML一直来说都很陌生, 使用大多是用于配置文件, 之前并没有细究过其中的约束规范, 今天刚好没事来学习并总结下.

1,XML基础介绍
　　XML 指可扩展标记语言（EXtensible Markup Language）,也是一种标记语言，很类似 HTML.它的设计宗旨是传输数据，而非显示数据它;标签没有被预定义,需要自行定义标签。
　　xml的作用:
　　　　XML 是各种应用程序之间进行数据传输的最常用的工具，并且在信息存储和描述领域变得越来越流行。简单的说，我们在开发中使用XML主要有以下两方面应用.
a.XML做为数据交换的载体，用于数据的存储与传输
　　　　b.XML做为配置文件
2,书写规范

注意事项:
xml必须有根元素(只有一个)

xml标签必须有关闭标签

xml标签对大小写敏感

xml的属性值须加引号

特殊字符必须转义

xml中的标签名不能有空格

空格/回车/制表符在xml中都是文本节点

xml必须正确地嵌套

我们将符合上述书写规则的XML叫做格式良好的XML文档。

在讲述XML组成部分前，我们必须对XML的树型结构有所了解.下面是一个简单的XML

<bookstore>

<book category="COOKING">

  <title lang="en">Everyday Italian</title>

  <author>Giada De Laurentiis</author>

  <year>2005</year>

  <price>30.00</price>

</book>

<book category="CHILDREN">

  <title lang="en">Harry Potter</title>

  <author>J K. Rowling</author>

  <year>2005</year>

  <price>29.99</price>

</book>

<book category="WEB">

  <title lang="en">Learning XML</title>

  <author>Erik T. Ray</author>

  <year>2003</year>

  <price>39.95</price>

</book>

</bookstore>

[Java拾遗一] XML的书写规范与解析.

对于一个xml文件，首先必须要有根元素，该元素是所有其它元素的父元素。而在xml中所有元素形成了一棵树。父，子及同胞等术语描述了元素之间的关系。所有的元素都可以拥有子元素。相同层级上的子元素成为同胞。
所有元素都可以拥有文本内容和属性。
    Root 根元素
    Element 元素
    Attribute 属性
    Text 文本
在开发中，我们将上述内容也统称为Node（节点）。

3,xml的作用详解
1.不同语言之间交换数据-- 用数据库代替
2.配置文件-- ☆

xml的约束:
        作用:明确的告诉我们那些元素和属性可以写,以及他们的顺序如何.
        分类:DTD约束和SCHEMA约束
        要求:给你xml约束你可以写出对应的xml文档即可.
1, DTD约束:struts hibernate中有使用
            与xml文档的关联:
                方式1:内部关联
                    格式:<!DOCTYPE 根元素名称 [dtd的语法]>

方式2:外部关联--系统关联
                    格式:<!DOCTYPE 根元素名称 SYSTEM "dtd路径">
                    dtd的后缀名是 .dtd
                方式3:外部关联--公共关联
                    格式:<!DOCTYPE 根元素名称 PUBLIC "dtd的名称" "dtd路径">

            元素:
                格式1:<!ELEMENT 元素的名称 (内容)>
                格式2:<!ELEMENT 元素的名称类别>
            属性:
                格式:<!ATTLIST 元素的名称属性的名称类型默认值>
                属性的类型:
                    ID:唯一
                    CDATA:文本
                默认值:
                    REQUIRED:必须出现
                    IMPLIED:可以选择
            类别:
                #PCDATA:文本是一个字符串,不能出现子元素 ,用的时候用(#PCDATA)
            符号:
                +     >=1
                ?     0|1
                *     任意值
                |     选择
                ()    分组
                ,     顺序
DTD约束示例代码:

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE bookstore[

    <!ELEMENT bookstore (book+)>

    <!ELEMENT book (title,author,year,price)>

    <!ELEMENT title (#PCDATA)>

    <!ELEMENT author (#PCDATA)>

    <!ELEMENT year (#PCDATA)>

    <!ELEMENT price (#PCDATA)>

    <!ATTLIST book category CDATA #REQUIRED>

    <!ATTLIST title lang CDATA #IMPLIED>

]>

<bookstore>

    <book category="COOKING" >

        <title lang="en" >Everyday Italian</title>

        <author>Giada De Laurentiis</author>

        <year>2005</year>

        <price>30.00</price>

    </book>

    <book category="CHILDREN">

        <title lang="en">Harry Potter</title>

        <author>J K. Rowling</author>

        <year>2005</year>

        <price>29.99</price>

    </book>

    <book category="WEB">

        <title lang="en">Learning XML</title>

        <author>Erik T. Ray</author>

        <year>2003</year>

        <price>39.95</price>

    </book>

</bookstore>

2,SCHEMA约束:spring中使用的就是schema约束
            作用:用来替代dtd的,多个schema可以出现一个xml文档上
            需求:
                xml 文档中出现了<table>
                a约束上的---table :桌子属性 height width
                b约束上的---table :表格属性 rows cols
            名称空间:
                作用:用来确定标签的约束来自于那个约束文档上
                格式:
                    方式1:xmlns="名称"
                    方式2:xmlns:别名="名称"
                例如:
                    table 代表的是桌子
                    b:table 代表的就是表格
            schema的语法:
                后缀名.xsd
                关联
                    1.约束文件的关联 bookstore.xsd
                        xmlns="http://www.w3.org/2001/XMLSchema"-- 固定值,自定义的约束文件可以出现那些标签
                        targetNamespace="http://www.example.org/bookstore"
                        给当前的xsd起个名称空间,方便目标xml文件引用,名字可以随便起,一般使用域名/自定义名称既可以

                        例如: targetNamespace="bookstore"
                            targetNamespace="http://www.augmentum.com/bookstore"

                        确定一个目标xml根元素
                            <element name="bookstore"></element>

                    2.xml文件的关联
                        写根标签
                        添加schema约束
                            1.xmlns="约束的名称空间" -- 值为xsd文件上的targetNamespace的内容
                        例如:
                            xmlns=="http://www.augmentum.com/bookstore"

                            2.xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" :固定的写法声明此文档是一个
　　　　　　　　　　　　被schema约束的文件
                            3.确定文档的位置
                                xsi:schemaLocation="{和xsd文件中的targetNamespace} {xsd文件的路径}"
                    关联小结:
                        先有约束文件. .xsd
                            targetNamespace 就是给当前的约束文件起个名字,方便xml使用
                            必须确定根元素
                        后有xml文件.
                            写根元素
                            添加约束
                                xmlns="名字" 他的值为targetNamespace中起的名称
                                xsi:schemaLocation="名字位置"
                    语法:
                        1.确定根元素
                            <element name >
                            name:元素的名称
                            type:元素的数据类型
                        2.确定元素类型
                            复杂的元素
                                <complexType>
                            简单的元素 -- 几乎看不见
                                <simpleType>
                        3.确定顺序:
                            <sequence maxOccurs="3"> 按次序相当于 dtd 中,
                            <all> 随意
                            <choice> 或相当于dtd中的 |

                            maxOccurs 最大的出现次数    值为unbounded指的是无上限
                            minOccurs 最小的出现次数
                        4.确定属性
                            <attribute name="category" type="string" use="required" />
                            name :属性的名称
                            type:属性的数据类型
                            use 相当于dtd中默认值
                                值为required:必须出现
                                值为optional:可选

                        5.若有属性的元素,内容只是文本
                            <complexType> --- 指定元素为复杂类型
                                <simpleContent>--- 指定元素是一个简单的内容,只有文本
                                    <extension base="string">    -- 文本内容进行扩展
                                        <attribute name="lang" type="string" /> -- 添加属性
                                    </extension>
                                </simpleContent>
                            </complexType>
Schema约束示例:

<?xml version="1.0" encoding="UTF-8"?>

<schema xmlns="http://www.w3.org/2001/XMLSchema"

    targetNamespace="aaa"

    xmlns:tns="http://www.example.org/bookstore"

    elementFormDefault="qualified">

    <element name="bookstore" >

        <!--

            1.确定根元素

                <element name >

                name:元素的名称

                type:元素的数据类型

            2.确定元素类型

                复杂的元素

                    <complexType>

                简单的元素 -- 几乎看不见

                    <simpleType>

            3.确定顺序:

                <sequence maxOccurs="3">  按次序 相当于  dtd 中,

                <all> 随意

                <choice> 或 相当于dtd中的 |

                maxOccurs 最大的出现次数    值为unbounded指的是无上限

                minOccurs 最小的出现次数

            4.确定属性

                <attribute name="category" type="string" use="required" />

                name :属性的名称

                type:属性的数据类型

                use 相当于dtd中 默认值

                    值为required:必须出现

                    值为optional:可选

         -->

        <complexType>

            <sequence maxOccurs="unbounded" minOccurs="1">

                <element name="book">

                    <complexType>

                        <sequence>

                            <element name="title">

                            </element>

                            <element name="author" type="string" />

                            <element name="year" type="date" />

                            <element name="price" type="double" />

                        </sequence>

                        <attribute name="category" type="string" use="optional" />

                    </complexType>

                </element>

            </sequence>

        </complexType>

    </element>

</schema>

bookstore.xsd

<?xml version="1.0" encoding="UTF-8"?>

<bookstore xmlns="aaa"

xsi:schemaLocation="aaa bookstore.xsd"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

</bookstore>

4, xml解析

　　获取xml中的内容
　　　　解析方式:SAX和DOM
　　　　区别:
　　　　sax:逐行的解析,不能增删改
　　　　dom:把整个文档加载到内存中,翻译成一棵树,就可以进行crud操作
　　要求:
　　　　会查询(获取)
　　　　DOM4J的解析(只需会查询操作)
　　1.导入包
　　2.获取document
　　3.获取根元素
　　4.获取其他节点
　　常用的方法: ☆
　　　　SAXReader reader=new SAXReader();
　　　　Document doc=reader.read(文件路径);
　　　　获取根元素:
　　　　　　Element root=doc.getRootElement();
　　　　获取其他节点:
　　　　获取属性
　　　　　　List<Element> bookList=root.elements();
　　　　获取book的属性
　　　　　　String value=bookElement.attributeValue("category");
　　　　　　bookElement.elementText("title");--获取title的文本内容
　　　　扩展方法:
　　　　　　获取book
　　　　　　　　Iterator<Element> it = root.elementIterator();
　　　　　　获取属性:
　　　　　　　　bookElement.attribute("category").getValue();
　　　　　　获取文本
　　　　　　　　bookElement.element("title").getText();
示例解析1:
xml代码

<?xml version="1.0" encoding="UTF-8"?>

<bookstore>

    <book category="COOKING">

        <title lang="en">Everyday Italian</title>

        <author>Giada De Laurentiis</author>

        <year>2005</year>

        <price>30.00</price>

    </book>

    <book category="CHILDREN">

        <title lang="en">Harry Potter</title>

        <author>J K. Rowling</author>

        <year>2005</year>

        <price>29.99</price>

    </book>

    <book category="WEB">

        <title lang="en">Learning XML</title>

        <author>Erik T. Ray</author>

        <year>2003</year>

        <price>39.95</price>

    </book>

</bookstore>

解析代码:

public static void main(String[] args) throws DocumentException {

    //获取Document

    Document document = new SAXReader().read("D:/Users/WangMeng/workspace/day08_XML/dtd/bookstore.xml");

    //获取根元素

    Element root = document.getRootElement();

    //获取其他节点

    /*获取其他节点方法一

    List<Element> bookList = root.elements();

    for (Element bookElement : bookList) {

        //获取属性

        //String value = bookElement.attributeValue("category");

        //System.out.println(value);

        //获取子标签的文本内容

        String textValue = bookElement.elementText("title");

        System.out.println(textValue);

    }

    */

    //获取其他节点方法二

    Iterator<Element> it = root.elementIterator();

    while (it.hasNext()) {

        Element bookElement = it.next();

        //获取属性

        //String value = bookElement.attribute("category").getValue();

        //获取元素, 元素内容

        String value = bookElement.element("title").getText();

        System.out.println(value);

    }

}

X-path解析(获取)
selectNodes(string):获取集合 //book
selectSingleNode(string):获取的单一的元素,若匹配的是一个集合的话,只取第一个
使用之前导入 jaxen-1.1-beta-6.jar
关于Xpath更详细的可以去w3c文档看xml中关于xpath的api.

示例解析2:
xml代码:

<?xml version="1.0" encoding="UTF-8"?>

<bookstore>

    <book category="COOKING">

        <title lang="en">Everyday Italian</title>

        <author>Giada De Laurentiis</author>

        <year>2005</year>

        <price>30.00</price>

    </book>

    <book category="CHILDREN">

        <title lang="en">Harry Potter</title>

        <author>J K. Rowling</author>

        <year>2005</year>

        <price>29.99</price>

    </book>

    <book category="WEB">

        <title lang="en">Learning XML</title>

        <author>Erik T. Ray</author>

        <year>2003</year>

        <price>39.95</price>

    </book>

</bookstore>

解析代码:

public static void main(String[] args) throws DocumentException

{

    //获取Document

    Document document = new SAXReader().read("D:/Users/WangMeng/workspace/day08_XML/dtd/bookstore.xml");

    //获取category="WEB"的book元素

    /*

     * 路径匹配: /a/b/c

     * 元素匹配 : //c

     * 属性匹配: //c[@属性='属性值']

     * 含有子元素: //c[d]

     *

     *

     */

    //Element bookElement = (Element)document.selectSingleNode("//book[@category='WEB']");

    Element bookElement = (Element)document.selectSingleNode("//book[price > 35]");

    System.out.println(bookElement.attributeValue("category"));

}

关于xml的内容就到这里了, 相信看完这些内容以后再也不担心xml的约束以及解析了.

秒客网

[Java拾遗一] XML的书写规范与解析.

相关文章