四、Java核心技术(进阶)-XML解析

时间:2021-10-25 01:13:28

一、概念

XML(eXtensible Markup Language),可扩展标记语言。标签可自定义,具有自我描述性,纯文本标识,跨平台/系统/语言,符合W3C标准。

展示形式:语言+意义

二、常规语法

  • 任何的起始与结束标签必须有对应的结束标签;
  • 简化写法(中间值为空的情况):<name></name>等价于<name/>;
  • 大小写敏感,如<name>和<Name>不一致;
  • 每个文件都需要一个根元素;
  • 标签按序进行嵌套,不可错位;
  • 特性必须有值,值要加引号;
  • “<”等字符需要转义;
  • 注释格式:<!-- 注释 -->
符号 xml中表示 含义
< &lt; 小于
> &gt; 大于
>= &gt;= 大于等于
<= &lt;= 小于等于
<> &lt;&gt; 不等于
& &amp;
' &apos; 单引号
" &quot; 双引号
<bookStroe>
    <book category="COOKING">
        <title>J k</title>
        <year>2005</year>
        <price>29.5</price>
    </book>
    <book category="WEB">
        <title>A D</title>
        <year>2007</year>
        <price>19.5</price>
    </book>
</bookStroe>

三、XML解析方法

-树结构

  • DOM:Document Object Model 文档对象模型,擅长小规模读写

-流结构

  • SAX:Simple API for XML 流机制解释器(推模式),擅长
  • Stax:The Stream API for XML 流机制解释器(拉模式),擅长(JDK6)

-库函数:JDK自带

-第三方库:

  • JDOM:www.jdom.org
  • DOM4J:dom4j.github.io

1.DOM API

DOM是W3C处理XML的标准API。适合小规模XML读写。

  • 处理方式:将整个XML当做类似于树的形式读入内存中进行解析及修改
  • 优点:直观易用
  • 缺点:解析大数据量文件,有内存泄露及程序崩溃风险
<bookStore>
    <book category="COOKING">
        <title>J k</title>
        <year>2005</year>
        <price>29.5</price>
    </book>
    <book category="WEB">
        <title>A D</title>
        <year>2007</year>
        <price>19.5</price>
    </book>
</bookStore>

四、Java核心技术(进阶)-XML解析注意:

1、因为xml按文本流读入的,空格也算一个节点,类型为#text,如下面的1、3、5也算一个节点

四、Java核心技术(进阶)-XML解析

 2、单个标签算一个node,其value为null;title这个节点获取到value,需要继续取孩子节点及其value:

title.getFirstChild().getNodeValue())

四、Java核心技术(进阶)-XML解析

 

import org.w3c.dom.*;
import javax.xml.parsers.*;

/**
 * @author: Shism
 * @Date: Created in 16:25 2023/3/21
 * @Description:
 **/
public class DomReader {
    public static void main(String[] args) {
        recurseXml();
    }
    public static void recurseXml(){
        try
        {
            //采用Dom解析xml
            DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
            DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
            //解析文件,获取到xml文件的Document对象,即树模型
            Document document = (Document) documentBuilder.parse("./src/main/resources/book.xml");

            //读入一级节点,即#Document
            NodeList docList = document.getChildNodes();
            System.out.println("Node Name1:" + document.getNodeName());

            //遍历Document
            for(int docItem = 0; docItem < docList.getLength(); docItem++){
                //获取所有二级节点,即bookStore节点
                NodeList bookStroeList = docList.item(docItem).getChildNodes();
                System.out.println("Node Name2:" + docList.item(docItem).getNodeName());
                for (int bookStroreItem = 0; bookStroreItem < bookStroeList.getLength(); bookStroreItem++){
                    //获取所有第三级节点,即book节点
                    Node book = bookStroeList.item(bookStroreItem);
                    if(book.getNodeName().equals("book")){
                        System.out.println("Node Name3:" + book.getNodeName());
                        NodeList node = book.getChildNodes();
                        for(int nodeItem = 0; nodeItem < node.getLength(); nodeItem++){
                            if(!node.item(nodeItem).getNodeName().equals("#text")){
                                //标签为一整个节点,value为null,继续往里取子节点
                                System.out.println(node.item(nodeItem).getNodeName()+":"+node.item(nodeItem).getNodeValue());
                                System.out.println(node.item(nodeItem).getNodeName()+":"+node.item(nodeItem).getFirstChild().getNodeValue());
                            }
                        }
                    }

                }
            }
            System.out.println("---------------------------------------------");
            //直接从doc中读取对应节点
            NodeList bookList = document.getElementsByTagName("book");
            for(int bookItem = 0; bookItem < bookList.getLength(); bookItem++){
                NodeList nodeList = bookList.item(bookItem).getChildNodes();
                for(int nodeItem = 0; nodeItem < nodeList.getLength(); nodeItem++){
                    if(!nodeList.item(nodeItem).getNodeName().equals("#text")){
                        System.out.println(nodeList.item(nodeItem).getNodeName()+":"+nodeList.item(nodeItem).getNodeValue());
                        System.out.println(nodeList.item(nodeItem).getNodeName()+":"+nodeList.item(nodeItem).getFirstChild().getNodeValue());
                    }
                }
            }
        }catch (Exception e){
            e.printStackTrace();
        }
    }
}

结果:

Node Name1:#document
Node Name2:bookStore
Node Name3:book
title:null
title:J k
year:null
year:2005
price:null
price:29.5
Node Name3:book
title:null
title:A D
year:null
year:2007
price:null
price:19.5
---------------------------------------------
title:null
title:J k
year:null
year:2005
price:null
price:29.5
title:null
title:A D
year:null
year:2007
price:null
price:19.5

Process finished with exit code 0

2、SAX方法:Simple API for XML

采用事件/流模型来解析XML文档,更快速、更轻量,适合大规模XML读

优点:

  • 选择性访问,无需加载整个文档,内存要求低
  • 推模型,每一个节点引发一个事件,需编写对应事件的处理程序;会把所有事件报出来

缺点:

  • 流模型读取数据,难以同时访问文档中多处数据
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.helpers.XMLReaderFactory;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

/**
 * @author: Shism
 * @Date: Created in 10:35 2023/3/24
 * @Description:
 **/
public class SAXReader {
    public static void main(String[] args) throws SAXException, IOException {
        XMLReader parser = XMLReaderFactory.createXMLReader();
        BookHandler bookHandler = new BookHandler();
        parser.setContentHandler(bookHandler);
        parser.parse("./src/main/resources/book.xml");
        System.out.println(bookHandler.getBookList());
    }
}
class BookHandler extends DefaultHandler{
    private List<String> bookList = new ArrayList<>();
    private boolean isBook = false;

    //获取书本列表
    public List<String> getBookList(){
        return bookList;
    }

    //解析开始回调
    @Override
    public void startDocument ()
            throws SAXException
    {
        System.out.println("start parse XML");
    }

    //解析结束回调
    @Override
    public void endDocument ()
            throws SAXException
    {
        System.out.println("end parse XML");
    }


    //解析到元素标签开头时
    @Override
    public void startElement (String uri, String localName,
                              String qName, Attributes attributes)
            throws SAXException
    {
        if(qName.equals("title"))
            isBook = true;
    }

    //解析到正文时
    @Override
    public void characters (char ch[], int start, int length)
            throws SAXException
    {
        String str = new String(ch, start, length);

        if(isBook){
            System.out.println("book name:"+str);
            bookList.add(str);
        }
    }

    //解析到元素标签结尾时
    @Override
    public void endElement (String uri, String localName, String qName)
            throws SAXException
    {
        isBook = false;
    }


}

结果:

start parse XML
book name:saddfljk
book name:wwwswwwwww
end parse XML
[saddfljk, wwwswwwwww]

Process finished with exit code 0

三、Stax方法:Streaming API for XML

-流模型中的拉模式

-遍历文档,从读取器中取出感兴趣部分

-两套API

  • 基于指针的API,XMLStreamReader
  • 基于迭代器的API,XMLEventReader
import javax.xml.stream.*;
import javax.xml.stream.events.StartElement;
import javax.xml.stream.events.XMLEvent;
import java.io.FileNotFoundException;
import java.io.FileReader;

/**
 * @author: Shism
 * @Date: Created in 16:00 2023/3/24
 * @Description:
 **/
public class StaxReader {
    public static final String xml = "./src/main/resources/book.xml";
    public static void main(String[] args) throws XMLStreamException, FileNotFoundException {
        System.out.println("-----------pointer Type------------");
        StaxReader.readByStream();
        System.out.println("-----------Iterator Type-----------");
        StaxReader.readByEvent();

    }

    //流模式
    public static void readByStream() throws FileNotFoundException, XMLStreamException {
        XMLInputFactory xmlInputFactory = XMLInputFactory.newFactory();
        XMLStreamReader xmlStreamReader = xmlInputFactory.createXMLStreamReader(new FileReader(xml));
        //基于指针遍历
        int i = 0;
        while (xmlStreamReader.hasNext()){
            //遍历至元素开始标签时,<元素>
            //xmlStreamReader.next():当前指针的值 指针head-value1-value2
            if(xmlStreamReader.next() == XMLStreamConstants.START_ELEMENT){
                if("title".equalsIgnoreCase(xmlStreamReader.getLocalName()))
                    System.out.println("title:"+xmlStreamReader.getElementText());
            }
        }
        xmlStreamReader.close();
    }

    //事件模式
    public static void readByEvent() throws FileNotFoundException, XMLStreamException {
        XMLInputFactory xmlInputFactory = XMLInputFactory.newFactory();
        //创建事件流
        XMLEventReader xmlEventReader = xmlInputFactory.createXMLEventReader(new FileReader(xml));
        boolean titleFlag = false;
        while(xmlEventReader.hasNext()){
            //从输入流获取事件
            XMLEvent event = xmlEventReader.nextEvent();
            //若为开始标签事件
            if(event.isStartElement()){
                StartElement element = event.asStartElement();
                if(element.getName().getLocalPart().equals("title")){
                    titleFlag = true;
                    System.out.print("title:");
                }
            }
            //若为正文事件
            if(event.isCharacters() && titleFlag){
                titleFlag = false;
                System.out.println(event.asCharacters().getData());
            }
        }
    }
}

输出(xml改了书名):

-----------pointer Type------------
title:book one
title:book two
-----------Iterator Type-----------
title:book one
title:book two

Process finished with exit code 0