使用java将XML文件转换为CSV文件

时间:2021-02-13 21:50:58

I need help understanding the steps involved in converting an XML file into a CSV file using java. Here is an example of an XML file

我需要帮助理解使用java将XML文件转换成CSV文件所涉及的步骤。下面是一个XML文件示例。

<?xml version="1.0"?>
<Sites>
<Site id="101" name="NY-01" location="New York">
    <Hosts>
        <Host id="1001">
           <Host_Name>srv001001</Host_Name>
           <IP_address>10.1.2.3</IP_address>
           <OS>Windows</OS>
           <Load_avg_1min>1.3</Load_avg_1min>
           <Load_avg_5min>2.5</Load_avg_5min>
           <Load_avg_15min>1.2</Load_avg_15min>
        </Host>
        <Host id="1002">
           <Host_Name>srv001002</Host_Name>
           <IP_address>10.1.2.4</IP_address>
           <OS>Linux</OS>
           <Load_avg_1min>1.4</Load_avg_1min>
           <Load_avg_5min>2.5</Load_avg_5min>
           <Load_avg_15min>1.2</Load_avg_15min>
        </Host>
        <Host id="1003">
           <Host_Name>srv001003</Host_Name>
           <IP_address>10.1.2.5</IP_address>
           <OS>Linux</OS>
           <Load_avg_1min>3.3</Load_avg_1min>
           <Load_avg_5min>1.6</Load_avg_5min>
           <Load_avg_15min>1.8</Load_avg_15min>
        </Host>
        <Host id="1004">
           <Host_Name>srv001004</Host_Name>
           <IP_address>10.1.2.6</IP_address>
           <OS>Linux</OS>
           <Load_avg_1min>2.3</Load_avg_1min>
           <Load_avg_5min>4.5</Load_avg_5min>
           <Load_avg_15min>4.2</Load_avg_15min>
        </Host>     
    </Hosts>
</Site>
</Sites>

and here is the resulting CSV file.

这是生成的CSV文件。

site_id, site_name, site_location, host_id, host_name, ip_address, operative_system, load_avg_1min, load_avg_5min, load_avg_15min
101, NY-01, New York, 1001, srv001001, 10.1.2.3, Windows, 1.3, 2.5, 1.2
101, NY-01, New York, 1002, srv001002, 10.1.2.4, Linux, 1.4, 2.5, 1.2
101, NY-01, New York, 1003, srv001003, 10.1.2.5, Linux, 3.3, 1.6, 1.8
101, NY-01, New York, 1004, srv001004, 10.1.2.6, Linux, 2.3, 4.5, 4.2

I was thinking of using a DOM parser to read the xml file. The problem I have with that is I would need to specify specific elements in to code by name, but I want it to be able to parse it without doing that.

我正在考虑使用DOM解析器读取xml文件。我的问题是,我需要按名称在代码中指定特定的元素,但我希望它能够不这样做就解析它。

Are there any tools or libraries in java that would be able to help me achieve this.

java中是否有任何工具或库可以帮助我实现这一点。

If I have a XML file of this format below and want to add the value of the InitgPty in the same row with MSgId (Pls note :InitgPty is in the next tag level, so it prints the value in the next row)

如果我有一个下面这种格式的XML文件,并且想要在MSgId的同一行中添加InitgPty的值(请注意:InitgPty位于下一个标记级别,因此它打印下一行的值)

<?xml version="1.0"?>
<CstmrCdtTrfInitn>
<GrpHdr>
<MsgId>XYZ07/ABC</MsgId>
<NbOfTxs>100000</NbOfTxs>
<InitgPty>
<Nm>XYZ</Nm>
</InitgPty>

5 个解决方案

#1


23  

here's a working example, data.xml has your data:

这里有一个工作示例,data。xml数据:

import java.io.File;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Result;
import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;

import org.w3c.dom.Document;

class Xml2Csv {

    public static void main(String args[]) throws Exception {
        File stylesheet = new File("src/main/resources/style.xsl");
        File xmlSource = new File("src/main/resources/data.xml");

        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        DocumentBuilder builder = factory.newDocumentBuilder();
        Document document = builder.parse(xmlSource);

        StreamSource stylesource = new StreamSource(stylesheet);
        Transformer transformer = TransformerFactory.newInstance()
                .newTransformer(stylesource);
        Source source = new DOMSource(document);
        Result outputTarget = new StreamResult(new File("/tmp/x.csv"));
        transformer.transform(source, outputTarget);
    }
}

style.xsl

style.xsl

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format" >
<xsl:output method="text" omit-xml-declaration="yes" indent="no"/>
<xsl:template match="/">
Host_Name,IP_address,OS,Load_avg_1min,Load_avg_5min,Load_avg_15min
<xsl:for-each select="//Host">
<xsl:value-of select="concat(Host_Name,',',IP_address,',',OS,Load_avg_1min,',',Load_avg_5min,',',Load_avg_15min,'&#xA;')"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>

output:

输出:

Host_Name,IP_address,OS,Load_avg_1min,Load_avg_5min,Load_avg_15min
srv001001,10.1.2.3,Windows1.3,2.5,1.2
srv001002,10.1.2.4,Linux1.4,2.5,1.2
srv001003,10.1.2.5,Linux3.3,1.6,1.8
srv001004,10.1.2.6,Linux2.3,4.5,4.2

#2


2  

Three steps:

三个步骤:

  1. Parse the XML file into a java XML library object.
  2. 将XML文件解析为java XML库对象。
  3. Retrieve relevant data from the object for each row.
  4. 从对象中检索每一行的相关数据。
  5. Write the results to a text file using native java functions, saving with *.csv extension.
  6. 使用本机java函数将结果写入文本文件,并使用*保存。csv扩展。

#3


2  

Your best best is to use XSLT to "transform" the XML to CSV. There are some Q/As on so (like here) that cover how to do this. The key is to provide a schema for your source data so the XSLT transform process knows how to read it so it can properly format the results.

最好的方法是使用XSLT将XML“转换”为CSV。这里有一些关于如何做这个的问题。关键是为源数据提供一个模式,以便XSLT转换过程知道如何读取它,以便能够正确地格式化结果。

Then you can use Xalan to input the XML, read the XSLT and output your results.

然后可以使用Xalan输入XML、读取XSLT并输出结果。

#4


1  

The answer has already been provided by Pedantic (using the DOM-like approach {Document Object Model}) and Jono (with the SAX-like approach this time) in January.

Pedantic(使用类似dom的方法{Document Object Model})和Jono(这次使用类似sax的方法)在1月份提供了答案。

My opinion is that both methods work well for small files but the latter works better with big XML files. You didn't mention the actual size of your XML files but you should take this into account.

我的观点是,这两种方法都适用于小文件,但后者更适用于大XML文件。您没有提到XML文件的实际大小,但是您应该考虑到这一点。

Whatever method is used a specific program (which would detect special tags tailored to your local XML) will be easier to write but won't work without code adaptations for another XML flavor, while a more generic program will be harder to devise but will work for all XML files. You said you wanted to be able to parse a file without specifying specific element names so I guess the generic approach is what you prefer, and I agree with that, but please note that it's easier said than done. Indeed, I had the same problem on january too, implying this time a big XML file (>>100Mo) and I was surprised that nothing was available over the Internet so far. Turning frustration into something better is always a good thing so I decided to deal with that specific problem in the most generic way by myself, with a special concern for the big-XML-file-issue.

无论使用什么方法,特定的程序(它将检测为您的本地XML定制的特殊标记)都更容易编写,但如果没有针对另一种XML风格的代码调整,就无法工作,而更通用的程序将更难设计,但将适用于所有XML文件。您说您希望能够在不指定特定元素名称的情况下解析一个文件,所以我认为通用方法是您喜欢的,我同意这一点,但是请注意,这说起来容易做起来难。实际上,我在1月份也遇到了同样的问题,这意味着这一次有一个大的XML文件(>>100Mo),我对到目前为止Internet上什么都没有感到惊讶。将挫折转化为更好的东西总是一件好事,因此我决定以最通用的方式处理这个特定的问题,并特别关注大型xml文件问题。

You might be interested to know that the generic Java library I wrote, which is now published as free software, converted your XML file into CSV the way you expected (in -x -u mode {please refer to the documentation for further information}).

您可能想知道,我所编写的通用Java库(现已作为免费软件发布)按照您的期望将XML文件转换为CSV(在-x -u模式{请参阅文档以获取更多信息)。

So the answer to the last part of your question is: yes, there is at least one library which will help you achieve your goal, mine, which is named "XML2CSV-Generic-Converter". There might be other ones of course, and better ones certainly, but I couldn't pick any decent (free) one by myself.

因此,您问题的最后一部分的答案是:是的,至少有一个库将帮助您实现您的目标,我的库,它被命名为“XML2CSV-Generic-Converter”。当然可能还有其他的,当然还有更好的,但我不能自己挑任何像样的(免费的)。

I won't provide any link here to comply with Peter Foti 's judicious remark - but if you key "XML2CSV-Generic-Converter" in your favorite search engine you should find it easily.

我不会提供任何链接来遵从Peter Foti明智的评论——但是如果你在你最喜欢的搜索引擎中键入“XML2CSV-Generic-Converter”,你就会很容易找到它。

#5


0  

your file looks really flat and simple. You don't necessarily need an XML parser to convert it. Just parse it with LineNumberReader.readLine() and use regexp to extract specific fields.

您的文件看起来非常简单。您不需要XML解析器来转换它。只需使用LineNumberReader.readLine()解析它,并使用regexp提取特定字段。

Another option is to use StAX, a streaming API for XML processing. It's pretty simple and you don't need to load the whole document in RAM.

另一种选择是使用StAX,一种用于XML处理的流API。它非常简单,并且不需要在RAM中加载整个文档。

#1


23  

here's a working example, data.xml has your data:

这里有一个工作示例,data。xml数据:

import java.io.File;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Result;
import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;

import org.w3c.dom.Document;

class Xml2Csv {

    public static void main(String args[]) throws Exception {
        File stylesheet = new File("src/main/resources/style.xsl");
        File xmlSource = new File("src/main/resources/data.xml");

        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        DocumentBuilder builder = factory.newDocumentBuilder();
        Document document = builder.parse(xmlSource);

        StreamSource stylesource = new StreamSource(stylesheet);
        Transformer transformer = TransformerFactory.newInstance()
                .newTransformer(stylesource);
        Source source = new DOMSource(document);
        Result outputTarget = new StreamResult(new File("/tmp/x.csv"));
        transformer.transform(source, outputTarget);
    }
}

style.xsl

style.xsl

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format" >
<xsl:output method="text" omit-xml-declaration="yes" indent="no"/>
<xsl:template match="/">
Host_Name,IP_address,OS,Load_avg_1min,Load_avg_5min,Load_avg_15min
<xsl:for-each select="//Host">
<xsl:value-of select="concat(Host_Name,',',IP_address,',',OS,Load_avg_1min,',',Load_avg_5min,',',Load_avg_15min,'&#xA;')"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>

output:

输出:

Host_Name,IP_address,OS,Load_avg_1min,Load_avg_5min,Load_avg_15min
srv001001,10.1.2.3,Windows1.3,2.5,1.2
srv001002,10.1.2.4,Linux1.4,2.5,1.2
srv001003,10.1.2.5,Linux3.3,1.6,1.8
srv001004,10.1.2.6,Linux2.3,4.5,4.2

#2


2  

Three steps:

三个步骤:

  1. Parse the XML file into a java XML library object.
  2. 将XML文件解析为java XML库对象。
  3. Retrieve relevant data from the object for each row.
  4. 从对象中检索每一行的相关数据。
  5. Write the results to a text file using native java functions, saving with *.csv extension.
  6. 使用本机java函数将结果写入文本文件,并使用*保存。csv扩展。

#3


2  

Your best best is to use XSLT to "transform" the XML to CSV. There are some Q/As on so (like here) that cover how to do this. The key is to provide a schema for your source data so the XSLT transform process knows how to read it so it can properly format the results.

最好的方法是使用XSLT将XML“转换”为CSV。这里有一些关于如何做这个的问题。关键是为源数据提供一个模式,以便XSLT转换过程知道如何读取它,以便能够正确地格式化结果。

Then you can use Xalan to input the XML, read the XSLT and output your results.

然后可以使用Xalan输入XML、读取XSLT并输出结果。

#4


1  

The answer has already been provided by Pedantic (using the DOM-like approach {Document Object Model}) and Jono (with the SAX-like approach this time) in January.

Pedantic(使用类似dom的方法{Document Object Model})和Jono(这次使用类似sax的方法)在1月份提供了答案。

My opinion is that both methods work well for small files but the latter works better with big XML files. You didn't mention the actual size of your XML files but you should take this into account.

我的观点是,这两种方法都适用于小文件,但后者更适用于大XML文件。您没有提到XML文件的实际大小,但是您应该考虑到这一点。

Whatever method is used a specific program (which would detect special tags tailored to your local XML) will be easier to write but won't work without code adaptations for another XML flavor, while a more generic program will be harder to devise but will work for all XML files. You said you wanted to be able to parse a file without specifying specific element names so I guess the generic approach is what you prefer, and I agree with that, but please note that it's easier said than done. Indeed, I had the same problem on january too, implying this time a big XML file (>>100Mo) and I was surprised that nothing was available over the Internet so far. Turning frustration into something better is always a good thing so I decided to deal with that specific problem in the most generic way by myself, with a special concern for the big-XML-file-issue.

无论使用什么方法,特定的程序(它将检测为您的本地XML定制的特殊标记)都更容易编写,但如果没有针对另一种XML风格的代码调整,就无法工作,而更通用的程序将更难设计,但将适用于所有XML文件。您说您希望能够在不指定特定元素名称的情况下解析一个文件,所以我认为通用方法是您喜欢的,我同意这一点,但是请注意,这说起来容易做起来难。实际上,我在1月份也遇到了同样的问题,这意味着这一次有一个大的XML文件(>>100Mo),我对到目前为止Internet上什么都没有感到惊讶。将挫折转化为更好的东西总是一件好事,因此我决定以最通用的方式处理这个特定的问题,并特别关注大型xml文件问题。

You might be interested to know that the generic Java library I wrote, which is now published as free software, converted your XML file into CSV the way you expected (in -x -u mode {please refer to the documentation for further information}).

您可能想知道,我所编写的通用Java库(现已作为免费软件发布)按照您的期望将XML文件转换为CSV(在-x -u模式{请参阅文档以获取更多信息)。

So the answer to the last part of your question is: yes, there is at least one library which will help you achieve your goal, mine, which is named "XML2CSV-Generic-Converter". There might be other ones of course, and better ones certainly, but I couldn't pick any decent (free) one by myself.

因此,您问题的最后一部分的答案是:是的,至少有一个库将帮助您实现您的目标,我的库,它被命名为“XML2CSV-Generic-Converter”。当然可能还有其他的,当然还有更好的,但我不能自己挑任何像样的(免费的)。

I won't provide any link here to comply with Peter Foti 's judicious remark - but if you key "XML2CSV-Generic-Converter" in your favorite search engine you should find it easily.

我不会提供任何链接来遵从Peter Foti明智的评论——但是如果你在你最喜欢的搜索引擎中键入“XML2CSV-Generic-Converter”,你就会很容易找到它。

#5


0  

your file looks really flat and simple. You don't necessarily need an XML parser to convert it. Just parse it with LineNumberReader.readLine() and use regexp to extract specific fields.

您的文件看起来非常简单。您不需要XML解析器来转换它。只需使用LineNumberReader.readLine()解析它,并使用regexp提取特定字段。

Another option is to use StAX, a streaming API for XML processing. It's pretty simple and you don't need to load the whole document in RAM.

另一种选择是使用StAX,一种用于XML处理的流API。它非常简单,并且不需要在RAM中加载整个文档。