在R中将XML数据转换为平面文件

时间:2020-12-26 16:33:08

I am working with some XML data that I need to convert to a flat file so I can do statistical analysis. I am analyzing the data using R. Here is what a sample of the data looks like:

我正在处理一些我需要转换为平面文件的XML数据,因此我可以进行统计分析。我正在使用R分析数据。以下是数据样本的样子:

<production xmlns="" diffgr:id="production1130" msdata:rowOrder="1129">
  <ENTITY_ID>116484210</ENTITY_ID>
  <LIQ>0</LIQ>
  <GAS>163</GAS>
  <WTR>0</WTR>
  <WCNT>1</WCNT>
  <DAYS>0</DAYS>
</production>
<production xmlns="" diffgr:id="production1131" msdata:rowOrder="1130">
  <ENTITY_ID>116484210</ENTITY_ID>
  <LIQ>12</LIQ>
  <GAS>130</GAS>
  <WTR>0</WTR>
  <WCNT>1</WCNT>
  <DAYS>0</DAYS>
</production>

I would like this to translate to a flat file that looks like this:

我想将此转换为如下所示的平面文件:

PRODUCTION_ID, ENTITY_ID, LIQ, GAS, WTR, WCNT, DAYS

PRODUCTION_ID,ENTITY_ID,LIQ,GAS,WTR,WCNT,DAYS

Any suggestions?

Thanks, Z

1 个解决方案

#1


8  

Simple example:

install.packages("XML")
library("XML")
doc = xmlInternalTreeParse("/Users/ras/test.xml") # your path goes here
myframe = xmlToDataFrame(doc)
myframe

Yields:

  ENTITY_ID LIQ GAS WTR WCNT DAYS
1 116484210   0 163   0    1    0
2 116484210  12 130   0    1    0

test.xml being:

<stuff>
    <production xmlns="" diffgr:id="production1130" msdata:rowOrder="1129">
      <ENTITY_ID>116484210</ENTITY_ID>
      <LIQ>0</LIQ>
      <GAS>163</GAS>
      <WTR>0</WTR>
      <WCNT>1</WCNT>
      <DAYS>0</DAYS>
    </production>
    <production xmlns="" diffgr:id="production1131" msdata:rowOrder="1130">
      <ENTITY_ID>116484210</ENTITY_ID>
      <LIQ>12</LIQ>
      <GAS>130</GAS>
      <WTR>0</WTR>
      <WCNT>1</WCNT>
      <DAYS>0</DAYS>
    </production>
</stuff>

#1


8  

Simple example:

install.packages("XML")
library("XML")
doc = xmlInternalTreeParse("/Users/ras/test.xml") # your path goes here
myframe = xmlToDataFrame(doc)
myframe

Yields:

  ENTITY_ID LIQ GAS WTR WCNT DAYS
1 116484210   0 163   0    1    0
2 116484210  12 130   0    1    0

test.xml being:

<stuff>
    <production xmlns="" diffgr:id="production1130" msdata:rowOrder="1129">
      <ENTITY_ID>116484210</ENTITY_ID>
      <LIQ>0</LIQ>
      <GAS>163</GAS>
      <WTR>0</WTR>
      <WCNT>1</WCNT>
      <DAYS>0</DAYS>
    </production>
    <production xmlns="" diffgr:id="production1131" msdata:rowOrder="1130">
      <ENTITY_ID>116484210</ENTITY_ID>
      <LIQ>12</LIQ>
      <GAS>130</GAS>
      <WTR>0</WTR>
      <WCNT>1</WCNT>
      <DAYS>0</DAYS>
    </production>
</stuff>