I am working with some XML data that I need to convert to a flat file so I can do statistical analysis. I am analyzing the data using R. Here is what a sample of the data looks like:
我正在处理一些我需要转换为平面文件的XML数据,因此我可以进行统计分析。我正在使用R分析数据。以下是数据样本的样子:
<production xmlns="" diffgr:id="production1130" msdata:rowOrder="1129">
<ENTITY_ID>116484210</ENTITY_ID>
<LIQ>0</LIQ>
<GAS>163</GAS>
<WTR>0</WTR>
<WCNT>1</WCNT>
<DAYS>0</DAYS>
</production>
<production xmlns="" diffgr:id="production1131" msdata:rowOrder="1130">
<ENTITY_ID>116484210</ENTITY_ID>
<LIQ>12</LIQ>
<GAS>130</GAS>
<WTR>0</WTR>
<WCNT>1</WCNT>
<DAYS>0</DAYS>
</production>
I would like this to translate to a flat file that looks like this:
我想将此转换为如下所示的平面文件:
PRODUCTION_ID, ENTITY_ID, LIQ, GAS, WTR, WCNT, DAYS
PRODUCTION_ID,ENTITY_ID,LIQ,GAS,WTR,WCNT,DAYS
Any suggestions?
Thanks, Z
1 个解决方案
#1
8
Simple example:
install.packages("XML")
library("XML")
doc = xmlInternalTreeParse("/Users/ras/test.xml") # your path goes here
myframe = xmlToDataFrame(doc)
myframe
Yields:
ENTITY_ID LIQ GAS WTR WCNT DAYS
1 116484210 0 163 0 1 0
2 116484210 12 130 0 1 0
test.xml being:
<stuff>
<production xmlns="" diffgr:id="production1130" msdata:rowOrder="1129">
<ENTITY_ID>116484210</ENTITY_ID>
<LIQ>0</LIQ>
<GAS>163</GAS>
<WTR>0</WTR>
<WCNT>1</WCNT>
<DAYS>0</DAYS>
</production>
<production xmlns="" diffgr:id="production1131" msdata:rowOrder="1130">
<ENTITY_ID>116484210</ENTITY_ID>
<LIQ>12</LIQ>
<GAS>130</GAS>
<WTR>0</WTR>
<WCNT>1</WCNT>
<DAYS>0</DAYS>
</production>
</stuff>
#1
8
Simple example:
install.packages("XML")
library("XML")
doc = xmlInternalTreeParse("/Users/ras/test.xml") # your path goes here
myframe = xmlToDataFrame(doc)
myframe
Yields:
ENTITY_ID LIQ GAS WTR WCNT DAYS
1 116484210 0 163 0 1 0
2 116484210 12 130 0 1 0
test.xml being:
<stuff>
<production xmlns="" diffgr:id="production1130" msdata:rowOrder="1129">
<ENTITY_ID>116484210</ENTITY_ID>
<LIQ>0</LIQ>
<GAS>163</GAS>
<WTR>0</WTR>
<WCNT>1</WCNT>
<DAYS>0</DAYS>
</production>
<production xmlns="" diffgr:id="production1131" msdata:rowOrder="1130">
<ENTITY_ID>116484210</ENTITY_ID>
<LIQ>12</LIQ>
<GAS>130</GAS>
<WTR>0</WTR>
<WCNT>1</WCNT>
<DAYS>0</DAYS>
</production>
</stuff>