如何grep我的xml文件并保存输出？

I am just giving part of huge xml file

我只是给了一些巨大的xml文件

   <caldata chopper="on" gain_1="0" gain_2="0" gain_3="0" impedance="(0,0)">
      <c0 unit="V">0.00000000e+00</c0>
      <c1 unit="Hz">4.00000000e+04</c1>
      <c2 unit="V/(nT*Hz)">8.35950000e-06</c2>
      <c3 unit="deg">-1.17930000e+02</c3>
    </caldata>
    <caldata chopper="on" gain_1="0" gain_2="0" gain_3="0" impedance="(0,0)">
      <c0 unit="V">0.00000000e+00</c0>
      <c1 unit="Hz">5.55810000e+04</c1>
      <c2 unit="V/(nT*Hz)">4.43400000e-06</c2>
      <c3 unit="deg">-1.58280000e+02</c3>
    </caldata>
    <caldata chopper="on" gain_1="0" gain_2="0" gain_3="0" impedance="(0,0)">
      <c0 unit="V">0.00000000e+00</c0>
      <c1 unit="Hz">6.00000000e+04</c1>
      <c2 unit="V/(nT*Hz)">3.63180000e-06</c2>
      <c3 unit="deg">-1.67340000e+02</c3>
    </caldata>
    <caldata chopper="off" gain_1="0" gain_2="0" gain_3="0" impedance="(0,0)">
      <c0 unit="V">0.00000000e+00</c0>
      <c1 unit="Hz">4.00000000e-01</c1>
      <c2 unit="V/(nT*Hz)">1.07140000e-02</c2>
      <c3 unit="deg">1.48080000e+02</c3>
    </caldata>
    <caldata chopper="off" gain_1="0" gain_2="0" gain_3="0" impedance="(0,0)">
      <c0 unit="V">0.00000000e+00</c0>
      <c1 unit="Hz">5.55800000e-01</c1>
      <c2 unit="V/(nT*Hz)">1.33250000e-02</c2>
      <c3 unit="deg">1.39110000e+02</c3>
    </caldata>
    <caldata chopper="off" gain_1="0" gain_2="0" gain_3="0" impedance="(0,0)">
      <c0 unit="V">0.00000000e+00</c0>
      <c1 unit="Hz">7.72300000e-01</c1>
      <c2 unit="V/(nT*Hz)">1.57750000e-02</c2>
      <c3 unit="deg">1.29560000e+02</c3>

I have tried like this

我试过这样的

grep '<c1 unit="Hz"' *.xml | cut -f2 -d">"|cut -f1 -d"<"

Works fine bit what I really want is output only when caldata chopper="off" and to save my output to file. How to do this?

工作得很好我真正想要的只是当caldata chopper =“off”输出并将输出保存到文件。这该怎么做?

3 个解决方案

#1

A solution would be to use an XML grep, such as xgrep. I tried it myself on my machine and got this:

解决方案是使用XML grep,例如xgrep。我在自己的机器上自己试了一下,得到了这个:

$ xgrep -t -x '//caldata[@chopper="off"]/c1[@unit="Hz"]/text()' test.xml 
4.00000000e-01
5.55800000e-01
7.72300000e-01

The secret is the XPath expression:

秘诀是XPath表达式:

//caldata[@chopper="off"] - take all caldata element with chopper attribute equals to off;

// caldata [@ chopper =“off”] - 将所有caldata元素与chopper属性等于off;

c1[@unit="Hz"] - from that caldata elements, get c1 elements with unit attribute equals to Hz;

c1 [@ unit =“Hz”] - 从该caldata元素中获取单位属性等于Hz的c1元素;

text() - from those c1 elements, get only the text content.

text() - 从那些c1元素中,只获取文本内容。

To save it to an output file, just use the > redirector from the shell. We just need to add it after the command, and then add the name of the file to get the output:

要将其保存到输出文件,只需使用shell中的>重定向器即可。我们只需要在命令后添加它,然后添加文件名来获取输出:

$ xgrep -t -x '//caldata[@chopper="off"]/c1[@unit="Hz"]/text()' test.xml  > output.xml
$ cat output.xml 
4.00000000e-01
5.55800000e-01
7.72300000e-01

I don't know if you could use a custom tool like this, sure, but if you can, it can be your best solution.

我不知道你是否可以使用这样的自定义工具,但如果可以的话,它可能是你最好的解决方案。

#2

This will do:

这样做:

cat file.xml | awk '/chopper="off"/,/calcdata/{print}' | grep 'unit="Hz"' | sed 's/^.*">//;s/<.*$//'

The first command (awk) takes only the chunks that contain chopper="off". The second command (grep) takes only the lines with the numbers you want. The third command (sed) takes the number from the line.

第一个命令(awk)只接受包含chopper =“off”的块。第二个命令(grep)只接受包含所需数字的行。第三个命令(sed)从行中获取数字。

#3

Since you're using grep, I'm going to assume some flavor of *nix and a command-line type solution

既然你正在使用grep,我将假设一些* nix和命令行类型的解决方案

In that case, you probably want to look at something like zorba, which will parse your input document with an xquery and output the parts you want.

在这种情况下,您可能希望查看类似zorba的内容,它将使用xquery解析输入文档并输出所需的部分。

If the container element in your data was foo, the xquery would contain:

如果数据中的容器元素是foo,则xquery将包含:

for $c in /foo/caldata
return if ($c/@chopper="on")
then $c else ""

#1