Shell脚本从XML文件中提取某些字段

时间:2021-10-09 00:26:03

I am new to the Linux shell and I can't understand regex's.

我是Linux shell的新手,我无法理解正则表达式。

Here is my question: I have a directory called /var/visitors and under this directory, I have directories like a, b, c, d. In each of these directories, there is a file called list.xml and here, for example, is the content of list.xml from /var/visitors/a:

这是我的问题:我有一个名为/ var / visitors的目录,在这个目录下,我有一个目录,如a,b,c,d。在每个目录中,都有一个名为list.xml的文件,例如,这里是/ var / visitors / a中list.xml的内容:

<key>Name</key>
<string>Mr Jones</string>
<key>ID</key>
<string>51</string>
<key>Len</key>
<string>53151334</string>

What I want to do is to merge the Name field with its corresponding string and merge the ID field with its corresponding string. I don't need any other fields.

我想要做的是将Name字段与其对应的字符串合并,并将ID字段与其对应的字符串合并。我不需要任何其他领域。

Name: Mr Jones
ID: 51
---
Name: Ms Maggie
ID: 502

Here is what I how far I got:

这是我到底有多远:

cd /var/visitors
find -name "list.xml" | xargs grep ?????

Please help.

5 个解决方案

#1


Not elegant, but this will work:

不优雅,但这会起作用:

find -name "list.xml" | xargs cat | tr -d "\n" | sed 's/<\/string>/\n/g' | sed 's/<\/key>/: /g' | sed 's/<[^>]*>//g' | egrep "Name:|ID:" | sed 's/Name: /---\nName: /g'

Basically it does this:

基本上这样做:

  • remove all newlines
  • 删除所有换行符

  • put each key value pair on its own line
  • 将每个键值对放在自己的行上

  • add : separator
  • 添加:分隔符

  • remove all element content (between < and >)
  • 删除所有元素内容(在 <和> 之间)

  • only save Name and ID fields (drop all others)
  • 仅保存名称和ID字段(删除所有其他字段)

  • add --- separator
  • 添加---分隔符

Sample Output:

---
Name: Greg
ID: 52
---
Name: Amy
ID: 53
---
Name: Mr Jones
ID: 51

#2


Grep is not going to help you here, you are going to need to use something like sed or awk.

Grep在这里不会帮助你,你需要使用像sed或awk这样的东西。

#3


This is real dirty, but if you're sure they're in the format they're in, you could throw some perl together to parse it... something like

这真的很脏,但是如果你确定它们的格式是他们的,你可以把一些perl放在一起来解析它......就像这样

for (<STDIN>) {
  if (/<key>([^<]*)</) { print $1 . " : "; }
  if (/<string>([^<]*)</) { print $1 . "\n"; }
}

that may not be perfect, but close to accomplishing what you're looking for. I'm sure there is probably some perl module that will parse XML for you, too, but for such a non-complex schema, I think you'll be ok without it.

这可能并不完美,但接近完成你正在寻找的东西。我确定可能有一些perl模块也会为你解析XML,但是对于这样一个非复杂的模式,我认为如果没有它,你会没事的。

#4


Assuming you have the file foo.bar containing the following text:

假设您的文件foo.bar包含以下文本:

<key>Name</key>
<string>Mr Jones</string>
<key>ID</key>
<string>51</string>
<key>Len</key>
<string>53151334</string>

something like this will work:

这样的事情会起作用:

$ awk -F '[<>]' '{if (FNR%2==1) {printf "%s: ",$3} else {print $3}}' foo.bar
Name: Mr Jones
ID: 51
Len: 53151334

If it's not entirely what you're wanting, shoe-horn it further to meet your specific requirements.

如果它不完全是您想要的,那么它可以进一步满足您的特定要求。

#5


I didn't include the separator line because I wasn't sure if you wanted it or it was just an artifact of using grep. It's easy enough to add it in:

我没有包含分隔符行,因为我不确定你是否想要它或者它只是使用grep的工件。添加它很容易:

find -name "list.xml" | xargs awk  -F '[<>]' -f xml.awk < in.dat

And the contents of xml.awk:

和xml.awk的内容:

$2 != "string" { K=$3 }
$2 == "string" { if ((K == "Name") || (K == "ID")) print K ": " $3 }

#1


Not elegant, but this will work:

不优雅,但这会起作用:

find -name "list.xml" | xargs cat | tr -d "\n" | sed 's/<\/string>/\n/g' | sed 's/<\/key>/: /g' | sed 's/<[^>]*>//g' | egrep "Name:|ID:" | sed 's/Name: /---\nName: /g'

Basically it does this:

基本上这样做:

  • remove all newlines
  • 删除所有换行符

  • put each key value pair on its own line
  • 将每个键值对放在自己的行上

  • add : separator
  • 添加:分隔符

  • remove all element content (between < and >)
  • 删除所有元素内容(在 <和> 之间)

  • only save Name and ID fields (drop all others)
  • 仅保存名称和ID字段(删除所有其他字段)

  • add --- separator
  • 添加---分隔符

Sample Output:

---
Name: Greg
ID: 52
---
Name: Amy
ID: 53
---
Name: Mr Jones
ID: 51

#2


Grep is not going to help you here, you are going to need to use something like sed or awk.

Grep在这里不会帮助你,你需要使用像sed或awk这样的东西。

#3


This is real dirty, but if you're sure they're in the format they're in, you could throw some perl together to parse it... something like

这真的很脏,但是如果你确定它们的格式是他们的,你可以把一些perl放在一起来解析它......就像这样

for (<STDIN>) {
  if (/<key>([^<]*)</) { print $1 . " : "; }
  if (/<string>([^<]*)</) { print $1 . "\n"; }
}

that may not be perfect, but close to accomplishing what you're looking for. I'm sure there is probably some perl module that will parse XML for you, too, but for such a non-complex schema, I think you'll be ok without it.

这可能并不完美,但接近完成你正在寻找的东西。我确定可能有一些perl模块也会为你解析XML,但是对于这样一个非复杂的模式,我认为如果没有它,你会没事的。

#4


Assuming you have the file foo.bar containing the following text:

假设您的文件foo.bar包含以下文本:

<key>Name</key>
<string>Mr Jones</string>
<key>ID</key>
<string>51</string>
<key>Len</key>
<string>53151334</string>

something like this will work:

这样的事情会起作用:

$ awk -F '[<>]' '{if (FNR%2==1) {printf "%s: ",$3} else {print $3}}' foo.bar
Name: Mr Jones
ID: 51
Len: 53151334

If it's not entirely what you're wanting, shoe-horn it further to meet your specific requirements.

如果它不完全是您想要的,那么它可以进一步满足您的特定要求。

#5


I didn't include the separator line because I wasn't sure if you wanted it or it was just an artifact of using grep. It's easy enough to add it in:

我没有包含分隔符行,因为我不确定你是否想要它或者它只是使用grep的工件。添加它很容易:

find -name "list.xml" | xargs awk  -F '[<>]' -f xml.awk < in.dat

And the contents of xml.awk:

和xml.awk的内容:

$2 != "string" { K=$3 }
$2 == "string" { if ((K == "Name") || (K == "ID")) print K ": " $3 }