从多行XML中的两个不同标签中提取数据

I have a sample XML like:

我有一个示例XML,如:

<soap:Body
xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<GetRooms_V2Response
    xmlns="http://tempuri.org/">
    <GetRooms>
    <Allocations>
        <AllocationID>426231</AllocationID>
        <AllocationName>Edinburgh Carlton Hotel</AllocationName>
        <ValidFrom>2014-11-01T00:00:00</ValidFrom>
        <ValidTo>2020-12-31T00:00:00</ValidTo>
        <RoomTypes>Double Room</RoomTypes>
        <BookingType>1</BookingType>
        <PriceType>523</PriceType>
        <IsBar>true</IsBar>
        <Days> … (details omitted due to size)
    </Allocations>
    <Allocations>

I want to extract data between AllocationID.../AllocationID and RoomTypes.../RoomTypes. I do not want a multiline script because I will be grepping a few more things before inputting this data.

我想在AllocationID ... / AllocationID和RoomTypes ... / RoomTypes之间提取数据。我不想要一个多行脚本,因为在输入这些数据之前我会更多的东西。

I tried something like this but it reads only single tag at a time:

我试过这样的东西,但它一次只读取一个标签:

sed -n 's:.*AllocationID\(.*\)/AllocationID.*:\1:p' test.xml

and this doesn't work:

这不起作用:

sed -n 's:.*AllocationID\(.*\)/AllocationID.*\RoomTypes\(.*\)</RoomTypes).*:\1,\2:p' test.xml

Can anyone please explain what's the best way to do this?

任何人都可以解释一下这是最好的方法吗?

1 个解决方案

#1

This extracts what you need:

这提取了你需要的东西:

sed -nE 's/(<AllocationID>(.*)<.*|<RoomTypes>(.*)<.*)/\2\3/gp' test.xml

I doubled your file. Output:

我把文件加倍了。输出:

426231
Double Room
426231
Double Room

#1