使用Sql Server查询XML数据

时间:2020-12-10 15:25:25

I have some xml files that I need to parse and convert to relational data. The data is to be stored in SQL Server. I get the files from different sources and I am running into trouble with one of them because the segment for the purchase order number is repeated. The trouble that I am having is that it creates a duplicate record or row. Here is what the xml looks like.

我有一些xml文件,我需要解析并转换为关系数据。数据将存储在SQL Server中。我从不同的来源获取文件,我遇到了其中一个的问题,因为重复了采购订单编号的段。我遇到的麻烦是它创建了一个重复的记录或行。这是xml的样子。

<?xml version="1.0" encoding="UTF-8"?>
<pt:PTShipmentNotification  xmlns:cmn="urn:tracelink:mapper:sl:commontypes"     xmlns:pt="urn:tracelink:mapper:sl:product_track">
<pt:MessageBody>
<pt:ShipmentTransaction>
<cmn:ShipmentIdentifiers>
<cmn:ShipmentId type="ShipmentNumber">22584</cmn:ShipmentId>
<cmn:ShipmentId type="BillOfLading">2584226516</cmn:ShipmentId>
</cmn:ShipmentIdentifiers>
</pt:ShipmentTransaction>
<pt:ShipmentItemDetails>
<cmn:LineItemNumber>1</cmn:LineItemNumber>
<cmn:LotNumber>FT0109</cmn:LotNumber>
<cmn:OrderNDC type="NDC542">49884066009</cmn:OrderNDC>
<cmn:ExpirationDate>2018-09-30</cmn:ExpirationDate>
<cmn:SalesQuantity quantityUnitOfMeasure="EA">12</cmn:SalesQuantity>
<pt:ReferenceDocuments>
<cmn:BusinessDocument type="PurchaseOrder">024136</cmn:BusinessDocument>
<cmn:DocumentDate>2016-02-03</cmn:DocumentDate>
</pt:ReferenceDocuments>
<pt:ReferenceDocuments>
<cmn:BusinessDocument type="PurchaseOrder">024136</cmn:BusinessDocument>
<cmn:DocumentDate>2016-02-03</cmn:DocumentDate>
</pt:ReferenceDocuments>
</pt:ShipmentItemDetails>
<pt:ShipmentItemDetails>
<cmn:LineItemNumber>2</cmn:LineItemNumber>
<cmn:LotNumber>FN0043</cmn:LotNumber>
<cmn:OrderNDC type="NDC542">49884082710</cmn:OrderNDC>
<cmn:ExpirationDate>2019-01-31</cmn:ExpirationDate>
<cmn:SalesQuantity quantityUnitOfMeasure="EA">36</cmn:SalesQuantity>
<pt:ReferenceDocuments>
<cmn:BusinessDocument type="PurchaseOrder">024136</cmn:BusinessDocument>
<cmn:DocumentDate>2016-02-03</cmn:DocumentDate>
</pt:ReferenceDocuments>
<pt:ReferenceDocuments>
<cmn:BusinessDocument type="Invoice">024136</cmn:BusinessDocument>
<cmn:DocumentDate>2016-02-03</cmn:DocumentDate>
</pt:ReferenceDocuments>
</pt:ShipmentItemDetails>
</pt:MessageBody>
</pt:PTShipmentNotification>

Here is the query that I use to load the file into an xml variable and then query through SQL Server Management Studio. In production I do it using SSIS.

这是我用来将文件加载到xml变量然后通过SQL Server Management Studio查询的查询。在生产中,我使用SSIS。

DECLARE @XmlFile XML

SELECT @XmlFile = BulkColumn
FROM  OPENROWSET(BULK 'C:\Files\in\badxml.xml', SINGLE_BLOB) x

;WITH XMLNAMESPACES (
    'urn:tracelink:mapper:sl:commontypes' AS cmn,
    'urn:tracelink:mapper:sl:product_track' AS pt 
)



select 

    T.data.value('@type', 'varchar(20)') AS ShipType,
    T.data.value('.', 'varchar(35)') AS ShipNum,
    n.value('../cmn:LineItemNumber[1]','VARCHAR(30)') AS LineItem,
    n.value('../cmn:OrderNDC[1]/@type', 'varchar(20)' ) AS OrderNDC,


    n.value('../cmn:OrderNDC[1]','VARCHAR(30)') AS NDC,
    n.value('../cmn:LotNumber[1]','VARCHAR(30)') AS Lot,
    n.value('../cmn:SalesQuantity[1]/@quantityUnitOfMeasure','VARCHAR(30)') AS UOM,
    n.value('../cmn:SalesQuantity[1]','VARCHAR(30)') AS QTY,
    n.value('cmn:BusinessDocument[0]','VARCHAR(50)') AS PO,


    GETDATE() ProcessedDt,
    0 ProcessedFlag,
    @XmlFile

from
    @XmlFile.nodes('pt:PTShipmentNotification/pt:MessageBody/pt:ShipmentTransaction/cmn:ShipmentIdentifiers/cmn:ShipmentId') T(data)
CROSS APPLY @XmlFile.nodes('pt:PTShipmentNotification/pt:MessageBody/pt:ShipmentItemDetails/pt:ReferenceDocuments')  x1(n)

WHERE 
    T.data.value('@type', 'varchar(20)') = 'ShipmentNumber' 

    AND n.value('cmn:BusinessDocument[1]/@type','VARCHAR(50)') = 'PurchaseOrder'

Referring to the xml above, If the second occurrence of the purchase order number is called something else such as invoice number (as it usually is), then there is no issue. I get only one record or row per line item number. I am trying to get the sender to fix the file but I am also interested in being able to revise the code to handle the issue. So for line item one in the above xml I get two records. For line item two I only get one record. Does anyone have any suggestions for modifying the query to correct the issue?

参考上面的xml,如果第二次出现的采购订单号被称为其他内容,例如发票号(通常是),则没有问题。每个订单项号只能获得一条记录或行。我试图让发件人修复文件,但我也有兴趣能够修改代码来处理问题。因此,对于上面xml中的第一行,我得到两条记录。对于第二行,我只获得一条记录。有没有人有任何修改查询以纠正问题的建议?

1 个解决方案

#1


0  

Try cross applying to only a single row, instead of trying to filter in the where.

尝试交叉应用到只有一行,而不是尝试在哪里过滤。

CROSS APPLY @XmlFile.nodes('pt:PTShipmentNotification/pt:MessageBody/pt:ShipmentItemDetails/pt:ReferenceDocuments[@type=''PurchaseOrder''][1]')  x1(n)

Remove the where filter on PurchaseOrder.

删除PurchaseOrder上的where过滤器。

DECLARE @XmlFile XML

SELECT @XmlFile = BulkColumn
FROM  OPENROWSET(BULK 'C:\Files\in\badxml.xml', SINGLE_BLOB) x

;WITH XMLNAMESPACES (
    'urn:tracelink:mapper:sl:commontypes' AS cmn,
    'urn:tracelink:mapper:sl:product_track' AS pt 
)



select 

    T.data.value('@type', 'varchar(20)') AS ShipType,
    T.data.value('.', 'varchar(35)') AS ShipNum,
    n.value('../cmn:LineItemNumber[1]','VARCHAR(30)') AS LineItem,
    n.value('../cmn:OrderNDC[1]/@type', 'varchar(20)' ) AS OrderNDC,


    n.value('../cmn:OrderNDC[1]','VARCHAR(30)') AS NDC,
    n.value('../cmn:LotNumber[1]','VARCHAR(30)') AS Lot,
    n.value('../cmn:SalesQuantity[1]/@quantityUnitOfMeasure','VARCHAR(30)') AS UOM,
    n.value('../cmn:SalesQuantity[1]','VARCHAR(30)') AS QTY,
    n.value('cmn:BusinessDocument[0]','VARCHAR(50)') AS PO,


    GETDATE() ProcessedDt,
    0 ProcessedFlag,
    @XmlFile

from
    @XmlFile.nodes('pt:PTShipmentNotification/pt:MessageBody/pt:ShipmentTransaction/cmn:ShipmentIdentifiers/cmn:ShipmentId') T(data)
CROSS APPLY @XmlFile.nodes('pt:PTShipmentNotification/pt:MessageBody/pt:ShipmentItemDetails/pt:ReferenceDocuments[@type=''PurchaseOrder''][1]')  x1(n)

WHERE 
    T.data.value('@type', 'varchar(20)') = 'ShipmentNumber' 

#1


0  

Try cross applying to only a single row, instead of trying to filter in the where.

尝试交叉应用到只有一行,而不是尝试在哪里过滤。

CROSS APPLY @XmlFile.nodes('pt:PTShipmentNotification/pt:MessageBody/pt:ShipmentItemDetails/pt:ReferenceDocuments[@type=''PurchaseOrder''][1]')  x1(n)

Remove the where filter on PurchaseOrder.

删除PurchaseOrder上的where过滤器。

DECLARE @XmlFile XML

SELECT @XmlFile = BulkColumn
FROM  OPENROWSET(BULK 'C:\Files\in\badxml.xml', SINGLE_BLOB) x

;WITH XMLNAMESPACES (
    'urn:tracelink:mapper:sl:commontypes' AS cmn,
    'urn:tracelink:mapper:sl:product_track' AS pt 
)



select 

    T.data.value('@type', 'varchar(20)') AS ShipType,
    T.data.value('.', 'varchar(35)') AS ShipNum,
    n.value('../cmn:LineItemNumber[1]','VARCHAR(30)') AS LineItem,
    n.value('../cmn:OrderNDC[1]/@type', 'varchar(20)' ) AS OrderNDC,


    n.value('../cmn:OrderNDC[1]','VARCHAR(30)') AS NDC,
    n.value('../cmn:LotNumber[1]','VARCHAR(30)') AS Lot,
    n.value('../cmn:SalesQuantity[1]/@quantityUnitOfMeasure','VARCHAR(30)') AS UOM,
    n.value('../cmn:SalesQuantity[1]','VARCHAR(30)') AS QTY,
    n.value('cmn:BusinessDocument[0]','VARCHAR(50)') AS PO,


    GETDATE() ProcessedDt,
    0 ProcessedFlag,
    @XmlFile

from
    @XmlFile.nodes('pt:PTShipmentNotification/pt:MessageBody/pt:ShipmentTransaction/cmn:ShipmentIdentifiers/cmn:ShipmentId') T(data)
CROSS APPLY @XmlFile.nodes('pt:PTShipmentNotification/pt:MessageBody/pt:ShipmentItemDetails/pt:ReferenceDocuments[@type=''PurchaseOrder''][1]')  x1(n)

WHERE 
    T.data.value('@type', 'varchar(20)') = 'ShipmentNumber'