I am looking for some probable choices to efficiently populate a relational SQL Server database from xml files. So basically I visualize a three step process to accomplish it;
我正在寻找一些可能的选择,以有效地从xml文件填充关系SQL Server数据库。所以基本上我想象了一个完成它的三步过程;
- Read XML from a public url
- 从公共URL读取XML
- populate sql db which is similar to xml schema
- 填充sql db,它类似于xml架构
- populate the target relational sql db.
- 填充目标关系sql db。
I am not sure if mapping the xml directly to the target db is achieveable easily ie. skipping step 2, but my inclination is that it would make the process a little bit complicated.
我不确定是否可以轻松地将xml直接映射到目标数据库,即。跳过第2步,但我的倾向是它会使这个过程有点复杂。
The xml reading part from a public url would be something like www.abc.com/xmlfeed.xml which would require a nightly routine to make this file available to be processed. Some thing like windows task schedular..or any better way?
来自公共URL的xml读取部分类似于www.abc.com/xmlfeed.xml,这需要每晚例程来使该文件可用于处理。有点像windows任务计划......还是更好的方法?
I have only two days to make this work, so I would prefer anything that is quick to implement with less coding effort. However I do need this method to be maintainable in the future, as I would be receiving the new xml data every day with the same old schema. In case the schema changes a little bit, I would like the process of tweaking the routine to be hassle free.
我只有两天的时间来完成这项工作,所以我更喜欢任何能够以较少的编码工作快速实现的东西。但是我确实需要这种方法在将来可维护,因为我每天都会使用相同的旧模式接收新的xml数据。如果架构发生了一些变化,我希望调整例程的过程无忧无虑。
I thought that migration of legacy data to SQL Server would be a few minutes task due to the frequency of such requirement, but to my surprise there are very little discussion/comparisions on the internet for different xml migration techniques.I am really confused to decide on the route that I should take, a pure SQL Server solution like SSIS or something like xml parsers.
我认为由于这种要求的频率,遗留数据到SQL Server的迁移将是几分钟的任务,但令我惊讶的是,对于不同的xml迁移技术,在互联网上几乎没有讨论/比较。我真的很困惑地决定在我应该采取的路线上,像SSIS这样的纯SQL Server解决方案或像xml解析器之类的东西。
1 个解决方案
#1
1
As I read your post through, my very first idea was SSIS, and at the end you wrote it yourself. Especially if you are familiar with it, I recommend it. You can implement such a solution in two days.
当我阅读你的帖子时,我的第一个想法是SSIS,最后你自己写了。特别是如果你熟悉它,我推荐它。您可以在两天内实施此类解决方案。
After you implemented the ETL process you can create an SQL Server Agent job which will schedule your SSIS package to run at the time you want it to run. It supports running packages from SQL Server or File System.
实现ETL过程后,您可以创建一个SQL Server代理作业,该作业将安排您的SSIS包在您希望它运行时运行。它支持从SQL Server或文件系统运行包。
EDIT
According to your example. It's fully possibe to implement such a solution in SSIS. I give some screenshot about a sample project which process your XML sturcture.
根据你的例子。在SSIS中实现这样的解决方案是完全可能的。我给出了一些处理XML结构的示例项目的截图。
-
First image shows that the SSIS package consists of 3 control flow steps. Each of them is a Data Flow Task. First it process the manufacturers then the models then cars.
第一张图显示SSIS包由3个控制流程步骤组成。它们中的每一个都是数据流任务。首先它处理制造商然后处理模型然后汽车。
-
I implemented only the manufacturers part. This is shown in image #2 and #3. (They overlap a little bit.) In #2 I read the XML content (XML Source task), aggregate it (Aggregation task) by manufacturer. Then I sort them also by manufacturer name (Sort task). On the other side I read the manufacturers existing in SQL database (through OLE DB Source task), then this will be also sorted.
我只实施了制造商部分。这在图像#2和#3中示出。 (它们重叠一点。)在#2中,我读取了XML内容(XML Source任务),由制造商聚合它(聚合任务)。然后我也按制造商名称(排序任务)对它们进行排序。另一方面,我读了SQL数据库中存在的制造商(通过OLE DB Source任务),然后这也将被排序。
-
After that these two sources are merged (Merge join task) by a join operation (similar as in SQL). In this case this is a FULL OUTER JOIN so you can figure out which manufacturer is new and which one should be deleted. I split the records into two parts according the previous two conditions (new, deleted).
之后,通过连接操作(类似于SQL)合并这两个源(合并连接任务)。在这种情况下,这是一个FULL OUTER JOIN,因此您可以确定哪个制造商是新的,哪个应该删除。我根据前两个条件(新的,删除的)将记录分成两部分。
-
Finally I add the new manufacturers through an OLE DB Destination task, and delete the missing manufacturers with the help of an OLE DB Command task. In the latter case I assume there's a stored procedure (called DeleteManufacturer(@ManufacturerName)) in SQL which will delete the manufacturer and all attached models and cars. (Cascade Delete)
最后,我通过OLE DB Destination任务添加新的制造商,并在OLE DB Command任务的帮助下删除缺少的制造商。在后一种情况下,我假设SQL中存在一个存储过程(称为DeleteManufacturer(@ManufacturerName)),它将删除制造商和所有附加的模型和汽车。 (级联删除)
The other two data flow tasks should be implemented in the same way. If you should uptade the matching records, the Conditional Split task must have three conditions and a new tree bunch attached to this third condition. Here a new OLE DB Command can be used with an UPDATE statement.
其他两个数据流任务应以相同的方式实现。如果您应该获取匹配的记录,则条件性拆分任务必须具有三个条件,并且新的树束附加到此第三个条件。这里新的OLE DB命令可以与UPDATE语句一起使用。
As I wrote previously if you are ready with the package, an SQL Server Agent job should be created, which will run your package at night (or at the time you wish).
正如我之前所写,如果您已准备好使用该软件包,则应创建一个SQL Server代理作业,该作业将在晚上(或您希望的时间)运行您的软件包。
#1
1
As I read your post through, my very first idea was SSIS, and at the end you wrote it yourself. Especially if you are familiar with it, I recommend it. You can implement such a solution in two days.
当我阅读你的帖子时,我的第一个想法是SSIS,最后你自己写了。特别是如果你熟悉它,我推荐它。您可以在两天内实施此类解决方案。
After you implemented the ETL process you can create an SQL Server Agent job which will schedule your SSIS package to run at the time you want it to run. It supports running packages from SQL Server or File System.
实现ETL过程后,您可以创建一个SQL Server代理作业,该作业将安排您的SSIS包在您希望它运行时运行。它支持从SQL Server或文件系统运行包。
EDIT
According to your example. It's fully possibe to implement such a solution in SSIS. I give some screenshot about a sample project which process your XML sturcture.
根据你的例子。在SSIS中实现这样的解决方案是完全可能的。我给出了一些处理XML结构的示例项目的截图。
-
First image shows that the SSIS package consists of 3 control flow steps. Each of them is a Data Flow Task. First it process the manufacturers then the models then cars.
第一张图显示SSIS包由3个控制流程步骤组成。它们中的每一个都是数据流任务。首先它处理制造商然后处理模型然后汽车。
-
I implemented only the manufacturers part. This is shown in image #2 and #3. (They overlap a little bit.) In #2 I read the XML content (XML Source task), aggregate it (Aggregation task) by manufacturer. Then I sort them also by manufacturer name (Sort task). On the other side I read the manufacturers existing in SQL database (through OLE DB Source task), then this will be also sorted.
我只实施了制造商部分。这在图像#2和#3中示出。 (它们重叠一点。)在#2中,我读取了XML内容(XML Source任务),由制造商聚合它(聚合任务)。然后我也按制造商名称(排序任务)对它们进行排序。另一方面,我读了SQL数据库中存在的制造商(通过OLE DB Source任务),然后这也将被排序。
-
After that these two sources are merged (Merge join task) by a join operation (similar as in SQL). In this case this is a FULL OUTER JOIN so you can figure out which manufacturer is new and which one should be deleted. I split the records into two parts according the previous two conditions (new, deleted).
之后,通过连接操作(类似于SQL)合并这两个源(合并连接任务)。在这种情况下,这是一个FULL OUTER JOIN,因此您可以确定哪个制造商是新的,哪个应该删除。我根据前两个条件(新的,删除的)将记录分成两部分。
-
Finally I add the new manufacturers through an OLE DB Destination task, and delete the missing manufacturers with the help of an OLE DB Command task. In the latter case I assume there's a stored procedure (called DeleteManufacturer(@ManufacturerName)) in SQL which will delete the manufacturer and all attached models and cars. (Cascade Delete)
最后,我通过OLE DB Destination任务添加新的制造商,并在OLE DB Command任务的帮助下删除缺少的制造商。在后一种情况下,我假设SQL中存在一个存储过程(称为DeleteManufacturer(@ManufacturerName)),它将删除制造商和所有附加的模型和汽车。 (级联删除)
The other two data flow tasks should be implemented in the same way. If you should uptade the matching records, the Conditional Split task must have three conditions and a new tree bunch attached to this third condition. Here a new OLE DB Command can be used with an UPDATE statement.
其他两个数据流任务应以相同的方式实现。如果您应该获取匹配的记录,则条件性拆分任务必须具有三个条件,并且新的树束附加到此第三个条件。这里新的OLE DB命令可以与UPDATE语句一起使用。
As I wrote previously if you are ready with the package, an SQL Server Agent job should be created, which will run your package at night (or at the time you wish).
正如我之前所写,如果您已准备好使用该软件包,则应创建一个SQL Server代理作业,该作业将在晚上(或您希望的时间)运行您的软件包。