I want to store an xml that I receive in my java web service. Reports would be run every 5 mins to pull some data in the xml elements.
我想存储我在java Web服务中收到的xml。报告将每5分钟运行一次,以便在xml元素中提取一些数据。
I thought of two approaches to solve this problem.
我想到了解决这个问题的两种方法。
-
Create multiple tables in the database to capture the xml data. Basically each element will have its own column in the database.
在数据库中创建多个表以捕获xml数据。基本上每个元素在数据库中都有自己的列。
-
Dump the whole xml in a column that can store xml data. For reporting purposes parse the value in the query itself.
将整个xml转储到可以存储xml数据的列中。出于报告目的,在查询本身中解析该值。
Which of the above approaches is better, particularly in terms of performance? This is critical since reports will be generated in very high frequency (every 5 mins).
上述哪种方法更好,特别是在性能方面?这是至关重要的,因为报告将以非常高的频率(每5分钟)生成。
The xml schema is pretty complicated and not a simple one.
xml架构非常复杂,而不是一个简单的架构。
6 个解决方案
#1
8
If data is going to be written once and queried many times, it will almost certainly be more efficient to parse the XML document once, store the data in a proper relational schema, and query the relational schema. Parsing XML is not cheap so the overhead of parsing potentially multiple XML documents every 5 minutes could be substantial.
如果要编写一次数据并多次查询,那么解析XML文档一次,将数据存储在适当的关系模式中并查询关系模式几乎肯定会更有效。解析XML并不便宜,因此每5分钟解析一次可能的多个XML文档的开销可能很大。
Of course, as will all performance questions, your mileage may vary so it may be worth testing. If you are using Oracle 11.2 and you store the data as binary XML (in which case it is stored after being parsed) and you create appropriate XMLIndexes on the XMLTypes you are storing, the performance penalty for leaving the data in the XML document may be quite small. It should still be slower than a proper relational structure but the difference may not be meaningful to you.
当然,与所有性能问题一样,您的里程可能会有所不同,因此可能值得测试。如果您使用的是Oracle 11.2并且将数据存储为二进制XML(在这种情况下它将在解析后存储),并且您在存储的XMLTypes上创建了适当的XMLIndexes,则将数据保留在XML文档中的性能损失可能是相当小。它应该仍然比适当的关系结构慢,但差异可能对你没有意义。
Personally, I'd prefer the relational storage approach in general even ignoring the performance issues because it makes it easier for others to interact with the data. There are far more developers that can write decent SQL than can write decent XPath expressions and there are far more query tools that can generate reports off of relational tables than can generate reports off of XML stored in a database.
就个人而言,我更喜欢关系存储方法,甚至忽略了性能问题,因为它使其他人更容易与数据交互。有更多的开发人员可以编写体面的SQL而不是编写合适的XPath表达式,并且有更多的查询工具可以从关系表生成报告,而不是从存储在数据库中的XML生成报告。
#2
5
Maximus, It really depends on what you want to do with the XML data.
Maximus,这实际上取决于您想要对XML数据做什么。
When I use XML for control purposes, such as configuring how a page displays, I will store the whole XML in a single BLOB field. It's fast and extremely simple. It's a simple save and load routine. You can easily view the XML in the BLOB field, and edit it.
当我使用XML进行控制时,例如配置页面的显示方式,我将整个XML存储在一个BLOB字段中。它快速而且非常简单。这是一个简单的保存和加载例程。您可以在BLOB字段中轻松查看XML并进行编辑。
If you need to search for or report on values inside of the XML, such as how many customers have a specific attribute, you probably want to parse into individual attributes. This will generally mean that you will have to do some pre and post processing, but allows you to quickly get to individual attributes.
如果您需要搜索或报告XML内部的值,例如有多少客户具有特定属性,您可能希望解析为单个属性。这通常意味着您必须进行一些预处理和后期处理,但允许您快速获取单个属性。
#3
4
Adhoc Access
临时访问
If you need to run efficent queries on the data contained in the XML in an adhoc or arbitrary manner you should parse it out into Tables
and Columns
that can logically be index and joined upon.
如果您需要以adhoc或任意方式对XML中包含的数据运行高效查询,则应将其解析为逻辑上可以索引和连接的表和列。
Limited Access
限制访问
If you are just storing the data, and delivering it based on some other criteria such as an unique id or other key, and the XML is essentially an opaque BLOB
then just store it in a BLOB
column and be done with it.
如果您只是存储数据,并根据其他标准(例如唯一ID或其他密钥)提供数据,并且XML本质上是一个不透明的BLOB,那么只需将其存储在BLOB列中并完成它。
Hybrid Model
混合模型
What you will probably need is something in between, where the XML is stored in a BLOB
and only relevant bits are stored in Tables
and Columns
so you can search for the XML payload effectively.
您可能需要的是介于两者之间的内容,其中XML存储在BLOB中,并且只有相关位存储在表和列中,因此您可以有效地搜索XML有效负载。
#4
1
Without knowing bit more, it is hard to say for sure, but most likely you are missing one important part that can simplify life a lot.
不知道多少,很难肯定,但很可能你错过了一个可以简化生活的重要部分。
- Bind from XML to POJOs (JAXB, MOXy or JibX)
- 从XML绑定到POJO(JAXB,MOXy或JibX)
- Store as normalized columns from POJO (use jDBI, Hibernate, or even simple JDBC templates)
- 存储为POJO的规范化列(使用jDBI,Hibernate,甚至是简单的JDBC模板)
Also, depending on exactly what kind of reports you produce, perhaps consider possibility of just keeping data in memory -- every 5 minutes does not sound like performance critical, but then again persistence is not always needed (or just is for historical data or backups).
此外,根据您生成的报告类型,可能会考虑将数据保存在内存中的可能性 - 每5分钟看起来不像性能关键,但是并不总是需要持久性(或者仅用于历史数据或备份) )。
#5
1
If you need to keep and query more than a couple of xml documents you should use a XML database..
如果您需要保留和查询多个xml文档,则应使用XML数据库。
eXist is nice, keep those xmls in a column or disagrete them in many tables is a bad option I think..
eXist很好,将这些xmls保留在列中或在许多表中分散它们是我认为不好的选择..
#6
0
You could also check out the XMLData column type which is in Sqlserver or Xml Type in Oracle http://msdn.microsoft.com/en-us/library/hh403385.aspx
您还可以查看Oracle中的Sqlserver或Xml类型中的XMLData列类型http://msdn.microsoft.com/en-us/library/hh403385.aspx
You could create computed columns on your xml data column for those xml fields that are queried the most which would help in faster retrievals. To retrieve a certain value at a certain xpath, you just need to pass the xpath to sqlserver for it to return the value at that xpath to you.
您可以在xml数据列上为那些查询最多的xml字段创建计算列,这有助于更快地检索。要在某个xpath处检索某个值,您只需将xpath传递给sqlserver,以便将该路径的值返回给您。
#1
8
If data is going to be written once and queried many times, it will almost certainly be more efficient to parse the XML document once, store the data in a proper relational schema, and query the relational schema. Parsing XML is not cheap so the overhead of parsing potentially multiple XML documents every 5 minutes could be substantial.
如果要编写一次数据并多次查询,那么解析XML文档一次,将数据存储在适当的关系模式中并查询关系模式几乎肯定会更有效。解析XML并不便宜,因此每5分钟解析一次可能的多个XML文档的开销可能很大。
Of course, as will all performance questions, your mileage may vary so it may be worth testing. If you are using Oracle 11.2 and you store the data as binary XML (in which case it is stored after being parsed) and you create appropriate XMLIndexes on the XMLTypes you are storing, the performance penalty for leaving the data in the XML document may be quite small. It should still be slower than a proper relational structure but the difference may not be meaningful to you.
当然,与所有性能问题一样,您的里程可能会有所不同,因此可能值得测试。如果您使用的是Oracle 11.2并且将数据存储为二进制XML(在这种情况下它将在解析后存储),并且您在存储的XMLTypes上创建了适当的XMLIndexes,则将数据保留在XML文档中的性能损失可能是相当小。它应该仍然比适当的关系结构慢,但差异可能对你没有意义。
Personally, I'd prefer the relational storage approach in general even ignoring the performance issues because it makes it easier for others to interact with the data. There are far more developers that can write decent SQL than can write decent XPath expressions and there are far more query tools that can generate reports off of relational tables than can generate reports off of XML stored in a database.
就个人而言,我更喜欢关系存储方法,甚至忽略了性能问题,因为它使其他人更容易与数据交互。有更多的开发人员可以编写体面的SQL而不是编写合适的XPath表达式,并且有更多的查询工具可以从关系表生成报告,而不是从存储在数据库中的XML生成报告。
#2
5
Maximus, It really depends on what you want to do with the XML data.
Maximus,这实际上取决于您想要对XML数据做什么。
When I use XML for control purposes, such as configuring how a page displays, I will store the whole XML in a single BLOB field. It's fast and extremely simple. It's a simple save and load routine. You can easily view the XML in the BLOB field, and edit it.
当我使用XML进行控制时,例如配置页面的显示方式,我将整个XML存储在一个BLOB字段中。它快速而且非常简单。这是一个简单的保存和加载例程。您可以在BLOB字段中轻松查看XML并进行编辑。
If you need to search for or report on values inside of the XML, such as how many customers have a specific attribute, you probably want to parse into individual attributes. This will generally mean that you will have to do some pre and post processing, but allows you to quickly get to individual attributes.
如果您需要搜索或报告XML内部的值,例如有多少客户具有特定属性,您可能希望解析为单个属性。这通常意味着您必须进行一些预处理和后期处理,但允许您快速获取单个属性。
#3
4
Adhoc Access
临时访问
If you need to run efficent queries on the data contained in the XML in an adhoc or arbitrary manner you should parse it out into Tables
and Columns
that can logically be index and joined upon.
如果您需要以adhoc或任意方式对XML中包含的数据运行高效查询,则应将其解析为逻辑上可以索引和连接的表和列。
Limited Access
限制访问
If you are just storing the data, and delivering it based on some other criteria such as an unique id or other key, and the XML is essentially an opaque BLOB
then just store it in a BLOB
column and be done with it.
如果您只是存储数据,并根据其他标准(例如唯一ID或其他密钥)提供数据,并且XML本质上是一个不透明的BLOB,那么只需将其存储在BLOB列中并完成它。
Hybrid Model
混合模型
What you will probably need is something in between, where the XML is stored in a BLOB
and only relevant bits are stored in Tables
and Columns
so you can search for the XML payload effectively.
您可能需要的是介于两者之间的内容,其中XML存储在BLOB中,并且只有相关位存储在表和列中,因此您可以有效地搜索XML有效负载。
#4
1
Without knowing bit more, it is hard to say for sure, but most likely you are missing one important part that can simplify life a lot.
不知道多少,很难肯定,但很可能你错过了一个可以简化生活的重要部分。
- Bind from XML to POJOs (JAXB, MOXy or JibX)
- 从XML绑定到POJO(JAXB,MOXy或JibX)
- Store as normalized columns from POJO (use jDBI, Hibernate, or even simple JDBC templates)
- 存储为POJO的规范化列(使用jDBI,Hibernate,甚至是简单的JDBC模板)
Also, depending on exactly what kind of reports you produce, perhaps consider possibility of just keeping data in memory -- every 5 minutes does not sound like performance critical, but then again persistence is not always needed (or just is for historical data or backups).
此外,根据您生成的报告类型,可能会考虑将数据保存在内存中的可能性 - 每5分钟看起来不像性能关键,但是并不总是需要持久性(或者仅用于历史数据或备份) )。
#5
1
If you need to keep and query more than a couple of xml documents you should use a XML database..
如果您需要保留和查询多个xml文档,则应使用XML数据库。
eXist is nice, keep those xmls in a column or disagrete them in many tables is a bad option I think..
eXist很好,将这些xmls保留在列中或在许多表中分散它们是我认为不好的选择..
#6
0
You could also check out the XMLData column type which is in Sqlserver or Xml Type in Oracle http://msdn.microsoft.com/en-us/library/hh403385.aspx
您还可以查看Oracle中的Sqlserver或Xml类型中的XMLData列类型http://msdn.microsoft.com/en-us/library/hh403385.aspx
You could create computed columns on your xml data column for those xml fields that are queried the most which would help in faster retrievals. To retrieve a certain value at a certain xpath, you just need to pass the xpath to sqlserver for it to return the value at that xpath to you.
您可以在xml数据列上为那些查询最多的xml字段创建计算列,这有助于更快地检索。要在某个xpath处检索某个值,您只需将xpath传递给sqlserver,以便将该路径的值返回给您。