
时间:2022-05-17 09:14:25

I need to extract data from an incoming message that could be in any format. The extracted data to store is also dependent upon the format, i.e. format A could extract field X, Y, Z, but format B could extract field A, B, C. I also need to view Message B by searching for field C within the message.


Right now I'm configuring and storing a the extraction strategy (XSLT) and executing it at runtime when it's related format is encountered, but I'm storing the extracted data in an Oracle database as an XmlType column. Oracle seems to have pretty lax development/support for XmlType as it requires an old jar that forces you to use a pretty old DOM DocumentBuilderFactory impl (looks like Java 1.4 code), which collides with Spring 3, and doesn't play very nicely with Hibernate. The XML queries are slow and non-intuitive as well.

现在我正在配置和存储提取策略(XSLT),并在遇到相关格式时在运行时执行它,但是我将提取的数据作为XmlType列存储在Oracle数据库中。Oracle对XmlType的开发/支持似乎相当松散,因为它需要一个旧的jar,这迫使您使用一个非常旧的DOM DocumentBuilderFactory impl(看起来像Java 1.4代码),它与Spring 3冲突,与Hibernate的性能不太好。XML查询既慢又不直观。

I'm concluding that Oracle with XmlType isn't a very good way to store the extracted data, so my question is, what is the best way to store the serialized/queryable data?


  • NoSQL (Cassandra, CouchDB, MongoDB, etc.)?
  • NoSQL (Cassandra、CouchDB、MongoDB等)?
  • A JCR like JackRabbit?
  • 一个JCR喜欢长耳大野兔吗?
  • A blob with manual de/serialization?
  • 一个具有手动去/序列化的blob ?
  • Another Oracle solution?
  • 另一个甲骨文解决方案吗?
  • Something else??
  • 别的东西? ?

2 个解决方案



One alterative that you haven't listed is using an XML Database. (Notice that Oracle is one of the ten or so XML database products.)


(Obviously, a blob type won't allow querying "inside" the persisted XML objects unless you read each blob instance into memory and do the querying there; e.g. using XSLT.)




I have had great success in storing complex xml objects in PostgreSQL. Together with the functional index features, you can even create indexes on node values of the stored xml files, and use those indexes to do very fast lookups using index scans without having to reparse the XML file.


This however will only work if you know your query patterns, arbitrary xpath queries will be slow also.


Example (untested, contains syntax errors for sure):


Create a simple table:


create table test123 (
    int serial primary key,
    myxml text

Now lets assume you have xml documents like:


    <info>Peter is a <i>very</i> good cook</info>

Now create a function index:


create index idx_test123_name on table123 using xpath(xml,"/test/name");

Now do you fast xml lookups:


SELECT xml FROM test123 WHERE xpath(xml,"/test/name") = 'Peter';

You should also consider creating an index using text_pattern_ops, so you can have fast prefix lookups like:


SELECT xml FROM test123 WHERE xpath(xml,"/test/name") like 'Pe%';



One alterative that you haven't listed is using an XML Database. (Notice that Oracle is one of the ten or so XML database products.)


(Obviously, a blob type won't allow querying "inside" the persisted XML objects unless you read each blob instance into memory and do the querying there; e.g. using XSLT.)




I have had great success in storing complex xml objects in PostgreSQL. Together with the functional index features, you can even create indexes on node values of the stored xml files, and use those indexes to do very fast lookups using index scans without having to reparse the XML file.


This however will only work if you know your query patterns, arbitrary xpath queries will be slow also.


Example (untested, contains syntax errors for sure):


Create a simple table:


create table test123 (
    int serial primary key,
    myxml text

Now lets assume you have xml documents like:


    <info>Peter is a <i>very</i> good cook</info>

Now create a function index:


create index idx_test123_name on table123 using xpath(xml,"/test/name");

Now do you fast xml lookups:


SELECT xml FROM test123 WHERE xpath(xml,"/test/name") = 'Peter';

You should also consider creating an index using text_pattern_ops, so you can have fast prefix lookups like:


SELECT xml FROM test123 WHERE xpath(xml,"/test/name") like 'Pe%';