在xml文件中搜索数据的最佳方法是什么?

In our new project we have to provide a search functionality to retrieve data from hundreds of xml files. I have a brief of our current plan below, I would like to know your suggestions/improvements on this.

在我们的新项目中，我们必须提供搜索功能，以便从数百个xml文件中检索数据。以下是我们目前的计划，我想知道你对这个计划的建议/改进。

These xml files contain personal information, and the search is based on 10 elements in it for example last name, first name, email etc. Our current plan is to create an master XmlDocument with all the searchable data and a key to the actual file. So that when the user searches the data we first look at master file and get the the results. We will also cache the actual xml files from the recent searches so simillar searches later can be handled quickly.

这些xml文件包含个人信息，搜索基于10个元素，例如姓氏、名字、电子邮件等。我们当前的计划是创建一个包含所有可搜索数据的主XmlDocument和一个实际文件的密钥。当用户搜索数据时，我们首先查看主文件并得到结果。我们还将缓存最近搜索的实际xml文件，以便以后可以快速地处理simillar搜索。

Our application is a .net 2.0 web application.

我们的应用程序是。net 2.0 web应用程序。

5 个解决方案

#1

First: how big are the xml files? XmlDocument doesn't scale to "huge"... but can handle "large" OK.

首先:xml文件有多大?XmlDocument不会扩展到“巨大”……但可以处理“大”行。

Second: can you perhaps put the data into a regular database structure (perhaps SQL Server Express Edition), index it, and access via regular TSQL? That will usually out-perform an xpath search. Equally, if it is structured, SQL Server 2005 and above supports the xml data-type, which shreds data - this allows you to index and query xml data in the database without having the entire DOM in memory (it translates xpath into relational queries).

第二:您是否可以将数据放入常规的数据库结构(也许是SQL Server Express Edition)、索引它并通过常规TSQL访问它?这通常要优于xpath搜索。同样，如果它是结构化的，那么SQL Server 2005和上面的SQL Server支持xml数据类型，该数据类型是shreds数据—它允许您在数据库中索引和查询xml数据，而不需要在内存中拥有整个DOM(它将xpath转换为关系查询)。

#2

If you can store then data in a SQL Server database then you could make use of SQL Servers in built XPath query functionality.

如果您可以将数据存储在SQL Server数据库中，那么您可以在构建的XPath查询功能中使用SQL Server。

#3

Hmm, sounds like your building a database over the top of Xml, for performance I'd be reading those files into the DB of your choice, and let it handle indexing and searching for you. If that's not an option get really with XPath, or roll your own exhaustive search using XmlReader.

嗯，听起来像是在Xml的顶部构建一个数据库，为了提高性能，我将把这些文件读入您选择的DB中，让它来处理索引和搜索您。如果这不是使用XPath的选项，也可以使用XmlReader进行详尽的搜索。

Xml is not the answer to every problem, however clean it appears to be, performance will suck.

Xml不是所有问题的答案，不管它看起来多么干净，性能都很糟糕。

#4

Index your XML files. Look into http://incubator.apache.org/lucene.net/

索引您的XML文件。看看http://incubator.apache.org/lucene.net/

I recently used it at my previous job to cache our SQL database for fast searching and very little overhead.

我最近在以前的工作中使用它来缓存SQL数据库，以便进行快速搜索，而且开销很小。

It provides fast searching of content inside xml files (all depending on how you organize your cache).

它提供快速搜索xml文件中的内容(所有这些都取决于您如何组织缓存)。

Very easy and straight forward to use.

非常简单，直接使用。

Much easier than trying to loop through a bunch of files.

比尝试循环遍历一堆文件要容易得多。

#5

Why dont you store the searchable data in a database table with key to the actual file? So your search would be on database table rather than xml file. I suppose this would be faster because you may index the table for faster searching.

为什么不将可搜索的数据存储在数据库表中，并使用实际文件的键呢?因此，您的搜索将位于数据库表上，而不是xml文件上。我想这可能会更快，因为您可以索引表以进行更快的搜索。

#1