如何提高XML读写性能

时间:2022-07-07 04:17:19

I have two .NET applications running independently(can start in any order, or may be only one running) which uses XML as data store. So both applications can read and write to XML file. To keep the data updated , i'm loading the XML file every time from the disk before read and write operations. And i'm using XPath query for querying the particular node. Now there is performance issue observed in this method as there are read and write requests on XML every second from one application(uses polling, and cannot be changed) I'm not sure what exactly is causing the performance hit, but i believe its continuous read write.

我有两个独立运行的.NET应用程序(可以以任何顺序启动,或者可能只运行一个),它使用XML作为数据存储。因此,两个应用程序都可以读写XML文件。为了保持数据更新,我在读取和写入操作之前每次都从磁盘加载XML文件。我正在使用XPath查询来查询特定节点。现在在这个方法中观察到性能问题,因为每秒都有一个应用程序对XML进行读写请求(使用轮询,并且无法更改)我不确定究竟是什么导致了性能损失,但我相信它是连续的读写。

I tried using memory mapped files from .NET 4.0 but i'm restricted to use .NET 3.5 and not any higher versions.

我尝试使用.NET 4.0中的内存映射文件,但我限制使用.NET 3.5而不是更高版本。

Can anyone help me out on this?

任何人都可以帮我解决这个问题吗?

Note : XML nodes have some common attributes , different number of attributes and one ID which i use for XPath querying.

注意:XML节点具有一些公共属性,不同数量的属性和一个用于XPath查询的ID。

5 个解决方案

#1


3  

IF you're sure that performance hit comes from I/O and you can't change both application there is really a little you can do.

如果您确定性能影响来自I / O并且您无法更改这两个应用程序,那么您可以做一些事情。

First solution with zero changes to existing application code: use a RAM disk. If they're using that file as shared memory you can do it without any other change. If data is persistent you may need to perform a background copy to another media after each writing. Performance won't be as good as a true shared memory but at least you won't have to wait for slow I/O operations.

对现有应用程序代码零更改的第一个解决方案:使用RAM磁盘。如果他们将该文件用作共享内存,则可以在不进行任何其他更改的情况下执行此操作。如果数据是持久的,则可能需要在每次写入后对其他媒体执行后台复制。性能不如真正的共享内存好,但至少你不必等待缓慢的I / O操作。

Second solution with changes only in the application that must read data: often the parsing of a XML file is pretty slow (specially if you're using XmlDocument and the file isn't very little). In this case, using XmlReader, you have to make your read code more complicated and to forget about XPath queries but its performance will be many times better than XmlDocument and it won't slow down increasing the file size.

仅在必须读取数据的应用程序中进行更改的第二个解决方案:通常解析XML文件非常慢(特别是如果您使用XmlDocument并且文件不是很少)。在这种情况下,使用XmlReader,您必须使您的读取代码更复杂并忘记XPath查询,但它的性能将比XmlDocument好很多倍,并且它不会减慢增加文件大小的速度。

Small (or not so small) updates: if code of the second application (I guess the one that will read the file) can be changed you can do a little to improve its performance. First of all do not read the file each time. Check its timestamp, register a FileSystemWatcher for that file or whatever else but do not read/parse file each time. When you did this you can go one step forward: read/parse the file only when it changes, prepare your XmlDocument on background (another thread) and make it available for polling requests. If requests are spaced they may even see a very quick response time (but profile performance of XmlDocument XPath query for your typical file).

小(或不是那么小)更新:如果第二个应用程序的代码(我猜是将读取文件的代码)可以更改,您可以做一些改进其性能。首先,每次都不要读取文件。检查其时间戳,为该文件或其他任何内容注册FileSystemWatcher,但每次都不读取/解析文件。当你这样做时,你可以前进一步:只有当文件发生变化时才读取/解析文件,在后台(另一个线程)上准备你的XmlDocument并使其可用于轮询请求。如果请求间隔,它们甚至可能会看到非常快的响应时间(但是对于典型文件,XmlDocument XPath查询的配置文件性能)。

EDIT: here you can find a RAM disk provided by Microsoft. It's pretty simple and naive but usually you/we don't need much more than that. Moreover it's an example on the DDK so you'll get source code too (in this case...just for fun).

编辑:在这里你可以找到微软提供的RAM磁盘。它非常简单和天真,但通常你/我们不需要更多。此外,它是DDK上的一个示例,因此您也将获得源代码(在这种情况下......只是为了好玩)。

#2


2  

XML is not designed for heavy querying. If you need to do this, consider using a database. SQL Server Compact could be a good alternative. If you need to stick with XML though and need to work with large files and need performance consider using XmlReader/XmlWriter which are not loading the entire file into memory and are pretty fast.

XML不是为重度查询而设计的。如果需要这样做,请考虑使用数据库。 SQL Server Compact可能是一个不错的选择。如果您需要坚持使用XML并且需要使用大型文件并且需要性能,请考虑使用XmlReader / XmlWriter,它们不会将整个文件加载到内存中并且非常快。

#3


2  

Instead of reading the XML file very time, just read it the first time and also get the last modified time for the file.

而不是非常时间读取XML文件,只需在第一次读取它,并获得文件的最后修改时间。

Whenver you need to know if the data is up to date, just check the modified time of the file, and only read the file again if it really has changed.

当您需要知道数据是否是最新的时,只需检查文件的修改时间,如果文件确实已更改,则只能再次读取该文件。

#4


2  

Don't poll the file. Read it once and keep it in-memory, then use FileSystemWatcher to reload it only when it changes.

不要轮询文件。读取一次并将其保留在内存中,然后使用FileSystemWatcher仅在更改时重新加载它。

Or alternatively, read the modification timestamp and only reload the file if the timestamp changed.

或者,读取修改时间戳,仅在时间戳更改时重新加载文件。


Also, when reading the file, make sure you lock it non-exclusively so other readers are not blocked.

此外,在阅读文件时,请确保您非独占地锁定文件,以便不阻止其他阅读器。

#5


1  

Try opening the file exclusively. The other application may crash, but if it does not crash, you will know one thing for sure: it cannot invoke much I/O load in one cycle on the shared file, because all its access attempts will fail immediately.

尝试专门打开文件。另一个应用程序可能会崩溃,但如果它没有崩溃,您肯定会知道一件事:它无法在共享文件的一个周期内调用太多的I / O负载,因为它的所有访问尝试都会立即失败。

Hopefully it will just wait a second and retry, and that should work well for you.

希望它只需等待一秒钟然后重试,这应该对您有用。

using (Stream iStream = File.Open("myfile.xml",
            FileMode.Open, FileAccess.ReadWrite, FileShare.None))
{
    ...
}

#1


3  

IF you're sure that performance hit comes from I/O and you can't change both application there is really a little you can do.

如果您确定性能影响来自I / O并且您无法更改这两个应用程序,那么您可以做一些事情。

First solution with zero changes to existing application code: use a RAM disk. If they're using that file as shared memory you can do it without any other change. If data is persistent you may need to perform a background copy to another media after each writing. Performance won't be as good as a true shared memory but at least you won't have to wait for slow I/O operations.

对现有应用程序代码零更改的第一个解决方案:使用RAM磁盘。如果他们将该文件用作共享内存,则可以在不进行任何其他更改的情况下执行此操作。如果数据是持久的,则可能需要在每次写入后对其他媒体执行后台复制。性能不如真正的共享内存好,但至少你不必等待缓慢的I / O操作。

Second solution with changes only in the application that must read data: often the parsing of a XML file is pretty slow (specially if you're using XmlDocument and the file isn't very little). In this case, using XmlReader, you have to make your read code more complicated and to forget about XPath queries but its performance will be many times better than XmlDocument and it won't slow down increasing the file size.

仅在必须读取数据的应用程序中进行更改的第二个解决方案:通常解析XML文件非常慢(特别是如果您使用XmlDocument并且文件不是很少)。在这种情况下,使用XmlReader,您必须使您的读取代码更复杂并忘记XPath查询,但它的性能将比XmlDocument好很多倍,并且它不会减慢增加文件大小的速度。

Small (or not so small) updates: if code of the second application (I guess the one that will read the file) can be changed you can do a little to improve its performance. First of all do not read the file each time. Check its timestamp, register a FileSystemWatcher for that file or whatever else but do not read/parse file each time. When you did this you can go one step forward: read/parse the file only when it changes, prepare your XmlDocument on background (another thread) and make it available for polling requests. If requests are spaced they may even see a very quick response time (but profile performance of XmlDocument XPath query for your typical file).

小(或不是那么小)更新:如果第二个应用程序的代码(我猜是将读取文件的代码)可以更改,您可以做一些改进其性能。首先,每次都不要读取文件。检查其时间戳,为该文件或其他任何内容注册FileSystemWatcher,但每次都不读取/解析文件。当你这样做时,你可以前进一步:只有当文件发生变化时才读取/解析文件,在后台(另一个线程)上准备你的XmlDocument并使其可用于轮询请求。如果请求间隔,它们甚至可能会看到非常快的响应时间(但是对于典型文件,XmlDocument XPath查询的配置文件性能)。

EDIT: here you can find a RAM disk provided by Microsoft. It's pretty simple and naive but usually you/we don't need much more than that. Moreover it's an example on the DDK so you'll get source code too (in this case...just for fun).

编辑:在这里你可以找到微软提供的RAM磁盘。它非常简单和天真,但通常你/我们不需要更多。此外,它是DDK上的一个示例,因此您也将获得源代码(在这种情况下......只是为了好玩)。

#2


2  

XML is not designed for heavy querying. If you need to do this, consider using a database. SQL Server Compact could be a good alternative. If you need to stick with XML though and need to work with large files and need performance consider using XmlReader/XmlWriter which are not loading the entire file into memory and are pretty fast.

XML不是为重度查询而设计的。如果需要这样做,请考虑使用数据库。 SQL Server Compact可能是一个不错的选择。如果您需要坚持使用XML并且需要使用大型文件并且需要性能,请考虑使用XmlReader / XmlWriter,它们不会将整个文件加载到内存中并且非常快。

#3


2  

Instead of reading the XML file very time, just read it the first time and also get the last modified time for the file.

而不是非常时间读取XML文件,只需在第一次读取它,并获得文件的最后修改时间。

Whenver you need to know if the data is up to date, just check the modified time of the file, and only read the file again if it really has changed.

当您需要知道数据是否是最新的时,只需检查文件的修改时间,如果文件确实已更改,则只能再次读取该文件。

#4


2  

Don't poll the file. Read it once and keep it in-memory, then use FileSystemWatcher to reload it only when it changes.

不要轮询文件。读取一次并将其保留在内存中,然后使用FileSystemWatcher仅在更改时重新加载它。

Or alternatively, read the modification timestamp and only reload the file if the timestamp changed.

或者,读取修改时间戳,仅在时间戳更改时重新加载文件。


Also, when reading the file, make sure you lock it non-exclusively so other readers are not blocked.

此外,在阅读文件时,请确保您非独占地锁定文件,以便不阻止其他阅读器。

#5


1  

Try opening the file exclusively. The other application may crash, but if it does not crash, you will know one thing for sure: it cannot invoke much I/O load in one cycle on the shared file, because all its access attempts will fail immediately.

尝试专门打开文件。另一个应用程序可能会崩溃,但如果它没有崩溃,您肯定会知道一件事:它无法在共享文件的一个周期内调用太多的I / O负载,因为它的所有访问尝试都会立即失败。

Hopefully it will just wait a second and retry, and that should work well for you.

希望它只需等待一秒钟然后重试,这应该对您有用。

using (Stream iStream = File.Open("myfile.xml",
            FileMode.Open, FileAccess.ReadWrite, FileShare.None))
{
    ...
}