在SQL Server中存储大量XML类型数据的最佳实践

时间:2022-07-04 16:56:44

Does anyone have any best practices they can share with regards to storing XML field type data in SQL Server 2008? We have lots of small XML structures, yet some larger (>50MB). We're finding that things get a little slow on DELETE. Any advice/war stories would be appreciated.

有没有人可以分享关于在SQL Server 2008中存储XML字段类型数据的最佳实践?我们有很多小的XML结构,但有些更大(> 50MB)。我们发现DELETE的事情变得有点慢。任何建议/战争故事将不胜感激。

5 个解决方案

#1


3  

Another vote for outside the database.

对数据库外的另一次投票。

In the past, I've used an approach similar to what James recommends, but SQL Server 2008 supports a new FILESTREAM storage option, which can store varbinary(max) columns outside the database on NTFS, and might be worth looking into.

在过去,我使用的方法类似于James推荐的方法,但SQL Server 2008支持新的FILESTREAM存储选项,它可以在NTFS上存储数据库外的varbinary(max)列,可能值得研究。

SQL Books Online has a lot of good information, starting with "FILESTREAM Overview".

SQL Books Online有很多很好的信息,从“FILESTREAM概述”开始。

#2


5  

I see that most of the answers so far are for outside the database.

我看到到目前为止大多数答案都是针对数据库之外的。

We have done this once, adding the file to the file system, and the name of the file in the a table in the database. The main problems with this was:

我们已经完成了一次,将文件添加到文件系统,以及数据库中的表中的文件名。这个问题的主要问题是:

  • the file system is not transactional, so it could get out of sync if something went wrong
  • 文件系统不是事务性的,因此如果出现问题,它可能会失去同步
  • you had to take backup seperatly, and restore would by definition be out of sync
  • 你必须单独进行备份,根据定义,恢复将不同步

For all new projects we have stored files in varbinary(max) fields. This has worked well for us, also under loads of 10's of thousands of users.

对于所有新项目,我们将文件存储在varbinary(max)字段中。这对我们来说效果很好,在成千上万的用户中也是如此。

#3


2  

I agree with storing the large file outside of the database

我同意将大文件存储在数据库之外

You can either store the path to the file

您可以存储文件的路径

In one project where I worked on I had another table which would keep track of all the user's uploaded data in a webapp ... whenever the user would upload the file I would create a new row in this table and use the fileID primary key as a foreign key in various other table

在我工作的一个项目中,我有另一个表,可以跟踪webapp中所有用户上传的数据......每当用户上传文件时,我会在此表中创建一个新行并使用fileID主键作为各种其他表中的外键

It greatly reduced many changes that came in later like when I had to change the root path of the upload directory etc

它大大减少了后来出现的许多变化,比如我必须更改上传目录的根路径等

#4


1  

You may want to store the large files as a file, and store the path in the database, unless you are somehow planning on doing a search on the xml files as part of your select.

您可能希望将大文件存储为文件,并将路径存储在数据库中,除非您计划在xml文件中搜索作为select的一部分。

I tend to prefer storing large files outside the database, as it really isn't designed, IMO, for storing these. If you are going to be searching then you could use DLINQ and XLINQ to facilitate the searching of the various xml files.

我更倾向于在数据库之外存储大文件,因为它实际上并非设计用于存储这些文件。如果您要搜索,那么您可以使用DLINQ和XLINQ来方便搜索各种xml文件。

#5


1  

Store meta data!

存储元数据!

Outside the database is the way we store large datasets as well, except I strongly recommend adding some meta-information to the file so that in case the files get out of sync with the DB, you would be able to semi-automatically resync it back. This way, you can first create or update the file, and later update the database, and not worry that database update will crash.

数据库外部也是我们存储大型数据集的方式,除了我强烈建议在文件中添加一些元信息,以便在文件与数据库不同步的情况下,您可以半自动重新同步它。这样,您可以先创建或更新文件,然后更新数据库,而不必担心数据库更新会崩溃。

Large number of files management Most file systems will be ok storing large number of files together, but they do start working a bit slow with time. Highly recommend doing subfolders based on some hash value. For example, if all filenames are integers, store 10000 files per dir, and calculate the dir name as (filename % 10000) * 10000 -- you will be able to find the file easier this way when debuging.

大量文件管理大多数文件系统都可以将大量文件存储在一起,但它们确实会随着时间的推移开始工作。强烈建议根据某些哈希值执行子文件夹。例如,如果所有文件名都是整数,则每个目录存储10000个文件,并将目录名称计算为(文件名%10000)* 10000 - 您将能够在debuging时更容易地找到该文件。

#1


3  

Another vote for outside the database.

对数据库外的另一次投票。

In the past, I've used an approach similar to what James recommends, but SQL Server 2008 supports a new FILESTREAM storage option, which can store varbinary(max) columns outside the database on NTFS, and might be worth looking into.

在过去,我使用的方法类似于James推荐的方法,但SQL Server 2008支持新的FILESTREAM存储选项,它可以在NTFS上存储数据库外的varbinary(max)列,可能值得研究。

SQL Books Online has a lot of good information, starting with "FILESTREAM Overview".

SQL Books Online有很多很好的信息,从“FILESTREAM概述”开始。

#2


5  

I see that most of the answers so far are for outside the database.

我看到到目前为止大多数答案都是针对数据库之外的。

We have done this once, adding the file to the file system, and the name of the file in the a table in the database. The main problems with this was:

我们已经完成了一次,将文件添加到文件系统,以及数据库中的表中的文件名。这个问题的主要问题是:

  • the file system is not transactional, so it could get out of sync if something went wrong
  • 文件系统不是事务性的,因此如果出现问题,它可能会失去同步
  • you had to take backup seperatly, and restore would by definition be out of sync
  • 你必须单独进行备份,根据定义,恢复将不同步

For all new projects we have stored files in varbinary(max) fields. This has worked well for us, also under loads of 10's of thousands of users.

对于所有新项目,我们将文件存储在varbinary(max)字段中。这对我们来说效果很好,在成千上万的用户中也是如此。

#3


2  

I agree with storing the large file outside of the database

我同意将大文件存储在数据库之外

You can either store the path to the file

您可以存储文件的路径

In one project where I worked on I had another table which would keep track of all the user's uploaded data in a webapp ... whenever the user would upload the file I would create a new row in this table and use the fileID primary key as a foreign key in various other table

在我工作的一个项目中,我有另一个表,可以跟踪webapp中所有用户上传的数据......每当用户上传文件时,我会在此表中创建一个新行并使用fileID主键作为各种其他表中的外键

It greatly reduced many changes that came in later like when I had to change the root path of the upload directory etc

它大大减少了后来出现的许多变化,比如我必须更改上传目录的根路径等

#4


1  

You may want to store the large files as a file, and store the path in the database, unless you are somehow planning on doing a search on the xml files as part of your select.

您可能希望将大文件存储为文件,并将路径存储在数据库中,除非您计划在xml文件中搜索作为select的一部分。

I tend to prefer storing large files outside the database, as it really isn't designed, IMO, for storing these. If you are going to be searching then you could use DLINQ and XLINQ to facilitate the searching of the various xml files.

我更倾向于在数据库之外存储大文件,因为它实际上并非设计用于存储这些文件。如果您要搜索,那么您可以使用DLINQ和XLINQ来方便搜索各种xml文件。

#5


1  

Store meta data!

存储元数据!

Outside the database is the way we store large datasets as well, except I strongly recommend adding some meta-information to the file so that in case the files get out of sync with the DB, you would be able to semi-automatically resync it back. This way, you can first create or update the file, and later update the database, and not worry that database update will crash.

数据库外部也是我们存储大型数据集的方式,除了我强烈建议在文件中添加一些元信息,以便在文件与数据库不同步的情况下,您可以半自动重新同步它。这样,您可以先创建或更新文件,然后更新数据库,而不必担心数据库更新会崩溃。

Large number of files management Most file systems will be ok storing large number of files together, but they do start working a bit slow with time. Highly recommend doing subfolders based on some hash value. For example, if all filenames are integers, store 10000 files per dir, and calculate the dir name as (filename % 10000) * 10000 -- you will be able to find the file easier this way when debuging.

大量文件管理大多数文件系统都可以将大量文件存储在一起,但它们确实会随着时间的推移开始工作。强烈建议根据某些哈希值执行子文件夹。例如,如果所有文件名都是整数,则每个目录存储10000个文件,并将目录名称计算为(文件名%10000)* 10000 - 您将能够在debuging时更容易地找到该文件。