Possible Duplicate:
Storing Images in DB - Yea or Nay?可能重复:在DB中存储图像 - 是或否?
For ages I've been told not to store images on the database, or any big BLOB for that matter. While I can understand why the databases aren't/weren't efficient for that I never understood why they couldn't. If I can put a file somewhere and reference it, why couldn't the database engine do the same. I'm glad Damien Katz mentioned it on a recent Stack Overflow podcast and Joel Spolsky and Jeff Atwood, at least silently, agreed.
多年来,我被告知不要将数据存储在数据库中,也不要将任何大型BLOB存储在数据库中。虽然我可以理解为什么数据库没有/没有效率,我从来不明白为什么他们不能。如果我可以在某处放置文件并引用它,为什么数据库引擎也不能这样做。我很高兴Damien Katz在最近的Stack Overflow播客中提到它,Joel Spolsky和Jeff Atwood,至少默默地同意了。
I've been reading hints that Microsoft SQL Server 2008 should be able to handle BLOBs efficient, is that true? If so, what is there stopping us from just storing images there and getting rid of one problem? One thing I can think of is that while the image can be served by a static web server very quickly if it's a file somewhere, when it's in the database it has to travel from the database to the web server application (which might be slower than the static web server) and then it's served. Shouldn't caching help/solve that last issue?
我一直在阅读Microsoft SQL Server 2008应该能够有效处理BLOB的提示,这是真的吗?如果是这样,有什么阻止我们在那里存储图像并摆脱一个问题?我能想到的一件事是,虽然图像可以很快地由静态Web服务器提供,如果它是某个文件,当它在数据库中时,它必须从数据库传输到Web服务器应用程序(可能比静态Web服务器)然后它被服务。缓存不应该帮助/解决最后一个问题吗?
5 个解决方案
#1
11
Yes, it's true, SQL Server 2008 just implemented a feature like the one you mention, it's called a filestream. And it's a good argument indeed for storing blobs in a DB, if you are certain you will only want to use SQL Server for your app (or are willing to pay the price in either performance or in developing a similar layer on top of the new DB server). Although I expect similar layers will start to appear if they don't already exist for different DB servers.
是的,确实如此,SQL Server 2008只是实现了一个像你提到的功能,它被称为文件流。如果你确定你只想为你的应用程序使用SQL Server(或者愿意为性能付出代价或者在新的应用程序之上开发一个类似的层),这对于在数据库中存储blob确实是一个很好的论据。数据库服务器)。虽然我预计如果不同的数据库服务器尚不存在类似的层,它们将开始出现。
As always what would the real benefits be depend on the particular scenario. If you will serve lots of relatively static, big files, then this scenario plus caching will probably be the best option considering a performance/manageability combo.
一如既往,真正的好处取决于具体情况。如果您将提供大量相对静态的大文件,那么考虑到性能/可管理性组合,此方案加上缓存可能是最佳选择。
This white paper describes the FILESTREAM feature of SQL Server 2008, which allows storage of and efficient access to BLOB data using a combination of SQL Server 2008 and the NTFS file system. It covers choices for BLOB storage, configuring Windows and SQL Server for using FILESTREAM data, considerations for combining FILESTREAM with other features, and implementation details such as partitioning and performance.
本白皮书介绍了SQL Server 2008的FILESTREAM功能,该功能允许使用SQL Server 2008和NTFS文件系统的组合来存储和有效访问BLOB数据。它涵盖了BLOB存储的选择,配置Windows和SQL Server以使用FILESTREAM数据,将FILESTREAM与其他功能相结合的注意事项,以及分区和性能等实现细节。
#2
4
Just because you can do something doesn't mean you should.
仅仅因为你可以做某事并不意味着你应该做。
If you care about efficiency you'll still most likely not want to do this for any sufficiently large scale file serving.
如果您关心效率,您仍然很可能不希望为任何足够大规模的文件服务执行此操作。
Also it looks like this topic has been heavily discussed...
看起来这个话题已经被大量讨论了......
- Exact Duplicate: User Images: Database or filesystem storage?
- 完全重复:用户映像:数据库还是文件系统存储?
- Exact Duplicate: Storing images in database: Yea or nay?
- 完全重复:在数据库中存储图像:是或不是?
- Exact Duplicate: Should I store my images in the database or folders?
- 确切重复:我应该将我的图像存储在数据库还是文件夹中?
- Exact Duplicate: Would you store binary data in database or folders?
- 确切重复:您是否将二进制数据存储在数据库或文件夹中?
- Exact Duplicate: Store pictures as files or or the database for a web app?
- 完全重复:将图片存储为Web应用程序的文件或数据库?
- Exact Duplicate: Storing a small number of images: blob or fs?
- 确切重复:存储少量图像:blob还是fs?
- Exact Duplicate: store image in filesystem or database?
- 精确复制:将图像存储在文件系统或数据库中?
#3
2
I'll try to decompose your question and address your various parts as best I can.
我会尝试分解你的问题并尽可能地解决你的各个部分。
-
SQL Server 2008 and the Filestream Type - Vinko's answer above is the best one I've seen so far. The Filestream type is the SQL Server 2008 is what you were looking for. Filestream is in version 1 so there are still some reasons why I wouldn't recommend using if for an enterprise application. As an example, my recollection is that you can't split the storage of the underlying physical files across multiple Windows UNC paths. Sooner or later that will become a pretty serious constraint for an enterprise app.
SQL Server 2008和文件流类型 - 上面的Vinko答案是迄今为止我见过的最好的答案。 Filestream类型是您正在寻找的SQL Server 2008。 Filestream在版本1中,因此仍然有一些原因我不建议将if用于企业应用程序。例如,我的回忆是您无法跨多个Windows UNC路径拆分底层物理文件的存储。迟早会成为企业应用程序的一个非常严重的约束。
-
Storing Files in the Database - In the grander scheme of things, Damien Katz's original direction was correct. Most of the big enterprise content management (ECM) players store files on the filesystem and metadata in the RDBMS. If you go even bigger and look at Amazon's S3 service, you're looking at physical files with a non-relational database backend. Unless you're measuring your files under storage in the billions, I wouldn't recommend going this route and rolling your own.
在数据库中存储文件 - 在宏大的计划中,Damien Katz的原始方向是正确的。大多数大型企业内容管理(ECM)播放器将文件存储在文件系统中,并将元数据存储在RDBMS中。如果你走得更大,看看亚马逊的S3服务,你就会看到带有非关系型数据库后端的物理文件。除非你在数十亿的存储中测量你的文件,否则我不建议你走这条路并自己动手。
-
A Bit More Detail on Files in the Database - At first glance, a lot of things speak for files in the database. One is simplicity, two is transactional integrity. Since the Windows file system cannot be enlisted in a transaction, writes that need to occur across the database and filesystem need to have transaction compensation logic built in. I didn't really see the other side of the story until I talked to DBAs. They generally don't like commingling business data and blobs (backup becomes painful) so unless you have a separate database dedicated to file storage, this option is generally not as appealing to DBAs. You're right that the database will be faster, all other things being equal. Not knowing the use case for your application, I can't say much about the caching option. Suffice it to say that in many enterprise applications, the cache hit rate on documents is just too darn low to justify caching them.
关于数据库中文件的更多细节 - 乍一看,很多东西都代表数据库中的文件。一个是简单,两个是交易完整性。由于Windows文件系统无法在事务中登记,因此需要在数据库和文件系统中进行的写入需要内置事务补偿逻辑。在与DBA交谈之前,我并没有真正看到故事的另一面。他们通常不喜欢混合业务数据和blob(备份变得很痛苦)所以除非你有一个专门用于文件存储的单独数据库,否则这个选项通常不会对DBA有吸引力。你是对的,数据库会更快,所有其他条件都相同。不知道你的应用程序的用例,我不能说缓存选项。可以这么说,在许多企业应用程序中,文档的缓存命中率太低,无法证明缓存它们。
Hope this helps.
希望这可以帮助。
#4
1
One of the classical reasons for caution about storing blobs in databases is that the data will be stored and edited (changed) under transaction control, which means that the DBMS needs to ensure that it can rollback changes, and recover changes after a crash. This is normally done by some variation on the theme of a transaction log. If the DBMS is to record the change in a 2 GB blob, then it has to have a way of identifying what has changed. This might be simple-minded (the before image and the after image) or more sophisticated (some sort of binary delta operation) that is more computationally expensive. Even so, sometimes the net result will be gigabytes of data to be stored through the logs. This hurts the system performance. There are various ways of limiting the impact of the changes - reducing the amount of data flowing through the logs - but there are trade-offs.
关于在数据库中存储blob的一个常见原因是数据将在事务控制下存储和编辑(更改),这意味着DBMS需要确保它可以回滚更改,并在崩溃后恢复更改。这通常通过事务日志主题的一些变化来完成。如果DBMS要在2 GB blob中记录更改,那么它必须有一种方法来识别已更改的内容。这可能是简单的(前映像和后映像)或更复杂(某种二进制增量操作),这在计算上更昂贵。即便如此,有时最终结果将是通过日志存储的千兆字节数据。这会损害系统性能。有多种方法可以限制变更的影响 - 减少流经日志的数据量 - 但需要权衡利弊。
The penalty for storing filenames in the database is that the DBMS has no control (in general) over when the files change - and hence again, the reproducibility of the data is compromised; you cannot guarantee that something outside the DBMS has not changed the data. (There's a very general version of that argument - you can't be sure that someone hasn't tampered with the database storage files in general. But I'm referring to storing a file name in the database referencing a file not controlled by the DBMS. Files controlled by the DBMS are protected against casual change by the unprivileged.)
在数据库中存储文件名的代价是DBMS无法控制(通常)文件何时更改 - 因此,数据的可重复性也会受到损害;你不能保证DBMS之外的东西没有改变数据。 (这个论点有一个非常通用的版本 - 你不能确定有人一般没有篡改数据库存储文件。但我指的是在数据库中存储一个文件名,引用一个不受数据库控制的文件。 DBMS。由DBMS控制的文件受到无特权的保护,以防止随意更改。)
The new SQL Server functionality sounds interesting. I've not explored what it does, so I can't comment on the extent to which it avoids or limits the problems alluded to above.
新的SQL Server功能听起来很有趣。我没有探究它的作用,所以我无法评论它在多大程度上避免或限制了上面提到的问题。
#5
0
There are options within SQL Server to manage where it stores large blobs of data, these have been in there since at lease SQL2005 so I don't know why you couldn't store large BLOBs of data. MOSS for instance stores all of the documents you upload to it in a SQL database.
SQL Server中有一些选项可以管理存储大量数据的位置,这些选项自SQL2005开始就存在,所以我不知道为什么你不能存储大量的BLOB数据。例如,MOSS将您上传到的所有文档存储在SQL数据库中。
There are of course some performance implications, as with just about anything, so you should take care that you don't retreive the blob if you don't need it, and don't include it in indexes etc.
当然有一些性能影响,就像几乎任何东西一样,所以你应该注意,如果你不需要它就不要检索blob,并且不要将它包含在索引等中。
#1
11
Yes, it's true, SQL Server 2008 just implemented a feature like the one you mention, it's called a filestream. And it's a good argument indeed for storing blobs in a DB, if you are certain you will only want to use SQL Server for your app (or are willing to pay the price in either performance or in developing a similar layer on top of the new DB server). Although I expect similar layers will start to appear if they don't already exist for different DB servers.
是的,确实如此,SQL Server 2008只是实现了一个像你提到的功能,它被称为文件流。如果你确定你只想为你的应用程序使用SQL Server(或者愿意为性能付出代价或者在新的应用程序之上开发一个类似的层),这对于在数据库中存储blob确实是一个很好的论据。数据库服务器)。虽然我预计如果不同的数据库服务器尚不存在类似的层,它们将开始出现。
As always what would the real benefits be depend on the particular scenario. If you will serve lots of relatively static, big files, then this scenario plus caching will probably be the best option considering a performance/manageability combo.
一如既往,真正的好处取决于具体情况。如果您将提供大量相对静态的大文件,那么考虑到性能/可管理性组合,此方案加上缓存可能是最佳选择。
This white paper describes the FILESTREAM feature of SQL Server 2008, which allows storage of and efficient access to BLOB data using a combination of SQL Server 2008 and the NTFS file system. It covers choices for BLOB storage, configuring Windows and SQL Server for using FILESTREAM data, considerations for combining FILESTREAM with other features, and implementation details such as partitioning and performance.
本白皮书介绍了SQL Server 2008的FILESTREAM功能,该功能允许使用SQL Server 2008和NTFS文件系统的组合来存储和有效访问BLOB数据。它涵盖了BLOB存储的选择,配置Windows和SQL Server以使用FILESTREAM数据,将FILESTREAM与其他功能相结合的注意事项,以及分区和性能等实现细节。
#2
4
Just because you can do something doesn't mean you should.
仅仅因为你可以做某事并不意味着你应该做。
If you care about efficiency you'll still most likely not want to do this for any sufficiently large scale file serving.
如果您关心效率,您仍然很可能不希望为任何足够大规模的文件服务执行此操作。
Also it looks like this topic has been heavily discussed...
看起来这个话题已经被大量讨论了......
- Exact Duplicate: User Images: Database or filesystem storage?
- 完全重复:用户映像:数据库还是文件系统存储?
- Exact Duplicate: Storing images in database: Yea or nay?
- 完全重复:在数据库中存储图像:是或不是?
- Exact Duplicate: Should I store my images in the database or folders?
- 确切重复:我应该将我的图像存储在数据库还是文件夹中?
- Exact Duplicate: Would you store binary data in database or folders?
- 确切重复:您是否将二进制数据存储在数据库或文件夹中?
- Exact Duplicate: Store pictures as files or or the database for a web app?
- 完全重复:将图片存储为Web应用程序的文件或数据库?
- Exact Duplicate: Storing a small number of images: blob or fs?
- 确切重复:存储少量图像:blob还是fs?
- Exact Duplicate: store image in filesystem or database?
- 精确复制:将图像存储在文件系统或数据库中?
#3
2
I'll try to decompose your question and address your various parts as best I can.
我会尝试分解你的问题并尽可能地解决你的各个部分。
-
SQL Server 2008 and the Filestream Type - Vinko's answer above is the best one I've seen so far. The Filestream type is the SQL Server 2008 is what you were looking for. Filestream is in version 1 so there are still some reasons why I wouldn't recommend using if for an enterprise application. As an example, my recollection is that you can't split the storage of the underlying physical files across multiple Windows UNC paths. Sooner or later that will become a pretty serious constraint for an enterprise app.
SQL Server 2008和文件流类型 - 上面的Vinko答案是迄今为止我见过的最好的答案。 Filestream类型是您正在寻找的SQL Server 2008。 Filestream在版本1中,因此仍然有一些原因我不建议将if用于企业应用程序。例如,我的回忆是您无法跨多个Windows UNC路径拆分底层物理文件的存储。迟早会成为企业应用程序的一个非常严重的约束。
-
Storing Files in the Database - In the grander scheme of things, Damien Katz's original direction was correct. Most of the big enterprise content management (ECM) players store files on the filesystem and metadata in the RDBMS. If you go even bigger and look at Amazon's S3 service, you're looking at physical files with a non-relational database backend. Unless you're measuring your files under storage in the billions, I wouldn't recommend going this route and rolling your own.
在数据库中存储文件 - 在宏大的计划中,Damien Katz的原始方向是正确的。大多数大型企业内容管理(ECM)播放器将文件存储在文件系统中,并将元数据存储在RDBMS中。如果你走得更大,看看亚马逊的S3服务,你就会看到带有非关系型数据库后端的物理文件。除非你在数十亿的存储中测量你的文件,否则我不建议你走这条路并自己动手。
-
A Bit More Detail on Files in the Database - At first glance, a lot of things speak for files in the database. One is simplicity, two is transactional integrity. Since the Windows file system cannot be enlisted in a transaction, writes that need to occur across the database and filesystem need to have transaction compensation logic built in. I didn't really see the other side of the story until I talked to DBAs. They generally don't like commingling business data and blobs (backup becomes painful) so unless you have a separate database dedicated to file storage, this option is generally not as appealing to DBAs. You're right that the database will be faster, all other things being equal. Not knowing the use case for your application, I can't say much about the caching option. Suffice it to say that in many enterprise applications, the cache hit rate on documents is just too darn low to justify caching them.
关于数据库中文件的更多细节 - 乍一看,很多东西都代表数据库中的文件。一个是简单,两个是交易完整性。由于Windows文件系统无法在事务中登记,因此需要在数据库和文件系统中进行的写入需要内置事务补偿逻辑。在与DBA交谈之前,我并没有真正看到故事的另一面。他们通常不喜欢混合业务数据和blob(备份变得很痛苦)所以除非你有一个专门用于文件存储的单独数据库,否则这个选项通常不会对DBA有吸引力。你是对的,数据库会更快,所有其他条件都相同。不知道你的应用程序的用例,我不能说缓存选项。可以这么说,在许多企业应用程序中,文档的缓存命中率太低,无法证明缓存它们。
Hope this helps.
希望这可以帮助。
#4
1
One of the classical reasons for caution about storing blobs in databases is that the data will be stored and edited (changed) under transaction control, which means that the DBMS needs to ensure that it can rollback changes, and recover changes after a crash. This is normally done by some variation on the theme of a transaction log. If the DBMS is to record the change in a 2 GB blob, then it has to have a way of identifying what has changed. This might be simple-minded (the before image and the after image) or more sophisticated (some sort of binary delta operation) that is more computationally expensive. Even so, sometimes the net result will be gigabytes of data to be stored through the logs. This hurts the system performance. There are various ways of limiting the impact of the changes - reducing the amount of data flowing through the logs - but there are trade-offs.
关于在数据库中存储blob的一个常见原因是数据将在事务控制下存储和编辑(更改),这意味着DBMS需要确保它可以回滚更改,并在崩溃后恢复更改。这通常通过事务日志主题的一些变化来完成。如果DBMS要在2 GB blob中记录更改,那么它必须有一种方法来识别已更改的内容。这可能是简单的(前映像和后映像)或更复杂(某种二进制增量操作),这在计算上更昂贵。即便如此,有时最终结果将是通过日志存储的千兆字节数据。这会损害系统性能。有多种方法可以限制变更的影响 - 减少流经日志的数据量 - 但需要权衡利弊。
The penalty for storing filenames in the database is that the DBMS has no control (in general) over when the files change - and hence again, the reproducibility of the data is compromised; you cannot guarantee that something outside the DBMS has not changed the data. (There's a very general version of that argument - you can't be sure that someone hasn't tampered with the database storage files in general. But I'm referring to storing a file name in the database referencing a file not controlled by the DBMS. Files controlled by the DBMS are protected against casual change by the unprivileged.)
在数据库中存储文件名的代价是DBMS无法控制(通常)文件何时更改 - 因此,数据的可重复性也会受到损害;你不能保证DBMS之外的东西没有改变数据。 (这个论点有一个非常通用的版本 - 你不能确定有人一般没有篡改数据库存储文件。但我指的是在数据库中存储一个文件名,引用一个不受数据库控制的文件。 DBMS。由DBMS控制的文件受到无特权的保护,以防止随意更改。)
The new SQL Server functionality sounds interesting. I've not explored what it does, so I can't comment on the extent to which it avoids or limits the problems alluded to above.
新的SQL Server功能听起来很有趣。我没有探究它的作用,所以我无法评论它在多大程度上避免或限制了上面提到的问题。
#5
0
There are options within SQL Server to manage where it stores large blobs of data, these have been in there since at lease SQL2005 so I don't know why you couldn't store large BLOBs of data. MOSS for instance stores all of the documents you upload to it in a SQL database.
SQL Server中有一些选项可以管理存储大量数据的位置,这些选项自SQL2005开始就存在,所以我不知道为什么你不能存储大量的BLOB数据。例如,MOSS将您上传到的所有文档存储在SQL数据库中。
There are of course some performance implications, as with just about anything, so you should take care that you don't retreive the blob if you don't need it, and don't include it in indexes etc.
当然有一些性能影响,就像几乎任何东西一样,所以你应该注意,如果你不需要它就不要检索blob,并且不要将它包含在索引等中。