Rails:在数据库中存储二进制文件

时间:2022-10-17 23:30:32

Using Rails, is there a reason why I should store attachments (could be a file of any time), in the filesystem instead of in the database? The database seems simpler to me, no need to worry about filesystem paths, structure, etc., you just look in your blob field. But most people seem to use the filesystem that it leaves me guessing that there must be some benefits to doing so that I'm not getting, or some disadvantages to using the database for such storage. (In this case, I'm using postgres).

使用Rails,我有理由在文件系统中而不是在数据库中存储附件(可能是任何时间的文件)吗?数据库对我来说似乎更简单,无需担心文件系统路径,结构等,只需查看blob字段即可。但是大多数人似乎都使用文件系统,它让我猜测这样做有一些好处,我没有得到,或者使用数据库进行这种存储有一些缺点。 (在这种情况下,我正在使用postgres)。

5 个解决方案

#1


28  

This is a pretty standard design question, and there isn't really a "one true answer".

这是一个非常标准的设计问题,并没有真正的“一个真正的答案”。

The rule of thumb I typically follow is "data goes in databases, files go in files".

我通常遵循的经验法则是“数据进入数据库,文件进入文件”。

Some of the considerations to keep in mind:

要记住的一些注意事项:

  1. If a file is stored in the database, how are you going to serve it out via http? Remember, you need to set the content type, filename, etc. If it's a file on the filesystem, the web server takes care of all that stuff for you. Very quickly and efficiently (perhaps even in kernel space), no interpreted code needed.

    如果文件存储在数据库中,您将如何通过http提供服务?请记住,您需要设置内容类型,文件名等。如果它是文件系统上的文件,Web服务器会为您处理所有这些内容。非常快速有效(甚至可能在内核空间中),不需要解释代码。

  2. Files are typically big. Big databases are certainly viable, but they are slow and inconvenient to back up etc. Why make your database huge when you don't have to?

    文件通常很大。大数据库当然是可行的,但是它们很慢并且不便于备份等。为什么在不需要时使数据库变得庞大?

  3. Much like 2., it's really easy to copy files to multiple machines. Say you're running a cluster, you can just periodically rsync the filesystem from your master machine to your slaves and use standard static http serving. Obviously databases can be clustered as well, it's just not necessarily as intuitive.

    很像2.,将文件复制到多台机器上真的很容易。假设您正在运行群集,您可以定期将文件系统从主计算机rsync同步到奴隶并使用标准的静态http服务。显然,数据库也可以聚类,它不一定是直观的。

  4. On the flip side of 3, if you're already clustering your database, then having to deal with clustered files in addition is administrative complexity. This would be a reason to consider storing files in the DB, I'd say.

    在3的反面,如果你已经在集群数据库,那么另外处理集群文件是管理复杂性。我想说,这是考虑在DB中存储文件的一个原因。

  5. Blob data in databases is typically opaque. You can't filter it, sort by it, or group by it. That lessens the value of storing it in the database.

    数据库中的Blob数据通常是不透明的。您无法对其进行过滤,按其排序或按其分组。这减少了将其存储在数据库中的价值。

  6. On the flip side, databases understand concurrency. You can use your standard model of transaction isolation to ensure that two clients don't try to edit the same file at the same time. This might be nice. Not to say you couldn't use lockfiles, but now you've got two things to understand instead of one.

    另一方面,数据库理解并发性。您可以使用标准的事务隔离模型来确保两个客户端不会同时尝试编辑同一文件。这可能很好。并不是说你不能使用锁定文件,但现在你有两件事需要理解,而不是一件。

  7. Accessibility. Files in a filesystem can be opened with regular tools. Vi, Photoshop, Word, whatever you need. This can be convenient. How are you gonna open that word document out of a blob field?

    可访问性。可以使用常规工具打开文件系统中的文件。 Vi,Photoshop,Word,无论你需要什么。这很方便。你怎么打算从blob领域打开那个word文档?

  8. Permissions. Filesystems have permissions, and they can be a pain in the rear. Conversely, they might be useful to your application. Permissions will really bite you if you're taking advantage of 7, because it's almost guaranteed that your web server runs with different permissions than your applications.

    权限。文件系统具有权限,它们可能是后方的痛苦。相反,它们可能对您的应用程序有用。如果您正在利用7,权限将真正咬你,因为几乎可以保证您的Web服务器以不同于您的应用程序的权限运行。

  9. Cacheing (from sarah mei below). This plays into the http question above on the client side (are you going to remember to set lifetimes correctly?). On the server side files on a filesystem are a very well-understood and optimized access pattern. Large blob fields may or may not be optimized well by your database, and you're almost guaranteed to have an additional network trip from the database to the web server as well.

    缓存(来自下面的莎拉梅)。这会在客户端播放上面的http问题(你会记得正确设置生命周期吗?)。在服务器端,文件系统上的文件是一种非常容易理解和优化的访问模式。您的数据库可能会或可能不会很好地优化大型blob字段,并且您几乎可以保证从数据库到Web服务器的额外网络旅行。

In short, people tend to use filesystems for files because they support file-like idioms the best. There's no reason you have to do it though, and filesystems are becoming more and more like databases so it wouldn't surprise me at all to see a complete convergence eventually.

简而言之,人们倾向于使用文件系统来处理文件,因为它们最好地支持类似文件的习语。没有理由你必须这样做,文件系统变得越来越像数据库,所以最终看到完全融合并不会让我感到惊讶。

#2


6  

There's some good advice about using the filesystem for files, but here's something else to think about. If you are storing sensitive or secure files/attachments, using the DB really is the only way to go. I have built apps where the data can't be put out on a file. It has to be put into the DB for security reasons. You can't leave it in a file system for a user on the server/machine to look at or take with them without proper securty. Using a high-class DB like Oracle, you can lock that data down very tightly and ensure that only appropriate users have access to that data.

关于将文件系统用于文件有一些很好的建议,但这里还有别的想法。如果您要存储敏感或安全的文件/附件,使用DB确实是唯一的方法。我已经构建了无法将数据放在文件中的应用程序。出于安全原因,必须将其放入DB中。您无法将其留在文件系统中,以便服务器/计算机上的用户在没有适当安全的情况下查看或使用它们。使用像Oracle这样的高级数据库,您可以非常紧密地锁定数据,并确保只有适当的用户才能访问该数据。

But the other points made are very valid. If you're simply doing things like avatar images or non-sensitive info, the filesystem is generally faster and more convenient for most plugin systems.

但其他要点非常有效。如果您只是执行诸如头像图像或非敏感信息之类的操作,那么对于大多数插件系统来说,文件系统通常更快,更方便。

The DB is pretty easy to setup for sending files back; it's a little bit more work, but just a few minutes if you know what you're doing. So yes, the filesystem is the better way to go overall, IMO, but the DB is the only viable choice when security or sensitive data is a major concern.

数据库很容易设置为发回文件;这是一个更多的工作,但如果你知道你在做什么只需几分钟。所以,是的,文件系统是更好的整体方式,IMO,但当安全或敏感数据是一个主要问题时,数据库是唯一可行的选择。

#3


2  

Erik's answer is great. I will also add that if you want to do any caching, it's much easier and more straightforward to cache static files than to cache database contents.

Erik的答案很棒。我还要补充一点,如果你想进行任何缓存,缓存静态文件比缓存数据库内容要容易得多,也更简单。

#4


2  

I don't see what the problem with blobstores is. You can always reconstruct a file system store from it, e.g. by caching the stuff to the local web server while the system is being used. But the authoritative store should always be the database. Which means you can deploy your application by tossing in the database and exporting the code from source control. Done. And adding a web server is no issue at all.

我不知道blobstores有什么问题。您始终可以从中重建文件系统存储,例如通过在使用系统时将内容缓存到本地Web服务器。但权威商店应该始终是数据库。这意味着您可以通过折腾数据库并从源代码管理中导出代码来部署应用程序。完成。添加Web服务器完全没有问题。

#5


0  

If you use a plugin such as Paperclip, you don't have to worry about anything either. There's this thing called the filesystem, which is where files should go. Just because it is a bit harder doesn't mean you should put your files in the wrong place. And with paperclip (or other similar plugins) it isn't hard. So, gogo filesystem!

如果你使用像Paperclip这样的插件,你也不必担心任何事情。有这个东西叫文件系统,这是文件应该去的地方。仅仅因为它有点难,并不意味着你应该把你的文件放在错误的地方。使用回形针(或其他类似的插件)并不难。那么,gogo文件系统!

#1


28  

This is a pretty standard design question, and there isn't really a "one true answer".

这是一个非常标准的设计问题,并没有真正的“一个真正的答案”。

The rule of thumb I typically follow is "data goes in databases, files go in files".

我通常遵循的经验法则是“数据进入数据库,文件进入文件”。

Some of the considerations to keep in mind:

要记住的一些注意事项:

  1. If a file is stored in the database, how are you going to serve it out via http? Remember, you need to set the content type, filename, etc. If it's a file on the filesystem, the web server takes care of all that stuff for you. Very quickly and efficiently (perhaps even in kernel space), no interpreted code needed.

    如果文件存储在数据库中,您将如何通过http提供服务?请记住,您需要设置内容类型,文件名等。如果它是文件系统上的文件,Web服务器会为您处理所有这些内容。非常快速有效(甚至可能在内核空间中),不需要解释代码。

  2. Files are typically big. Big databases are certainly viable, but they are slow and inconvenient to back up etc. Why make your database huge when you don't have to?

    文件通常很大。大数据库当然是可行的,但是它们很慢并且不便于备份等。为什么在不需要时使数据库变得庞大?

  3. Much like 2., it's really easy to copy files to multiple machines. Say you're running a cluster, you can just periodically rsync the filesystem from your master machine to your slaves and use standard static http serving. Obviously databases can be clustered as well, it's just not necessarily as intuitive.

    很像2.,将文件复制到多台机器上真的很容易。假设您正在运行群集,您可以定期将文件系统从主计算机rsync同步到奴隶并使用标准的静态http服务。显然,数据库也可以聚类,它不一定是直观的。

  4. On the flip side of 3, if you're already clustering your database, then having to deal with clustered files in addition is administrative complexity. This would be a reason to consider storing files in the DB, I'd say.

    在3的反面,如果你已经在集群数据库,那么另外处理集群文件是管理复杂性。我想说,这是考虑在DB中存储文件的一个原因。

  5. Blob data in databases is typically opaque. You can't filter it, sort by it, or group by it. That lessens the value of storing it in the database.

    数据库中的Blob数据通常是不透明的。您无法对其进行过滤,按其排序或按其分组。这减少了将其存储在数据库中的价值。

  6. On the flip side, databases understand concurrency. You can use your standard model of transaction isolation to ensure that two clients don't try to edit the same file at the same time. This might be nice. Not to say you couldn't use lockfiles, but now you've got two things to understand instead of one.

    另一方面,数据库理解并发性。您可以使用标准的事务隔离模型来确保两个客户端不会同时尝试编辑同一文件。这可能很好。并不是说你不能使用锁定文件,但现在你有两件事需要理解,而不是一件。

  7. Accessibility. Files in a filesystem can be opened with regular tools. Vi, Photoshop, Word, whatever you need. This can be convenient. How are you gonna open that word document out of a blob field?

    可访问性。可以使用常规工具打开文件系统中的文件。 Vi,Photoshop,Word,无论你需要什么。这很方便。你怎么打算从blob领域打开那个word文档?

  8. Permissions. Filesystems have permissions, and they can be a pain in the rear. Conversely, they might be useful to your application. Permissions will really bite you if you're taking advantage of 7, because it's almost guaranteed that your web server runs with different permissions than your applications.

    权限。文件系统具有权限,它们可能是后方的痛苦。相反,它们可能对您的应用程序有用。如果您正在利用7,权限将真正咬你,因为几乎可以保证您的Web服务器以不同于您的应用程序的权限运行。

  9. Cacheing (from sarah mei below). This plays into the http question above on the client side (are you going to remember to set lifetimes correctly?). On the server side files on a filesystem are a very well-understood and optimized access pattern. Large blob fields may or may not be optimized well by your database, and you're almost guaranteed to have an additional network trip from the database to the web server as well.

    缓存(来自下面的莎拉梅)。这会在客户端播放上面的http问题(你会记得正确设置生命周期吗?)。在服务器端,文件系统上的文件是一种非常容易理解和优化的访问模式。您的数据库可能会或可能不会很好地优化大型blob字段,并且您几乎可以保证从数据库到Web服务器的额外网络旅行。

In short, people tend to use filesystems for files because they support file-like idioms the best. There's no reason you have to do it though, and filesystems are becoming more and more like databases so it wouldn't surprise me at all to see a complete convergence eventually.

简而言之,人们倾向于使用文件系统来处理文件,因为它们最好地支持类似文件的习语。没有理由你必须这样做,文件系统变得越来越像数据库,所以最终看到完全融合并不会让我感到惊讶。

#2


6  

There's some good advice about using the filesystem for files, but here's something else to think about. If you are storing sensitive or secure files/attachments, using the DB really is the only way to go. I have built apps where the data can't be put out on a file. It has to be put into the DB for security reasons. You can't leave it in a file system for a user on the server/machine to look at or take with them without proper securty. Using a high-class DB like Oracle, you can lock that data down very tightly and ensure that only appropriate users have access to that data.

关于将文件系统用于文件有一些很好的建议,但这里还有别的想法。如果您要存储敏感或安全的文件/附件,使用DB确实是唯一的方法。我已经构建了无法将数据放在文件中的应用程序。出于安全原因,必须将其放入DB中。您无法将其留在文件系统中,以便服务器/计算机上的用户在没有适当安全的情况下查看或使用它们。使用像Oracle这样的高级数据库,您可以非常紧密地锁定数据,并确保只有适当的用户才能访问该数据。

But the other points made are very valid. If you're simply doing things like avatar images or non-sensitive info, the filesystem is generally faster and more convenient for most plugin systems.

但其他要点非常有效。如果您只是执行诸如头像图像或非敏感信息之类的操作,那么对于大多数插件系统来说,文件系统通常更快,更方便。

The DB is pretty easy to setup for sending files back; it's a little bit more work, but just a few minutes if you know what you're doing. So yes, the filesystem is the better way to go overall, IMO, but the DB is the only viable choice when security or sensitive data is a major concern.

数据库很容易设置为发回文件;这是一个更多的工作,但如果你知道你在做什么只需几分钟。所以,是的,文件系统是更好的整体方式,IMO,但当安全或敏感数据是一个主要问题时,数据库是唯一可行的选择。

#3


2  

Erik's answer is great. I will also add that if you want to do any caching, it's much easier and more straightforward to cache static files than to cache database contents.

Erik的答案很棒。我还要补充一点,如果你想进行任何缓存,缓存静态文件比缓存数据库内容要容易得多,也更简单。

#4


2  

I don't see what the problem with blobstores is. You can always reconstruct a file system store from it, e.g. by caching the stuff to the local web server while the system is being used. But the authoritative store should always be the database. Which means you can deploy your application by tossing in the database and exporting the code from source control. Done. And adding a web server is no issue at all.

我不知道blobstores有什么问题。您始终可以从中重建文件系统存储,例如通过在使用系统时将内容缓存到本地Web服务器。但权威商店应该始终是数据库。这意味着您可以通过折腾数据库并从源代码管理中导出代码来部署应用程序。完成。添加Web服务器完全没有问题。

#5


0  

If you use a plugin such as Paperclip, you don't have to worry about anything either. There's this thing called the filesystem, which is where files should go. Just because it is a bit harder doesn't mean you should put your files in the wrong place. And with paperclip (or other similar plugins) it isn't hard. So, gogo filesystem!

如果你使用像Paperclip这样的插件,你也不必担心任何事情。有这个东西叫文件系统,这是文件应该去的地方。仅仅因为它有点难,并不意味着你应该把你的文件放在错误的地方。使用回形针(或其他类似的插件)并不难。那么,gogo文件系统!