I'm working on a website which allows users to upload files (pictures and otherwise). I don't have any prior experience in this area and was hoping to get some input on the right way to store and index these files.
我正在开发一个网站,允许用户上传文件(图片等)。我在这方面没有任何经验,我希望得到一些正确的方法来存储和索引这些文件。
While I would like to have an architecture that scales well to high volume data, I am not currently worrying about extremely high (facebook-, google-scale) volumes.
虽然我希望有一个能够很好地扩展到大容量数据的架构,但我目前并不担心非常高的容量(facebook, google-scale)。
I was thinking of storing the files on the filesystem at
我在考虑将文件存储在文件系统上。
/files/{username}/
And then having a Database uploads
where each user has his own table with the filenames (and thus URLs) of each file he has uploaded (and any other extra information I might want to store). The database end of this (giving each user his own table) seems very inefficient to me yet maintaining records of all files in a single table doesn't seem right as well as it would require searching through the entire table each time a single file is accessed.
然后有一个数据库上传,每个用户都有自己的表,其中包含他上传的每个文件的文件名(以及url)(以及我想存储的任何其他额外信息)。数据库端(为每个用户提供自己的表)在我看来非常低效,但是维护单个表中所有文件的记录似乎不太合适,因为每次访问单个文件时都需要搜索整个表。
My reasoning behind considering giving each user his own table was that it is a neat and distinct way to shard the data across tables and reduce search times when looking for a file given the user.
我之所以考虑为每个用户提供自己的表,是因为在查找给定用户的文件时,这是一种巧妙而独特的跨表数据切分方法,并减少了搜索时间。
2 个解决方案
#1
3
What Matt H suggested is a good idea if what you are trying to achieve is per user level image access. But granted that you are limited in your database stored space, storing the images at binary data is inefficient as you stated.
如果您试图实现的是每个用户级别的图像访问,那么Matt H建议的是一个好主意。但是,如果您在数据库存储空间中受到限制,那么在二进制数据中存储图像的效率就像您说的那样低。
Using a table per user is bad design. The user who uploaded the file should simply be a field/column in the table that stores all file uploads, along with any file metadata. I suggest generating a GUID for the file name, which is guaranteed to be unique, and better than an autoincrement field which is easy to guess if you are attempting to prevent users from simply accessing all the images.
每个用户使用一个表是糟糕的设计。上传文件的用户应该是表中存储所有文件上传的字段/列,以及任何文件元数据。我建议为文件名生成GUID,它保证是惟一的,并且比自动递增字段更好,如果您试图阻止用户访问所有的图像,那么可以很容易地猜到这个字段。
You are concerned about performance, but until you are dealing with millions upon millions of records, your queries for selecting images belong to a user, uploaded within a specific time frame (say you are storing a timestamp or similar) are minuscule in cost. If speed is an issue, you can add a B-tree index on the username, which would speed up your user specific image queries significantly.
您关注的是性能,但在处理数以百万计的记录之前,您选择图像的查询属于用户,并在特定的时间范围内上传(比如您正在存储时间戳或类似的内容),其成本非常低。如果速度有问题,可以在用户名上添加B-tree索引,这将显著加快用户特定的图像查询。
Back on the topic of security, access and organization. Store the images with a folder per user (although depending on the number of users, the number of folders may grow to an unmanageable level). If you don't want the images to be publicly available, store them in a non-web folder, have your application read the data and stream it to render the image for the user. More complex but you hide the actual file from the internet. In addition, you would be able to validate all requests for an image by an authenticated user.
回到安全、访问和组织的话题。为每个用户存储一个文件夹的图像(尽管根据用户的数量,文件夹的数量可能会增长到难以管理的级别)。如果您不想让图像公开,请将它们存储在非web文件夹中,让应用程序读取数据并流它以呈现给用户的图像。更复杂,但你隐藏了实际的文件从互联网。此外,您还可以验证经过身份验证的用户对映像的所有请求。
#2
3
It depends on the nature and structure of your app and database. I've used many techniques, including folder-based, pictures stored in a database blob, off-web file folders accessed through an authentication gateway...
这取决于你的应用程序和数据库的性质和结构。我使用了许多技术,包括基于文件夹的、存储在数据库blob中的图片、通过身份验证网关访问的非web文件文件夹……
For external images that aren't directly related to the app or database, like temp photos or something, I tend to put those in a folder. Since it seems like your structure is pictures from a user, then I would expect there might be metadata associated with the image, such as tags. In that case, I would probably store the picture in a database table, assuming I had the capacity for that. If the photos needed to be secured, inaccessible to other users without authentication, then a database would have its own security, whereas a file-based storage would need some sort of trick to prevent unauthorized access.
对于不直接与应用程序或数据库相关的外部图像,比如temp照片或其他东西,我倾向于将它们放在一个文件夹中。由于您的结构似乎是来自用户的图片,所以我认为可能会有与图片相关的元数据,比如标签。在这种情况下,我可能会将图片存储在一个数据库表中,假设我有这个能力。如果需要对照片进行保护,不需要身份验证就无法访问其他用户,那么数据库将具有自己的安全性,而基于文件的存储则需要某种技巧来防止未经授权的访问。
I wouldn't use a table per user, just a table of Pictures with elements of ID, userid, picture blob.
我不会使用每个用户的表,只是一张带有ID、userid、图片blob元素的图片。
Does that help?
这有帮助吗?
#1
3
What Matt H suggested is a good idea if what you are trying to achieve is per user level image access. But granted that you are limited in your database stored space, storing the images at binary data is inefficient as you stated.
如果您试图实现的是每个用户级别的图像访问,那么Matt H建议的是一个好主意。但是,如果您在数据库存储空间中受到限制,那么在二进制数据中存储图像的效率就像您说的那样低。
Using a table per user is bad design. The user who uploaded the file should simply be a field/column in the table that stores all file uploads, along with any file metadata. I suggest generating a GUID for the file name, which is guaranteed to be unique, and better than an autoincrement field which is easy to guess if you are attempting to prevent users from simply accessing all the images.
每个用户使用一个表是糟糕的设计。上传文件的用户应该是表中存储所有文件上传的字段/列,以及任何文件元数据。我建议为文件名生成GUID,它保证是惟一的,并且比自动递增字段更好,如果您试图阻止用户访问所有的图像,那么可以很容易地猜到这个字段。
You are concerned about performance, but until you are dealing with millions upon millions of records, your queries for selecting images belong to a user, uploaded within a specific time frame (say you are storing a timestamp or similar) are minuscule in cost. If speed is an issue, you can add a B-tree index on the username, which would speed up your user specific image queries significantly.
您关注的是性能,但在处理数以百万计的记录之前,您选择图像的查询属于用户,并在特定的时间范围内上传(比如您正在存储时间戳或类似的内容),其成本非常低。如果速度有问题,可以在用户名上添加B-tree索引,这将显著加快用户特定的图像查询。
Back on the topic of security, access and organization. Store the images with a folder per user (although depending on the number of users, the number of folders may grow to an unmanageable level). If you don't want the images to be publicly available, store them in a non-web folder, have your application read the data and stream it to render the image for the user. More complex but you hide the actual file from the internet. In addition, you would be able to validate all requests for an image by an authenticated user.
回到安全、访问和组织的话题。为每个用户存储一个文件夹的图像(尽管根据用户的数量,文件夹的数量可能会增长到难以管理的级别)。如果您不想让图像公开,请将它们存储在非web文件夹中,让应用程序读取数据并流它以呈现给用户的图像。更复杂,但你隐藏了实际的文件从互联网。此外,您还可以验证经过身份验证的用户对映像的所有请求。
#2
3
It depends on the nature and structure of your app and database. I've used many techniques, including folder-based, pictures stored in a database blob, off-web file folders accessed through an authentication gateway...
这取决于你的应用程序和数据库的性质和结构。我使用了许多技术,包括基于文件夹的、存储在数据库blob中的图片、通过身份验证网关访问的非web文件文件夹……
For external images that aren't directly related to the app or database, like temp photos or something, I tend to put those in a folder. Since it seems like your structure is pictures from a user, then I would expect there might be metadata associated with the image, such as tags. In that case, I would probably store the picture in a database table, assuming I had the capacity for that. If the photos needed to be secured, inaccessible to other users without authentication, then a database would have its own security, whereas a file-based storage would need some sort of trick to prevent unauthorized access.
对于不直接与应用程序或数据库相关的外部图像,比如temp照片或其他东西,我倾向于将它们放在一个文件夹中。由于您的结构似乎是来自用户的图片,所以我认为可能会有与图片相关的元数据,比如标签。在这种情况下,我可能会将图片存储在一个数据库表中,假设我有这个能力。如果需要对照片进行保护,不需要身份验证就无法访问其他用户,那么数据库将具有自己的安全性,而基于文件的存储则需要某种技巧来防止未经授权的访问。
I wouldn't use a table per user, just a table of Pictures with elements of ID, userid, picture blob.
我不会使用每个用户的表,只是一张带有ID、userid、图片blob元素的图片。
Does that help?
这有帮助吗?