I am rebuilding the background system of a site with a lot of traffic.
我正在重建拥有大量流量的网站的后台系统。
This is the core of the application and the way I build this part of the database is critical for a big chunk of code and upcoming work. The system described below will have to run millions of times each day. I would appreciate any input on the issue.
这是应用程序的核心,我构建数据库的这一部分的方式对于大量代码和即将开展的工作至关重要。下面描述的系统每天必须运行数百万次。我很感激有关该问题的任何意见。
The background is that a user can add what he or she has been eating during the day.
背景是用户可以添加他或她在白天吃的东西。
Simplified, the process is more or less this:
简化,这个过程或多或少是这样的:
- The user arrives to the site and the site lists his/her choices for the day (if entered before as the steps below describes).
- 用户到达站点并且站点列出他/她当天的选择(如果按照以下步骤描述之前输入)。
- The user can add a meal (consisting of 1 to unlimited different items of food and their quantity). The meal is added through a search field and is organized in different types (like 'Breakfast', 'Lunch').
- 用户可以添加一餐(由1份到无限量的食物及其数量组成)。通过搜索字段添加膳食,并以不同类型组织(如“早餐”,“午餐”)。
- During the meal building process a list of the most commonly used food items (primarily by this user, but secondly also by all users) will be shown for quick selection.
- 在膳食构建过程中,将显示最常用的食物项目列表(主要由该用户,但其次也包括所有用户)以供快速选择。
- The meals will be stored in a FoodLog table that consists of something like this:
id, user_id, date, type, food_data
. - 膳食将存储在FoodLog表中,该表由以下内容组成:id,user_id,date,type,food_data。
What I currently have is a huge database with food items from which the search will be performed. The food items are stored with information on both the common name (like "pork cutlets") and on producer (like "coca cola"), along with other detailed information needed.
我目前拥有的是一个庞大的数据库,里面有食物项目,可以从中搜索。食品中存储有关于通用名称(如“猪肉片”)和生产者(如“可口可乐”)的信息,以及所需的其他详细信息。
Question summary:
问题摘要:
My problem is that I do not know the best way to store the data for it to be easily accessible in the way I need it and without the database going out of hand.
我的问题是,我不知道存储数据的最佳方式,以便以我需要的方式轻松访问它,并且数据库不会失控。
Consider 1 million users adding 1 to 7 meals each day. To store each food item for each meal, each day and each user would potentially create (1*avg_num_meals*avg_num_food_items) million rows each day.
考虑每天增加1至7餐的100万用户。为了存储每餐的每个食品,每天和每个用户可能每天创建(1 * avg_num_meals * avg_num_food_items)百万行。
Storing the data in some compressed way (like the food_data is an json_encoded string), would lessen the amount of rows significally, but at the same time making it hard to create the 'most used food items'-list and other statistics on the fly.
以某种压缩方式存储数据(如food_data是一个json_encoded字符串),会显着减少行数,但同时也很难创建“最常用的食物项目”列表和其他动态统计信息。
Should the table be split into several tables? If this is the case, how would they interact?
该表应该分成几个表吗?如果是这种情况,他们将如何互动?
The site is currently hosted on a mid-range CDN and is using a LAMP (Linux, Apache, MySQL, PHP) backbone.
该网站目前托管在中端CDN上,并使用LAMP(Linux,Apache,MySQL,PHP)骨干网。
4 个解决方案
#1
10
Roughly, you want a fully normalized data structure for this. You want to have one table for Users, one table for Meals (one entry per meal, with a reference to User; you probably also want to have a time / date of the meal in this table), and a table for MealItems, which is simply an association table between Meal and the Food Items table.
粗略地说,您需要一个完全标准化的数据结构。您希望为用户提供一个表,为餐提供一个表(每餐一个条目,参考用户;您可能还希望在此表中有餐时间/日期),以及MealItems表,只是膳食和食物项目表之间的关联表。
So when a User comes in and creates an account, you make an entry in the Users table. When a user reports a Meal they've eaten, you create a record in the Meals table, and a record in the MealItems table for every item they reported.
因此,当用户进入并创建帐户时,您在“用户”表中创建一个条目。当用户报告他们吃过的膳食时,您可以在膳食表中创建记录,并在MealItems表中为他们报告的每个项目创建记录。
This structure makes it straightforward to have a variable number of items with every meal, without wasting a lot of space. You can determine the representation of items in meals with a relatively simple query, as well as determining just what the total set of items any one user has consumed in any given timespan.
这种结构使得每餐都可以直接获得可变数量的物品,而不会浪费大量空间。您可以使用相对简单的查询来确定膳食中物品的表示,以及确定任何一个用户在任何给定时间段内消耗的物品总数。
This normalized table structure will support a VERY large number of records and support a large number of queries against the database.
此规范化表结构将支持非常大量的记录,并支持对数据库的大量查询。
#2
3
First,
第一,
Storing the data in some compressed way (like the food_data is an json_encoded string)
以某种压缩方式存储数据(如food_data是json_encoded字符串)
is not a recommended idea. This will cause you countless headaches in the future as new requirements are added.
不是推荐的想法。随着新要求的增加,这将在未来引起无数令人头疼的问题。
You should definitely have a few tables here.
你肯定应该有几张桌子。
Users
id, etc
Food Items
id, name, description, etc
Meals
id, user_id, category, etc
Meal Items
id, food_item_id, meal_id
The Meal Items would tie the Meals to the Food Items using ids. The Meals would be tied to Users using ids. This makes it simple to use joins in order to get detailed lists of data- totals, averages, etc. If the fields are properly indexed, this should be a great model to support a large number of records.
膳食项目会使用ID将膳食与食物项目联系起来。膳食将与使用ID的用户绑定。这使得使用连接变得简单,以获得数据总数,平均值等的详细列表。如果字段被正确编入索引,这应该是支持大量记录的好模型。
#3
2
In addition to what's been said:
除了说了什么:
- be judicious in your use of indexes. Properly applying these to your database could significantly speed up read access to your tables.
- 在使用索引时要明智。正确地将这些应用到您的数据库可以显着加快对表的读取访问。
- Consider using language-specific features to minimize space. You mention that you're using mysql; consider using
ENUM
when appropriate (food types, meal types) to minimize database size and to simplify management. - 考虑使用特定于语言的功能来最小化空间。你提到你正在使用mysql;考虑在适当时使用ENUM(食物类型,膳食类型)以最小化数据库大小并简化管理。
#4
1
I would split up your meal table into two tables, one table stores a single row for each meal, the second table stores one row for each food item used in a meal, with a foreign key reference to the meal it was used in.
我会把您的餐桌分成两张桌子,一张桌子每餐储存一排,第二张桌子为每餐中使用的每一种食品储存一排,外国钥匙参考用餐。
After that, just make sure you have indices on any table columns used in joins or WHERE clauses.
之后,只需确保在连接或WHERE子句中使用的任何表列上都有索引。
#1
10
Roughly, you want a fully normalized data structure for this. You want to have one table for Users, one table for Meals (one entry per meal, with a reference to User; you probably also want to have a time / date of the meal in this table), and a table for MealItems, which is simply an association table between Meal and the Food Items table.
粗略地说,您需要一个完全标准化的数据结构。您希望为用户提供一个表,为餐提供一个表(每餐一个条目,参考用户;您可能还希望在此表中有餐时间/日期),以及MealItems表,只是膳食和食物项目表之间的关联表。
So when a User comes in and creates an account, you make an entry in the Users table. When a user reports a Meal they've eaten, you create a record in the Meals table, and a record in the MealItems table for every item they reported.
因此,当用户进入并创建帐户时,您在“用户”表中创建一个条目。当用户报告他们吃过的膳食时,您可以在膳食表中创建记录,并在MealItems表中为他们报告的每个项目创建记录。
This structure makes it straightforward to have a variable number of items with every meal, without wasting a lot of space. You can determine the representation of items in meals with a relatively simple query, as well as determining just what the total set of items any one user has consumed in any given timespan.
这种结构使得每餐都可以直接获得可变数量的物品,而不会浪费大量空间。您可以使用相对简单的查询来确定膳食中物品的表示,以及确定任何一个用户在任何给定时间段内消耗的物品总数。
This normalized table structure will support a VERY large number of records and support a large number of queries against the database.
此规范化表结构将支持非常大量的记录,并支持对数据库的大量查询。
#2
3
First,
第一,
Storing the data in some compressed way (like the food_data is an json_encoded string)
以某种压缩方式存储数据(如food_data是json_encoded字符串)
is not a recommended idea. This will cause you countless headaches in the future as new requirements are added.
不是推荐的想法。随着新要求的增加,这将在未来引起无数令人头疼的问题。
You should definitely have a few tables here.
你肯定应该有几张桌子。
Users
id, etc
Food Items
id, name, description, etc
Meals
id, user_id, category, etc
Meal Items
id, food_item_id, meal_id
The Meal Items would tie the Meals to the Food Items using ids. The Meals would be tied to Users using ids. This makes it simple to use joins in order to get detailed lists of data- totals, averages, etc. If the fields are properly indexed, this should be a great model to support a large number of records.
膳食项目会使用ID将膳食与食物项目联系起来。膳食将与使用ID的用户绑定。这使得使用连接变得简单,以获得数据总数,平均值等的详细列表。如果字段被正确编入索引,这应该是支持大量记录的好模型。
#3
2
In addition to what's been said:
除了说了什么:
- be judicious in your use of indexes. Properly applying these to your database could significantly speed up read access to your tables.
- 在使用索引时要明智。正确地将这些应用到您的数据库可以显着加快对表的读取访问。
- Consider using language-specific features to minimize space. You mention that you're using mysql; consider using
ENUM
when appropriate (food types, meal types) to minimize database size and to simplify management. - 考虑使用特定于语言的功能来最小化空间。你提到你正在使用mysql;考虑在适当时使用ENUM(食物类型,膳食类型)以最小化数据库大小并简化管理。
#4
1
I would split up your meal table into two tables, one table stores a single row for each meal, the second table stores one row for each food item used in a meal, with a foreign key reference to the meal it was used in.
我会把您的餐桌分成两张桌子,一张桌子每餐储存一排,第二张桌子为每餐中使用的每一种食品储存一排,外国钥匙参考用餐。
After that, just make sure you have indices on any table columns used in joins or WHERE clauses.
之后,只需确保在连接或WHERE子句中使用的任何表列上都有索引。