为图形存储数据的最有效方法。

时间:2022-02-22 23:49:13

I have come up with a total of three different, equally viable methods for saving data for a graph.

我总共想出了三种不同的、同样可行的方法来为图形保存数据。

The graph in question is "player's score in various categories over time". Categories include "buildings", "items", "quest completion", "achievements" and so on.

问题的图表是“不同类别球员的分数随时间而变化”。类别包括“建筑”、“项目”、“任务完成”、“成就”等。

Method 1:

CREATE TABLE `graphdata` (
    `userid` INT UNSIGNED NOT NULL,
    `date` DATE NOT NULL,
    `category` ENUM('buildings','items',...) NOT NULL,
    `score` FLOAT UNSIGNED NOT NULL,
    PRIMARY KEY (`userid`, `date`, `category`),
    INDEX `userid` (`userid`),
    INDEX `date` (`date`)
) ENGINE=InnoDB

This table contains one row for each user/date/category combination. To show a user's data, select by userid. Old entries are cleared out by:

此表包含每个用户/日期/类别组合的一行。要显示用户的数据,请选择userid。旧参赛作品以:

DELETE FROM `graphdata` WHERE `date` < DATE_ADD(NOW(),INTERVAL -1 WEEK)

Method 2:

CREATE TABLE `graphdata` (
    `userid` INT UNSIGNED NOT NULL,
    `buildings-1day` FLOAT UNSIGNED NOT NULL,
    `buildings-2day` FLOAT UNSIGNED NOT NULL,
    ... (and so on for each category up to `-7day`
    PRIMARY KEY (`userid`)
)

Selecting by user id is faster due to being a primary key. Every day scores are shifted down the fields, as in:

由于是主键,按用户id进行选择会更快。每天的分数都被转移到各个领域,比如:

... SET `buildings-3day`=`buildings-2day`, `buildings-2day`=`buildings-1day`...

Entries are not deleted (unless a user deletes their account). Rows can be added/updated with an INSERT...ON DUPLICATE KEY UPDATE query.

条目不会被删除(除非用户删除他们的账户)。行可以添加/更新插入…关于重复的密钥更新查询。

Method 3:

Use one file for each user, containing a JSON-encoded array of their score data. Since the data is being fetched by an AJAX JSON call anyway, this means the file can be fetched statically (and even cached until the following midnight) without any stress on the server. Every day the server runs through each file, shift()s the oldest score off each array and push()es the new one on the end.

每个用户使用一个文件,其中包含一个json编码的得分数据数组。由于数据是通过AJAX JSON调用获取的,这意味着可以静态地获取文件(甚至缓存到下一个午夜),而不会对服务器造成任何压力。服务器每天都要遍历每个文件,从每个数组中移出最老的分数,然后在末尾按下新的。


Personally I think Method 3 is by far the best, however I've heard bad things about using files instead of databases - for instance if I wanted to be able to rank users by their scores in different categories, this solution would be very bad.

我个人认为方法3是目前为止最好的方法,但是我听说过使用文件而不是数据库的不好之处——例如,如果我希望能够根据用户在不同类别中的分数对他们进行排名,那么这个解决方案将是非常糟糕的。

Out of the two database solutions, I've implemented Method 2 on one of my older projects, and that seems to work quite well. Method 1 seems "better" in that it makes better use of relational databases and all that stuff, but I'm a little concerned in that it will contain (number of users) * (number of categories) * 7 rows, which could turn out to be a big number.

在这两个数据库解决方案中,我在我的一个较老的项目上实现了Method 2,这似乎非常有效。方法1似乎“更好”,因为它更好地利用了关系数据库和所有这些东西,但是我有点担心它将包含(用户数量)*(类别数量)* 7行,这可能会是一个很大的数字。

Is there anything I'm missing that could help me make a final decision on which method to use? 1, 2, 3 or none of the above?

有什么是我遗漏的,可以帮助我最终决定使用哪种方法吗?1、2、3还是以上都没有?

2 个解决方案

#1


3  

If you're going to use a relational db, method 1 is much better than method 2. It's normalized, so it's easy to maintain and search. I'd change the date field to a timestamp and call it added_on (or something that's not a reserved word like 'date' is). And I'd add an auto_increment primary key score_id so that user_id/date/category doesn't have to be unique. That way, if a user managed to increment his building score twice in the same second, both would still be recorded.

如果要使用关系数据库,那么方法1要比方法2好得多。它是标准化的,所以很容易维护和搜索。我将日期字段更改为时间戳,并将其命名为added_on(或者不是像“date”这样的保留词)。我将添加一个auto_increment主键score_id,以便user_id/date/category不需要是唯一的。这样,如果一个用户设法在同一秒内增加了两次他的建筑分数,那么这两个都将被记录下来。

The second method requires you to update all the records every day. The first method only does inserts, no updates, so each record is only written to once.

第二个方法要求您每天更新所有记录。第一个方法只执行插入,不进行更新,因此每个记录只被写入一次。

... SET buildings-3day=buildings-2day, buildings-2day=buildings-1day...

…设置buildings-3day = buildings-2day buildings-2day = buildings-1day……

You really want to update every single record in the table every day until the end of time?!

你真的想每天更新表中的每一条记录,直到时间结束?!

Selecting by user id is faster due to being a primary key

由于是主键,按用户id进行选择会更快

Since user_id is the first field in your Method 1 primary key, it will be similarly fast for lookups. As first field in a regular index (which is what I've suggested above), it will still be very fast.

由于user_id是方法1主键中的第一个字段,因此查找也同样快速。作为常规索引中的第一个字段(这就是我上面提到的),它仍然会非常快。

The idea with a relational db is that each row represents a single instance/action/occurrence. So when a user does something to affect his score, do an INSERT that records what he did. You can always create a summary from data like this. But you can't get this kind of data from a summary.

关系数据库的思想是,每一行表示一个实例/操作/事件。所以当用户做一些事情来影响他的分数时,做一个插入记录他所做的事情。您总是可以从这样的数据创建摘要。但是你不能从总结中得到这样的数据。

Secondly, you seem unwontedly concerned about getting rid of old data. Why? Your select queries would have a date range on them that would exclude old data automatically. And if you're concerned about performance, you can partition your tables based on row age or set up a cronjob to delete old records periodically.

第二,你似乎对摆脱旧数据感到奇怪。为什么?您的select查询的日期范围将自动排除旧数据。如果您关心性能,您可以基于行年龄对表进行分区,或者设置一个cronjob来定期删除旧记录。

ETA: Regarding JSON stored in files

This seems to me to combine the drawbacks of Method 2 (difficult to search, every file must be updated every day) with the additional drawbacks of file access. File accesses are expensive. File writes are even more so. If you really want to store summary data, I'd run a query only when the data is requested and I'd store the results in a summary table by user_id. The table could hold a JSON string:

在我看来,这似乎结合了方法2的缺点(很难搜索,每个文件都必须每天更新)和文件访问的缺点。文件访问是昂贵的。文件写入更是如此。如果您真的想要存储摘要数据,我将只在请求数据时运行查询,并将结果存储在user_id的汇总表中。该表可以保存JSON字符串:

CREATE TABLE score_summaries(
user_id INT unsigned NOT NULL PRIMARY KEY,
gen_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
json_data TEXT NOT NULL DEFAULT '{}'
);

For example:

例如:

Bob (user_id=7) logs into the game for the first time. He's on his profile page which displays his weekly stats. These queries ran:

Bob (user_id=7)第一次登录到游戏中。他的个人资料页面显示了他的每周统计数据。这些查询运行:

SELECT json_data FROM score_summaries 
  WHERE user_id=7 
    AND gen_date > DATE_SUB(CURDATE() INTERVAL 1 DAY); 
//returns nothing so generate summary record

SELECT DATE(added_on), category, SUM(score) 
  FROM scores WHERE user_id=7 AND added_on < CURDATE() AND > DATE_SUB(CURDATE(), INTERVAL 1 WEEK)
  GROUP BY DATE(added_on), category; //never include today's data, encode as json with php

INSERT INTO score_summaries(user_id, json_data)
  VALUES(7, '$json') //from PHP, in this case $json == NULL
  ON DUPLICATE KEY UPDATE json_data=VALUES(json_data)

//use $json for presentation too

Today's scores are generated as needed and not stored in the summary. If Bob views his scores again today, the historical ones can come from the summary table or could be stored in a session after the first request. If Bob doesn't visit for a week, no summary needs to be generated.

今天的分数是根据需要生成的,而不是存储在总结中。如果Bob今天再次查看他的分数,历史记录可以来自汇总表,也可以在第一个请求之后存储在会话中。如果Bob一周不访问,则不需要生成摘要。

#2


1  

method 1 seems like a clear winner to me . If you are concerned about size of single table (graphData) being too big you could reduce it by creating

方法1在我看来是一个明显的赢家。如果您担心单个表(graphData)的大小太大,可以通过创建来减少它

CREATE TABLE `graphdata` (
    `graphDataId` INT UNSIGNED NOT NULL,
    `categoryId` INT NOT NULL,
    `score` FLOAT UNSIGNED NOT NULL,
    PRIMARY KEY (`GraphDataId'),
) ENGINE=InnoDB

than create 2 tables because you obviosuly need to have info connecting graphDataId with userId

而不是创建两个表,因为您需要使用userId来连接graphDataId的信息。

create table 'graphDataUser'(
         `graphDataId` INT UNSIGNED NOT NULL,
        `userId` INT NOT NULL,
)ENGINE=InnoDB

and graphDataId date connection

和graphDataId日期连接

create table 'graphDataDate'(
         `graphDataId` INT UNSIGNED NOT NULL,
        'graphDataDate' DATE NOT NULL
)ENGINE=InnoDB

i think that you don't really need to worry about number of rows some table contains because most of dba does a good job regarding number of rows. Its your job only to get data formatted in a way it is easly retrived no matter what is the task for which data is retrieved. Using that advice i think should pay off in a long run.

我认为您不需要担心某些表包含的行数,因为大多数dba在行数方面做得很好。您的工作只是以一种可以轻松检索的方式对数据进行格式化,而不管检索数据的任务是什么。我认为,使用这个建议从长远来看应该会有回报。

#1


3  

If you're going to use a relational db, method 1 is much better than method 2. It's normalized, so it's easy to maintain and search. I'd change the date field to a timestamp and call it added_on (or something that's not a reserved word like 'date' is). And I'd add an auto_increment primary key score_id so that user_id/date/category doesn't have to be unique. That way, if a user managed to increment his building score twice in the same second, both would still be recorded.

如果要使用关系数据库,那么方法1要比方法2好得多。它是标准化的,所以很容易维护和搜索。我将日期字段更改为时间戳,并将其命名为added_on(或者不是像“date”这样的保留词)。我将添加一个auto_increment主键score_id,以便user_id/date/category不需要是唯一的。这样,如果一个用户设法在同一秒内增加了两次他的建筑分数,那么这两个都将被记录下来。

The second method requires you to update all the records every day. The first method only does inserts, no updates, so each record is only written to once.

第二个方法要求您每天更新所有记录。第一个方法只执行插入,不进行更新,因此每个记录只被写入一次。

... SET buildings-3day=buildings-2day, buildings-2day=buildings-1day...

…设置buildings-3day = buildings-2day buildings-2day = buildings-1day……

You really want to update every single record in the table every day until the end of time?!

你真的想每天更新表中的每一条记录,直到时间结束?!

Selecting by user id is faster due to being a primary key

由于是主键,按用户id进行选择会更快

Since user_id is the first field in your Method 1 primary key, it will be similarly fast for lookups. As first field in a regular index (which is what I've suggested above), it will still be very fast.

由于user_id是方法1主键中的第一个字段,因此查找也同样快速。作为常规索引中的第一个字段(这就是我上面提到的),它仍然会非常快。

The idea with a relational db is that each row represents a single instance/action/occurrence. So when a user does something to affect his score, do an INSERT that records what he did. You can always create a summary from data like this. But you can't get this kind of data from a summary.

关系数据库的思想是,每一行表示一个实例/操作/事件。所以当用户做一些事情来影响他的分数时,做一个插入记录他所做的事情。您总是可以从这样的数据创建摘要。但是你不能从总结中得到这样的数据。

Secondly, you seem unwontedly concerned about getting rid of old data. Why? Your select queries would have a date range on them that would exclude old data automatically. And if you're concerned about performance, you can partition your tables based on row age or set up a cronjob to delete old records periodically.

第二,你似乎对摆脱旧数据感到奇怪。为什么?您的select查询的日期范围将自动排除旧数据。如果您关心性能,您可以基于行年龄对表进行分区,或者设置一个cronjob来定期删除旧记录。

ETA: Regarding JSON stored in files

This seems to me to combine the drawbacks of Method 2 (difficult to search, every file must be updated every day) with the additional drawbacks of file access. File accesses are expensive. File writes are even more so. If you really want to store summary data, I'd run a query only when the data is requested and I'd store the results in a summary table by user_id. The table could hold a JSON string:

在我看来,这似乎结合了方法2的缺点(很难搜索,每个文件都必须每天更新)和文件访问的缺点。文件访问是昂贵的。文件写入更是如此。如果您真的想要存储摘要数据,我将只在请求数据时运行查询,并将结果存储在user_id的汇总表中。该表可以保存JSON字符串:

CREATE TABLE score_summaries(
user_id INT unsigned NOT NULL PRIMARY KEY,
gen_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
json_data TEXT NOT NULL DEFAULT '{}'
);

For example:

例如:

Bob (user_id=7) logs into the game for the first time. He's on his profile page which displays his weekly stats. These queries ran:

Bob (user_id=7)第一次登录到游戏中。他的个人资料页面显示了他的每周统计数据。这些查询运行:

SELECT json_data FROM score_summaries 
  WHERE user_id=7 
    AND gen_date > DATE_SUB(CURDATE() INTERVAL 1 DAY); 
//returns nothing so generate summary record

SELECT DATE(added_on), category, SUM(score) 
  FROM scores WHERE user_id=7 AND added_on < CURDATE() AND > DATE_SUB(CURDATE(), INTERVAL 1 WEEK)
  GROUP BY DATE(added_on), category; //never include today's data, encode as json with php

INSERT INTO score_summaries(user_id, json_data)
  VALUES(7, '$json') //from PHP, in this case $json == NULL
  ON DUPLICATE KEY UPDATE json_data=VALUES(json_data)

//use $json for presentation too

Today's scores are generated as needed and not stored in the summary. If Bob views his scores again today, the historical ones can come from the summary table or could be stored in a session after the first request. If Bob doesn't visit for a week, no summary needs to be generated.

今天的分数是根据需要生成的,而不是存储在总结中。如果Bob今天再次查看他的分数,历史记录可以来自汇总表,也可以在第一个请求之后存储在会话中。如果Bob一周不访问,则不需要生成摘要。

#2


1  

method 1 seems like a clear winner to me . If you are concerned about size of single table (graphData) being too big you could reduce it by creating

方法1在我看来是一个明显的赢家。如果您担心单个表(graphData)的大小太大,可以通过创建来减少它

CREATE TABLE `graphdata` (
    `graphDataId` INT UNSIGNED NOT NULL,
    `categoryId` INT NOT NULL,
    `score` FLOAT UNSIGNED NOT NULL,
    PRIMARY KEY (`GraphDataId'),
) ENGINE=InnoDB

than create 2 tables because you obviosuly need to have info connecting graphDataId with userId

而不是创建两个表,因为您需要使用userId来连接graphDataId的信息。

create table 'graphDataUser'(
         `graphDataId` INT UNSIGNED NOT NULL,
        `userId` INT NOT NULL,
)ENGINE=InnoDB

and graphDataId date connection

和graphDataId日期连接

create table 'graphDataDate'(
         `graphDataId` INT UNSIGNED NOT NULL,
        'graphDataDate' DATE NOT NULL
)ENGINE=InnoDB

i think that you don't really need to worry about number of rows some table contains because most of dba does a good job regarding number of rows. Its your job only to get data formatted in a way it is easly retrived no matter what is the task for which data is retrieved. Using that advice i think should pay off in a long run.

我认为您不需要担心某些表包含的行数,因为大多数dba在行数方面做得很好。您的工作只是以一种可以轻松检索的方式对数据进行格式化,而不管检索数据的任务是什么。我认为,使用这个建议从长远来看应该会有回报。