I'm working on a project for a school where a particular module deals with attendance system. I'm using LAMP(PHP 5.2+ MYSQL 5+) stack for development. Now the school strength is around 1500 and total number of working days per year is around 250. Plus, I've to keep records for 5 years before it can be erased.
我正在为一所学校做一个项目,那里有一个专门的模块来处理考勤系统。我正在使用LAMP(PHP 5.2+ MYSQL 5+)堆栈进行开发。现在学校的人数在1500人左右,每年的工作天数在250天左右。另外,我得保存记录5年才能被删除。
The table structure is
表结构
studentId varchar(12)
date date
fn varchar(1) *forenoon*
af varchar(1) *afternoon*
If I simply use a single table, that means 1,875,000 records for a 5 year period. Now instead of such a humongous database, I considered making a table for each class (not section). So considering there are 12 classes, I'll have 12 tables, which means an average of 1,55,000 records per table which is manageable.
如果我只使用一个表,这意味着5年期间有18875,000条记录。现在,我考虑为每个类(不是section)创建一个表,而不是一个庞大的数据库。考虑到有12个类,我将有12个表,这意味着每个表平均有1 55000个记录,这是可以管理的。
Is this the right way to do it? Or are there any better ways?
这是正确的做法吗?或者有更好的方法吗?
5 个解决方案
#1
14
What you are doing is called premature optimization. This is a common mistake.
你所做的就是所谓的过早优化。这是一个常见的错误。
You are better of getting your database structure as close to reality and in future if there becomes a need for optimization or speed improvement you can always do that.
您可以更好地使您的数据库结构接近实际情况,如果将来需要进行优化或速度改进,您可以一直这样做。
From experience and looking at your example the single table solution looks fine.
从经验和示例来看,单表解决方案看起来不错。
#2
3
A couple of points.
两个点。
- 2 million records is not a big table.
- 200万张唱片并不是一个大表格。
- having a separate table per class is definitely not normalized.
- 每个类都有一个单独的表肯定不是标准化的。
You haven't really provided enough information re links to other table and what else, if anything, this table will store. But you should be starting with 3NF for all tables and only changing that if you find performance problems.
您还没有提供足够的信息来重新链接到其他表,如果有的话,这个表还会存储什么。但是您应该从所有表的3NF开始,只有在发现性能问题时才更改它。
#3
2
As long as you indexed your table columns properly, there shouldn't be a big problem with the first table.
只要正确地索引了表列,第一个表就不会出现大问题。
I would disagree with the idea of splitting it up into the 12 classes, because you have no guarantee that that is the way it is going to stay (classes added, classes merge, etc.).
我不同意将它分割成12个类的想法,因为您不能保证这就是它将保持的方式(添加的类、类合并等等)。
Mucking up your database normalization for a perceived benefit of efficiency is something you should look at only for extreme circumstances (if ever)
为了提高效率而对数据库进行规范化是您应该只在极端情况下(如果有的话)才应该关注的事情。
#4
2
I would suggest that there is no need to split this table up. If you create appropriate indexes for any selective queries you may need to perform, the system should be able to find the required rows very quickly. Even for analytic queries that involve all rows, 2 million such records should only require a second or two to scan, which I imagine would not present a great problem.
我认为没有必要把这张桌子分开。如果您为可能需要执行的任何选择性查询创建适当的索引,系统应该能够非常快速地找到所需的行。即使对于涉及所有行的分析查询,200万条这样的记录也只需要一两秒钟就可以扫描,我认为这不会带来什么大问题。
MySQL now also supports partitioning of data as an optional feature. Partitioning is similar to your proposal to split the table up, but it is done at the physical level, so it isn't visible to users or developers using your schema. This may be a useful approach if you find that a single-table implementation is still too slow. This document provides an overview of partitioning in MySQL 5.4.
MySQL现在还支持将数据分区作为可选特性。分区类似于将表拆分的建议,但它是在物理级别完成的,因此用户或开发人员使用您的模式是不可见的。如果您发现单表实现仍然太慢,那么这可能是一种有用的方法。本文档提供了对MySQL 5.4分区的概述。
#5
0
Checksum,
校验和,
I echo Michiel opinin that this is premature optimization.
我回应Michiel opinin这是过早的优化。
What you can basically do later on to improve performance is use the database archiving and partitioning features so that your database reads are efficient. I can sugest creating index on this table also. Anyways I do not believe 1 million records is huge. Databases today are capable of handling such big numbers. Also you will encounter the performance problems 3 years form now only
为了提高性能,您稍后可以使用数据库存档和分区特性,以便您的数据库读取更加高效。我也可以在这张表上创建索引。不管怎样,我不相信100万张唱片是巨大的。今天的数据库能够处理如此大的数字。而且你现在只会遇到3年的性能问题
So go ahead write code rather than thinking of what go wrong!
所以,继续写代码吧,不要去想哪里出错了!
#1
14
What you are doing is called premature optimization. This is a common mistake.
你所做的就是所谓的过早优化。这是一个常见的错误。
You are better of getting your database structure as close to reality and in future if there becomes a need for optimization or speed improvement you can always do that.
您可以更好地使您的数据库结构接近实际情况,如果将来需要进行优化或速度改进,您可以一直这样做。
From experience and looking at your example the single table solution looks fine.
从经验和示例来看,单表解决方案看起来不错。
#2
3
A couple of points.
两个点。
- 2 million records is not a big table.
- 200万张唱片并不是一个大表格。
- having a separate table per class is definitely not normalized.
- 每个类都有一个单独的表肯定不是标准化的。
You haven't really provided enough information re links to other table and what else, if anything, this table will store. But you should be starting with 3NF for all tables and only changing that if you find performance problems.
您还没有提供足够的信息来重新链接到其他表,如果有的话,这个表还会存储什么。但是您应该从所有表的3NF开始,只有在发现性能问题时才更改它。
#3
2
As long as you indexed your table columns properly, there shouldn't be a big problem with the first table.
只要正确地索引了表列,第一个表就不会出现大问题。
I would disagree with the idea of splitting it up into the 12 classes, because you have no guarantee that that is the way it is going to stay (classes added, classes merge, etc.).
我不同意将它分割成12个类的想法,因为您不能保证这就是它将保持的方式(添加的类、类合并等等)。
Mucking up your database normalization for a perceived benefit of efficiency is something you should look at only for extreme circumstances (if ever)
为了提高效率而对数据库进行规范化是您应该只在极端情况下(如果有的话)才应该关注的事情。
#4
2
I would suggest that there is no need to split this table up. If you create appropriate indexes for any selective queries you may need to perform, the system should be able to find the required rows very quickly. Even for analytic queries that involve all rows, 2 million such records should only require a second or two to scan, which I imagine would not present a great problem.
我认为没有必要把这张桌子分开。如果您为可能需要执行的任何选择性查询创建适当的索引,系统应该能够非常快速地找到所需的行。即使对于涉及所有行的分析查询,200万条这样的记录也只需要一两秒钟就可以扫描,我认为这不会带来什么大问题。
MySQL now also supports partitioning of data as an optional feature. Partitioning is similar to your proposal to split the table up, but it is done at the physical level, so it isn't visible to users or developers using your schema. This may be a useful approach if you find that a single-table implementation is still too slow. This document provides an overview of partitioning in MySQL 5.4.
MySQL现在还支持将数据分区作为可选特性。分区类似于将表拆分的建议,但它是在物理级别完成的,因此用户或开发人员使用您的模式是不可见的。如果您发现单表实现仍然太慢,那么这可能是一种有用的方法。本文档提供了对MySQL 5.4分区的概述。
#5
0
Checksum,
校验和,
I echo Michiel opinin that this is premature optimization.
我回应Michiel opinin这是过早的优化。
What you can basically do later on to improve performance is use the database archiving and partitioning features so that your database reads are efficient. I can sugest creating index on this table also. Anyways I do not believe 1 million records is huge. Databases today are capable of handling such big numbers. Also you will encounter the performance problems 3 years form now only
为了提高性能,您稍后可以使用数据库存档和分区特性,以便您的数据库读取更加高效。我也可以在这张表上创建索引。不管怎样,我不相信100万张唱片是巨大的。今天的数据库能够处理如此大的数字。而且你现在只会遇到3年的性能问题
So go ahead write code rather than thinking of what go wrong!
所以,继续写代码吧,不要去想哪里出错了!