
时间:2021-11-12 12:45:11

I'm designing a messaging system that will have to support a huge amount of messages and users.


I was thinking about two solutions.


Usertable -> id, username ....
Messagetable -> id, from_id, to_id, message ...


Usertable -> id, username ....
Messagetable -> id, message ...
HasMessagetable -> id, from_id, to_id...

I'm wondering what is the best approach to this and why.


Also, are there good publications (free or not) about large databases design and best practices?


Thank you

3 个解决方案



I did the same not too long ago and started out with approach 1. But then users were supposed to be able to send messages to multiple users. Suddenly approach 1 saved each message n times if n recipients were addressed. So if this is ever a possibility, I think 2 is better.




Your second schema is more normalized. Both are acceptable. Properly normalized database design is cleaner, but for perfomance reasons many DBA resort to denormalisation. I would use the second schema, until you hit performance issues this would be the better approach in my very humble opinion.


Do note that normalising to that extent is often considered overkill by many, as others have posted. I do it that way from habit and old (now outdated) DB theory courses I learned 12 years ago.





In general, the less joins you have to do, the better your queries will perform. Therefore, the first option will probably be a better choice since you are going to have a very large database.


Basically, you are going to need to ignore some database normalization techniques in order to gain the performance you need. However, try not to limit yourself either. For example, if you have messages that go to multiple people, you are going to need to either choose option two or figure out a different way to handle this.


As for resources for large database design, here is one for Microsoft SQL Server but a lot of the things it discusses will apply:

至于大型数据库设计的资源,这里有一个用于Microsoft SQL Server,但它讨论的很多东西都适用:




I did the same not too long ago and started out with approach 1. But then users were supposed to be able to send messages to multiple users. Suddenly approach 1 saved each message n times if n recipients were addressed. So if this is ever a possibility, I think 2 is better.




Your second schema is more normalized. Both are acceptable. Properly normalized database design is cleaner, but for perfomance reasons many DBA resort to denormalisation. I would use the second schema, until you hit performance issues this would be the better approach in my very humble opinion.


Do note that normalising to that extent is often considered overkill by many, as others have posted. I do it that way from habit and old (now outdated) DB theory courses I learned 12 years ago.





In general, the less joins you have to do, the better your queries will perform. Therefore, the first option will probably be a better choice since you are going to have a very large database.


Basically, you are going to need to ignore some database normalization techniques in order to gain the performance you need. However, try not to limit yourself either. For example, if you have messages that go to multiple people, you are going to need to either choose option two or figure out a different way to handle this.


As for resources for large database design, here is one for Microsoft SQL Server but a lot of the things it discusses will apply:

至于大型数据库设计的资源,这里有一个用于Microsoft SQL Server,但它讨论的很多东西都适用:
