Background
I am developing a social web app for poets and writers, allowing them to share their poetry, gather feedback, and communicate with other poets. I have very little formal training in database design, but I have been reading books, SO, and online DB design resources in an attempt to ensure performance and scalability without over-engineering.
我正在为诗人和作家开发一个社交网络应用程序,允许他们分享他们的诗歌,收集反馈,并与其他诗人交流。我在数据库设计方面的培训很少,但我一直在阅读书籍,SO和在线数据库设计资源,以确保性能和可扩展性而不会过度设计。
The database is MySQL, and the application is written in PHP. I'm not sure yet whether we will be using an ORM library or writing SQL queries from scratch in the app. Other than the web application, Solr search server and maybe some messaging client will interact with the database.
数据库是MySQL,应用程序是用PHP编写的。我不确定我们是否会在应用程序中使用ORM库或从头开始编写SQL查询。除了Web应用程序之外,Solr搜索服务器和某些消息传递客户端将与数据库进行交互。
Current Needs
The schema I have thrown together below represents the primary components of the first version of the website. Initially, users can register for the site and do any of the following:
我在下面拼凑的模式代表了网站第一版的主要组件。最初,用户可以注册该站点并执行以下任何操作:
- Create and modify profile details and account settings
- 创建和修改配置文件详细信息和帐户设置
- Post, tag and categorize their writing
- 发布,标记和分类他们的写作
- Read, comment on and "favorite" other users' posts
- 阅读,评论和“最喜欢”其他用户的帖子
- "Follow" other users to get notifications of their activity
- “关注”其他用户以获取其活动的通知
- Search and browse content and get suggested posts/users (though we will be using the Solr search server to index DB data and run these type of queries)
- 搜索和浏览内容并获取建议的帖子/用户(尽管我们将使用Solr搜索服务器索引数据库数据并运行这些类型的查询)
Schema
Here is what I came up with on MySQL Workbench for the initial site. I'm still a little fuzzy on some relational databasey things, so go easy.
以下是我在MySQL Workbench上为初始站点提出的建议。我对某些关系数据库事物仍然有点模糊,所以就这么简单。
Questions
- In general, is there anything I'm doing wrong or can improve upon?
- 一般来说,有什么我做错了或可以改进吗?
- Is there any reason why I shouldn't combine the ExternalAccounts table into the UserProfiles table?
- 我有什么理由不将ExternalAccounts表合并到UserProfiles表中吗?
- Is there any reason why I shouldn't combine the PostStats table into the Posts table?
- 我有什么理由不将PostStats表合并到Posts表中吗?
- Should I expand the design to include the features we are doing in the second version just to ensure that the initial schema can support it?
- 我是否应该扩展设计以包含我们在第二个版本中执行的功能,以确保初始架构可以支持它?
- Is there anything I can do to optimize the DB design for Solr indexing/performance/whatever?
- 有什么办法可以优化Solr索引/性能/数据库的DB设计吗?
- Should I be using more natural primary keys, like Username instead of UserID, or zip/area code instead of a surrogate LocationID in the Locations table?
- 我应该使用更自然的主键,例如Username而不是UserID,还是zip / area代码而不是Locations表中的代理LocationID?
Thanks for the help!
谢谢您的帮助!
2 个解决方案
#1
3
In general, is there anything I'm doing wrong or can improve upon?
一般来说,有什么我做错了或可以改进吗?
Overall, I don't see any big flaws in your current setup or schema.
总的来说,我没有看到您当前的设置或架构中存在任何重大缺陷。
What I'm wonderng is your split into 3 User* tables. I get what you want your intendtion was (having different user-related things seperate) but I don't know if I would go with the exact same thing. If you plan on displaying only data from the User
table on the site, this is fine, since the other info is not needed multiple times on the same page but if users need to use their real name and display their real name (like John Doe instead of doe55) than this will slow down things when the data gets bigger since you may require joins. Having the Preferences
seperate seems like a personal choice. I have no argument in favor of nor against it.
我想知道你分成3个User *表。我得到了你想要的意图(不同的用户相关的东西分开),但我不知道我是否会采用完全相同的东西。如果您计划仅显示网站上User表中的数据,这很好,因为在同一页面上不需要多次其他信息,但如果用户需要使用他们的真实姓名并显示他们的真实姓名(如John Doe)而不是doe55)因为你可能需要连接,所以当数据变大时会减慢速度。将首选项分开似乎是个人选择。我没有赞成也不反对它。
Your many-to-many tables would not need an addtional PK (e.g PostFavoriteID
). A combined primary of both PostID
and UserID
would be enough since PostFavoriteID
is never used anywhere else. This goes for all join tables
您的多对多表不需要附加PK(例如PostFavoriteID)。 PostID和UserID的组合主要是足够的,因为PostFavoriteID从未在其他地方使用过。这适用于所有连接表
Is there any reason why I shouldn't combine the ExternalAccounts table into the UserProfiles table?
我有什么理由不将ExternalAccounts表合并到UserProfiles表中吗?
As withe the prev. answer, I don't see a advatanage or disadvantage. I may put both in the same table since the NULL
(or maybe better -1
) values would not bother me.
与上一篇文章一样。回答,我没有看到一个优点或缺点。我可以将两者放在同一个表中,因为NULL(或者更好的-1)值不会打扰我。
Is there any reason why I shouldn't combine the PostStats table into the Posts table?
我有什么理由不将PostStats表合并到Posts表中吗?
I would put them into the same table using a trigger to handle the increment of the ViewCount
table
我会使用触发器将它们放入同一个表中来处理ViewCount表的增量
Should I expand the design to include the features we are doing in the second version just to ensure that the initial schema can support it?
我是否应该扩展设计以包含我们在第二个版本中执行的功能,以确保初始架构可以支持它?
You are using a normalsied schema so any additions can be done at any time.
您正在使用标准化架构,因此可以随时进行任何添加。
Is there anything I can do to optimize the DB design for Solr indexing/performance/whatever?
有什么办法可以优化Solr索引/性能/数据库的DB设计吗?
Can't tell you, haven't done it yet but I know that Solr is very powerfull and flexible so I think you should be doing fine.
不能告诉你,还没有完成它但我知道Solr非常强大和灵活,所以我认为你应该做得很好。
Should I be using more natural primary keys, like Username instead of UserID, or zip/area code instead of a surrogate LocationID in the Locations table?
我应该使用更自然的主键,例如Username而不是UserID,还是zip / area代码而不是Locations表中的代理LocationID?
There are many threads here on SO discussing this. Personally, I like a surrogate key better (or another unique number key if available) since it makes queries more easier and faster since an int is looked up easier. If you allow a change of username/email/whatever-your-PK-is than there are massive updates required. With the surrogate key, you don't need to bother.
这里有很多线索在讨论这个问题。就个人而言,我更喜欢代理键(或其他唯一数字键,如果可用),因为它使查询更容易和更快,因为更容易查找int。如果您允许更改用户名/电子邮件/无论您的PK是什么,那么需要进行大量更新。使用代理键,您不需要打扰。
What I would also do is to add things like created_at
, last_accessed
at (best done via triggers or procedures IMO) to have some stats already available. This can realy give you valuable stats
我还要做的是添加像created_at,last_accessed(最好通过触发器或程序IMO)这样的东西来获得一些统计数据。这可以真正为您提供有价值的统计数据
Further strategies to increate the performance would be things like memcache, counter cache, partitioned tables,... Such things can be discussed when you are really overrun by users because there may be things/technologies/techniques/... that are very specific to your problem.
进一步增加性能的策略包括内存缓存,计数器缓存,分区表,......当你真的被用户占用时,可以讨论这些事情,因为可能有非常具体的东西/技术/技术/ ......你的问题。
#2
1
I'm not clear what's going on with your User* tables - they're set up as if they're 1:1 but the diagram reflects 1-to-many (the crow's foot symbol).
我不清楚你的User *表格发生了什么 - 它们的设置好像它们是1:1但是图表反映了1对多(乌鸦的符号)。
The ExternalAccounts
and UserSettings
could be normalised further (in which case they would then be 1-to-many!), which will give you a more maintainable design - you wouldn't need to add further columns to your schema for additional External Account or Notification Types (although this may be less scalable in terms of performance).
ExternalAccounts和UserSettings可以进一步规范化(在这种情况下,它们将是1对多!),这将为您提供更易于维护的设计 - 您不需要在架构中添加更多列以用于其他外部帐户或通知类型(尽管在性能方面可能不太可扩展)。
For example:
例如:
ExternalAccounts
UserId int,
AccountType varchar(45),
AccountIdentifier varchar(45)
will allow you to store LinkedIn, Google, etc. accounts in the same structure. Similarly, further Notification Types can be readily added using a structure like:
将允许您以相同的结构存储LinkedIn,Google等帐户。同样,可以使用如下结构轻松添加其他通知类型:
UserSettings
UserId int,
NotificationType varchar(45),
NotificationFlag ENUM('on','off')
hth
心连心
#1
3
In general, is there anything I'm doing wrong or can improve upon?
一般来说,有什么我做错了或可以改进吗?
Overall, I don't see any big flaws in your current setup or schema.
总的来说,我没有看到您当前的设置或架构中存在任何重大缺陷。
What I'm wonderng is your split into 3 User* tables. I get what you want your intendtion was (having different user-related things seperate) but I don't know if I would go with the exact same thing. If you plan on displaying only data from the User
table on the site, this is fine, since the other info is not needed multiple times on the same page but if users need to use their real name and display their real name (like John Doe instead of doe55) than this will slow down things when the data gets bigger since you may require joins. Having the Preferences
seperate seems like a personal choice. I have no argument in favor of nor against it.
我想知道你分成3个User *表。我得到了你想要的意图(不同的用户相关的东西分开),但我不知道我是否会采用完全相同的东西。如果您计划仅显示网站上User表中的数据,这很好,因为在同一页面上不需要多次其他信息,但如果用户需要使用他们的真实姓名并显示他们的真实姓名(如John Doe)而不是doe55)因为你可能需要连接,所以当数据变大时会减慢速度。将首选项分开似乎是个人选择。我没有赞成也不反对它。
Your many-to-many tables would not need an addtional PK (e.g PostFavoriteID
). A combined primary of both PostID
and UserID
would be enough since PostFavoriteID
is never used anywhere else. This goes for all join tables
您的多对多表不需要附加PK(例如PostFavoriteID)。 PostID和UserID的组合主要是足够的,因为PostFavoriteID从未在其他地方使用过。这适用于所有连接表
Is there any reason why I shouldn't combine the ExternalAccounts table into the UserProfiles table?
我有什么理由不将ExternalAccounts表合并到UserProfiles表中吗?
As withe the prev. answer, I don't see a advatanage or disadvantage. I may put both in the same table since the NULL
(or maybe better -1
) values would not bother me.
与上一篇文章一样。回答,我没有看到一个优点或缺点。我可以将两者放在同一个表中,因为NULL(或者更好的-1)值不会打扰我。
Is there any reason why I shouldn't combine the PostStats table into the Posts table?
我有什么理由不将PostStats表合并到Posts表中吗?
I would put them into the same table using a trigger to handle the increment of the ViewCount
table
我会使用触发器将它们放入同一个表中来处理ViewCount表的增量
Should I expand the design to include the features we are doing in the second version just to ensure that the initial schema can support it?
我是否应该扩展设计以包含我们在第二个版本中执行的功能,以确保初始架构可以支持它?
You are using a normalsied schema so any additions can be done at any time.
您正在使用标准化架构,因此可以随时进行任何添加。
Is there anything I can do to optimize the DB design for Solr indexing/performance/whatever?
有什么办法可以优化Solr索引/性能/数据库的DB设计吗?
Can't tell you, haven't done it yet but I know that Solr is very powerfull and flexible so I think you should be doing fine.
不能告诉你,还没有完成它但我知道Solr非常强大和灵活,所以我认为你应该做得很好。
Should I be using more natural primary keys, like Username instead of UserID, or zip/area code instead of a surrogate LocationID in the Locations table?
我应该使用更自然的主键,例如Username而不是UserID,还是zip / area代码而不是Locations表中的代理LocationID?
There are many threads here on SO discussing this. Personally, I like a surrogate key better (or another unique number key if available) since it makes queries more easier and faster since an int is looked up easier. If you allow a change of username/email/whatever-your-PK-is than there are massive updates required. With the surrogate key, you don't need to bother.
这里有很多线索在讨论这个问题。就个人而言,我更喜欢代理键(或其他唯一数字键,如果可用),因为它使查询更容易和更快,因为更容易查找int。如果您允许更改用户名/电子邮件/无论您的PK是什么,那么需要进行大量更新。使用代理键,您不需要打扰。
What I would also do is to add things like created_at
, last_accessed
at (best done via triggers or procedures IMO) to have some stats already available. This can realy give you valuable stats
我还要做的是添加像created_at,last_accessed(最好通过触发器或程序IMO)这样的东西来获得一些统计数据。这可以真正为您提供有价值的统计数据
Further strategies to increate the performance would be things like memcache, counter cache, partitioned tables,... Such things can be discussed when you are really overrun by users because there may be things/technologies/techniques/... that are very specific to your problem.
进一步增加性能的策略包括内存缓存,计数器缓存,分区表,......当你真的被用户占用时,可以讨论这些事情,因为可能有非常具体的东西/技术/技术/ ......你的问题。
#2
1
I'm not clear what's going on with your User* tables - they're set up as if they're 1:1 but the diagram reflects 1-to-many (the crow's foot symbol).
我不清楚你的User *表格发生了什么 - 它们的设置好像它们是1:1但是图表反映了1对多(乌鸦的符号)。
The ExternalAccounts
and UserSettings
could be normalised further (in which case they would then be 1-to-many!), which will give you a more maintainable design - you wouldn't need to add further columns to your schema for additional External Account or Notification Types (although this may be less scalable in terms of performance).
ExternalAccounts和UserSettings可以进一步规范化(在这种情况下,它们将是1对多!),这将为您提供更易于维护的设计 - 您不需要在架构中添加更多列以用于其他外部帐户或通知类型(尽管在性能方面可能不太可扩展)。
For example:
例如:
ExternalAccounts
UserId int,
AccountType varchar(45),
AccountIdentifier varchar(45)
will allow you to store LinkedIn, Google, etc. accounts in the same structure. Similarly, further Notification Types can be readily added using a structure like:
将允许您以相同的结构存储LinkedIn,Google等帐户。同样,可以使用如下结构轻松添加其他通知类型:
UserSettings
UserId int,
NotificationType varchar(45),
NotificationFlag ENUM('on','off')
hth
心连心