My Django project is going to be backed by a large database with several hundred thousand entries, and will need to support searching (I'll probably end up using djangosearch or a similar project.)
我的Django项目将由一个拥有几十万条目的大型数据库支持,并且需要支持搜索(我可能最终会使用djangosearch或类似的项目)。
Which database backend is best suited to my project and why? Can you recommend any good resources for further reading?
哪个数据库后端最适合我的项目,为什么?你能推荐一些好的参考资料来进一步阅读吗?
10 个解决方案
#1
40
As someone who recently switched a project from MySQL to Postgresql I don't regret the switch.
作为最近将一个项目从MySQL切换到Postgresql的人,我并不后悔这样做。
The main difference, from a Django point of view, is more rigorous constraint checking in Postgresql, which is a good thing, and also it's a bit more tedious to do manual schema changes (aka migrations).
从Django的观点来看,主要的区别是Postgresql中更严格的约束检查,这是一件好事,而且手工修改模式(也就是迁移)也有点麻烦。
There are probably 6 or so Django database migration applications out there and at least one doesn't support Postgresql. I don't consider this a disadvantage though because you can use one of the others or do them manually (which is what I prefer atm).
大约有6个Django数据库迁移应用程序,其中至少有一个不支持Postgresql。我不认为这是一个缺点,因为你可以使用其中的一个或手动操作它们(我更喜欢atm)。
Full text search might be better supported for MySQL. MySQL has built-in full text search supported from within Django but it's pretty useless (no word stemming, phrase searching, etc.). I've used django-sphinx as a better option for full text searching in MySQL.
对于MySQL,可能更支持全文搜索。MySQL在Django中内置了完整的文本搜索,但是它非常无用(没有词干、短语搜索等)。我使用django-sphinx作为在MySQL中进行全文搜索的更好选择。
Full text searching is built-in with Postgresql 8.3 (earlier versions need TSearch module). Here's a good instructional blog post: Full-text searching in Django with PostgreSQL and tsearch2
全文搜索内置Postgresql 8.3(早期版本需要TSearch模块)。这里有一篇很好的教学博客:用PostgreSQL和tsearch2在Django中进行全文搜索
#2
51
For whatever it's worth the the creators of Django recommend PostgreSQL.
无论如何,Django的创建者推荐PostgreSQL。
If you're not tied to any legacy system and have the freedom to choose a database back-end, we recommend PostgreSQL, which achives a fine balance between cost, features, speed and stability. (The Definitive Guide to Django, p. 15)
如果您不依赖于任何遗留系统,并且可以*选择数据库后端,我们建议您使用PostgreSQL,它可以在成本、特性、速度和稳定性之间实现良好的平衡。(《Django的最终指南》,第15页)
#3
39
large database with several hundred thousand entries,
拥有数十万条目的大型数据库,
This is not large database, it's very small one.
这不是一个大的数据库,而是一个很小的数据库。
I'd choose PostgreSQL, because it has a lot more features. Most significant it this case: in PostgreSQL you can use Python as procedural language.
我选择PostgreSQL,因为它有更多的特性。最重要的一点是:在PostgreSQL中,您可以使用Python作为过程语言。
#4
7
Go with whichever you're more familiar with. MySQL vs PostgreSQL is an endless war. Both of them are excellent database engines and both are being used by major sites. It really doesn't matter in practice.
和你更熟悉的人一起去。MySQL vs PostgreSQL是一场无休止的战争。它们都是优秀的数据库引擎,都被主要站点使用。这在实践中真的不重要。
#5
7
Even if Postgresql looks better, I find it has some performances issues with Django:
即使Postgresql看起来更好,我发现它在Django上有一些性能问题:
Postgresql is made to handle "long connections" (connection pooling, persistant connections, etc.)
Postgresql用于处理“长连接”(连接池、持久连接等)。
MySQL is made to handle "short connections" (connect, do your queries, disconnect, has some performances issues with a lot of open connections)
MySQL是用来处理“短连接”的(连接、查询、断开连接等)
The problem is that Django does not support connection pooling or persistant connection, it has to connect/disconnect to the database at each view call.
问题是Django不支持连接池或持久连接,它必须在每次视图调用时连接/断开数据库。
It will works with Postgresql, but connecting to a Postgresql cost a LOT more than connecting to a MySQL database (On Postgresql, each connection has it own process, it's a lot slower than just popping a new thread in MySQL).
它将与Postgresql一起工作,但是连接到Postgresql要比连接到MySQL数据库花费多得多(在Postgresql上,每个连接都有自己的进程,这比在MySQL中弹出一个新线程要慢得多)。
Then you get some features like the Query Cache that can be really useful on some cases. (But you lost the superb text search of PostgreSQL)
然后您会得到一些特性,比如查询缓存,在某些情况下非常有用。(但是您丢失了PostgreSQL出色的文本搜索)
#6
5
All the answers bring interesting information to the table, but some are a little outdated, so here's my grain of salt.
所有的答案都给我们带来了有趣的信息,但也有一些是过时的,所以我对此持保留态度。
As of 1.7, migrations are now an integral feature of Django. So they documented the main differences that Django developers might want to know beforehand.
从1.7开始,迁移现在是Django的一个完整特性。因此他们记录了Django开发人员可能希望事先知道的主要差异。
Backend Support
Migrations are supported on all backends that Django ships with, as well as any third-party backends if they have programmed in support for schema alteration (done via the SchemaEditor class).
在Django附带的所有后台都支持迁移,以及任何第三方支持,如果它们支持模式更改(通过SchemaEditor类完成)。
However, some databases are more capable than others when it comes to schema migrations; some of the caveats are covered below.
但是,在模式迁移方面,有些数据库比其他数据库更有能力;下面将介绍一些注意事项。
PostgreSQL
PostgreSQL is the most capable of all the databases here in terms of schema support; the only caveat is that adding columns with default values will cause a full rewrite of the table, for a time proportional to its size.
PostgreSQL在模式支持方面是这里所有数据库中能力最强的;唯一要注意的是,添加具有默认值的列将导致对表的完全重写,时间与表的大小成正比。
For this reason, it’s recommended you always create new columns with null=True, as this way they will be added immediately.
出于这个原因,建议您始终使用null=True创建新的列,因为这样可以立即添加它们。
MySQL
MySQL lacks support for transactions around schema alteration operations, meaning that if a migration fails to apply you will have to manually unpick the changes in order to try again (it’s impossible to roll back to an earlier point).
MySQL缺乏对模式更改操作的事务的支持,这意味着如果迁移失败,您将不得不手动取消更改以再次尝试(不可能回滚到更早的点)。
In addition, MySQL will fully rewrite tables for almost every schema operation and generally takes a time proportional to the number of rows in the table to add or remove columns. On slower hardware this can be worse than a minute per million rows - adding a few columns to a table with just a few million rows could lock your site up for over ten minutes.
此外,MySQL将完全重写几乎所有模式操作的表,并且通常需要与表中添加或删除列的行数成比例的时间。在速度较慢的硬件上,这可能比每百万行一分钟还要糟糕——向只有几百万行的表中添加几列可以将站点锁定在10分钟以上。
Finally, MySQL has reasonably small limits on name lengths for columns, tables and indexes, as well as a limit on the combined size of all columns an index covers. This means that indexes that are possible on other backends will fail to be created under MySQL.
最后,MySQL对列、表和索引的名称长度有相当小的限制,对索引覆盖的所有列的组合大小也有限制。这意味着在MySQL下,其他后台可能无法创建索引。
SQLite
SQLite has very little built-in schema alteration support, and so Django attempts to emulate it by:
SQLite几乎没有内置的模式更改支持,因此Django试图通过以下方式来模拟它:
- Creating a new table with the new schema
- 使用新模式创建新表
- Copying the data across
- 复制数据
- Dropping the old table
- 删除旧表
- Renaming the new table to match the original name
- 重命名新表以匹配原始名称
This process generally works well, but it can be slow and occasionally buggy. It is not recommended that you run and migrate SQLite in a production environment unless you are very aware of the risks and its limitations; the support Django ships with is designed to allow developers to use SQLite on their local machines to develop less complex Django projects without the need for a full database.
这个过程通常很好,但是它可能很慢而且偶尔会有bug。不建议您在生产环境中运行和迁移SQLite,除非您非常了解风险及其局限性;Django附带的支持允许开发人员在本地机器上使用SQLite来开发不太复杂的Django项目,而不需要完整的数据库。
#7
3
When a migration fails in django-south, the developers encourage you not to use MySQL:
当django-south的迁移失败时,开发人员鼓励您不要使用MySQL:
! The South developers regret this has happened, and would
! like to gently persuade you to consider a slightly
! easier-to-deal-with DBMS (one that supports DDL transactions)
#8
2
To add to previous answers :
补充之前的答案:
- "Full text search might be better supported for MySQL"
- " MySQL可能更支持全文搜索"
The FULLTEXT index in MySQL is a joke.
MySQL中的全文索引是一个笑话。
- It only works with MyISAM tables, so you lose ACID, Transactions, Constraints, Relations, Durability, Concurrency, etc.
- 它只适用于MyISAM表,因此会丢失ACID、事务、约束、关系、持久性、并发性等。
- INSERT/UPDATE/DELETE to a largish TEXT column (like a forum post) will a rebuild a large part of the index. If it does not fit in myisam_key_buffer, then large IO will occur. I've seen a single forum post insertion trigger 100MB or more of IO ... meanwhile the posts table is exclusiely locked !
- 插入/更新/删除到较大的文本列(如论坛帖子)将重新构建索引的大部分。如果它不适合myisam_key_buffer,那么将会发生大型IO。我已经看到一个论坛后插入触发器100MB或更多的IO…与此同时,邮政台被独占地锁住了!
- I did some benchmarking (3 years ago, may be stale...) which showed that on large datasets, basically postgres fulltext is 10-100x faster than mysql, and Xapian 10-100x faster than postgres (but not integrated).
- 我做了一些基准测试(3年前,可能已经过时了…),结果显示,在大型数据集中,基本上postgres fulltext比mysql快10-100x, Xapian比postgres快10-100x(但没有集成)。
Other reasons not mentioned are the extremely smart query optimizer, large choice of join types (merge, hash, etc), hash aggregation, gist indexes on arrays, spatial search, etc which can result in extremely fast plans on very complicated queries.
没有提到的其他原因是非常智能的查询优化器、大量选择的连接类型(合并、散列等)、散列聚合、数组上的主要索引、空间搜索等等,这些都可能导致对非常复杂的查询的快速计划。
#9
1
Will this application be hosted on your own servers or by a hosting company? Make sure that if you are using a hosting company, they support the database of choice.
这个应用程序是托管在您自己的服务器上还是托管公司?请确保,如果您使用的是托管公司,它们支持选择的数据库。
#10
0
There is a major licensing difference between the two db that will affect you if you ever intend to distribute code using the db. MySQL's client libraries are GPL and PostegreSQL's is under a BSD like license which might be easier to work with.
如果您打算使用db分发代码,那么这两个db之间的许可差异将会影响您。MySQL的客户端库是GPL,而PostegreSQL的则是类似BSD的许可证,使用起来可能更容易。
#1
40
As someone who recently switched a project from MySQL to Postgresql I don't regret the switch.
作为最近将一个项目从MySQL切换到Postgresql的人,我并不后悔这样做。
The main difference, from a Django point of view, is more rigorous constraint checking in Postgresql, which is a good thing, and also it's a bit more tedious to do manual schema changes (aka migrations).
从Django的观点来看,主要的区别是Postgresql中更严格的约束检查,这是一件好事,而且手工修改模式(也就是迁移)也有点麻烦。
There are probably 6 or so Django database migration applications out there and at least one doesn't support Postgresql. I don't consider this a disadvantage though because you can use one of the others or do them manually (which is what I prefer atm).
大约有6个Django数据库迁移应用程序,其中至少有一个不支持Postgresql。我不认为这是一个缺点,因为你可以使用其中的一个或手动操作它们(我更喜欢atm)。
Full text search might be better supported for MySQL. MySQL has built-in full text search supported from within Django but it's pretty useless (no word stemming, phrase searching, etc.). I've used django-sphinx as a better option for full text searching in MySQL.
对于MySQL,可能更支持全文搜索。MySQL在Django中内置了完整的文本搜索,但是它非常无用(没有词干、短语搜索等)。我使用django-sphinx作为在MySQL中进行全文搜索的更好选择。
Full text searching is built-in with Postgresql 8.3 (earlier versions need TSearch module). Here's a good instructional blog post: Full-text searching in Django with PostgreSQL and tsearch2
全文搜索内置Postgresql 8.3(早期版本需要TSearch模块)。这里有一篇很好的教学博客:用PostgreSQL和tsearch2在Django中进行全文搜索
#2
51
For whatever it's worth the the creators of Django recommend PostgreSQL.
无论如何,Django的创建者推荐PostgreSQL。
If you're not tied to any legacy system and have the freedom to choose a database back-end, we recommend PostgreSQL, which achives a fine balance between cost, features, speed and stability. (The Definitive Guide to Django, p. 15)
如果您不依赖于任何遗留系统,并且可以*选择数据库后端,我们建议您使用PostgreSQL,它可以在成本、特性、速度和稳定性之间实现良好的平衡。(《Django的最终指南》,第15页)
#3
39
large database with several hundred thousand entries,
拥有数十万条目的大型数据库,
This is not large database, it's very small one.
这不是一个大的数据库,而是一个很小的数据库。
I'd choose PostgreSQL, because it has a lot more features. Most significant it this case: in PostgreSQL you can use Python as procedural language.
我选择PostgreSQL,因为它有更多的特性。最重要的一点是:在PostgreSQL中,您可以使用Python作为过程语言。
#4
7
Go with whichever you're more familiar with. MySQL vs PostgreSQL is an endless war. Both of them are excellent database engines and both are being used by major sites. It really doesn't matter in practice.
和你更熟悉的人一起去。MySQL vs PostgreSQL是一场无休止的战争。它们都是优秀的数据库引擎,都被主要站点使用。这在实践中真的不重要。
#5
7
Even if Postgresql looks better, I find it has some performances issues with Django:
即使Postgresql看起来更好,我发现它在Django上有一些性能问题:
Postgresql is made to handle "long connections" (connection pooling, persistant connections, etc.)
Postgresql用于处理“长连接”(连接池、持久连接等)。
MySQL is made to handle "short connections" (connect, do your queries, disconnect, has some performances issues with a lot of open connections)
MySQL是用来处理“短连接”的(连接、查询、断开连接等)
The problem is that Django does not support connection pooling or persistant connection, it has to connect/disconnect to the database at each view call.
问题是Django不支持连接池或持久连接,它必须在每次视图调用时连接/断开数据库。
It will works with Postgresql, but connecting to a Postgresql cost a LOT more than connecting to a MySQL database (On Postgresql, each connection has it own process, it's a lot slower than just popping a new thread in MySQL).
它将与Postgresql一起工作,但是连接到Postgresql要比连接到MySQL数据库花费多得多(在Postgresql上,每个连接都有自己的进程,这比在MySQL中弹出一个新线程要慢得多)。
Then you get some features like the Query Cache that can be really useful on some cases. (But you lost the superb text search of PostgreSQL)
然后您会得到一些特性,比如查询缓存,在某些情况下非常有用。(但是您丢失了PostgreSQL出色的文本搜索)
#6
5
All the answers bring interesting information to the table, but some are a little outdated, so here's my grain of salt.
所有的答案都给我们带来了有趣的信息,但也有一些是过时的,所以我对此持保留态度。
As of 1.7, migrations are now an integral feature of Django. So they documented the main differences that Django developers might want to know beforehand.
从1.7开始,迁移现在是Django的一个完整特性。因此他们记录了Django开发人员可能希望事先知道的主要差异。
Backend Support
Migrations are supported on all backends that Django ships with, as well as any third-party backends if they have programmed in support for schema alteration (done via the SchemaEditor class).
在Django附带的所有后台都支持迁移,以及任何第三方支持,如果它们支持模式更改(通过SchemaEditor类完成)。
However, some databases are more capable than others when it comes to schema migrations; some of the caveats are covered below.
但是,在模式迁移方面,有些数据库比其他数据库更有能力;下面将介绍一些注意事项。
PostgreSQL
PostgreSQL is the most capable of all the databases here in terms of schema support; the only caveat is that adding columns with default values will cause a full rewrite of the table, for a time proportional to its size.
PostgreSQL在模式支持方面是这里所有数据库中能力最强的;唯一要注意的是,添加具有默认值的列将导致对表的完全重写,时间与表的大小成正比。
For this reason, it’s recommended you always create new columns with null=True, as this way they will be added immediately.
出于这个原因,建议您始终使用null=True创建新的列,因为这样可以立即添加它们。
MySQL
MySQL lacks support for transactions around schema alteration operations, meaning that if a migration fails to apply you will have to manually unpick the changes in order to try again (it’s impossible to roll back to an earlier point).
MySQL缺乏对模式更改操作的事务的支持,这意味着如果迁移失败,您将不得不手动取消更改以再次尝试(不可能回滚到更早的点)。
In addition, MySQL will fully rewrite tables for almost every schema operation and generally takes a time proportional to the number of rows in the table to add or remove columns. On slower hardware this can be worse than a minute per million rows - adding a few columns to a table with just a few million rows could lock your site up for over ten minutes.
此外,MySQL将完全重写几乎所有模式操作的表,并且通常需要与表中添加或删除列的行数成比例的时间。在速度较慢的硬件上,这可能比每百万行一分钟还要糟糕——向只有几百万行的表中添加几列可以将站点锁定在10分钟以上。
Finally, MySQL has reasonably small limits on name lengths for columns, tables and indexes, as well as a limit on the combined size of all columns an index covers. This means that indexes that are possible on other backends will fail to be created under MySQL.
最后,MySQL对列、表和索引的名称长度有相当小的限制,对索引覆盖的所有列的组合大小也有限制。这意味着在MySQL下,其他后台可能无法创建索引。
SQLite
SQLite has very little built-in schema alteration support, and so Django attempts to emulate it by:
SQLite几乎没有内置的模式更改支持,因此Django试图通过以下方式来模拟它:
- Creating a new table with the new schema
- 使用新模式创建新表
- Copying the data across
- 复制数据
- Dropping the old table
- 删除旧表
- Renaming the new table to match the original name
- 重命名新表以匹配原始名称
This process generally works well, but it can be slow and occasionally buggy. It is not recommended that you run and migrate SQLite in a production environment unless you are very aware of the risks and its limitations; the support Django ships with is designed to allow developers to use SQLite on their local machines to develop less complex Django projects without the need for a full database.
这个过程通常很好,但是它可能很慢而且偶尔会有bug。不建议您在生产环境中运行和迁移SQLite,除非您非常了解风险及其局限性;Django附带的支持允许开发人员在本地机器上使用SQLite来开发不太复杂的Django项目,而不需要完整的数据库。
#7
3
When a migration fails in django-south, the developers encourage you not to use MySQL:
当django-south的迁移失败时,开发人员鼓励您不要使用MySQL:
! The South developers regret this has happened, and would
! like to gently persuade you to consider a slightly
! easier-to-deal-with DBMS (one that supports DDL transactions)
#8
2
To add to previous answers :
补充之前的答案:
- "Full text search might be better supported for MySQL"
- " MySQL可能更支持全文搜索"
The FULLTEXT index in MySQL is a joke.
MySQL中的全文索引是一个笑话。
- It only works with MyISAM tables, so you lose ACID, Transactions, Constraints, Relations, Durability, Concurrency, etc.
- 它只适用于MyISAM表,因此会丢失ACID、事务、约束、关系、持久性、并发性等。
- INSERT/UPDATE/DELETE to a largish TEXT column (like a forum post) will a rebuild a large part of the index. If it does not fit in myisam_key_buffer, then large IO will occur. I've seen a single forum post insertion trigger 100MB or more of IO ... meanwhile the posts table is exclusiely locked !
- 插入/更新/删除到较大的文本列(如论坛帖子)将重新构建索引的大部分。如果它不适合myisam_key_buffer,那么将会发生大型IO。我已经看到一个论坛后插入触发器100MB或更多的IO…与此同时,邮政台被独占地锁住了!
- I did some benchmarking (3 years ago, may be stale...) which showed that on large datasets, basically postgres fulltext is 10-100x faster than mysql, and Xapian 10-100x faster than postgres (but not integrated).
- 我做了一些基准测试(3年前,可能已经过时了…),结果显示,在大型数据集中,基本上postgres fulltext比mysql快10-100x, Xapian比postgres快10-100x(但没有集成)。
Other reasons not mentioned are the extremely smart query optimizer, large choice of join types (merge, hash, etc), hash aggregation, gist indexes on arrays, spatial search, etc which can result in extremely fast plans on very complicated queries.
没有提到的其他原因是非常智能的查询优化器、大量选择的连接类型(合并、散列等)、散列聚合、数组上的主要索引、空间搜索等等,这些都可能导致对非常复杂的查询的快速计划。
#9
1
Will this application be hosted on your own servers or by a hosting company? Make sure that if you are using a hosting company, they support the database of choice.
这个应用程序是托管在您自己的服务器上还是托管公司?请确保,如果您使用的是托管公司,它们支持选择的数据库。
#10
0
There is a major licensing difference between the two db that will affect you if you ever intend to distribute code using the db. MySQL's client libraries are GPL and PostegreSQL's is under a BSD like license which might be easier to work with.
如果您打算使用db分发代码,那么这两个db之间的许可差异将会影响您。MySQL的客户端库是GPL,而PostegreSQL的则是类似BSD的许可证,使用起来可能更容易。