Postgresql一个带有多个模式的数据库与一个模式的多个数据库

时间:2022-09-27 12:52:28

I've been reading this question, but it doesn't help me. Considering db administration, I think one db with multiple schemas is easier to maintain than the other option, but in terms of performance, which one is better?, is there any advantage from one over the other?

我一直在读这个问题,但它对我没有帮助。考虑到数据库管理,我认为一个具有多个模式的数据库比其他选项更容易维护,但就性能而言哪个更好?哪一个有优势?

TIA.

TIA。

4 个解决方案

#1


12  

If performance is important, there is no substitute for benchmarking your specific use case. If it isn't that important, then go with easier to administer! Hardware is cheap compared to programmer/DBA time, and compared to the expected higher accident rate on a more complex setup. Not to mention computers follow Moore's law, but the day stubbornly refuses to get longer.

如果性能很重要,则无法替代对特定用例进行基准测试。如果它不那么重要,那么请更容易管理!与程序员/ DBA时间相比,硬件便宜,并且与更复杂设置的预期更高事故率相比。更不用说计算机遵循摩尔定律,但这一天顽固地拒绝延长。

If I had to guess, having not benchmarked your specific use case (since I can't), I'd guess one database with multiple schemas will be better performing because:

如果我不得不猜测,没有对您的特定用例进行基准测试(因为我不能),我猜一个具有多个模式的数据库将会表现更好,因为:

  1. Each connection is to one database, AFAIK. Connection setup/teardown is expensive in PostgreSQL.

    每个连接都是一个数据库,AFAIK。 PostgreSQL中的连接设置/拆卸费用很高。

  2. Many schemas is closer to many tables than many databases is, and I'd expect many tables to be optimized for more than I'd expect many databases to be.

    许多模式比许多数据库更接近许多表,我希望许多表的优化程度超过我预期的许多数据库。

However, I can see a counterexample which may apply. Each database is stored in a single directory. This makes splitting databases across filesystems—and thus disk arrays—really easy using normal filesystem tools (e.g., mount points and/or symbolic links). Splitting databases across multiple arrays is very unlikely to outperform one, larger RAID10 array with the same number of disks, but will provide better isolation (database A doing a huge query will not affect database B as much). Do, however, check your OS's IO scheduler documentation; remember that each PostgreSQL connection gets its own backend process, so per-process fairness queuing may accomplish this better.

但是,我可以看到一个可能适用的反例。每个数据库都存储在一个目录中。这使得使用普通文件系统工具(例如,挂载点和/或符号链接)在文件系统和磁盘阵列之间拆分数据库非常容易。跨多个阵列拆分数据库不太可能胜过具有相同磁盘数量的一个更大的RAID10阵列,但会提供更好的隔离(执行大量查询的数据库A不会对数据库B造成太大影响)。但是,请检查操作系统的IO调度程序文档;请记住,每个PostgreSQL连接都有自己的后端进程,因此每进程公平排队可以更好地完成此任务。

Note that you can also segment the data across filesystems using PostgreSQL's CREATE TABLESPACE and friends, so the above can actually be done with schemas as well.

请注意,您还可以使用PostgreSQL的CREATE TABLESPACE和朋友对文件系统中的数据进行分段,因此上述实际上也可以使用模式完成。

#2


3  

Performance-wise, it's going to depend completely on your application.

性能方面,它完全取决于您的应用程序。

For example, multiple databases requires connection-pooling per database. If you have hundreds or thousands of databases, that pretty much means you can't do connection pooling. That'll cost you performance for anything except say a client application with a single persistent connection to the database.

例如,多个数据库需要每个数据库的连接池。如果您有数百或数千个数据库,这几乎意味着您无法进行连接池。除了说一个客户端应用程序与数据库的单个持久连接之外,这对于任何事情都会花费你的性能。

However, if you only access "one database at a time" (and not within seconds of each other either), keeping things in separate databases will only need to load the system table cache for the databases that are actively being used, leaving more memory to cache user tables (since the system tables in each database will be significantly smaller).

但是,如果您一次只能访问“一个数据库”(而不是彼此之间的数秒),则将内容保存在单独的数据库中只需要为正在使用的数据库加载系统表缓存,从而留下更多内存缓存用户表(因为每个数据库中的系统表将明显更小)。

In most cases, schemas in one database will win out. A fairly common solution is a hybrid, of X databases and Y schemas.

在大多数情况下,一个数据库中的模式将胜出。一个相当常见的解决方案是X数据库和Y模式的混合。

#3


0  

I don't think it matters either way. Each table will be stored in a separate file, the only question is what directory the file lives in. Since performance is impacted primarily by read and write operations on individual files, organization of files in directories should have little effect.

我认为无论如何都不重要。每个表都将存储在一个单独的文件中,唯一的问题是该文件所在的目录。由于性能主要受到对单个文件的读写操作的影响,因此目录中文件的组织应该没什么影响。

#4


0  

Multiple databases have no performance advantage over multiple schemas (namespaces) that I can see (except perhaps if you have an incredibly large number of tables). CREATE TABLESPACE allows you to put whatever you want wherever you want on the filesystem, so the physical storage can be controlled by you in either case.

多个数据库与我可以看到的多个模式(命名空间)相比没有性能优势(除非你有非常多的表)。 CREATE TABLESPACE允许您在文件系统上随意放置任何您想要的内容,因此在任何一种情况下都可以控制物理存储。

The main difference will be when you query across multiple schemas, it's much better than querying across multiple databases. Also, you may be able to share more connections via a connection pooler if it's all in the same database.

主要的区别在于,当您跨多个模式进行查询时,它比跨多个数据库查询要好得多。此外,如果它们都在同一个数据库中,您可以通过连接池来共享更多连接。

#1


12  

If performance is important, there is no substitute for benchmarking your specific use case. If it isn't that important, then go with easier to administer! Hardware is cheap compared to programmer/DBA time, and compared to the expected higher accident rate on a more complex setup. Not to mention computers follow Moore's law, but the day stubbornly refuses to get longer.

如果性能很重要,则无法替代对特定用例进行基准测试。如果它不那么重要,那么请更容易管理!与程序员/ DBA时间相比,硬件便宜,并且与更复杂设置的预期更高事故率相比。更不用说计算机遵循摩尔定律,但这一天顽固地拒绝延长。

If I had to guess, having not benchmarked your specific use case (since I can't), I'd guess one database with multiple schemas will be better performing because:

如果我不得不猜测,没有对您的特定用例进行基准测试(因为我不能),我猜一个具有多个模式的数据库将会表现更好,因为:

  1. Each connection is to one database, AFAIK. Connection setup/teardown is expensive in PostgreSQL.

    每个连接都是一个数据库,AFAIK。 PostgreSQL中的连接设置/拆卸费用很高。

  2. Many schemas is closer to many tables than many databases is, and I'd expect many tables to be optimized for more than I'd expect many databases to be.

    许多模式比许多数据库更接近许多表,我希望许多表的优化程度超过我预期的许多数据库。

However, I can see a counterexample which may apply. Each database is stored in a single directory. This makes splitting databases across filesystems—and thus disk arrays—really easy using normal filesystem tools (e.g., mount points and/or symbolic links). Splitting databases across multiple arrays is very unlikely to outperform one, larger RAID10 array with the same number of disks, but will provide better isolation (database A doing a huge query will not affect database B as much). Do, however, check your OS's IO scheduler documentation; remember that each PostgreSQL connection gets its own backend process, so per-process fairness queuing may accomplish this better.

但是,我可以看到一个可能适用的反例。每个数据库都存储在一个目录中。这使得使用普通文件系统工具(例如,挂载点和/或符号链接)在文件系统和磁盘阵列之间拆分数据库非常容易。跨多个阵列拆分数据库不太可能胜过具有相同磁盘数量的一个更大的RAID10阵列,但会提供更好的隔离(执行大量查询的数据库A不会对数据库B造成太大影响)。但是,请检查操作系统的IO调度程序文档;请记住,每个PostgreSQL连接都有自己的后端进程,因此每进程公平排队可以更好地完成此任务。

Note that you can also segment the data across filesystems using PostgreSQL's CREATE TABLESPACE and friends, so the above can actually be done with schemas as well.

请注意,您还可以使用PostgreSQL的CREATE TABLESPACE和朋友对文件系统中的数据进行分段,因此上述实际上也可以使用模式完成。

#2


3  

Performance-wise, it's going to depend completely on your application.

性能方面,它完全取决于您的应用程序。

For example, multiple databases requires connection-pooling per database. If you have hundreds or thousands of databases, that pretty much means you can't do connection pooling. That'll cost you performance for anything except say a client application with a single persistent connection to the database.

例如,多个数据库需要每个数据库的连接池。如果您有数百或数千个数据库,这几乎意味着您无法进行连接池。除了说一个客户端应用程序与数据库的单个持久连接之外,这对于任何事情都会花费你的性能。

However, if you only access "one database at a time" (and not within seconds of each other either), keeping things in separate databases will only need to load the system table cache for the databases that are actively being used, leaving more memory to cache user tables (since the system tables in each database will be significantly smaller).

但是,如果您一次只能访问“一个数据库”(而不是彼此之间的数秒),则将内容保存在单独的数据库中只需要为正在使用的数据库加载系统表缓存,从而留下更多内存缓存用户表(因为每个数据库中的系统表将明显更小)。

In most cases, schemas in one database will win out. A fairly common solution is a hybrid, of X databases and Y schemas.

在大多数情况下,一个数据库中的模式将胜出。一个相当常见的解决方案是X数据库和Y模式的混合。

#3


0  

I don't think it matters either way. Each table will be stored in a separate file, the only question is what directory the file lives in. Since performance is impacted primarily by read and write operations on individual files, organization of files in directories should have little effect.

我认为无论如何都不重要。每个表都将存储在一个单独的文件中,唯一的问题是该文件所在的目录。由于性能主要受到对单个文件的读写操作的影响,因此目录中文件的组织应该没什么影响。

#4


0  

Multiple databases have no performance advantage over multiple schemas (namespaces) that I can see (except perhaps if you have an incredibly large number of tables). CREATE TABLESPACE allows you to put whatever you want wherever you want on the filesystem, so the physical storage can be controlled by you in either case.

多个数据库与我可以看到的多个模式(命名空间)相比没有性能优势(除非你有非常多的表)。 CREATE TABLESPACE允许您在文件系统上随意放置任何您想要的内容,因此在任何一种情况下都可以控制物理存储。

The main difference will be when you query across multiple schemas, it's much better than querying across multiple databases. Also, you may be able to share more connections via a connection pooler if it's all in the same database.

主要的区别在于,当您跨多个模式进行查询时,它比跨多个数据库查询要好得多。此外,如果它们都在同一个数据库中,您可以通过连接池来共享更多连接。