I am interested in your thoughts about the the pitfalls of joining two or more tables from different databases. I'll try to give an example.
我对您加入来自不同数据库的两个或多个表的陷阱感兴趣。我试着举个例子。
Suppose table Table1
is located in DatabaseA
database and Table2
is located in DatabaseB
. Let's say i have a view, in DatabaseA
that pulls out some data from Table1
, and some other tables in DatabaseA
'.
假设表Table1位于DatabaseA数据库中,Table2位于DatabaseB中。假设我有一个视图,在DatabaseA中从Table1中提取一些数据,以及在DatabaseA中提取一些其他表。
This view is used to push data to another database, let's call this one, unimaginatevely, DatabaseC
.
这个视图用于将数据推送到另一个数据库,让我们称之为,这是一个无法想象的数据库。
If i need some data from Table2
, my instinct is to join directly Table2
in this view, sort of like this table1 inner join DatabaseB..table2 on [some columns]
如果我需要来自Table2的一些数据,我的直觉是在这个视图中直接加入Table2,有点像这个table1内连接DatabaseB..table2 [some columns]
Doing this is pretty simple and quick, but i have a nagging voice in my head that keeps telling me not to do this. My worries are about not being able to track down all the objects depending on Table2
, so if I change something there, I have to be very carefull and remember everywhere i use this table. So, sort of like breaking SRP for this view (and two databases), because this view can change from two different actions (performed on two different databases: Changing Table1
or changing Table2
)
这样做非常简单快捷,但我头脑中有一种唠叨的声音,一直告诉我不要这样做。我担心的是无法根据Table2追踪所有对象,所以如果我在那里改变一些东西,我必须非常小心并记住我使用这个表的所有地方。所以,有点像打破这个视图(和两个数据库)的SRP,因为这个视图可以从两个不同的动作改变(在两个不同的数据库上执行:更改Table1或更改Table2)
I am interested in your opinions. Is this a good or bad idea? What would be the problems with this approach (performance wise, maintainence wise and so on) and if you have a real world experience where this approach either was a big mistake or was a life saver for you.
我对你的意见感兴趣。这是一个好主意还是坏主意?这种方法会出现什么问题(性能明智,维护明智等等),如果你有真实的世界经验,这种方法要么是一个大错误,要么是为你节省生命。
P.S: I've searched this topic on google and SO, but could not find anything related to this. I will gladly take the minus votes, duplicate questions and other 'reprimands' from SO users just to have a different view on this problem.
P.S:我在google和SO上搜索了这个主题,但找不到与此相关的任何内容。我很乐意接受来自SO用户的减号,重复问题和其他“谴责”,只是对这个问题有不同的看法。
P.P.S: I am using SQL Server 2005.
P.P.S:我正在使用SQL Server 2005。
Thank you and hope i made myself clear:)
谢谢你,希望我明白自己:)
5 个解决方案
#1
26
If they are on the same server, there is no real problem pulling from separate database. In fact, you may want to separate them for good reasons. For instance if you have a combination of transactional tables and lookup tables that are imported from files. The transactional data needs full recovery and frequent transactional log backups to be able to properly restore, the lookup data does not and can benefit from being in a database in simple recovery mode.
如果它们位于同一台服务器上,那么从单独的数据库中提取就没有问题。事实上,你可能想要将它们分开是有充分理由的。例如,如果您具有从文件导入的事务表和查找表的组合。事务数据需要完全恢复并且频繁的事务日志备份才能够正确恢复,查找数据不会并且可以从简单恢复模式下的数据库中受益。
We have many different databases our applications use and we cross databases in queries all the time. As long as the indexing is done properly, there has been no noticable performance difference. The biggest potential issue is for data integrity as you can't set up foreign keys across databases. This can be handled in triggers if need be though.
我们的应用程序使用了许多不同的数据库,并且我们一直在查询中交叉数据库。只要索引正确完成,就没有明显的性能差异。最大的潜在问题是数据完整性,因为您无法跨数据库设置外键。如果需要,可以在触发器中处理。
Now when the databases are on different servers, there can be a performance problem and getting the data is more complicated.
现在,当数据库位于不同的服务器上时,可能会出现性能问题并且数据变得更加复杂。
#2
11
Like everything else in SQL, it depends.
与SQL中的其他所有内容一样,它取决于。
At my job, we do this a LOT. We have very large data sets, and separate DBs for header and detail level records, then additional DBs for reports or tables that we build off of other data, etc etc.
在我的工作中,我们做了很多。我们有非常大的数据集,标题和详细级别记录的单独数据库,以及我们根据其他数据建立的报表或表格的其他数据库等。
There's not really a performance issue from joining across DBs, and in some cases depending on your hardware setup it may be FASTER. If DatabaseA and DatabaseB are on separate physical drives with different controllers, it will likely be faster to run a query joining those than if they were in the same DB on the same volume.
加入数据库并没有真正的性能问题,在某些情况下,根据您的硬件设置,它可能会更快。如果DatabaseA和DatabaseB位于具有不同控制器的单独物理驱动器上,则运行加入这些控制器的查询可能比在同一卷上的同一数据库中运行更快。
Maintenance can be an issue but no more than for any other database/tables. It's not like you have different versions of the same tables, you just have those tables in different DBs.
维护可能是一个问题,但不会超过任何其他数据库/表。它不像你有相同表的不同版本,你只是在不同的数据库中有这些表。
The only major drawback is SQL Server does a poor job of showing intra-database dependencies, so you will need to keep track of these yourself. There are some scripts for this and also third party utilities, and I have heard that SQL Server Denali will add additional support for this but I'm not sure if that's accurate.
唯一的主要缺点是SQL Server在显示数据库内部依赖性方面表现不佳,因此您需要自己跟踪这些依赖性。有一些脚本用于此以及第三方实用程序,我听说SQL Server Denali将为此添加额外支持但我不确定这是否准确。
#3
5
Your nagging voice is probably right.
你唠叨的声音可能是正确的。
Not least of the problems will be how to enforce declarative referential integrity since you cannot create foreign keys between databases, therefore sooner or later you will have to cope with inconsistent or mismatched or incomplete data.
最重要的问题是如何强制执行声明性引用完整性,因为您无法在数据库之间创建外键,因此迟早您将不得不应对不一致或不匹配或不完整的数据。
But if you don't care about that, I don't see a problem :-)
但如果你不关心,我没有看到问题:-)
#4
2
The answer to your questions is...it depends.
你的问题的答案是......这取决于你。
I have noticed that there is no serious degradation in performance when you keep the queries nice and simple (fewer join etc).
我注意到,当你保持查询的简洁(减少连接等)时,性能没有严重下降。
The more complex the queries, the more chance that the optimizer will produce a suboptimal execution plan.
查询越复杂,优化程序产生次优执行计划的可能性就越大。
The optimizer ultimately gets to decide how to execute the query. The more complex the query, the more opportunity for the optimizer to get the order of operations "wrong".
优化器最终决定如何执行查询。查询越复杂,优化器就越有机会使操作顺序“错误”。
I recently experimented with this problem...
我最近试验过这个问题......
I ran a query with roughly 8 joins on a single database. I then put up a copy of that database on the same server with a different name, and then I modified the query so that it would join to a couple tables in the second copy of the database.
我在一个数据库上运行了大约8个连接的查询。然后我在同一台服务器上用不同的名称放置了该数据库的副本,然后我修改了查询,以便它将连接到数据库的第二个副本中的几个表。
As a single database query, it ran in under 3 seconds; expected given the volume of data.
作为单个数据库查询,它在3秒内运行;预计给定数据量。
The cross database joined query run in just under 3 minutes.
十字架数据库加入查询运行不到3分钟。
enter code here
#5
1
Some general themes re cross-database joins:
一些通用主题是跨数据库连接:
Foreign keys
外键
As others have pointed out, in the absence of foreign keys, you'll need to roll your own referential integrity. Not a problem in itself, but issues can surface when you're not in control of the data in one or more of the databases.
正如其他人所指出的那样,在没有外键的情况下,你需要推出自己的参照完整性。本身不是问题,但是当您无法控制一个或多个数据库中的数据时,问题就会出现。
A related issue is the use of CASE tools. When reverse-engineering a schema, they will overlook links between tables where a FK->PK relationship doesn't exist.
相关问题是使用CASE工具。对模式进行逆向工程时,它们将忽略不存在FK-> PK关系的表之间的链接。
Performance
性能
If the database are on different servers then you're exposed to the vagaries of whatever else is running on those servers as well as the cost of running the join operation itself. Again, if the servers are all within your control, this is something you can monitor but this may may not be the case.
如果数据库位于不同的服务器上,那么您将暴露于这些服务器上运行的任何其他内容的变幻莫测以及运行连接操作本身的成本。同样,如果服务器都在您的控制之内,那么您可以监控这些情况,但情况可能并非如此。
Coupling
耦合
If your solution relies on other databases you have multiple points of failure. If a database goes down, this could cascade to one or more systems.
如果您的解决方案依赖于其他数据库,则会出现多个故障点。如果数据库出现故障,则可能会级联到一个或多个系统。
Data modification
数据修改
Your solution may be coupled to what you believe to be static data in tables on another database. However, what if this were accidentally (or purposefully) amended, duplicated or deleted. Again, if the databases in question are out of your remit, other teams/departments may not be aware of how your system operates.
您的解决方案可能与您认为是另一个数据库中的表中的静态数据相关联。但是,如果意外(或有目的地)修改,复制或删除了该怎么办。同样,如果有问题的数据库不在您的职权范围内,其他团队/部门可能也不了解您的系统如何运作。
All this being, true, there are many cases where cross-database joins are the norm. A few examples I've seen:
所有这些都是,确实,在许多情况下,跨数据库连接是常态。我见过的一些例子:
Mart-Repository
沃尔玛的仓库
Performant operations take place on the mart whilst the master data stash is kept on the repository. CRUD operations take place between the two on a frequent or infrequent basis (nightly update, real-time etc).
当主数据存储保存在存储库中时,在市场上执行高性能操作。 CRUD操作在两者之间频繁或不频繁地进行(夜间更新,实时等)。
Legacy DB
遗留数据库
You might expose a legacy database for data migration and or reporting/auditing purposes.
您可能会公开旧数据库以进行数据迁移和/或报告/审核。
Lookup
抬头
One or more of your databases may contain static lookup information which can be re-used.
您的一个或多个数据库可能包含可以重复使用的静态查找信息。
So to answer your question - it depends on what exactly you're doing and whether the risk is acceptable. Other solutions exist such as replication but again, how feasible this is will depend on the structure of your department/company.
所以回答你的问题 - 这取决于你究竟在做什么以及风险是否可以接受。存在其他解决方案,例如复制,但是,这又取决于您的部门/公司的结构。
#1
26
If they are on the same server, there is no real problem pulling from separate database. In fact, you may want to separate them for good reasons. For instance if you have a combination of transactional tables and lookup tables that are imported from files. The transactional data needs full recovery and frequent transactional log backups to be able to properly restore, the lookup data does not and can benefit from being in a database in simple recovery mode.
如果它们位于同一台服务器上,那么从单独的数据库中提取就没有问题。事实上,你可能想要将它们分开是有充分理由的。例如,如果您具有从文件导入的事务表和查找表的组合。事务数据需要完全恢复并且频繁的事务日志备份才能够正确恢复,查找数据不会并且可以从简单恢复模式下的数据库中受益。
We have many different databases our applications use and we cross databases in queries all the time. As long as the indexing is done properly, there has been no noticable performance difference. The biggest potential issue is for data integrity as you can't set up foreign keys across databases. This can be handled in triggers if need be though.
我们的应用程序使用了许多不同的数据库,并且我们一直在查询中交叉数据库。只要索引正确完成,就没有明显的性能差异。最大的潜在问题是数据完整性,因为您无法跨数据库设置外键。如果需要,可以在触发器中处理。
Now when the databases are on different servers, there can be a performance problem and getting the data is more complicated.
现在,当数据库位于不同的服务器上时,可能会出现性能问题并且数据变得更加复杂。
#2
11
Like everything else in SQL, it depends.
与SQL中的其他所有内容一样,它取决于。
At my job, we do this a LOT. We have very large data sets, and separate DBs for header and detail level records, then additional DBs for reports or tables that we build off of other data, etc etc.
在我的工作中,我们做了很多。我们有非常大的数据集,标题和详细级别记录的单独数据库,以及我们根据其他数据建立的报表或表格的其他数据库等。
There's not really a performance issue from joining across DBs, and in some cases depending on your hardware setup it may be FASTER. If DatabaseA and DatabaseB are on separate physical drives with different controllers, it will likely be faster to run a query joining those than if they were in the same DB on the same volume.
加入数据库并没有真正的性能问题,在某些情况下,根据您的硬件设置,它可能会更快。如果DatabaseA和DatabaseB位于具有不同控制器的单独物理驱动器上,则运行加入这些控制器的查询可能比在同一卷上的同一数据库中运行更快。
Maintenance can be an issue but no more than for any other database/tables. It's not like you have different versions of the same tables, you just have those tables in different DBs.
维护可能是一个问题,但不会超过任何其他数据库/表。它不像你有相同表的不同版本,你只是在不同的数据库中有这些表。
The only major drawback is SQL Server does a poor job of showing intra-database dependencies, so you will need to keep track of these yourself. There are some scripts for this and also third party utilities, and I have heard that SQL Server Denali will add additional support for this but I'm not sure if that's accurate.
唯一的主要缺点是SQL Server在显示数据库内部依赖性方面表现不佳,因此您需要自己跟踪这些依赖性。有一些脚本用于此以及第三方实用程序,我听说SQL Server Denali将为此添加额外支持但我不确定这是否准确。
#3
5
Your nagging voice is probably right.
你唠叨的声音可能是正确的。
Not least of the problems will be how to enforce declarative referential integrity since you cannot create foreign keys between databases, therefore sooner or later you will have to cope with inconsistent or mismatched or incomplete data.
最重要的问题是如何强制执行声明性引用完整性,因为您无法在数据库之间创建外键,因此迟早您将不得不应对不一致或不匹配或不完整的数据。
But if you don't care about that, I don't see a problem :-)
但如果你不关心,我没有看到问题:-)
#4
2
The answer to your questions is...it depends.
你的问题的答案是......这取决于你。
I have noticed that there is no serious degradation in performance when you keep the queries nice and simple (fewer join etc).
我注意到,当你保持查询的简洁(减少连接等)时,性能没有严重下降。
The more complex the queries, the more chance that the optimizer will produce a suboptimal execution plan.
查询越复杂,优化程序产生次优执行计划的可能性就越大。
The optimizer ultimately gets to decide how to execute the query. The more complex the query, the more opportunity for the optimizer to get the order of operations "wrong".
优化器最终决定如何执行查询。查询越复杂,优化器就越有机会使操作顺序“错误”。
I recently experimented with this problem...
我最近试验过这个问题......
I ran a query with roughly 8 joins on a single database. I then put up a copy of that database on the same server with a different name, and then I modified the query so that it would join to a couple tables in the second copy of the database.
我在一个数据库上运行了大约8个连接的查询。然后我在同一台服务器上用不同的名称放置了该数据库的副本,然后我修改了查询,以便它将连接到数据库的第二个副本中的几个表。
As a single database query, it ran in under 3 seconds; expected given the volume of data.
作为单个数据库查询,它在3秒内运行;预计给定数据量。
The cross database joined query run in just under 3 minutes.
十字架数据库加入查询运行不到3分钟。
enter code here
#5
1
Some general themes re cross-database joins:
一些通用主题是跨数据库连接:
Foreign keys
外键
As others have pointed out, in the absence of foreign keys, you'll need to roll your own referential integrity. Not a problem in itself, but issues can surface when you're not in control of the data in one or more of the databases.
正如其他人所指出的那样,在没有外键的情况下,你需要推出自己的参照完整性。本身不是问题,但是当您无法控制一个或多个数据库中的数据时,问题就会出现。
A related issue is the use of CASE tools. When reverse-engineering a schema, they will overlook links between tables where a FK->PK relationship doesn't exist.
相关问题是使用CASE工具。对模式进行逆向工程时,它们将忽略不存在FK-> PK关系的表之间的链接。
Performance
性能
If the database are on different servers then you're exposed to the vagaries of whatever else is running on those servers as well as the cost of running the join operation itself. Again, if the servers are all within your control, this is something you can monitor but this may may not be the case.
如果数据库位于不同的服务器上,那么您将暴露于这些服务器上运行的任何其他内容的变幻莫测以及运行连接操作本身的成本。同样,如果服务器都在您的控制之内,那么您可以监控这些情况,但情况可能并非如此。
Coupling
耦合
If your solution relies on other databases you have multiple points of failure. If a database goes down, this could cascade to one or more systems.
如果您的解决方案依赖于其他数据库,则会出现多个故障点。如果数据库出现故障,则可能会级联到一个或多个系统。
Data modification
数据修改
Your solution may be coupled to what you believe to be static data in tables on another database. However, what if this were accidentally (or purposefully) amended, duplicated or deleted. Again, if the databases in question are out of your remit, other teams/departments may not be aware of how your system operates.
您的解决方案可能与您认为是另一个数据库中的表中的静态数据相关联。但是,如果意外(或有目的地)修改,复制或删除了该怎么办。同样,如果有问题的数据库不在您的职权范围内,其他团队/部门可能也不了解您的系统如何运作。
All this being, true, there are many cases where cross-database joins are the norm. A few examples I've seen:
所有这些都是,确实,在许多情况下,跨数据库连接是常态。我见过的一些例子:
Mart-Repository
沃尔玛的仓库
Performant operations take place on the mart whilst the master data stash is kept on the repository. CRUD operations take place between the two on a frequent or infrequent basis (nightly update, real-time etc).
当主数据存储保存在存储库中时,在市场上执行高性能操作。 CRUD操作在两者之间频繁或不频繁地进行(夜间更新,实时等)。
Legacy DB
遗留数据库
You might expose a legacy database for data migration and or reporting/auditing purposes.
您可能会公开旧数据库以进行数据迁移和/或报告/审核。
Lookup
抬头
One or more of your databases may contain static lookup information which can be re-used.
您的一个或多个数据库可能包含可以重复使用的静态查找信息。
So to answer your question - it depends on what exactly you're doing and whether the risk is acceptable. Other solutions exist such as replication but again, how feasible this is will depend on the structure of your department/company.
所以回答你的问题 - 这取决于你究竟在做什么以及风险是否可以接受。存在其他解决方案,例如复制,但是,这又取决于您的部门/公司的结构。