Oracle多模式聚合实时视图

All,

Looking for some guidance on an Oracle design decision I am currently trying to evaluate:

寻找有关我正在尝试评估的Oracle设计决策的一些指导:

The problem

I have data in three separate schemas on the same oracle db server. I am looking to build an application that will show data from all three schemas, however the data that is shown will be based on real time sorting and prioritisation rules that is applied to the data globally (i.e.: based on the priority weightings applied I may pull back data from any one of the three schemas).

我在同一个oracle db服务器上有三个独立模式的数据。我正在构建一个应用程序,它将显示来自所有三个模式的数据,但是显示的数据将基于全局应用于数据的实时排序和优先级排序规则(即:基于我应用的优先级权重)从三个模式中的任何一个中拉回数据)。

Tentative Solution

Create a VIEW in the DB which maintains logical links to the relevant columns in the three schemas, write a stored procedure which accepts parameterised priority weightings. The application subsequently calls the stored procedure to select the ‘prioritised’ row from the view and then queries the associated schema directly for additional data based on the row returned.

在DB中创建一个VIEW,它维护到三个模式中相关列的逻辑链接,编写一个接受参数化优先级权重的存储过程。应用程序随后调用存储过程从视图中选择“优先级”行,然后根据返回的行直接查询关联的模式以获取其他数据。

I have concerns over performance where the data is being sorted/ prioritised upon each query being performed but cannot see a way around this as the prioritisation rules will change often. We are talking of data sets in the region of 2-3 million rows per schema.

我担心性能会在执行每个查询时对数据进行排序/优先级排序,但由于优先级规则会经常更改,因此无法找到解决方法。我们讨论的是每个模式2-3百万行的数据集。

Does anyone have alternative suggestions on how to provide an aggregated and sorted view over the data?

有没有人有关于如何提供数据的聚合和排序视图的替代建议?

3 个解决方案

#1

Querying from multiple schemas (or even multiple databases) is not really a big deal, even inside the same query. Just prepend the table name with the schema you are interested in, as in

即使在同一个查询中,查询多个模式(甚至多个数据库)也不是什么大问题。只需在表名中添加您感兴趣的模式,如下所示

SELECT SOMETHING
FROM
  SCHEMA1.SOME_TABLE ST1, SCHEMA2.SOME_TABLE ST2
WHERE ST1.PK_FIELD = ST2.PK_FIELD

If performance becomes a problem, then that is a big topic... optimal query plans, indexes, and your method of database connection can all come into play. One thing that comes to mind is that if it does not have to be realtime, then you could use materialized views (aka "snapshots") to cache the data in a single place. Then you could query that with reasonable performance.

如果性能成为一个问题,那么这是一个很大的话题......最佳的查询计划,索引和数据库连接方法都可以发挥作用。我想到的一件事是,如果它不必是实时的,那么您可以使用物化视图(也称为“快照”)将数据缓存在一个地方。然后你可以用合理的性能查询它。

Just set the snapshots to refresh at an interval appropriate to your needs.

只需将快照设置为以适合您需要的间隔刷新。

#2

It doesn't matter that the data is from 3 schemas, really. What's important to know is how frequently the data will change, how often the criteria will change, and how frequently it will be queried.

真的,数据来自3个模式并不重要。重要的是要知道数据的变化频率,标准的变化频率以及查询的频率。

If there is a finite set of criteria (that is, the data will be viewed in a limited number of ways) which only change every few days and it will be queried like crazy, you should probably look at materialized views.

如果存在一组有限的标准(即,数据将以有限的方式查看),这些标准每隔几天就会更改一次并且会像疯了一样被查询,那么您应该查看物化视图。

If the criteria is nearly infinite, then there's no point making materialized views since they won't likely be reused. The same holds true if the criteria itself changes extremely frequently, the data in a materialized view wouldn't help in this case either.

如果标准几乎是无限的,那么制作物化视图是没有意义的,因为它们不可能被重用。如果标准本身变化非常频繁,那么物化视图中的数据也无助于此情况。

The other question that's unanswered is how often the source data is updated, and how important is it to have the newest information. Frequently updated source day can either mean a materialized view will get "stale" for some duration or you may be spending a lot of time refreshing the materialized views unnecessarily to keep the data "fresh".

另一个没有答案的问题是源数据的更新频率,以及获取最新信息的重要性。经常更新的源日可能意味着物化视图将在一段时间内“陈旧”,或者您可能花费大量时间不必要地刷新物化视图以保持数据“新鲜”。

Honestly, 2-3 million records isn't a lot for Oracle anymore, given sufficient hardware. I would probably benchmark simple dynamic queries first before attempting fancy (materialized) view.

老实说,如果有足够的硬件,那么对于Oracle来说,2-3百万条记录就不算多了。在尝试花哨(物化)视图之前,我可能会首先对简单的动态查询进行基准测试。

#3

As others have said, querying a couple of million rows in Oracle is not really a problem, but then that depends on how often you are doing it - every tenth of a second may cause some load on the db server!

正如其他人所说,在Oracle中查询几百万行并不是一个真正的问题,但那取决于你执行它的频率 - 每十分之一秒可能会导致数据库服务器上的一些负载!

Without more details of your business requirements and a good model of your data its always difficult to provide good performance ideas. It usually comes down to coming up with a theory, then trying it against your database and accessing if it is "fast enough".

如果没有更多有关业务需求的详细信息以及良好的数据模型,则始终难以提供良好的性能建议。它通常归结为提出一个理论,然后针对您的数据库进行尝试,并在“足够快”时进行访问。

It may also be worth you taking a step back and asking yourself how accurate the results need to be. Does the business really need exact values for this query or are good estimates acceptable

您可能还值得退后一步,并询问自己结果的准确程度。业务是否确实需要此查询的确切值,或者是可接受的良好估计值

Tom Kyte (of Ask Tom fame) always has some interesting ideas (and actual facts) in these areas. This article describes generating a proper dynamic search query - but Tom points out that when you query Google it never tries to get the exact number of hits for a query - it gives you a guess. If you can apply a good estimate then you can really improve query performance times

Tom Kyte(Ask Tom成名)总是在这些领域有一些有趣的想法(和实际事实)。本文介绍了如何生成正确的动态搜索查询 - 但Tom指出,当您查询Google时,它从不尝试获取查询的确切命中数 - 它会让您猜测。如果您可以应用一个好的估计,那么您可以真正提高查询性能时间

#1