如何验证两个表是否具有完全相同的数据?

时间:2022-09-29 14:03:10

Basically we have one table (original table) and it is backed up into another table (backup table); thus the two tables have exactly the same schema.

基本上我们有一个表(原始表),它被备份到另一个表(备份表);因此这两个表具有完全相同的模式。

At the beginning both tables (original table and backup table) contains exactly the same set of data. After sometime for some reason I need to verify whether dataset in the original table has changed or not.

在开始时,两个表(原始表和备份表)包含完全相同的数据集。由于某种原因,我需要验证原始表中的数据集是否已更改。

In order to do this I have to compare the dataset in the original table against the backup table.

为此,我必须将原始表中的数据集与备份表进行比较。

Let's say the original table has the following schema:

假设原始表具有以下模式:

create table LemmasMapping (
   lemma1 int,
   lemma2 int,
   index ix_lemma1 using btree (lemma1),
   index ix_lemma2 using btree (lemma2)
)

How could I achieve the dataset comparison?

我怎样才能实现数据集比较?

Update: the table does not have a primary key. It simply stores mappings between two ids.

更新:表没有主键。它只是存储两个ID之间的映射。

7 个解决方案

#1


14  

I would write three queries.

我会写三个查询。

  1. An inner join to pick up the rows where the primary key exists in both tables, but there is a difference in the value of one or more of the other columns. This would pick up changed rows in original.

    一个内部联接,用于获取两个表中存在主键的行,但其中一个或多个列的值存在差异。这会拾取原始的更改行。

  2. A left outer join to pick up the rows that are in the original tables, but not in the backup table (i.e. a row in original has a primary key that does not exist in backup). This would return rows inserted into original.

    左外连接用于拾取原始表中的行,但不在备份表中(即原始行中的行具有备份中不存在的主键)。这将返回插入原始行的行。

  3. A right outer join to pick up the rows in backup which no longer exist in original. This would return rows that have been deleted from original.

    右外连接用于拾取备份中不再存在原始行的行。这将返回已从原始文件中删除的行。

You could union the three queries together to return a single result set. If you did this you would need to add a column to indicate what type of row it is (updated, inserted or deleted).

您可以将三个查询组合在一起以返回单个结果集。如果您这样做,您将需要添加一列来指示它是什么类型的行(更新,插入或删除)。

With a bit of effort you might be able to do this in one query using a full outer join. Be careful with outer joins, as they behave differently in different SQL engines. Predicates put in the where clause, instead of the join clause can sometimes turn your outer join into an inner join.

通过一些努力,您可以使用完全外部联接在一个查询中执行此操作。注意外连接,因为它们在不同的SQL引擎中表现不同。放在where子句中的谓词,而不是join子句有时可以将外连接转换为内连接。

#2


30  

You can just use CHECKSUM TABLE and compare the results. You can even alter the table to enable live checksums so that they are continuously available.

您可以使用CHECKSUM TABLE并比较结果。您甚至可以更改表以启用实时校验和,以便它们可以持续可用。

CHECKSUM TABLE original_table, backup_table;

It doesn't require the tables to have a primary key.

它不要求表具有主键。

#3


15  

SELECT * FROM Table1
UNION
SELECT * FROM Table2

If you get records greater than any of two tables, they don't have same data.

如果您获得的记录大于两个表中的任何一个,则它们没有相同的数据。

#4


1  

select count(*) 
from lemmas as original_table 
      full join backup_table using (lemma_id)
where backup_table.lemma_id is null
      or original_table.lemma_id is null
      or original_table.lemma != backup_table.lemma

The full join / check for null should cover additions or deletions as well as changes.

null的完整联接/检查应涵盖添加或删除以及更改。

  • backup.id is null = addition
  • backup.id为null =加法
  • original.id is null = deletion
  • original.id为null =删除
  • neither null = change
  • 既不是null =改变

#5


1  

Try the following to compare two tables:

请尝试以下方法来比较两个表:

SELECT 'different' FROM DUAL WHERE EXISTS(
    SELECT * FROM (
        SELECT /*DISTINCT*/ +1 AS chk,a.c1,a.c2,a.c3 FROM a
        UNION ALL
        SELECT /*DISTINCT*/ +1 AS chk,b.c1,b.c2,b.c3 FROM b
    ) c
    GROUP BY c1,c2,c3
    HAVING SUM(chk)<>2
)
UNION SELECT 'equal' FROM DUAL
LIMIT 1;

#6


0  

For the lazier or more SQL-averse developer working with MS SQL Server, I would recommend SQL Delta (www.sqldelta.com) for this and any other database-diff type work. It has a great GUI, is quick and accurate and can diff all database objects, generate and run the necessary change scripts, synchronise entire databases. Its the next best thing to a DBA ;-)

对于使用MS SQL Server的懒惰或更厌恶SQL的开发人员,我建议使用SQL Delta(www.sqldelta.com)以及任何其他数据库差异类型的工作。它具有出色的GUI,快速准确,可以区分所有数据库对象,生成并运行必要的更改脚本,同步整个数据库。这是DBA的下一个最好的事情;-)

I think there is a similar tool available from RedGate called SQL Compare. I believe some editions of the latest version of Visual Studio (2010) also include a very similar tool.

我认为RedGate提供了一个名为SQL Compare的类似工具。我相信最新版本的Visual Studio(2010)的某些版本也包含一个非常相似的工具。

#7


0  

Please try the following method for determining if two tables are exactly the same, when there is no primary key of any kind and there are no duplicate rows within a table, using the below logic:

请尝试以下方法,以确定两个表是否完全相同,当没有任何类型的主键且表中没有重复的行时,使用以下逻辑:

Step 1 - Test for Duplicate Rows on TABLEA

步骤1 - 在TABLEA上测试重复行

If SELECT DISTINCT * FROM TABLEA

如果SELECT DISTINCT * FROM TABLEA

has the same row count as

具有相同的行数

SELECT * FROM TABLEA

SELECT * FROM TABLEA

then go to the next step, otherwise you can't use this method...

然后转到下一步,否则你不能使用这个方法......

Step 2 - Test for Duplicate Rows on TABLEB

第2步 - 在TABLEB上测试重复行

If SELECT DISTINCT * FROM TABLEB

如果SELECT DISTINCT * FROM TABLEB

has the same row count as

具有相同的行数

SELECT * FROM TABLEB

SELECT * FROM TABLEB

then go to the next step, else you can't use this method...

然后转到下一步,否则你不能使用这个方法......

Step 3 - INNER JOIN TABLEA to TABLEB on every column

第3步 - 在每列上INNER JOIN TABLEA到TABLEB

If the row count of the below query has the same row count as the row counts from Steps 1 and 2, then the tables are the same:

如果以下查询的行计数与步骤1和2中的行计数具有相同的行计数,则表格相同:

SELECT
*

FROM
TABLEA

INNER JOIN TABLEA ON
TABLEA.column1 = TABLEB.column1
AND TABLEA.column2 = TABLEB.column2
AND TABLEA.column3 = TABLEB.column3 
--etc...for every column

Note that this method doesn't necessarily test for different data types, and probably won't work on non-joinable data types (like VARBINARY)

请注意,此方法不一定测试不同的数据类型,并且可能不适用于不可连接的数据类型(如VARBINARY)

Feedback welcome!

欢迎反馈!

#1


14  

I would write three queries.

我会写三个查询。

  1. An inner join to pick up the rows where the primary key exists in both tables, but there is a difference in the value of one or more of the other columns. This would pick up changed rows in original.

    一个内部联接,用于获取两个表中存在主键的行,但其中一个或多个列的值存在差异。这会拾取原始的更改行。

  2. A left outer join to pick up the rows that are in the original tables, but not in the backup table (i.e. a row in original has a primary key that does not exist in backup). This would return rows inserted into original.

    左外连接用于拾取原始表中的行,但不在备份表中(即原始行中的行具有备份中不存在的主键)。这将返回插入原始行的行。

  3. A right outer join to pick up the rows in backup which no longer exist in original. This would return rows that have been deleted from original.

    右外连接用于拾取备份中不再存在原始行的行。这将返回已从原始文件中删除的行。

You could union the three queries together to return a single result set. If you did this you would need to add a column to indicate what type of row it is (updated, inserted or deleted).

您可以将三个查询组合在一起以返回单个结果集。如果您这样做,您将需要添加一列来指示它是什么类型的行(更新,插入或删除)。

With a bit of effort you might be able to do this in one query using a full outer join. Be careful with outer joins, as they behave differently in different SQL engines. Predicates put in the where clause, instead of the join clause can sometimes turn your outer join into an inner join.

通过一些努力,您可以使用完全外部联接在一个查询中执行此操作。注意外连接,因为它们在不同的SQL引擎中表现不同。放在where子句中的谓词,而不是join子句有时可以将外连接转换为内连接。

#2


30  

You can just use CHECKSUM TABLE and compare the results. You can even alter the table to enable live checksums so that they are continuously available.

您可以使用CHECKSUM TABLE并比较结果。您甚至可以更改表以启用实时校验和,以便它们可以持续可用。

CHECKSUM TABLE original_table, backup_table;

It doesn't require the tables to have a primary key.

它不要求表具有主键。

#3


15  

SELECT * FROM Table1
UNION
SELECT * FROM Table2

If you get records greater than any of two tables, they don't have same data.

如果您获得的记录大于两个表中的任何一个,则它们没有相同的数据。

#4


1  

select count(*) 
from lemmas as original_table 
      full join backup_table using (lemma_id)
where backup_table.lemma_id is null
      or original_table.lemma_id is null
      or original_table.lemma != backup_table.lemma

The full join / check for null should cover additions or deletions as well as changes.

null的完整联接/检查应涵盖添加或删除以及更改。

  • backup.id is null = addition
  • backup.id为null =加法
  • original.id is null = deletion
  • original.id为null =删除
  • neither null = change
  • 既不是null =改变

#5


1  

Try the following to compare two tables:

请尝试以下方法来比较两个表:

SELECT 'different' FROM DUAL WHERE EXISTS(
    SELECT * FROM (
        SELECT /*DISTINCT*/ +1 AS chk,a.c1,a.c2,a.c3 FROM a
        UNION ALL
        SELECT /*DISTINCT*/ +1 AS chk,b.c1,b.c2,b.c3 FROM b
    ) c
    GROUP BY c1,c2,c3
    HAVING SUM(chk)<>2
)
UNION SELECT 'equal' FROM DUAL
LIMIT 1;

#6


0  

For the lazier or more SQL-averse developer working with MS SQL Server, I would recommend SQL Delta (www.sqldelta.com) for this and any other database-diff type work. It has a great GUI, is quick and accurate and can diff all database objects, generate and run the necessary change scripts, synchronise entire databases. Its the next best thing to a DBA ;-)

对于使用MS SQL Server的懒惰或更厌恶SQL的开发人员,我建议使用SQL Delta(www.sqldelta.com)以及任何其他数据库差异类型的工作。它具有出色的GUI,快速准确,可以区分所有数据库对象,生成并运行必要的更改脚本,同步整个数据库。这是DBA的下一个最好的事情;-)

I think there is a similar tool available from RedGate called SQL Compare. I believe some editions of the latest version of Visual Studio (2010) also include a very similar tool.

我认为RedGate提供了一个名为SQL Compare的类似工具。我相信最新版本的Visual Studio(2010)的某些版本也包含一个非常相似的工具。

#7


0  

Please try the following method for determining if two tables are exactly the same, when there is no primary key of any kind and there are no duplicate rows within a table, using the below logic:

请尝试以下方法,以确定两个表是否完全相同,当没有任何类型的主键且表中没有重复的行时,使用以下逻辑:

Step 1 - Test for Duplicate Rows on TABLEA

步骤1 - 在TABLEA上测试重复行

If SELECT DISTINCT * FROM TABLEA

如果SELECT DISTINCT * FROM TABLEA

has the same row count as

具有相同的行数

SELECT * FROM TABLEA

SELECT * FROM TABLEA

then go to the next step, otherwise you can't use this method...

然后转到下一步,否则你不能使用这个方法......

Step 2 - Test for Duplicate Rows on TABLEB

第2步 - 在TABLEB上测试重复行

If SELECT DISTINCT * FROM TABLEB

如果SELECT DISTINCT * FROM TABLEB

has the same row count as

具有相同的行数

SELECT * FROM TABLEB

SELECT * FROM TABLEB

then go to the next step, else you can't use this method...

然后转到下一步,否则你不能使用这个方法......

Step 3 - INNER JOIN TABLEA to TABLEB on every column

第3步 - 在每列上INNER JOIN TABLEA到TABLEB

If the row count of the below query has the same row count as the row counts from Steps 1 and 2, then the tables are the same:

如果以下查询的行计数与步骤1和2中的行计数具有相同的行计数,则表格相同:

SELECT
*

FROM
TABLEA

INNER JOIN TABLEA ON
TABLEA.column1 = TABLEB.column1
AND TABLEA.column2 = TABLEB.column2
AND TABLEA.column3 = TABLEB.column3 
--etc...for every column

Note that this method doesn't necessarily test for different data types, and probably won't work on non-joinable data types (like VARBINARY)

请注意,此方法不一定测试不同的数据类型,并且可能不适用于不可连接的数据类型(如VARBINARY)

Feedback welcome!

欢迎反馈!