如何将新数据附加到现有的配置单元表

时间:2021-01-09 14:01:24

How to append the records to existing partitioned Hive table? For example I have existing external Table called "ip_country" and dataset is testdata1. If dataset grows say like my dataset in next day is testdata1 and testdata2 then how to append new data i.e.., "testdata2" to "ip_country" hive table.

如何将记录追加到现有的分区Hive表?例如,我有一个名为“ip_country”的外部表,数据集是testdata1。如果数据集增长说像我的数据集在第二天是testdata1和testdata2那么如何将新数据,即“,testdata2”附加到“ip_country”hive表。

1 个解决方案

#1


It can be achieved in couple of ways (Purely depends on your requirement)

它可以通过几种方式实现(完全取决于您的要求)

  1. If you don't bother about overwriting the existing records in the partition, (I mean you don't have a big history data, say 10 yrs data), then Insert Overwrite might fit.
  2. 如果你不打扰覆盖分区中的现有记录,(我的意思是你没有大的历史数据,比如10年的数据),那么插入覆盖可能是合适的。

INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 FROM from_statement;

INSERT OVERWRITE TABLE tablename1 [PARTITION(partcol1 = val1,partcol2 = val2 ...)[IF NOT EXISTS]] select_statement1 FROM from_statement;

  1. If you don't bother about duplicates in the partition, then Insert Into might fit (Honestly I wudn't prefer to have duplicate records).
  2. 如果你不打扰分区中的重复项,那么Insert Into可能适合(老实说,我不喜欢有重复的记录)。

INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement;

INSERT INTO TABLE tablename1 [PARTITION(partcol1 = val1,partcol2 = val2 ...)] select_statement1 FROM from_statement;

  1. If you have history data plus Incremental data, then History data can be inserted once and the incremental data(based on the frequency that you choose daily/weekly/fortnightly basis) can be inserted using a Insert Overwrite
  2. 如果您有历史数据和增量数据,则可以插入历史数据一次,并且可以使用插入覆盖插入增量数据(基于您每天/每周/每两周选择的频率)

#1


It can be achieved in couple of ways (Purely depends on your requirement)

它可以通过几种方式实现(完全取决于您的要求)

  1. If you don't bother about overwriting the existing records in the partition, (I mean you don't have a big history data, say 10 yrs data), then Insert Overwrite might fit.
  2. 如果你不打扰覆盖分区中的现有记录,(我的意思是你没有大的历史数据,比如10年的数据),那么插入覆盖可能是合适的。

INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 FROM from_statement;

INSERT OVERWRITE TABLE tablename1 [PARTITION(partcol1 = val1,partcol2 = val2 ...)[IF NOT EXISTS]] select_statement1 FROM from_statement;

  1. If you don't bother about duplicates in the partition, then Insert Into might fit (Honestly I wudn't prefer to have duplicate records).
  2. 如果你不打扰分区中的重复项,那么Insert Into可能适合(老实说,我不喜欢有重复的记录)。

INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement;

INSERT INTO TABLE tablename1 [PARTITION(partcol1 = val1,partcol2 = val2 ...)] select_statement1 FROM from_statement;

  1. If you have history data plus Incremental data, then History data can be inserted once and the incremental data(based on the frequency that you choose daily/weekly/fortnightly basis) can be inserted using a Insert Overwrite
  2. 如果您有历史数据和增量数据,则可以插入历史数据一次,并且可以使用插入覆盖插入增量数据(基于您每天/每周/每两周选择的频率)