How to append the records to existing partitioned Hive table? For example I have existing external Table called "ip_country" and dataset is testdata1. If dataset grows say like my dataset in next day is testdata1 and testdata2 then how to append new data i.e.., "testdata2" to "ip_country" hive table.
如何将记录追加到现有的分区Hive表?例如,我有一个名为“ip_country”的外部表,数据集是testdata1。如果数据集增长说像我的数据集在第二天是testdata1和testdata2那么如何将新数据,即“,testdata2”附加到“ip_country”hive表。
1 个解决方案
#1
It can be achieved in couple of ways (Purely depends on your requirement)
它可以通过几种方式实现(完全取决于您的要求)
- If you don't bother about overwriting the existing records in the partition, (I mean you don't have a big history data, say 10 yrs data), then Insert Overwrite might fit.
如果你不打扰覆盖分区中的现有记录,(我的意思是你没有大的历史数据,比如10年的数据),那么插入覆盖可能是合适的。
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 FROM from_statement;
INSERT OVERWRITE TABLE tablename1 [PARTITION(partcol1 = val1,partcol2 = val2 ...)[IF NOT EXISTS]] select_statement1 FROM from_statement;
- If you don't bother about duplicates in the partition, then Insert Into might fit (Honestly I wudn't prefer to have duplicate records).
如果你不打扰分区中的重复项,那么Insert Into可能适合(老实说,我不喜欢有重复的记录)。
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement;
INSERT INTO TABLE tablename1 [PARTITION(partcol1 = val1,partcol2 = val2 ...)] select_statement1 FROM from_statement;
- If you have history data plus Incremental data, then History data can be inserted once and the incremental data(based on the frequency that you choose daily/weekly/fortnightly basis) can be inserted using a Insert Overwrite
如果您有历史数据和增量数据,则可以插入历史数据一次,并且可以使用插入覆盖插入增量数据(基于您每天/每周/每两周选择的频率)
#1
It can be achieved in couple of ways (Purely depends on your requirement)
它可以通过几种方式实现(完全取决于您的要求)
- If you don't bother about overwriting the existing records in the partition, (I mean you don't have a big history data, say 10 yrs data), then Insert Overwrite might fit.
如果你不打扰覆盖分区中的现有记录,(我的意思是你没有大的历史数据,比如10年的数据),那么插入覆盖可能是合适的。
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 FROM from_statement;
INSERT OVERWRITE TABLE tablename1 [PARTITION(partcol1 = val1,partcol2 = val2 ...)[IF NOT EXISTS]] select_statement1 FROM from_statement;
- If you don't bother about duplicates in the partition, then Insert Into might fit (Honestly I wudn't prefer to have duplicate records).
如果你不打扰分区中的重复项,那么Insert Into可能适合(老实说,我不喜欢有重复的记录)。
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement;
INSERT INTO TABLE tablename1 [PARTITION(partcol1 = val1,partcol2 = val2 ...)] select_statement1 FROM from_statement;
- If you have history data plus Incremental data, then History data can be inserted once and the incremental data(based on the frequency that you choose daily/weekly/fortnightly basis) can be inserted using a Insert Overwrite
如果您有历史数据和增量数据,则可以插入历史数据一次,并且可以使用插入覆盖插入增量数据(基于您每天/每周/每两周选择的频率)