MYSQL之 1）临时表的应用场景 2）分表后的merge存储引擎查询应用场景

实际工作用到的，放这里总结一下，容易健忘，mark一下：

1）临时表：在网上搜索了一下，对比一下临时表和内存表：

临时表的表结构和数据都是储存在内存中的，session开始到结束是它的生命周期。
内存表的表结构是存储到数据库上的，数据放在内存中，网上推荐使用内存表，但我觉得我的应用场景更合适临时表。

我的需求：
一个表中存储了孩子、双亲的vcf数据（测序数据，每一个样本的单体数据百万行），我需要将孩子的primary key 和双亲的primary key 做对比（注意，primary key 是 PRIMARY KEY(col1,col2,col3,col4)形式，需要对比col1,2,3相等，col4不相等），将孩子与父亲对比匹配的数据和与母亲对比匹配数据分别拎出来，然后再inner join。这个其实一句sql是可以解决的，但是，由于都在一个表中，并且数据是千万行级别的，耗时可想而知。
经过种种尝试，最终选择了临时表：
先把孩子/父亲/母亲三者的vcf数据从同一张table中拎出来，放到临时表中

querySample ="create temporary table if not exists temp_table" + child +"(INDEX my_index_name(CHROM,POS,REF,ALT)) charset=latin1 as (select CHROM, POS, REF, ALT, round((AO/(AO+RO)),3) as AF from sampleinfor where SAMPLE=:sample and length(REF)=1 and length(ALT)=1)";

querySample ="create temporary table if not exists temp_table" + father +"(INDEX my_index_name(CHROM,POS,REF,ALT)) charset=latin1 as (select CHROM, POS, REF, ALT, round((AO/(AO+RO)),3) as AF from sampleinfor where SAMPLE=:sample and length(REF)=1 and length(ALT)=1)";

querySample ="create temporary table if not exists temp_table" + mother +"(INDEX my_index_name(CHROM,POS,REF,ALT)) charset=latin1 as (select CHROM, POS, REF, ALT, round((AO/(AO+RO)),3) as AF from sampleinfor where SAMPLE=:sample and length(REF)=1 and length(ALT)=1)";

注意这里要把后续需要比对的列建立索引，方便查询。

ok，下一步，将孩子分别与父亲，母亲匹配对比：

String childFather = "create temporary table if not exists temp_fatherDiff" + father + "(PRIMARY KEY(CHROM,POS,REF(32),ALT(32))) charset=latin1 as (select child.CHROM,child.POS,child.REF,child.ALT,child.AF as ChildAF,father.ALT as FatherALT,father.AF as FatherAF, null as MotherALT, null as MotherAF,null as SampleId,null as FatherId,null as MotherId from temp_table" + child + " as child,temp_table" + father + " father where(child.CHROM=father.CHROM and child.POS=father.POS and child.REF=father.REF and child.ALT!=father.ALT))";

母亲的类似，这里重点是建立primary key 方便后续的inner join 以及 where语句里index的使用

最后将孩子与双亲的数据进行inner join

String finalQuery = "select father.CHROM,father.POS,father.REF,father.ALT,father.ChildAF,father.FatherALT,father.FatherAF,mother.MotherALT,mother.MotherAF, :child as SampleId, :father as FatherId, :mother as MotherId" + " from temp_fatherDiff" + father + " father inner join temp_motherDiff" + mother + " mother on(father.CHROM=mother.CHROM and father.POS=mother.POS and father.REF=mother.REF)";

这样建立的临时表存在于内存中，分表查询提高了查询效率，内存操作也更加快速，并且临时表的也会在随后session关闭后消失，不需要人为删除。

2）merge存储引擎针对分表后的数据进行查询

这里也是我的一个场景，我的gnomad数据是根据染色体create的table：tableChr1,tableChr2……..，每一个table为千万行级别，一个多G大小。
这里的前边的操作其实就是上边提到的，将孩子与双亲对比后的inner join 数据，找出了孩子与双亲兼不同的数据，这些数据呢，需要 left join gnomad的数据，ok，问题来了，gnomad是分表的。当然可以，将不同的table union一下，但是我这里一共有24张分表，而且常染色体和外显子都是根据染色体分表创建的，一共48张表，sql语句得写多长啊，所以这里就用到的merge 储存引擎以总表的形式去查询

这里的分表的结构是一致的，因此后边建立总表的结构也要和分表结构一致，并且引擎需要是MRG_MYISAM(mysql)
建立总表的形式是

create table gnochrall (CHROM varchar(32) not null, START int not null, END int not null, REF varchar(10000) not null, ALT varchar(10000) not null, AF_GNOMAD varchar(255) default null, AF_AFR varchar(255) default null, AF_AMR varchar(255) default null, AF_ASJ varchar(255) default null, AF_EAS varchar(255) default null, AF_FIN varchar(255) default null, AF_NFE varchar(255) default null, AF_OTH varchar(255) default null, primary key(CHROM(32),START,END,REF(255),ALT(255))) engine=MRG_MYISAM default charset=latin1 union=(gnochr1,gnochr2,gnochr3,gnochr4,gnochr5,gnochr6,gnochr7,gnochr8,gnochr9,gnochr10,gnochr11,gnochr12,gnochr13,gnochr14,gnochr15,gnochr16,gnochr17,gnochr18,gnochr19,gnochr20,gnochr21,gnochr22,gnochrX,gnochrY);

重点是engine选取MRG_MYISAM, 需要对哪些分表建立merge总表：union=（table1,tabl2,…….）

ok,这样就可以在sql中写一个 left join gnochrall on() 了。
…………………………………………….
end

秒客网

MYSQL之 1）临时表的应用场景 2）分表后的merge存储引擎查询应用场景

相关文章