I have several m x n matrices of gene expression data that I want to store in MySQL.
我有几个mxn矩阵的基因表达数据我想存储在MySQL中。
m is approx 30,000 genes (uniquely identifiable)
n is approx 3,000 samples (mostly uniquely identifiable)
m是大约30,000个基因(唯一可识别的)n是大约3,000个样本(大部分唯一可识别的)
I'm not sure what the best way is to store these data. I initially read the matrices directly into MySQL tables, but I have since been told that this is not a great way to do things, since the number of columns (samples) is a variable quantity. I cannot transpose the matrices and store them that way because there are more genes than MySQL allows for when creating columns.
我不知道存储这些数据的最佳方法是什么。我最初将这些矩阵直接读到MySQL表中,但后来我被告知这不是一种很好的方法,因为列(样本)的数量是一个变量。我不能将矩阵转置并以这种方式存储它们,因为在创建列时,MySQL允许的基因比它多。
I've since been told that 'junction tables' might represent a better way to do this. After watching several YouTube videos on these, however, I'm none the wiser. I've also searched Google and there doesn't seem to be a tutorial on storing gene expression data in MySQL using junction tables. So, does anyone have any advice on how best to store these data? I honestly expected that there would be a huge literature on this, so if you have useful links that would also be much appreciated.
后来有人告诉我,“连接表”可能是一种更好的方法。不过,在看了YouTube上的几段视频后,我还是不知道。我还搜索了谷歌,似乎没有关于使用连接表在MySQL中存储基因表达数据的教程。那么,对于如何最好地存储这些数据,有人有什么建议吗?我真诚地希望在这方面有大量的文献,所以如果你有有用的链接,也会非常感谢。
1 个解决方案
#1
3
You need just a few tables for this, I am using mysql syntax:
你只需要几个表,我用的是mysql语法:
CREATE TABLE genes (
`gene_id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
`gene_name` varchar(99) not null
)ENGINE=InnoDB;
CREATE TABLE samples (
`sample_id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
`sample_name` varchar(99) not null
)ENGINE=InnoDB;
CREATE TABLE gene_sample (
`gene_id` INT NOT NULL,
`sample_id` INT NOT NULL,
FOREIGN KEY(`gene_id`) REFERENCES genes (`gene_id`),
FOREIGN KEY(`sample_id`) REFERENCES sample (`sample_id`),
)ENGINE=InnoDB;
For every gene that occurs in a sample, insert the pair of gene_id
and sample_id
into the gene_sample
table.
对于在样本中发生的每个基因,将这对gene_id和sample_id插入到gene_sample表中。
Use two JOIN expressions in a SELECT to reconstruct the full data:
在SELECT中使用两个JOIN表达式来重构完整的数据:
SELECT genes.*, samples.*
FROM gene_sample
LEFT JOIN genes USING (gene_id)
LEFT JOIN samples USING (sample_id);
#1
3
You need just a few tables for this, I am using mysql syntax:
你只需要几个表,我用的是mysql语法:
CREATE TABLE genes (
`gene_id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
`gene_name` varchar(99) not null
)ENGINE=InnoDB;
CREATE TABLE samples (
`sample_id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
`sample_name` varchar(99) not null
)ENGINE=InnoDB;
CREATE TABLE gene_sample (
`gene_id` INT NOT NULL,
`sample_id` INT NOT NULL,
FOREIGN KEY(`gene_id`) REFERENCES genes (`gene_id`),
FOREIGN KEY(`sample_id`) REFERENCES sample (`sample_id`),
)ENGINE=InnoDB;
For every gene that occurs in a sample, insert the pair of gene_id
and sample_id
into the gene_sample
table.
对于在样本中发生的每个基因,将这对gene_id和sample_id插入到gene_sample表中。
Use two JOIN expressions in a SELECT to reconstruct the full data:
在SELECT中使用两个JOIN表达式来重构完整的数据:
SELECT genes.*, samples.*
FROM gene_sample
LEFT JOIN genes USING (gene_id)
LEFT JOIN samples USING (sample_id);