ASE又走到了关键的一步 要生成能决定是否有差异表达的table.
准备借鉴一下cuffdiff和edgeR 的结果
cuffdiff对差异表达基因的描述:
一共十四列:
第一列, test_id
a unique identifer describing the transcript, gene, primary transcript, or CDS being tested.
eg XLOC_000003
第二列,gene_id
eg XLOC_000003
第三列, gene
第四列, locus
genomic coordinates for easy browsing to the genes or transcripts being tested.
eg contig_23646:3511-3922
第五列, sample1
label (or number if no labels provided) of the first sample being tested
eg Sample_E
第六列, sample2
label (or number if no labels provided) of the second sample being tested
eg Sample_FHM
第七列, status
can be one of OK(test successful), NOTEST(not enough alignments for testing), LOWDATA(too many fragments in locus), or FAIL, when an ill-conditioned covariance matrix or other numerical exception prevents testing
eg OK
第八列 value_1
FPKM of the gene in sample 1
eg 339.567
第九列 value_2
FPKM of the gene in sample 2
eg 465.939
第十列 log2(fold change)
the (base 2 ) log of the fold change 1/2
eg 0.456447
第十一列 test stat
the value of the test statistic used to compute significance of the observed change in FPKM
不懂什么意思 估计要去翻统计书的节奏了
eg 0.361712
第十二列 p_value
the uncorrected p-value of the test statistic
eg 0.4849
第十三列 q_value
the FDR-adjusted p-value of the test statistic
eg 0.756741
第十四列 significant
can be either 'yes' or 'no' , depending on whether p is greater than the FDR after Benjamini-Hochberg correction for multiple-testing
eg no
The FPKM value represents the concentration of a transcript in your samples, normalized for observed read counts and gene length. Thus fields 7,8 represent measurements for your samples and field 9 is simply a ratio of the two. You might look up FPKM or RPKM values if you're unsure what they represent. Fields 11 and 12 are p-value and q-value. These are values associated with the measured variation or uncertainty when you make repeated measurements of something. You should look up what a p-value and an "adjusted p-value" are (the adjusted one is important for you to understand if you're going to do any genomic data analysis). The 13th field is simply a flag based on whether the value in field 11 or 12 is less than 0.05 (I forget which one, but you could figure it out by exploring your data).
edge R 结果对差异表达基因的描述:
Differential expression analysis of RNA-seq and digital gene expression profiles with biological replication. Uses empirical Bayes estimation and exact tests based on the negative binomial distribution. Also useful for differential signal analysis with other types of genome-scale count data.(貌似两者采用的分布模型是不一样的哦~~)
by freemao
FAFU
free_mao@qq.com