CNV

时间:2023-11-28 10:36:08

CNV:

人类主要是二倍体。如果有些区域出现3个、4个拷贝,那就是扩增了,如果只出现1个拷贝,就是缺失。
所以CNV分析是依靠特定位置的测序深度来估算的,先在染色体上划窗,然后看每个窗口的平均测序深度,如果连续多个窗口的测序深度在样品/对照中都有差异,那么就判断为CNV,标准是拷贝数相除,然后取log2,log2Ratio小于-1或大于0.6即视为出现拷贝数变异,对应的ratio就是小于二分之一或者三分之二,也就是至少增加或减少一个拷贝

CNV:注释

library(biomaRt)
mart <- useMart(biomart="ensembl", dataset="hsapiens_gene_ensembl")
results <- getBM(attributes = c("hgnc_symbol", "chromosome_name",
"start_position", "end_position"),
filters = c("chromosome_name", "start", "end"),
values=list(1, 94312388, 96000000),
mart=mart)
dim(results) # 34 hits, only 12 with gene symbols

library(GenomicRanges)
filename <- "test.txt"

#test.txt

Sample Chromosome Start End Num_Probes Segment_Mean
TCGA-BR-A4J9-10A-01D-A255-01 1 3218610 247813706 127587 -8e-04
TCGA-BR-A4J9-10A-01D-A255-01 2 484222 16358510 9812 4e-04
TCGA-BR-A4J9-10A-01D-A255-01 2 16358715 16359561 3 -2.0811
TCGA-BR-A4J9-10A-01D-A255-01 2 16360852 149639289 67009 0.0085
TCGA-BR-A4J9-10A-01D-A255-01 2 149641890 149644977 2 -2.552

tbl <- read.table(filename, sep="\t", as.is=TRUE, header=TRUE);
gr <- makeGRangesFromDataFrame(tbl)
gr.short <- subset(gr, width < 100)
length(gr) # 117 regions
length(gr.short) # just 2 regions
gr.short
regions <- paste(seqnames(gr.short), start(gr.short), end(gr.short), sep=":")
regions
results <- getBM(attributes = c("hgnc_symbol", "chromosome_name",
"start_position","end_position"),
filters = c("chromosomal_region"),
values=regions,
mart=mart)