ensembl/release91:
cat Homo_sapiens.GRCh38.91.gtf | grep -v "#" | cut -f9 | cut -f1,3,6,8 -d\; | grep gene_biotype | sed -e 's/\"//g' | sed -e 's/\;//g' | cut -f2,6,8 -d" " | sort | uniq > GRCh38.feature.info
ENSG00000000003 TSPAN6 protein_coding ENSG00000000005 TNMD protein_coding ENSG00000000419 DPM1 protein_coding ENSG00000000457 SCYL3 protein_coding ENSG00000000460 C1orf112 protein_coding ENSG00000000938 FGR protein_coding ENSG00000000971 CFH protein_coding ENSG00000001036 FUCA2 protein_coding ENSG00000001084 GCLC protein_coding ENSG00000001167 NFYA protein_coding ENSG00000001460 STPG1 protein_coding ENSG00000001461 NIPAL3 protein_coding ENSG00000001497 LAS1L protein_coding ENSG00000001561 ENPP4 protein_coding ENSG00000001617 SEMA3F protein_coding ENSG00000001626 CFTR protein_coding ENSG00000001629 ANKIB1 protein_coding ENSG00000001630 CYP51A1 protein_coding ENSG00000001631 KRIT1 protein_coding ENSG00000002016 RAD52 protein_coding ENSG00000002079 MYH16 transcribed_unitary_pseudogene ENSG00000002330 BAD protein_coding ENSG00000002549 LAP3 protein_coding ENSG00000002586 CD99 protein_coding ENSG00000002587 HS3ST1 protein_coding ENSG00000002726 AOC1 protein_coding ENSG00000002745 WNT16 protein_coding ENSG00000002746 HECW1 protein_coding ENSG00000002822 MAD1L1 protein_coding ENSG00000002834 LASP1 protein_coding ENSG00000002919 SNX11 protein_coding ENSG00000002933 TMEM176A protein_coding ENSG00000003056 M6PR protein_coding ENSG00000003096 KLHL13 protein_coding ENSG00000003137 CYP26B1 protein_coding
58302个ENSG id
56655个gene name(为什么有将近两千个是重复)
46种类型:
3prime_overlapping_ncRNA antisense_RNA bidirectional_promoter_lncRNA IG_C_gene IG_C_pseudogene IG_D_gene IG_J_gene IG_J_pseudogene IG_pseudogene IG_V_gene IG_V_pseudogene lincRNA macro_lncRNA miRNA misc_RNA Mt_rRNA Mt_tRNA non_coding polymorphic_pseudogene processed_pseudogene processed_transcript protein_coding pseudogene ribozyme rRNA scaRNA scRNA sense_intronic sense_overlapping snoRNA snRNA sRNA TEC transcribed_processed_pseudogene transcribed_unitary_pseudogene transcribed_unprocessed_pseudogene translated_processed_pseudogene TR_C_gene TR_D_gene TR_J_gene TR_J_pseudogene TR_V_gene TR_V_pseudogene unitary_pseudogene unprocessed_pseudogene vaultRNA
问题:
1. 为什么用gencode的注释文件做表达定量会出问题?
2. 不同的release之间有什么区别?
3. 不同来源的注释区别在哪里?
4.