目录
文献Seurat处理步骤
①HDS分析步骤
②原文献数据处理
加载单细胞Seurat
换算每个样本每种细胞比例
比较两个分组比例
①比例计算
②绘图
单种细胞比例亚组间差异
①添加分组信息
②循环作图
文献Seurat处理步骤
组蛋白脱乙酰酶介导的胃癌肿瘤微环境特征及协同免疫治疗(多组学文献学习)_acrg 队列-****博客
①HDS分析步骤
单细胞转录组测序分析步骤:用 CellRanger 识别细胞后构建 Seurat 矩阵,根据之前的研究过滤低质量细胞,最后获得数据进行聚类分析。首先,选取方差最大的前 2000 个基因进行数据归一化,使用主成分分析(PCA)将数据维度降低到 50 个主成分,并使用harmony去除样本的批次效应。通过 tSNE 聚类分析,共确定了 9 个聚类(B 细胞、CD4+ T 细胞、CD8+ T 细胞、NK 细胞、肥大细胞、内皮细胞、成纤维细胞、骨髓细胞和浆细胞)。然后,提取每个细胞群的基因表达矩阵,以确定亚群。
方差前 2000 个基因用于主成分分析,前 25-30 个主成分用于 Harmony 批校正。Wilcoxon 秩和检验用于鉴定亚群之间的差异表达基因。最后,利用 CellChat 库中的配体和受体信息分析各亚群细胞的通讯情况。
③单细胞学习-pbmc的Seurat 流程_seurat 删除离散细胞-****博客
②原文献数据处理
Parallel single-cell and bulk transcriptome analyses reveal key features of the gastric tumor microenvironment | Genome Biology | Full Text ()
Single-cell sequencing data processing
The 10X droplet-based single-cell RNA sequencing data were processed using CellRanger toolkit (version 3.0.0) provided by 10X Genomics. Gene expression levels are quantified using GRCh38 reference genome (Ensembl 93 annotation). For each cell identified by CellRanger, we calculated the total number of detected genes, total number of UMI counts, and proportion of mitochondrial reads. A set of quality thresholds was applied to filter out low-quality cells, including detection of 200–7500 genes, 500–75,000 UMI counts, and less than 10% mitochondrial reads, resulting in a total cell number of 117,506 post-filter cells that were used for clustering analysis.
Normalization and batch effect correction
使用的是Seurat-SCTransform与harmony整合学习-****博客
Cells passing quality filter were normalized with SCTransform [84] using the default parameters. Independent component analysis (ICA) was applied on the normalized gene-cell matrix to identify potential batch effects. Out of 128 independent components, an independent component (IC_15) was found to have a highly sample-specific distribution (Additional file 1: Fig. S1b). We further inspected the top weighted genes in this independent component and found this IC populated by a heat-shock protein-related program (Additional file 1: Fig. S1c), potentially derived from enzymatic stimulation during tissue dissociation [85]. The gene expression program driven by IC_15 was then subtracted from the normalized gene-cell matrix to remove this dissociation-derived batch effect.
加载单细胞Seurat
-
#加载单细胞Seurat 数据进行分析
-
rm(list = ls())
-
library(dplyr)
-
library(readr)
-
library(BiocParallel)
-
library(Seurat)
-
library(sctransform)#加载包
-
library()
-
-
load("")#数据读取
-
gcmeta <- read_csv("cell_metadata.csv")#提取相关细胞的亚型:提取肿瘤的metadata
-
HDS <- read.csv("2024年7月19日ICI计算法.csv",sep = ",",header=T)#转录组HDS评分
-
table(gcmeta$Type)#查看注释细胞类型
-
table(gcmeta$Patient,gcmeta$Tissue)#查看样本及癌与癌旁细胞数
#提取相关肿瘤亚型单细胞gcdata数据
-
#提取相关肿瘤亚型单细胞gcdata数据
-
n_last <- 7
-
HDS$sample <-substr(HDS$X, nchar(HDS$X) - n_last + 1,
-
nchar(HDS$X))
-
samp <- intersect(unique(gcmeta$Sample),HDS$sample)#有21个配对单细胞肿瘤数据
-
gcdata2 <- subset(gcdata, subset = Sample %in% samp)#提取21个肿瘤的单细胞矩阵
-
table(gcdata2$Sample)#21个肿瘤样本单细胞数据进行下游继续分析
-
table(gcdata2$Sample,gcdata2$Type)#样本间细胞比例
换算每个样本每种细胞比例
-
#换算每样样本每种细胞占有的比例:绘制总的堆积图
-
Cellratio <- prop.table(table(gcdata2$Type,gcdata2$Sample),
-
margin = 2)# margin = 2按照列计算每个样本比例
-
Cellratio <- as.data.frame(Cellratio)#计算比例绘制堆积图
-
library(ggplot2)#绘制细胞比例堆积图
-
colourCount = length(unique(Cellratio$Var1))
-
p1 <- ggplot(Cellratio) +
-
geom_bar(aes(x =Var2, y= Freq, fill = Var1),
-
stat = "identity",width = 0.7,size = 0.5,colour = '#222222')+
-
theme_classic() +
-
labs(x='Sample',y = 'Ratio')+
-
#coord_flip()+ #进行翻转
-
theme( = element_rect(fill=NA,color="black",
-
size=0.5, linetype="solid"))
-
p1
-
dev.off()
head(Cellratio) Var1 Var2 Freq 1 B 171012T 0.3945686901 2 CD4+ T 171012T 0.1560170394 3 CD8+ T 171012T 0.1256656017 4 Endothelial 171012T 0.0766773163 5 Epithelial 171012T 0.0005324814 6 Fibroblast 171012T 0.0431309904
比较两个分组比例
①比例计算
②单细胞学习-组间及样本细胞比例分析_单细胞分组间细胞占比差异-****博客
②-Ⅱ单细胞学习-组间及样本细胞比例分析(补充)_单细胞数据计算某一细胞数量-****博客
-
library(tidyverse)
-
library(reshape)
-
clusdata <- as.data.frame(table(gcdata2$Type,gcdata2$Sample))
-
#进行长宽数据转换
-
clusdata1 <- clusdata %>% pivot_wider(names_from = Var2,
-
values_from =Freq )
-
clusdata1 <- as.data.frame(clusdata1)
-
rownames(clusdata1) <- clusdata1$Var1
-
clusdata2 <- clusdata1[,-1]
-
-
#分别计算每个组每种细胞和
-
HDS1 <- HDS[order(HDS$),]
-
HDS2 <- HDS1[HDS1$sample %in% samp,]
-
rownames(HDS2) <- HDS2$sample
-
-
low <- c(rownames(HDS2)[1:10])#低评分组样本
-
clusdata2$lowsum <- rowSums(clusdata2[,low])
-
high <- c(rownames(HDS2)[11:21])#高评分组样本
-
clusdata2$highsum <- rowSums(clusdata2[,high])#然后绘制堆积图
-
-
clus2 <- clusdata2[,c(22,23)]#细胞笔记数据
-
clus2$ID <- rownames(clus2)
-
clus3 <- melt(clus2, = c("ID"))##根据分组变为长数据
②绘图
-
p <- ggplot(data = clus3,
-
aes(x=ID,y=value,fill=variable))+
-
#geom_bar(stat = "identity",position = "stack")+ ##展示原来数值
-
geom_bar(stat = "identity",position = "fill")+ ##按照比例展示:纵坐标为1
-
scale_y_continuous(expand = expansion(mult=c(0.01,0.1)),##展示纵坐标百分比数值
-
labels = scales::percent_format())+
-
scale_fill_manual(values = c("lowsum"="#a56cc1","highsum"="#769fcd"), ##配色:"lowsum"="#98d09d","highsum"="#e77381"
-
limits=c("lowsum","highsum"))+ ##limit调整图例顺序
-
theme( = element_blank(), ##主题设置
-
axis.line = element_line(),
-
= "top")+ #"bottom"
-
labs(title = "single cell",x=NULL,y="percent")+ ##X,Y轴设置
-
guides(fill=guide_legend(title = NULL,nrow = 1,byrow = FALSE))
-
p
-
dev.off()
单种细胞比例亚组间差异
①添加分组信息
-
#添加HDS分组信息
-
HDS2$group <- c(1:nrow(HDS2))
-
HDS2$group1 <- ifelse(HDS2$group>10,"high","low")
-
HDS3 <- HDS2[rownames(cellper),]#调整顺序
-
identical(HDS3$sample,rownames(cellper))#[1] TRUE 数据检查
-
cellper$sample <- HDS3$sample
-
cellper$group <- HDS3$group1
②循环作图
-
###作图展示
-
pplist = list()##循环作图建立空表
-
library(ggplot2)
-
library(dplyr)
-
library(ggpubr)
-
library(cowplot)
-
sce_groups = c(colnames(cellper)[1:12])#细胞系
-
for(group_ in sce_groups){
-
cellper_ = cellper %>% select(one_of(c('sample','group',group_)))#选择一组数据
-
colnames(cellper_) = c('sample','group','percent')#对选择数据列命名
-
cellper_$percent = as.numeric(cellper_$percent)#数值型数据
-
cellper_ <- cellper_ %>% group_by(group) %>% mutate(upper = quantile(percent, 0.75),
-
lower = quantile(percent, 0.25),
-
mean = mean(percent),
-
median = median(percent))#上下分位数
-
print(group_)
-
print(cellper_$median)
-
-
pp1 = ggplot(cellper_,aes(x=group,y=percent)) + #ggplot作图
-
geom_jitter(shape = 21,aes(fill=group),width = 0.25) +
-
stat_summary(fun=mean, geom="point", color="grey60") +#stat_summary添加平均值
-
theme_cowplot() +
-
theme( = element_text(size = 10), = element_text(size = 10), = element_text(size = 10),
-
= element_text(size = 10), = element_text(size = 10,face = 'plain'), = 'none') +
-
labs(title = group_,y='Percentage') +
-
geom_errorbar(aes(ymin = lower, ymax = upper),col = "grey60",width = 1)
-
-
###组间t检验分析
-
labely = max(cellper_$percent)
-
compare_means(percent ~ group, data = cellper_)
-
my_comparisons <- list( c("low", "high") )
-
pp1 = pp1 + stat_compare_means(comparisons = my_comparisons,size = 3,method = "")
-
pplist[[group_]] = pp1
-
}
-
#批量绘制 colnames(cellper)[1:12] 细胞系
-
plot_grid(pplist[['B']],
-
pplist[['CD4+ T']],
-
pplist[['CD8+ T']],
-
pplist[['Endothelial']],
-
pplist[['Epithelial']],
-
pplist[['Fibroblast']],
-
pplist[['Glial']],
-
pplist[['Innate lymphoid']],
-
pplist[['Mast']],
-
pplist[['Mural']],
-
pplist[['Myeloid']],
-
pplist[['Plasma']],
-
#nrow = 5,#列数
-
ncol = 4)#行数
原文献(评分计算部分存在一定差异,所以这里也存在一定差异)
文献:
Histone deacetylase-mediated tumor microenvironment characteristics and synergistic immunotherapy in gastric cancer ()
Parallel single-cell and bulk transcriptome analyses reveal key features of the gastric tumor microenvironment | Genome Biology | Full Text ()