专栏四:解析cellphoneDB(V3版本为主)

时间:2024-12-11 07:29:33

新版本网站github新版本

旧版本网站github旧版本

本专栏尽量结合二者(新版本增加了东西,但框架没有变)

官方tutoriol

20231115更新

注意官方已经不再维护V3版本 且已经出现V5版本

数据准备input:

  1. 表达矩阵

表达矩阵counts file 有以下的几种选择

  • a text file, 一个txt格式的矩阵,

用seurat的GetAssayData(seuratobj,slot = "count")

count = GetAssayData(seuratobj,slot = "count")

方便提取,然后输出为txt格式即可

2023.11.15补充:可以采用更快捷的fwrite函数

  1. library()
  2. # 使用fwrite来代替
  3. fwrite(count, file = "", sep = "\t", quote = FALSE, = TRUE, = TRUE)
(count,file = "",quote = F,sep = "\t", = T, = T)
  • h5ad (recoommended), 

如果使用h5ad,在运行时把后缀更改一下即可

cellphonedb method analysis test_meta.txt test_counts.h5ad
  • h5

  • a path to a folder containing mtx/barcode/features.可以直接用cellranger的输出结果

  • NOTE: Your gene/protein ids must be HUMAN. If you are working with another specie such as mouse, we recommend you to convert the gene ids to their corresponding orthologous.如果不是人的话要转换

包含两列,一列barcodes,一列是cluster/celltype信息的txt文件

file ( if method degs_analysis)

如果选择method degs_analysis则需要,这个DEG可以来自于Seurat的寻找结果

This is a two columns file indicanting which gene is specific or upregulated in a cell type (see example ). The first column should be the cell type/cluster name (matching those in ) and the second column the associated gene id. The remaining columns are ignored. We provide notebooks for both Seurat and Scanpy users. It is on you to design a DEG analysis appropiated for your research question.

总结上面的话就是:第一列是匹配meta信息的celltype,第二列是gene,其它的会被忽略

提供了接入seurat和scanpy的分析接入代码,包括DEG的查找方法DEG

4. microenviroments file (if microenvs_file_path)

可以选择性提供微环境信息来进一步确定配受体情况。我理解为可以把microenvironment理解为额外划定/圈定的分析(空间)范围,celltype要和metadata对应

运行代码:

  1. statistical method

分为running the statistical method和without using the statistical method还有最新的degs_analysis方法

  • method statistical_analysis

范例:

cellphonedb method statistical_analysis test_meta.txt test_counts.txt
  • method analysis

范例:

cellphonedb method analysis test_meta.txt test_counts.txt 
  • method degs_analysis

  1. 运行时的参数

~ Optional Method parameters:

  • --counts-data: [ensembl | gene_name | hgnc_symbol] Type of gene identifiers in the counts data一般选择hgnc_symbol,这里是说表达矩阵的基因名是什么样的格式

  • --project-name: Name of the project. A subfolder with this name is created in the output folder创建一个子目录在output下,命名为该project。适合多样本的计算时区分各个样本。

  • --iterations: Number of iterations for the statistical analysis [1000]迭代次数,可以默认

  • --threshold: % of cells expressing the specific ligand/receptor表达占比低于%的基因将不会被分析,常选择0.1

  • --result-precision: Number of decimal digits in results [3]

  • --output-path: Directory where the results will be allocated (the directory must exist) [out]输出文件夹,这个文件夹不会自己创建,(不同于cnmf会自己建一个),必须提前建立文件夹

  • --output-format: Output format of the results files (extension will be added to filename if not present) [txt] 输出文件的格式,不指定就是txt

  • --means-result-name: Means result filename [means] 文件结果的名字,默认就好

  • --significant-means-result-name: Significant mean result filename [significant_means]文件结果的名字,默认就好

  • --deconvoluted-result-name: Deconvoluted result filename [deconvoluted]文件结果名字,默认就好

  • --verbose/--quiet: Print or hide CellPhoneDB logs [verbose] 是否输出中间结果

  • --subsampling: Enable subsampling 下采样相关的参数-当样本量过大时可以采用

  • --subsampling-log: Enable subsampling log1p for non log-transformed data inputs !!mandatory!!

  • --subsampling-num-pc: Subsampling NumPC argument (number of PCs to use) [100]

  • --subsampling-num-cells: Number of cells to subsample to [1/3 of cells]

~ Optional Method Statistical parameters 统计参数

  • --pvalues-result-name: P-values result filename [pvalues]文件结果名字,默认就好

  • --pvalue: P-value threshold [0.05] 阈值 不用管

  • --debug-seed: Debug random seed -1. To disable it please use a value >=0 [-1] 处理bug

  • --threads: Number of threads to use. >=1 [4] 是否多核

3. 正式运行

旧版本官方给了四个例子,大同小异其实可以一起设置

cellphonedb method statistical_analysis   --iterations=10 --threads=2

  1. 新版本的method运行

具体实例在tutoriol新版,基于了python内运行,不再是terminal

Example with running the DEG-based method
  1. from import cpdb_degs_analysis_method
  2. deconvoluted, means, relevant_interactions, significant_means = cpdb_degs_analysis_method.call(
  3. cpdb_file_path = ,
  4. meta_file_path = test_meta.txt,
  5. counts_file_path = test_counts.h5ad,
  6. degs_file_path = degs_file_path,
  7. counts_data = 'hgnc_symbol',
  8. threshold = 0.1,
  9. output_path = out_path)
Example with running the statistical method
  1. from import cpdb_statistical_analysis_method
  2. deconvoluted, means, pvalues, significant_means = cpdb_statistical_analysis_method.call(
  3. cpdb_file_path = ,
  4. meta_file_path = test_meta.txt,
  5. counts_file_path = test_counts.h5ad,
  6. counts_data = 'hgnc_symbol',
  7. output_path = out_path)
Example without using the statistical method

用text的结果

  1. from import cpdb_analysis_method
  2. means, deconvoluted = cpdb_analysis_method.call(
  3. cpdb_file_path = ,
  4. meta_file_path = test_meta.txt,
  5. counts_file_path = test_counts.txt,
  6. counts_data = 'hgnc_symbol',
  7. output_path = out_path)

用h5ad的结果

  1. from import cpdb_analysis_method
  2. means, deconvoluted = cpdb_analysis_method.call(
  3. cpdb_file_path = ,
  4. meta_file_path = test_meta.txt,
  5. counts_file_path = test_counts.h5ad,
  6. counts_data = 'hgnc_symbol',
  7. output_path = out_path)
Example running a microenviroments file
  1. from import cpdb_degs_analysis_method
  2. deconvoluted, means, relevant_interactions, significant_means = cpdb_degs_analysis_method.call(
  3. cpdb_file_path = ,
  4. meta_file_path = test_meta.txt,
  5. counts_file_path = test_counts.h5ad,
  6. counts_data = 'hgnc_symbol',
  7. microenvs_file_path = microenvs_file_path,
  8. output_path = out_path)

结果描述:

旧版官方网站cellphoneDB_github结果解读

新版本结果解读更详尽,也包括更多的新方法,这里暂不介绍。

在结果文件夹中主要有以下文件:每个文件内又有若干参数

  • P-value (),

  • Mean (),

  • Significant mean (significant_means.txt)

  • Deconvoluted ()

  • anno的注释文件属于输入数据

所有文件的共有参数

  • id_cp_interaction: Unique CellPhoneDB identifier for each interaction stored in the database.

  • interacting_pair: Name of the interacting pairs separated by “|”.

  • partner A or B: Identifier for the first interacting partner (A) or the second (B). It could be: UniProt (prefix simple:) or complex (prefix complex:)类似多配体/多受体情况

  • gene A or B: Gene identifier for the first interacting partner (A) or the second (B). The identifier will depend on the input user list.

  • secreted: True if one of the partners is secreted.

  • Receptor A or B: True if the first interacting partner (A) or the second (B) is annotated as a receptor in our database.

  • annotation_strategy: Curated if the interaction was annotated by the CellPhoneDB developers. Otherwise, the name of the database where the interaction has been downloaded from.

  • is_integrin: True if one of the partners is integrin.


  • : p-values for the all the interacting partners: refers to the enrichment of the interacting ligand-receptor pair in each of the interacting pairs of cell types. (Only in )

后方有一些数值,应该为cluster之间,对于该配受体对的p值

结果和p值的文件类似

  • means: Mean values for all the interacting partners: mean value refers to the total mean of the individual partner average expression values in the corresponding interacting pairs of cell types. If one of the mean values is 0, then the total mean is set to 0. (Only in )

3.significant_means.txt

主要参数同前,多了rank结果

  • rank: Total number of significant p-values for each interaction divided by the number of cell type-cell type comparisons. (Only in significant_means.txt)

  • significant_mean: Significant mean calculation for all the interacting partners. If < 0.05, the value will be the mean. Alternatively, the value is set to 0. (Only in significant_means.txt)

和其它文件都有点不一样

  • gene_name: Gene identifier for one of the subunits that are participating in the interaction defined in “” file. The identifier will depend on the input of the user list.

  • uniprot: UniProt identifier for one of the subunits that are participating in the interaction defined in “” file.

  • is_complex: True if the subunit is part of a complex. Single if it is not, complex if it is.

  • protein_name: Protein name for one of the subunits that are participating in the interaction defined in “” file.

  • complex_name: Complex name if the subunit is part of a complex. Empty if not.

  • id_cp_interaction: Unique CellPhoneDB identifier for each of the interactions stored in the database.

  • mean: Mean expression of the corresponding gene in each cluster.

绘图:

cpdb自带绘图功能,但更适合导出数据去R中绘制

可以查看一个R包ktplots绘制cpdb图,也可以看官方的绘图

查看版本:

获得当前最新库(terminal中运行)

cellphonedb database list_remote

本地库版本

cellphonedb database list_local