ChIP-seq 学习内容

chip-seq
流程图
书籍资料
工具
UCSU
安装
使用
原理
手册
Swiss在线分析工具
短序列比对工具
BWA
流程
格式处理
序列比对
peak-calling
motif
可视化
输出文档
上下游分析

chip-seq

流程图

【怪毛匠子】

【独家整理-怪毛匠子】

书籍资料

生物信息学许忠能

生物信息学——计算的视角李岭译

工具

UCSU

http://genome.ucsc.edu

安装

获得源文件 http://liulab.dfci.harvard.edu/MACS/Download.html MACS-1.4.2-1.tar.gz http://github.com/downloads/taoliu/MACS/MACS-1.4.2-1.tar.gz 解压缩文件生成MACS-1.4.2文件夹 tar xvzf MACS-1.4.2-1.tar.gz cd MACS-1.4.2 python setup.py install –prefix /your_directory/ prefix用于指定安装目录修改环境变量：（使用sudo可以不用设置环境变量。。。） export PATH = /your_directory/bin:$PATH export PYTHONPATH = /your_directory/lib/python2.X/site-packages/:$PYTHONPATH 使用命令macs14 -h 验证并查看macs的使用说明

使用

假设我们现在有mouse的一组CTCF的ChIP-seq测序数据CTCF.fastq，首先，我们把这些reads map到mouse基因组（这里我们采用mm10）上。假设基因组的index文件已经建好，存在/path_to/文件夹下。

bowtie –m 1 -S -q /path_to/mm10 CTCF.fastq CTCF.sam

-m 最终只保留map上一次的reads

-S 输出文件格式是SAM

-q 输入文件格式是fastq

peak-callingmacs 14 -t CTCF.sam -n CTCF –g mm-t

实验组数据文件名（相对对照组control而言，后面会进一步说明）-n 输出文件名前缀

-g 基因组的大致大小，-g number。MACS内置了一些基因组长度，“mm”表示小鼠的，“hs”表示人的，“ce”表示线虫，“dm”是果蝇。

运行成功后，将得到如下文件：

CTCF_model.r，CTCF_peaks.bed，CTCF_peaks.xls，CTCF_summits.bed

其中，CTCF_model.r以代码的形式保存了“双峰模型”。在终端中输入:

Rscript CTCF_model.r

原理

手册

Swiss在线分析工具

http://ccg.vital-it.ch/chipseq/

短序列比对工具

soap 针对single-end

maq

bwa

Bowtie 速度很快 chipseq适用

BWA

下载地址

http://bio-bwa.sourceforge.net/bwa.shtml

步骤

第一步: 建立 Index

根据reference genome data(e.g. reference.fa) 建立 Index File

[root@localhost ]# bwa index -a bwtsw human_hg18_ref.fa（human参考基因组18）

第二步: 寻找 SA coordinates

如果是pair-end 数据（leftRead.fastq和rightRead.fastq）两个文件分别处理

1 bwa aln reference.fa leftRead.fastq > leftRead.sai

2 bwa aln reference.fa rightRead.fastq > rightRead.sai

3 bwa aln reference.fa singleRead.fastq > singleRead.sai

如果希望多线程运行，在其中加入 -t这个参数，另外-f这个参数可以指定结果输出文件，如:

1 bwa aln -c -t 3 -f leftreads.sai reference.fa leftreads.fastq

第三步：转换SA coordinates输出为sam

如果是pair-end数据

1 bwa sampe -f pair-end.sam reference.fa leftRead.sai rightRead.sai leftRead.fastq rightread.fastq

如果是single reads数据

1 bwa samse -f single.sam reference.fa single.sai single.fastq

流程

格式处理

格式：fastq

工具：FASTQ Groomer、samtools

序列比对

工具：bowtie 输入：fastq 输出：SAM/BAM

peak-calling

工具：MACS(peak-calling) 输入：mapped reads 输出：peaks(BED)、report(html)【】参数：链接：

motif

http://blog.163.com/zju_whw/blog/static/225753129201532104815301/

motif分为两种：

1.Consensus（共识序列），这种就是有序列或是说字母表示，如果同时出现“A”和“G”就用“R”表示，具体是根据IUPAC code（International Union of Pure and Applied Chemistry，http://www.bioinformatics.org/sms2/iupac.html

2.Matrix-based（矩阵方法），就是利用矩阵将每个位置的A，G，C，T的量都表示出来。该方法又有三种变化，Count-matrix，PFM（position frequency matrix）和PWM（position weight scoring）。Count matirx是每个位置计数得来的，PFM是每个位置的百分比得来的，而PWM是通过取对数得来的。

1. 工具：Homer(motif富集的几何优化)

输入：

输出：

参数：

链接：http://homer.salk.edu/homer/

download：http://homer.salk.edu/homer/configureHomer.pl

http://blog.163.com/zju_whw/blog/static/225753129201532104815301/