DNA sequence open reading frames (ORFs) | DNA序列的开放阅读框ORF预测

常见的ORF预测工具

基本概念

开放阅读框（英语：Open reading frame；缩写：ORF；其他译名：开放阅读框架、开放读架等）是指在给定的阅读框架中，不包含终止密码子的一串序列。这段序列是生物个体的基因组中，可能作为蛋白质编码序列的部分。基因中的ORF包含并位于开始编码与终止编码之间。由于一段DNA或RNA序列有多种不同读取方式，因此可能同时存在许多不同的开放阅读框架。有一些计算机程序可分析出最可能是蛋白质编码的序列。

关键词：

1. 不包含终止密码子的一串序列；

2. 可能作为蛋白质编码序列的部分；

3. 有多种不同读取方式，因此可能同时存在许多不同的开放阅读框架；

4. 有些工具会用blast比对来提高可信度

示例

一段5'-UCUAAAGGUCCA-3'序列。此序列共有3种读取法：

UCU AAA GGU CCA
CUA AAG GUC
UAA AGG UCC

由于UAA为终止编码，因此第三种读取法不具编译出蛋白质的潜力，故只有前两者为开放阅读框架

个人当然是推荐使用NCBI大佬开发的工具的啦，发文章可信度高些。

以下是Linux版该工具的说明：

USAGE

  ORFfinder [-h] [-help] [-xmlhelp] [-in Input_File] [-id Accession_GI]

    [-b begin] [-e end] [-c circular] [-g Genetic_code] [-s Start_codon]

    [-ml minimal_length] [-n nested_ORFs] [-strand Strand] [-out Output_File]

    [-outfmt output_format] [-logfile File_Name] [-conffile File_Name]

    [-version] [-version-full] [-dryrun]

DESCRIPTION

   Searching open reading frames in a sequence

OPTIONAL ARGUMENTS

 -h

   Print USAGE and DESCRIPTION;  ignore all other parameters

 -help

   Print USAGE, DESCRIPTION and ARGUMENTS; ignore all other parameters

 -xmlhelp

   Print USAGE, DESCRIPTION and ARGUMENTS in XML format; ignore all other

   parameters

 -logfile <File_Out>

   File to which the program log should be redirected

 -conffile <File_In>

   Program's configuration (registry) data file

 -version

   Print version number;  ignore other arguments

 -version-full

   Print extended version data;  ignore other arguments

 -dryrun

   Dry run the application: do nothing, only test all preconditions

 *** Input query options (one of them has to be provided):

 -in <File_In>

   name of file with the nucleotide sequence in FASTA format

   (more than one sequence is allowed)

   Default = `'

 -id <String>

   Accession or gi number of the nucleotide sequence

   (ignored, if the file name is provided)

   Default = `'

 *** Query sequence details:

 -b <Integer>

   Start address of sequence fragment to be processed

   Default = `1'

 -e <Integer>

   Stop address of sequence fragment to be processed (0 - to the end of the

   sequence)

   Default = `0'

 -c <Boolean>

   Is the sequence circular? (t/f) *** Under development

   Default = `false'

 *** Search parameters:

 -g <Integer>

   Genetic code to use (1-31)

   see https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi for details

   Default = `1'

 -s <Integer>

   ORF start codon to use:

       0 = "ATG" only

       1 = "ATG" and alternative initiation codons

       2 = any sense codon

   Default = `1'

 -ml <Integer>

   Minimal length of the ORF (nt)

   Value less than 30 is automatically changed by 30.

   Default = `75'

 -n <Boolean>

   Ignore nested ORFs (completely placed within another)

   Default = `false'

 -strand <String>

   Output ORFs on specified strand only (both|plus|minus)

   Default = `both'

 *** Output options:

 -out <File_Out>

   Output file name

 -outfmt <Integer>

   Output options:

       0 = list of ORFs in FASTA format

       1 = CDS in FASTA format

       2 = Text ASN.1

       3 = Feature table

   Default = `0'

ORFfinder -in in.fasta -s 2 -ml 100 -out test.out -outfmt 3

秒客网

DNA sequence open reading frames (ORFs) | DNA序列的开放阅读框ORF预测

常见的ORF预测工具

基本概念

示例

相关文章