I have a list of taxids that looks like this:
我有一个如下所示的出租车列表:
1204725
2162
1300163
420247
I am looking to get a file with taxonomic ids in order from the taxids above:
我希望从上面的出租车中获取一个带有分类标准的文件:
kingdom_id phylum_id class_id order_id family_id genus_id species_id
I am using the package "ete3". I use the tool ete-ncbiquery that tells you the lineage from the ids above. (I run it from my linux laptop with the command below)
我正在使用“ete3”包。我使用工具ete-ncbiquery来告诉你上面的id的谱系。 (我使用下面的命令从我的linux笔记本电脑运行它)
ete3 ncbiquery --search 1204725 2162 13000163 420247 --info
The result looks like this:
结果如下:
# Taxid Sci.Name Rank Named Lineage Taxid Lineage
2162 Methanobacterium formicicum species root,cellular organisms,Archaea,Euryarchaeota,Methanobacteria,Methanobacteriales,Methanobacteriaceae,Methanobacterium,Methanobacterium formicicum 1,131567,2157,28890,183925,2158,2159,2160,2162
1204725 Methanobacterium formicicum DSM 3637 no rank root,cellular organisms,Archaea,Euryarchaeota,Methanobacteria,Methanobacteriales,Methanobacteriaceae,Methanobacterium,Methanobacterium formicicum,Methanobacterium formicicum DSM 3637 1,131567,2157,28890,183925,2158,2159,2160,2162,1204725
420247 Methanobrevibacter smithii ATCC 35061 no rank root,cellular organisms,Archaea,Euryarchaeota,Methanobacteria,Methanobacteriales,Methanobacteriaceae,Methanobrevibacter,Methanobrevibacter smithii,Methanobrevibacter smithii ATCC 350611,131567,2157,28890,183925,2158,2159,2172,2173,420247
I have no idea which items (IDS) correspond to what I am looking for (if any)
我不知道哪些项目(IDS)对应于我要找的东西(如果有的话)
3 个解决方案
#1
6
The following code:
以下代码:
import csv
from ete3 import NCBITaxa
ncbi = NCBITaxa()
def get_desired_ranks(taxid, desired_ranks):
lineage = ncbi.get_lineage(taxid)
lineage2ranks = ncbi.get_rank(lineage)
ranks2lineage = dict((rank, taxid) for (taxid, rank) in lineage2ranks.items())
return {'{}_id'.format(rank): ranks2lineage.get(rank, '<not present>') for rank in desired_ranks}
def main(taxids, desired_ranks, path):
with open(path, 'w') as csvfile:
fieldnames = ['{}_id'.format(rank) for rank in desired_ranks]
writer = csv.DictWriter(csvfile, delimiter='\t', fieldnames=fieldnames)
writer.writeheader()
for taxid in taxids:
writer.writerow(get_desired_ranks(taxid, desired_ranks))
if __name__ == '__main__':
taxids = [1204725, 2162, 1300163, 420247]
desired_ranks = ['kingdom', 'phylum', 'class', 'order', 'family', 'genus', 'species']
path = 'taxids.csv'
main(taxids, desired_ranks, path)
Produces a file that looks like this:
生成一个如下所示的文件:
kingdom_id phylum_id class_id order_id family_id genus_id species_id
<not present> 28890 183925 2158 2159 2160 2162
<not present> 28890 183925 2158 2159 2160 2162
<not present> 28890 183925 2158 2159 2160 2162
<not present> 28890 183925 2158 2159 2172 2173
#2
0
With the Taxid Lineage numbers in your results, try using them in ete3's get_rank
method. As an example:
使用结果中的Taxid Lineage数字,尝试在ete3的get_rank方法中使用它们。举个例子:
from ete3 import NCBITaxa
ncbi = NCBITaxa()
print ncbi.get_rank([9606, 9443])
# {9443: u'order', 9606: u'species'}
Presumably the resulting dictionary should contain the rank information of all IDs, including any intermediate "no rank" IDs that you may want to eliminate.
据推测,结果字典应包含所有ID的排名信息,包括您可能想要消除的任何中间“无排名”ID。
#3
0
You can also use the R packaage taxonomizr
. The package takes a bit of time to download the necessary files, but after that its quite fast and easy.
您也可以使用R packaage taxonomizr。该软件包需要一些时间来下载必要的文件,但之后它非常快速和简单。
library("taxonomizr) getNamesAndNodes() taxaNodes <- read.nodes('nodes.dmp') taxaNames <- read.names('names.dmp') taxaID <- c("1204725", "2162", "1300163", "420247")
getNamesAndNodes
downloads the names.dmp
and nodes.dmp
file from ncbi.
library(“taxonomizr”)getNamesAndNodes()taxaNodes < - read.nodes('nodes.dmp')taxaNames < - read.names('names.dmp')taxaID < - c(“1204725”,“2162”,“1300163” ,“420247”)getNamesAndNodes从ncbi下载names.dmp和nodes.dmp文件。
#1
6
The following code:
以下代码:
import csv
from ete3 import NCBITaxa
ncbi = NCBITaxa()
def get_desired_ranks(taxid, desired_ranks):
lineage = ncbi.get_lineage(taxid)
lineage2ranks = ncbi.get_rank(lineage)
ranks2lineage = dict((rank, taxid) for (taxid, rank) in lineage2ranks.items())
return {'{}_id'.format(rank): ranks2lineage.get(rank, '<not present>') for rank in desired_ranks}
def main(taxids, desired_ranks, path):
with open(path, 'w') as csvfile:
fieldnames = ['{}_id'.format(rank) for rank in desired_ranks]
writer = csv.DictWriter(csvfile, delimiter='\t', fieldnames=fieldnames)
writer.writeheader()
for taxid in taxids:
writer.writerow(get_desired_ranks(taxid, desired_ranks))
if __name__ == '__main__':
taxids = [1204725, 2162, 1300163, 420247]
desired_ranks = ['kingdom', 'phylum', 'class', 'order', 'family', 'genus', 'species']
path = 'taxids.csv'
main(taxids, desired_ranks, path)
Produces a file that looks like this:
生成一个如下所示的文件:
kingdom_id phylum_id class_id order_id family_id genus_id species_id
<not present> 28890 183925 2158 2159 2160 2162
<not present> 28890 183925 2158 2159 2160 2162
<not present> 28890 183925 2158 2159 2160 2162
<not present> 28890 183925 2158 2159 2172 2173
#2
0
With the Taxid Lineage numbers in your results, try using them in ete3's get_rank
method. As an example:
使用结果中的Taxid Lineage数字,尝试在ete3的get_rank方法中使用它们。举个例子:
from ete3 import NCBITaxa
ncbi = NCBITaxa()
print ncbi.get_rank([9606, 9443])
# {9443: u'order', 9606: u'species'}
Presumably the resulting dictionary should contain the rank information of all IDs, including any intermediate "no rank" IDs that you may want to eliminate.
据推测,结果字典应包含所有ID的排名信息,包括您可能想要消除的任何中间“无排名”ID。
#3
0
You can also use the R packaage taxonomizr
. The package takes a bit of time to download the necessary files, but after that its quite fast and easy.
您也可以使用R packaage taxonomizr。该软件包需要一些时间来下载必要的文件,但之后它非常快速和简单。
library("taxonomizr) getNamesAndNodes() taxaNodes <- read.nodes('nodes.dmp') taxaNames <- read.names('names.dmp') taxaID <- c("1204725", "2162", "1300163", "420247")
getNamesAndNodes
downloads the names.dmp
and nodes.dmp
file from ncbi.
library(“taxonomizr”)getNamesAndNodes()taxaNodes < - read.nodes('nodes.dmp')taxaNames < - read.names('names.dmp')taxaID < - c(“1204725”,“2162”,“1300163” ,“420247”)getNamesAndNodes从ncbi下载names.dmp和nodes.dmp文件。