从fasta序列制作表,python

时间:2022-10-14 00:13:47

I have around 500 protein sequences in fasta format, I got from a blastp search. From those sequences, I need to have the protein name, organism, Uniprot ID and if possible the protein family, so that I can build a table with that information.

我有大约500个fasta格式的蛋白质序列,我来自blastp搜索。从这些序列中,我需要获得蛋白质名称,生物体,Uniprot ID以及可能的蛋白质家族,以便我可以使用该信息构建一个表格。

Is there any way I can do it using python? some function that comunicate with Uniprot? how can I parse the information from the fasta header?

有什么办法可以用python做到吗?一些与Uniprot交流的功能?如何解析fasta标题中的信息?

1 个解决方案

#1


4  

You should take a look at Biopython that has a FASTA parser. After parsing you can use pandas DataFrame to build a table. Without a snippet of example data it is difficult to provide a more thourogh answer, but it should be doable with about 5 lines of code :)

你应该看看有一个FASTA解析器的Biopython。解析后,您可以使用pandas DataFrame构建表。如果没有示例数据的片段,很难提供更多的thourogh答案,但它应该可以使用大约5行代码:)

from Bio import SeqIO
with open("example.fasta", "rU") as handle:
    print list(SeqIO.parse(handle, "fasta"))

#1


4  

You should take a look at Biopython that has a FASTA parser. After parsing you can use pandas DataFrame to build a table. Without a snippet of example data it is difficult to provide a more thourogh answer, but it should be doable with about 5 lines of code :)

你应该看看有一个FASTA解析器的Biopython。解析后,您可以使用pandas DataFrame构建表。如果没有示例数据的片段,很难提供更多的thourogh答案,但它应该可以使用大约5行代码:)

from Bio import SeqIO
with open("example.fasta", "rU") as handle:
    print list(SeqIO.parse(handle, "fasta"))