从fasta序列制作表，python

I have around 500 protein sequences in fasta format, I got from a blastp search. From those sequences, I need to have the protein name, organism, Uniprot ID and if possible the protein family, so that I can build a table with that information.

我有大约500个fasta格式的蛋白质序列，我来自blastp搜索。从这些序列中，我需要获得蛋白质名称，生物体，Uniprot ID以及可能的蛋白质家族，以便我可以使用该信息构建一个表格。

Is there any way I can do it using python? some function that comunicate with Uniprot? how can I parse the information from the fasta header?

有什么办法可以用python做到吗？一些与Uniprot交流的功能？如何解析fasta标题中的信息？

1 个解决方案

#1

You should take a look at Biopython that has a FASTA parser. After parsing you can use pandas DataFrame to build a table. Without a snippet of example data it is difficult to provide a more thourogh answer, but it should be doable with about 5 lines of code :)

你应该看看有一个FASTA解析器的Biopython。解析后，您可以使用pandas DataFrame构建表。如果没有示例数据的片段，很难提供更多的thourogh答案，但它应该可以使用大约5行代码:)

from Bio import SeqIO
with open("example.fasta", "rU") as handle:
    print list(SeqIO.parse(handle, "fasta"))

#1

from Bio import SeqIO
with open("example.fasta", "rU") as handle:
    print list(SeqIO.parse(handle, "fasta"))

秒客网

从fasta序列制作表，python

1 个解决方案

#1

#1

相关文章