I have the following extract from an input file:
我从输入文件中提取以下内容:
Query_7736 1624 SDLA**VY*EMQALRIKPSNVTFSILIKLYGRNKQVSKAIEVLEEMKR*GVQPGMIVYTC 1803
XP_002972017 833 MAEACELMRSLRSLRVSPDTVTFSTLIDGLCKCGQTDEACNVFDDMIAGGYVPNVVTYNV 894
XP_002972017 583 FEQASALFEEMVAKNLQPDVMTFGALIDGLCKAGQVEAARDILDLMGNLGVPPNVVTYNA 642
XP_002972017 653 IEEACQFLEEMVSSGCVPDSITYGSLVYALCRASRTDDALQLVSELKSFGWDPDTVTYNI 712
XP_002972017 905 MERAHAMIESMVDKGVTPDVITYSVLVDAFCKASHVDEALELLHGMASRGCTPNVVTFNS 964
XP_002972017 940 VDEALELLHGMASRGCTPNVVTFNSIIDGLCKSDQSGEAFQMFDDMTKHGLAPDKITYCT 1000
XP_002970953 380 RELASSVYKTMTSHGCVPDVVTLSTMIDGLSKAGRIGAAVRIFKSMEARGLAPNEVVYSA 447
XP_002970953 458 MDCALEMLAQMKKAFCTPDTITYNILIDGLCKSGDVEAARAFFDEMLEAGCKPDVYTYNI 517
XP_002971975 632 LEEARKILERLERENCKADMFAYRVMMDGLCRTGRMSAALELLEAIKQSGTPPRHDIYVA 692
XP_002971975 527 VDDAERLLEEMVASDCSPDVYTYTSLVDGFCKVGRMVEARRVLKRMAKRGCQPNVVTYTA 586
XP_002971975 387 VRDAQEVFKRMIVRGIEPNVVTYNSLIHGFCMTNGVDSALLLMEEMTATGCLPDIITYNT 446
XP_002971975 317 LDEACKLFEKMRENSCEPDVVTFTALMDGLCKGDRLQEAQQVLETMEDRNCTPNVITYSS 376
XP_002961692 489 VRDALGLLEFMIESGLSPDVITFNSVLDGLCKEQRILDAHNVFKRALERGCRPNVVTYST 548
XP_002961692 873 SEQALELLRAMVADGGSPDACNYMTVMDGLFKAGSPEVAAKLLQEMRSRGHSPDLRTYTI 932
I have the following script that searches each line with the 'Query_' looks for an '*' and prints the column under it.
我有以下脚本,使用'Query_'搜索每一行,查找'*'并在其下打印列。
lines = [line.rstrip() for line in open('infile.txt')]
for line in lines:
data = line.split()
sequence = data[2]
if data[0].startswith("Query_"):
star_indicies = [i for i,c in enumerate(sequence) if c == '*']
else:
print list(sequence[star_index] for star_index in star_indicies)
The output is generated as follows:
输出生成如下:
['C', 'E', 'R', 'G']
['S', 'A', 'E', 'L']
['C', 'Q', 'E', 'F']
['H', 'A', 'E', 'R']
['L', 'E', 'H', 'H']
['S', 'S', 'K', 'R']
['L', 'E', 'A', 'A']
['R', 'K', 'E', 'S']
['E', 'R', 'E', 'R']
['Q', 'E', 'K', 'T']
['C', 'K', 'E', 'R']
['L', 'G', 'E', 'R']
['L', 'E', 'R', 'R']
['V', 'E', 'D', 'D']
['L', 'G', 'E', 'A']
['C', 'Q', 'S', 'N']
['L', 'E', 'Q', 'A']
How do I output each individual column to a new line such as this:
如何将每个列输出到新行,例如:
C, S, C, H, L, S, L, R, E, Q, C, L, L, V, L, C, L
E, A, Q, A, E, S, E, K, R, E, K, G, E, E, G, Q, E
R, E, E, E, H, K, A, E, E, K, E, E, R, D, E, S, Q
G, L, F, R, H, R, A, S, R, T, R, R, R, D, A, N, A
I can convert this to a string easily, but when I try to print the desired way I am unsuccessful, this is what I was using:
我可以轻松地将它转换为字符串,但是当我尝试打印所需的方法时,我不成功,这就是我使用的:
print ("\n".join(map(str,list(sequence[star_index] for star_index in star_indicies))),
3 个解决方案
#1
1
If you can have more than one "Query_" per file and in different order:
如果每个文件可以有不同的“Query_”且顺序不同:
lines = [line.rstrip().split() for line in open('infile.txt')]
# Load the indexes in one list, the sequences in another
# As shown in http://*.com/a/21023591/1688590
indexes, sequences = [], []
for line in lines:
target = indexes if line[0].startswith("Query_") else sequences
target.append(line[2])
for pos, char in enumerate(zip(*indexes)):
# Check if any of the "Query_" sequences has a * in that position
if "*" in char:
# Output every char in that position in the other secuences
print(", ".join([_[pos] for _ in sequences]))
#2
0
Just join each column of your last list with a comma
只需使用逗号加入上一个列表的每一列即可
result = list(sequence[star_index] for star_index in star_indices)
for i in range(len(result[0])):
print(", ".join([l[i] for l in result]))
Here's what you get
这是你得到的
In [2]: result = [['C', 'E', 'R', 'G'],
['S', 'A', 'E', 'L'],
['C', 'Q', 'E', 'F'],
['H', 'A', 'E', 'R'],
['L', 'E', 'H', 'H'],
['S', 'S', 'K', 'R'],
['L', 'E', 'A', 'A'],
['R', 'K', 'E', 'S'],
['E', 'R', 'E', 'R'],
['Q', 'E', 'K', 'T'],
['C', 'K', 'E', 'R'],
['L', 'G', 'E', 'R'],
['L', 'E', 'R', 'R'],
['V', 'E', 'D', 'D'],
['L', 'G', 'E', 'A'],
['C', 'Q', 'S', 'N'],
['L', 'E', 'Q', 'A']]
In [3]: for i in range(len(result[0])):
print(", ".join([l[i] for l in result]))
...:
C, S, C, H, L, S, L, R, E, Q, C, L, L, V, L, C, L
E, A, Q, A, E, S, E, K, R, E, K, G, E, E, G, Q, E
R, E, E, E, H, K, A, E, E, K, E, E, R, D, E, S, Q
G, L, F, R, H, R, A, S, R, T, R, R, R, D, A, N, A
Or, as others have suggested:
或者,正如其他人所建议的:
In [9]: for l in zip(*result):
...: print(", ".join(l))
...:
C, S, C, H, L, S, L, R, E, Q, C, L, L, V, L, C, L
E, A, Q, A, E, S, E, K, R, E, K, G, E, E, G, Q, E
R, E, E, E, H, K, A, E, E, K, E, E, R, D, E, S, Q
G, L, F, R, H, R, A, S, R, T, R, R, R, D, A, N, A
#3
0
Given the following
鉴于以下内容
In [7]: list(d2)
Out[7]:
[['C', 'E', 'R', 'G'],
['S', 'A', 'E', 'L'],
['C', 'Q', 'E', 'F'],
['H', 'A', 'E', 'R'],
['L', 'E', 'H', 'H'],
['S', 'S', 'K', 'R'],
['L', 'E', 'A', 'A'],
['R', 'K', 'E', 'S'],
['E', 'R', 'E', 'R'],
['Q', 'E', 'K', 'T'],
['C', 'K', 'E', 'R'],
['L', 'G', 'E', 'R'],
['L', 'E', 'R', 'R'],
['V', 'E', 'D', 'D'],
['L', 'G', 'E', 'A'],
['C', 'Q', 'S', 'N'],
['L', 'E', 'Q', 'A']]
You can use zip()
with the unpacking operator (*
) like so:
您可以将zip()与解包运算符(*)一起使用,如下所示:
In [8]: '\n'.join([' '.join(l) for l in zip(*d2)])
Out[8]: 'C S C H L S L R E Q C L L V L C L\nE A Q A E S E K R E K G E E G Q E\nR E E E H K A E E K E E R D E S Q\nG L F R H R A S R T R R R D A N A'
#1
1
If you can have more than one "Query_" per file and in different order:
如果每个文件可以有不同的“Query_”且顺序不同:
lines = [line.rstrip().split() for line in open('infile.txt')]
# Load the indexes in one list, the sequences in another
# As shown in http://*.com/a/21023591/1688590
indexes, sequences = [], []
for line in lines:
target = indexes if line[0].startswith("Query_") else sequences
target.append(line[2])
for pos, char in enumerate(zip(*indexes)):
# Check if any of the "Query_" sequences has a * in that position
if "*" in char:
# Output every char in that position in the other secuences
print(", ".join([_[pos] for _ in sequences]))
#2
0
Just join each column of your last list with a comma
只需使用逗号加入上一个列表的每一列即可
result = list(sequence[star_index] for star_index in star_indices)
for i in range(len(result[0])):
print(", ".join([l[i] for l in result]))
Here's what you get
这是你得到的
In [2]: result = [['C', 'E', 'R', 'G'],
['S', 'A', 'E', 'L'],
['C', 'Q', 'E', 'F'],
['H', 'A', 'E', 'R'],
['L', 'E', 'H', 'H'],
['S', 'S', 'K', 'R'],
['L', 'E', 'A', 'A'],
['R', 'K', 'E', 'S'],
['E', 'R', 'E', 'R'],
['Q', 'E', 'K', 'T'],
['C', 'K', 'E', 'R'],
['L', 'G', 'E', 'R'],
['L', 'E', 'R', 'R'],
['V', 'E', 'D', 'D'],
['L', 'G', 'E', 'A'],
['C', 'Q', 'S', 'N'],
['L', 'E', 'Q', 'A']]
In [3]: for i in range(len(result[0])):
print(", ".join([l[i] for l in result]))
...:
C, S, C, H, L, S, L, R, E, Q, C, L, L, V, L, C, L
E, A, Q, A, E, S, E, K, R, E, K, G, E, E, G, Q, E
R, E, E, E, H, K, A, E, E, K, E, E, R, D, E, S, Q
G, L, F, R, H, R, A, S, R, T, R, R, R, D, A, N, A
Or, as others have suggested:
或者,正如其他人所建议的:
In [9]: for l in zip(*result):
...: print(", ".join(l))
...:
C, S, C, H, L, S, L, R, E, Q, C, L, L, V, L, C, L
E, A, Q, A, E, S, E, K, R, E, K, G, E, E, G, Q, E
R, E, E, E, H, K, A, E, E, K, E, E, R, D, E, S, Q
G, L, F, R, H, R, A, S, R, T, R, R, R, D, A, N, A
#3
0
Given the following
鉴于以下内容
In [7]: list(d2)
Out[7]:
[['C', 'E', 'R', 'G'],
['S', 'A', 'E', 'L'],
['C', 'Q', 'E', 'F'],
['H', 'A', 'E', 'R'],
['L', 'E', 'H', 'H'],
['S', 'S', 'K', 'R'],
['L', 'E', 'A', 'A'],
['R', 'K', 'E', 'S'],
['E', 'R', 'E', 'R'],
['Q', 'E', 'K', 'T'],
['C', 'K', 'E', 'R'],
['L', 'G', 'E', 'R'],
['L', 'E', 'R', 'R'],
['V', 'E', 'D', 'D'],
['L', 'G', 'E', 'A'],
['C', 'Q', 'S', 'N'],
['L', 'E', 'Q', 'A']]
You can use zip()
with the unpacking operator (*
) like so:
您可以将zip()与解包运算符(*)一起使用,如下所示:
In [8]: '\n'.join([' '.join(l) for l in zip(*d2)])
Out[8]: 'C S C H L S L R E Q C L L V L C L\nE A Q A E S E K R E K G E E G Q E\nR E E E H K A E E K E E R D E S Q\nG L F R H R A S R T R R R D A N A'