迭代两个文件,比较匹配的字符串,合并匹配的行。

时间:2022-03-29 20:20:27

I have two files with a list of organisms. The first file contains a list indicating 'Family Genus', so two columns. The second file contains 'Genus species', also two columns. Both files coincide in having the Genus of all the listed species. I want to merge both lists using each file's Genus to be able to add the Family name to the 'Genus species'. Thus, the output should contain 'Family Genus species'. Since there is a space between each name, I am using that space to split into columns. So far this is the code I have:

我有两个文件,上面有生物的列表。第一个文件包含一个表示“Family属”的列表,因此有两列。第二个文件包含“属物种”,也是两列。这两个文件都有列出的所有物种的属。我想使用每个文件的属来合并两个列表,以便能够将家族名称添加到“属物种”中。因此,输出应该包含“家族属物种”。由于每个名称之间有一个空格,所以我使用该空间将其拆分为列。到目前为止,这是我的代码:

with open('FAMILY_GENUS.TXT') as f1, open('GENUS_SPECIES.TXT') as f2:
    for line1 in f1:
        line1 = line1.strip()
        c1 = line1.split(' ')
        print(line1, end=' ')
        for line2 in f2:
            line2 = line2.strip()
            c2 = line2.split(' ')
            if line1[1] == line2[0]:
                print(line2[1], end=' ')
        print()

The resulting output is composed of only two lines, and not the entire record. What am I missing?

结果输出仅由两行组成,而不是整个记录。我缺少什么?

And also, how can I save it to a file instead of just printing on the screen?

而且,我如何将它保存到一个文件中而不是在屏幕上打印?

2 个解决方案

#1


3  

This is an alternative solution.

这是另一种解决方案。

f1 = open('fg','r')
f2 = open('gs','r')
genera= {}
for i in f1.readlines():
    family,genus = i.strip().split(" ")
    genera[genus] = family

for i in f2.readlines():
    genus,species = i.strip().split(" ")
    print(genera[genus], genus,species)

#2


0  

I would process the files first and get a mapping of genus to to family and to the multiple species it may contain. Then use that mapping to match them up and print them out.

我将首先处理这些文件,并得到一个属到科和它可能包含的多个物种的映射。然后使用映射将它们匹配起来并打印出来。

genuses = {}

# Map all genuses to a family
with open('FAMILY_GENUS.TXT') as f1:
    for line in f1:
        family, genus = line.strip().split()
        genuses.setdefault(genus, {})['family'] = family

# Map all species to a genus
with open('GENUS_SPECIES.TXT') as f2:
    for line in f2:
        genus, species = line.strip().split()
        genuses.setdefault(genus, {}).setdefault('species', []).append(species)

# Go through each genus and create a specie string for
# each specie it contains.
species_strings = []
for genus, d in genuses.items():
    family = d.get('family')
    species = d.get('species')
    if family and species:
        for specie in species:
            s = '{0} {1} {2}'.format(family, genus, specie)
            species_strings.append(s)

# Sort the strings to make the output pretty and print them out.
species_strings.sort()
for s in species_strings:
    print s

#1


3  

This is an alternative solution.

这是另一种解决方案。

f1 = open('fg','r')
f2 = open('gs','r')
genera= {}
for i in f1.readlines():
    family,genus = i.strip().split(" ")
    genera[genus] = family

for i in f2.readlines():
    genus,species = i.strip().split(" ")
    print(genera[genus], genus,species)

#2


0  

I would process the files first and get a mapping of genus to to family and to the multiple species it may contain. Then use that mapping to match them up and print them out.

我将首先处理这些文件,并得到一个属到科和它可能包含的多个物种的映射。然后使用映射将它们匹配起来并打印出来。

genuses = {}

# Map all genuses to a family
with open('FAMILY_GENUS.TXT') as f1:
    for line in f1:
        family, genus = line.strip().split()
        genuses.setdefault(genus, {})['family'] = family

# Map all species to a genus
with open('GENUS_SPECIES.TXT') as f2:
    for line in f2:
        genus, species = line.strip().split()
        genuses.setdefault(genus, {}).setdefault('species', []).append(species)

# Go through each genus and create a specie string for
# each specie it contains.
species_strings = []
for genus, d in genuses.items():
    family = d.get('family')
    species = d.get('species')
    if family and species:
        for specie in species:
            s = '{0} {1} {2}'.format(family, genus, specie)
            species_strings.append(s)

# Sort the strings to make the output pretty and print them out.
species_strings.sort()
for s in species_strings:
    print s