Python - 打印存在于2个文件中的字符串

时间:2021-10-26 11:05:30

I have 2 files containing multiple strings, fileA.txt and fileB.txt.

我有2个包含多个字符串的文件,fileA.txt和fileB.txt。

fileA.txt:

fileA.txt:

hello hi 
how

fileB.txt:

fileB.txt:

hello how are you

I am trying to write a program that will see if a string exists in both files. If it does, print the string or multiple strings.

我正在尝试编写一个程序,看看两个文件中是否存在字符串。如果是,则打印字符串或多个字符串。

The results would print "hello", and "how" as they exist in both files.

结果将在两个文件中打印“hello”和“how”。

I am having trouble executing this as I have only been able to work with strings that I define, rather than unknown strings in the file:

我无法执行此操作,因为我只能使用我定义的字符串,而不是文件中的未知字符串:

with open("fileA.txt", 'r') as fileA, open ("fileB.txt") as fileB:
    for stringsA in fileA:

        for stringsB in fileB:

            if stringsA in stringsB:
                print("true")

Any assistance would be appreciated.

任何援助将不胜感激。

3 个解决方案

#1


5  

Files iterate by lines, not words. You'll have to split the words:

文件按行而不是单词迭代。你必须分开这些词:

>>> with open('fileA.txt') as a, open('fileB.txt') as b:
...     a_words = set(a.read().split())
...     b_words = set(b.read().split())
...     print('\n'.join(a_words & b_words))
...     
hello
how

#2


1  

A simple solution would be to construct a list of distinct words for each file and check for common words.

一个简单的解决方案是为每个文件构建一个不同单词列表,并检查常用单词。

Python's Set datatype would be very helpful in this case. https://docs.python.org/3.6/library/stdtypes.html#set

在这种情况下,Python的Set数据类型将非常有用。 https://docs.python.org/3.6/library/stdtypes.html#set

#3


1  

You first want to get a list of all unique strings in fileA. Then get a similar unique list for fileB. Then compare the two. Using set's makes the comparison easier.

您首先要获取fileA中所有唯一字符串的列表。然后获取fileB的类似唯一列表。然后比较两者。使用set可以使比较更容易。

def get_strings_from_file(f):
    return set([s.strip() for s in f.read().split() if s.strip()])

def main():
    with open("fileA.txt", 'r') as fileA, open ("fileB.txt") as fileB:
        stringsA = get_strings_from_file(fileA)
        stringsB = get_strings_from_file(fileB)
        return stringsA.intersection(stringsB)

#1


5  

Files iterate by lines, not words. You'll have to split the words:

文件按行而不是单词迭代。你必须分开这些词:

>>> with open('fileA.txt') as a, open('fileB.txt') as b:
...     a_words = set(a.read().split())
...     b_words = set(b.read().split())
...     print('\n'.join(a_words & b_words))
...     
hello
how

#2


1  

A simple solution would be to construct a list of distinct words for each file and check for common words.

一个简单的解决方案是为每个文件构建一个不同单词列表,并检查常用单词。

Python's Set datatype would be very helpful in this case. https://docs.python.org/3.6/library/stdtypes.html#set

在这种情况下,Python的Set数据类型将非常有用。 https://docs.python.org/3.6/library/stdtypes.html#set

#3


1  

You first want to get a list of all unique strings in fileA. Then get a similar unique list for fileB. Then compare the two. Using set's makes the comparison easier.

您首先要获取fileA中所有唯一字符串的列表。然后获取fileB的类似唯一列表。然后比较两者。使用set可以使比较更容易。

def get_strings_from_file(f):
    return set([s.strip() for s in f.read().split() if s.strip()])

def main():
    with open("fileA.txt", 'r') as fileA, open ("fileB.txt") as fileB:
        stringsA = get_strings_from_file(fileA)
        stringsB = get_strings_from_file(fileB)
        return stringsA.intersection(stringsB)