I have 2 files containing multiple strings, fileA.txt
and fileB.txt
.
我有2个包含多个字符串的文件,fileA.txt和fileB.txt。
fileA.txt:
fileA.txt:
hello hi
how
fileB.txt:
fileB.txt:
hello how are you
I am trying to write a program that will see if a string exists in both files. If it does, print the string or multiple strings.
我正在尝试编写一个程序,看看两个文件中是否存在字符串。如果是,则打印字符串或多个字符串。
The results would print "hello", and "how" as they exist in both files.
结果将在两个文件中打印“hello”和“how”。
I am having trouble executing this as I have only been able to work with strings that I define, rather than unknown strings in the file:
我无法执行此操作,因为我只能使用我定义的字符串,而不是文件中的未知字符串:
with open("fileA.txt", 'r') as fileA, open ("fileB.txt") as fileB:
for stringsA in fileA:
for stringsB in fileB:
if stringsA in stringsB:
print("true")
Any assistance would be appreciated.
任何援助将不胜感激。
3 个解决方案
#1
5
Files iterate by lines, not words. You'll have to split the words:
文件按行而不是单词迭代。你必须分开这些词:
>>> with open('fileA.txt') as a, open('fileB.txt') as b:
... a_words = set(a.read().split())
... b_words = set(b.read().split())
... print('\n'.join(a_words & b_words))
...
hello
how
#2
1
A simple solution would be to construct a list of distinct words for each file and check for common words.
一个简单的解决方案是为每个文件构建一个不同单词列表,并检查常用单词。
Python's Set datatype would be very helpful in this case. https://docs.python.org/3.6/library/stdtypes.html#set
在这种情况下,Python的Set数据类型将非常有用。 https://docs.python.org/3.6/library/stdtypes.html#set
#3
1
You first want to get a list of all unique strings in fileA
. Then get a similar unique list for fileB
. Then compare the two. Using set
's makes the comparison easier.
您首先要获取fileA中所有唯一字符串的列表。然后获取fileB的类似唯一列表。然后比较两者。使用set可以使比较更容易。
def get_strings_from_file(f):
return set([s.strip() for s in f.read().split() if s.strip()])
def main():
with open("fileA.txt", 'r') as fileA, open ("fileB.txt") as fileB:
stringsA = get_strings_from_file(fileA)
stringsB = get_strings_from_file(fileB)
return stringsA.intersection(stringsB)
#1
5
Files iterate by lines, not words. You'll have to split the words:
文件按行而不是单词迭代。你必须分开这些词:
>>> with open('fileA.txt') as a, open('fileB.txt') as b:
... a_words = set(a.read().split())
... b_words = set(b.read().split())
... print('\n'.join(a_words & b_words))
...
hello
how
#2
1
A simple solution would be to construct a list of distinct words for each file and check for common words.
一个简单的解决方案是为每个文件构建一个不同单词列表,并检查常用单词。
Python's Set datatype would be very helpful in this case. https://docs.python.org/3.6/library/stdtypes.html#set
在这种情况下,Python的Set数据类型将非常有用。 https://docs.python.org/3.6/library/stdtypes.html#set
#3
1
You first want to get a list of all unique strings in fileA
. Then get a similar unique list for fileB
. Then compare the two. Using set
's makes the comparison easier.
您首先要获取fileA中所有唯一字符串的列表。然后获取fileB的类似唯一列表。然后比较两者。使用set可以使比较更容易。
def get_strings_from_file(f):
return set([s.strip() for s in f.read().split() if s.strip()])
def main():
with open("fileA.txt", 'r') as fileA, open ("fileB.txt") as fileB:
stringsA = get_strings_from_file(fileA)
stringsB = get_strings_from_file(fileB)
return stringsA.intersection(stringsB)