I am trying to figure out how to identify duplicate elements in a list of lines that has been split into four elements per line. I then need to keep the line with the original element and remove all lines containing the duplicate element.
我试图找出如何识别每行分成四个元素的行列表中的重复元素。然后我需要保持原始元素的行,并删除包含重复元素的所有行。
For example:
123, jon, doe, $50
123, bob, smith, $25
456, jane, jones, $60
The desired output should be:
期望的输出应该是:
warning! duplicate: 123
and the list should then be read like this:
然后应该像这样读取列表:
123, jon, doe, $50
456, jane, jones, $60
The list is very long, and so far I have tried looping through, but all I can seem to do is print out the zeroth element. I don't know how to identify and remove the lines containing a duplicate element from the list.
列表很长,到目前为止我已尝试循环,但我似乎只能打印出第0个元素。我不知道如何从列表中识别和删除包含重复元素的行。
My guess is that the code should come before the last line, so that after the original list has been cleared of duplicates, what remains will be appended. If someone can help me with this I would appreciate it. This is my first question and I have tried my best to abide by all stated policies. I'm using Python 3. Thank you.
我的猜测是代码应该在最后一行之前,以便在原始列表被清除重复之后,将附加剩余的内容。如果有人可以帮助我,我会很感激。这是我的第一个问题,我尽力遵守所有规定的政策。我正在使用Python 3.谢谢。
class BankAccount:
def __init__(self, account_num, first_name, last_name, decimal_val):
self.account_num = account_num
self.first_name = first_name
self.last_name = last_name
self.decimal_val = float(decimal_val)
def __str__(self):
return (self.account_num+", "+ self.last_name+", "+self.first_name+", "+str(self.decimal_val))
def __eq__(self, other):
if self.account_num == other.account_num:
print("Warning! Account number already exists:"+self.account_num)
from BankAccount import *
total, count, average = 0, 0, 0
customer_money = [] # for a different part that is working
with open("accounts.csv", "r") as file: #original file
contents = file.readlines()
customers = []
for i in range(1,len(contents)):
line = contents[i].split(",") #splits each line into four elements
customers.append(BankAccount(line[0], line[1], line[2], line[3]))
2 个解决方案
#1
0
I'd use a dictionary here, since you want the id number to be unique, and to be able to fail if it's repeated.
我在这里使用字典,因为你希望id号是唯一的,并且如果重复则能够失败。
Also you don't need to generate a list from readlines()
, or use range
to iterate over that list - you can loop on the file
object directly. So something like:
此外,您不需要从readlines()生成列表,或使用范围迭代该列表 - 您可以直接循环文件对象。所以类似于:
customers = {}
with open("accounts.csv", "r") as file: #original file
for i in file:
i = i.strip()
line = i.split(",")
if not line[0] in customers:
customers[line[0]] = BankAccount(line[0], line[1], line[2], line[3])
else:
print("Duplicate!", line)
You can then use customers.values()
if you just need a list of BankAccount
objects.
如果您只需要BankAccount对象列表,则可以使用customers.values()。
#2
0
Your code is almost there. First, your __eq__
method needs to be a bit different: don't try to print anything there, just indicate whether or not the two objects should be considered duplicates. That looks like this:
你的代码几乎就在那里。首先,你的__eq__方法需要有点不同:不要尝试在那里打印任何东西,只是指出这两个对象是否应该被认为是重复的。看起来像这样:
def __eq__(self, other):
return self.account_id == other.account_id
Then you can take advantage of the in
operator to filter your duplicates. Here's an example:
然后,您可以利用in运算符来过滤重复项。这是一个例子:
one = BankAccount(123, 'John', 'Doe', 39.5)
customers = [one]
two = BankAccount(123, 'Fred', 'Smith', 96.2)
assert(two in customers) # This is true
The last step is to add a check to your for loop before adding a new customer to your list:
最后一步是在将新客户添加到列表之前向for循环添加一个检查:
customers = []
for i in range(1,len(contents)):
line = contents[i].split(",") #splits each line into four elements
account = BankAccount(line[0], line[1], line[2], line[3])
if account in customers:
print("Duplicate account: {}".format(account.id))
else:
customers.append(account)
Note that there are plenty of other ways to achieve your goal, and some of them are probably more efficient, but I wanted to show you a solution that was very close to what you already had.
请注意,还有很多其他方法可以实现您的目标,其中一些可能更有效,但我想向您展示一个非常接近您已有的解决方案。
One more note: your __str__
method doesn't quite work either - you need to change self.account_id
to str(self.account_id)
. Once you do that you can change the "Duplicate account" message above to print("Duplicate account: {}".format(account))
so you get more information.
还有一点需要注意:你的__str__方法也不起作用 - 你需要将self.account_id更改为str(self.account_id)。完成后,您可以更改上面的“重复帐户”消息进行打印(“重复帐户:{}”。格式(帐户)),以便获得更多信息。
#1
0
I'd use a dictionary here, since you want the id number to be unique, and to be able to fail if it's repeated.
我在这里使用字典,因为你希望id号是唯一的,并且如果重复则能够失败。
Also you don't need to generate a list from readlines()
, or use range
to iterate over that list - you can loop on the file
object directly. So something like:
此外,您不需要从readlines()生成列表,或使用范围迭代该列表 - 您可以直接循环文件对象。所以类似于:
customers = {}
with open("accounts.csv", "r") as file: #original file
for i in file:
i = i.strip()
line = i.split(",")
if not line[0] in customers:
customers[line[0]] = BankAccount(line[0], line[1], line[2], line[3])
else:
print("Duplicate!", line)
You can then use customers.values()
if you just need a list of BankAccount
objects.
如果您只需要BankAccount对象列表,则可以使用customers.values()。
#2
0
Your code is almost there. First, your __eq__
method needs to be a bit different: don't try to print anything there, just indicate whether or not the two objects should be considered duplicates. That looks like this:
你的代码几乎就在那里。首先,你的__eq__方法需要有点不同:不要尝试在那里打印任何东西,只是指出这两个对象是否应该被认为是重复的。看起来像这样:
def __eq__(self, other):
return self.account_id == other.account_id
Then you can take advantage of the in
operator to filter your duplicates. Here's an example:
然后,您可以利用in运算符来过滤重复项。这是一个例子:
one = BankAccount(123, 'John', 'Doe', 39.5)
customers = [one]
two = BankAccount(123, 'Fred', 'Smith', 96.2)
assert(two in customers) # This is true
The last step is to add a check to your for loop before adding a new customer to your list:
最后一步是在将新客户添加到列表之前向for循环添加一个检查:
customers = []
for i in range(1,len(contents)):
line = contents[i].split(",") #splits each line into four elements
account = BankAccount(line[0], line[1], line[2], line[3])
if account in customers:
print("Duplicate account: {}".format(account.id))
else:
customers.append(account)
Note that there are plenty of other ways to achieve your goal, and some of them are probably more efficient, but I wanted to show you a solution that was very close to what you already had.
请注意,还有很多其他方法可以实现您的目标,其中一些可能更有效,但我想向您展示一个非常接近您已有的解决方案。
One more note: your __str__
method doesn't quite work either - you need to change self.account_id
to str(self.account_id)
. Once you do that you can change the "Duplicate account" message above to print("Duplicate account: {}".format(account))
so you get more information.
还有一点需要注意:你的__str__方法也不起作用 - 你需要将self.account_id更改为str(self.account_id)。完成后,您可以更改上面的“重复帐户”消息进行打印(“重复帐户:{}”。格式(帐户)),以便获得更多信息。