如何从csv文件中获取唯一值

I have this csv file

我有这个csv文件

Cat, and, dog, bites
Yahoo, news, claims, a, cat, mated, with, a, dog, and, produced, viable, offspring
Cat, killer, likely, is, a, big, dog
Professional, free, advice, on, dog, training, puppy, training
Cat, and, kitten, training, and, behavior
Dog, &, Cat, provides, dog, training, in Eugene, Oregon
Dog, and, cat, is, a, slang, term, used, by, police, officers, for, a, male-female, relationship
Shop, for, your, show, dog, grooming, and, pet, supplies

I want to make all the words start with a small letter and create a list which will include all the unique items from the above csv file. Have you any idea? Thanks in advance! So far, I have managed to convert all the words with a small letter:

我想让所有单词以小写字母开头,并创建一个列表,其中包含上述csv文件中的所有唯一项。你知道吗?提前致谢!到目前为止,我已设法用小写字母转换所有单词:

unique_row_items = set([field.strip().lower() for field in row])

But i can't manage the other one.

但我无法管理另一个。

def unique():

    rows = list(csv.reader(open('example_1.csv', 'r'), delimiter=','))

    result = []

    for r in rows:
        key = r
        if key not in result:
            result.append(r)
    return result

Which does not give the results I want

哪个不给我想要的结果

2 个解决方案

#1

If you can't figure out how to do everything at once, do it step by step.

如果您无法弄清楚如何一次完成所有事情,请一步一步地完成。

So, let's write an explicit for statement over the rows:

所以,让我们在行上写一个明确的for语句:

result = []
# use `with` so the file gets closed
with open('example_1.csv', 'r') as f:
    # no need for `list` here
    rows = csv.reader(f, delimiter=',')
    for row in rows:
        # no need for `set([...])`, just `set(...)`
        unique_row_items = set(field.strip().lower() for field in row)
        for item in unique_row_items:
            if item not in result:
                result.append(item)

But if you look at this, you're trying to use a list as a set; it'll be easier (and more efficient) if you just use a set as a set; then you don't need the if … in check:

但是如果你看一下这个,你就是试图将一个列表用作一个集合;如果你只使用一个集合作为集合,它会更容易(也更有效率);然后你不需要if ... in check:

result = set()
with open('example_1.csv', 'r') as f:
    # no need for `list` here
    rows = csv.reader(f, delimiter=',')
    for row in rows:
        unique_row_items = set(field.strip().lower() for field in row)
        for item in unique_row_items:
            result.add(item)

And now, adding each element from one set to another is just unioning the sets, so you can replace those last two lines with, e.g.:

现在,将每个元素从一个集合添加到另一个集合只是将集合合并,因此您可以使用例如:

result |= unique_row_items

And now, if you want to turn it all back into one big expression, you can:

而现在,如果你想把它全部变成一个大表达式,你可以:

with open('example_1.csv', 'r') as f:
    result = set.union(*(set(field.strip().lower() for field in row)
                         for row in csv.reader(f, delimiter=',')))

Also, in Python 2.7+, you can just use a set comprehension, instead of calling set on a list comprehension or generator expression:

此外,在Python 2.7+中,您可以使用集合理解,而不是在列表理解或生成器表达式上调用set:

with open('example_1.csv', 'r') as f:
    result = set.union(*({field.strip().lower() for field in row}
                         for row in csv.reader(f, delimiter=',')))

In fact, you can even turn the whole thing into one big comprehension with a nested loop:

事实上,你甚至可以通过嵌套循环将整个事物变成一个大的理解:

with open('example_1.csv', 'r') as f:
    result = {field.strip().lower() 
              for row in csv.reader(f, delimiter=',')
              for field in row}

Or, alternatively, you don't have to make it one big expression:

或者,您也不必将其作为一个大表达:

with open('example_1.csv', 'r') as f:
    rows = csv.reader(f, delimiter=',')
    rowsets = ({field.strip().lower() for field in row} for row in rows)
    result = set.union(*rowsets)

Also, as Padraic Cunningham pointed out, one of the dialect options the csv module offers is skipinitialspace, which does just like it sounds like, so you don't need the strip anymore. For example, using the big set comprehension:

此外,正如Padraic Cunningham指出的那样,csv模块提供的方言选项之一是skipinitialspace,它就像它听起来一样,所以你不再需要条带了。例如,使用大集合理解:

with open('example_1.csv', 'r') as f:
    result = {field.lower() 
              for row in csv.reader(f, delimiter=',', skipinitialspace=True)
              for field in row}

Or, alternatively, it looks like your format is really using comma-space rather than comma as a delimiter, so:

或者,看起来您的格式实际上是使用逗号空格而不是逗号作为分隔符,因此:

with open('example_1.csv', 'r') as f:
    result = {field.lower() 
              for row in csv.reader(f, delimiter=', ')
              for field in row}

#2

To store all the words in lowercase , you can use .lower() method on strings and after creating a list of all the words in the list we create a set which returns only the unique values.

要以小写形式存储所有单词,可以在字符串上使用.lower()方法,在创建列表中所有单词的列表后,我们创建一个仅返回唯一值的集合。

with open("data_file.csv", "r") as data_file:
    all_words = []
    for line in data_file.readlines():
        for word in line.split(","):
            all_words.append(word.lower())

unique_words = set(all_words)
print unique_words

#1