Python,结合3列中的独特内容(Excel电子表格)

时间:2021-04-21 04:25:17

Good day.There’re some data in an Excel spreadsheet, structuring like this:

美好的一天。在Excel电子表格中有一些数据,结构如下:

Python,结合3列中的独特内容(Excel电子表格)

It’s wanted to put the unique values in the 3 columns together, nicely into a format like:

它希望将3列中的唯一值放在一起,很好地形成如下格式:

Mike to America for Hotel; Meal

and

Kate to Europe; America for Transport

etc

I can only work out for 2 columns.

我只能为2列工作。

the_file = xlrd.open_workbook("testing.xlsx")
the_sheet = the_file.sheet_by_name("Sheet1")

products = defaultdict(list)

for row_index in range(1, the_sheet.nrows):
    products[str(the_sheet.cell(row_index, 0).value)].append(the_sheet.cell(row_index, 1).value) 

for product, v in products.items()
    print product + " to " + ";".join(set(v))

the output is:

输出是:

Mike to America
Hulk to America;Asia
Kate to Europe;America
Dave to Europe
Jack to Europe;America;Asia
Luci to Asia

What’s the way to make the lines work for 3 columns together?
Thank you.

是什么方法可以使线条一起工作3列?谢谢。

3 个解决方案

#1


1  

Think there's a more pythonic way to do it but this is what I came up with:

认为有更多的pythonic方式,但这是我想出来的:

from collections import defaultdict


l = [
    ['mike', 'america', 'hotel'],
    ['mike', 'america', 'meal'],
    ['jack', 'america', 'meal'],
    ['jack', 'europe', 'hotel'],
    ['jack', 'america', 'bonus'],
    ['jack', 'asia', 'hotel'],
    ['dave', 'europe', 'meal'],
]

people = defaultdict(list)
people_places = defaultdict(list)

for row_index in range(len(l)):
    people[l[row_index][0]].append(l[row_index][1])
    people_places[l[row_index][0] + '|' + l[row_index][1]].append(l[row_index][2])

for p, k in people.items():
    activity = []
    for place in k:
        activity += people_places[p + '|' + place]
    print '{} to {} for {}'.format(
        p,
        ';'.join(set(k)),
        ';'.join(set(activity))
    )

You can translate the code to use directly spreadsheet rows and cells or extract the list l first with something like:

您可以将代码翻译为直接使用电子表格行和单元格,或者首先使用以下内容提取列表:

l = []
with xlrd.open_workbook("testing.xlsx") as the_file:
    the_sheet = the_file.sheet_by_name("Sheet1")

    for row_index in range(1, the_sheet.nrows):
        l.append([
            the_sheet.cell(row_index, 0).value, 
            the_sheet.cell(row_index, 1).value, 
            the_sheet.cell(row_index, 2).value])

#2


1  

First you extract the rows you want here I make it as a nested list i.e. [[col1, col2, col3],[col1, col2, col3]]

首先你提取你想要的行我把它作为嵌套列表,即[[col1,col2,col3],[col1,col2,col3]]

box = list()
bigbox = []
for i in range(len(the_sheet.col(1))):
    if i > 2:
        for j in range(1,4):
            box.append(str(the_sheet.col(j)[i]).split(":")[1])
        bigbox.append(box)
        box = []

print bigbox

And then I convert the nested list into a nested dictionary of nested set i.e. {'name':{'travel': set of travel, 'expense': set of expense}, ...}

然后我将嵌套列表转换为嵌套字典的嵌套字典,即{'name':{'travel':旅行集,'费用':费用集},...}

dbox = dict()

for name, travel, expense in bigbox:
    if name not in dbox:
        dbox[name] = {'travel': {travel}, 'expense': {expense}}
    else:
        dbox[name]['travel'].add(travel)
        dbox[name]['expense'].add(expense)

print dbox

lastly you print it out using some voodoo magic read the doc for more info

最后你用一些巫毒魔法打印出来阅读文档了解更多信息

for name in dbox:
    print(name, 'to', "; ".join(dbox[name]['travel']), 'for', "; ".join(dbox[name]['expense']))

Hope this help I want to complain on how you don't give me the excel file I need to create that on my own next time include it and this is familar to some programming courses assignment??

希望这个帮助我想抱怨你怎么不给我我需要创建的excel文件,我自己下次包含它,这是一些编程课程分配的熟悉?

#3


1  

Solution that comes to me at this moment :

此时遇到的解决方案:

from collections import defaultdict

the_file = xlrd.open_workbook("4_test.xlsx")
the_sheet = the_file.sheet_by_name("Sheet1")

nested_dict = lambda: defaultdict(nested_dict)
_dict = nested_dict()

for row_index in range(1, the_sheet.nrows):
    expense = []
    travel = []
    name = str(the_sheet.cell(row_index, 0).value)
    for row_index_1 in range(1, the_sheet.nrows):
        if name == str(the_sheet.cell(row_index_1, 0).value):
            travel.append(str(the_sheet.cell(row_index_1, 1).value))
            expense.append(str(the_sheet.cell(row_index_1, 2).value))
            _dict[name]['travel'] = travel
            _dict[name]['expense']= expense

for name in _dict:
    print name + " to "+ ",".join(set(_dict[name]['travel'])) + " for " + ",".join(set(_dict[name]['expense']))

Output ::

Pintu to Europe for Bonous

Pintu到欧洲为Bonous

Jack to Europe,America for Hotel,Meal

杰克到欧洲,美国为酒店,用餐

Mike to America for Bonous,Hotel,Transport

迈克到美国为Bonous,酒店,运输

#1


1  

Think there's a more pythonic way to do it but this is what I came up with:

认为有更多的pythonic方式,但这是我想出来的:

from collections import defaultdict


l = [
    ['mike', 'america', 'hotel'],
    ['mike', 'america', 'meal'],
    ['jack', 'america', 'meal'],
    ['jack', 'europe', 'hotel'],
    ['jack', 'america', 'bonus'],
    ['jack', 'asia', 'hotel'],
    ['dave', 'europe', 'meal'],
]

people = defaultdict(list)
people_places = defaultdict(list)

for row_index in range(len(l)):
    people[l[row_index][0]].append(l[row_index][1])
    people_places[l[row_index][0] + '|' + l[row_index][1]].append(l[row_index][2])

for p, k in people.items():
    activity = []
    for place in k:
        activity += people_places[p + '|' + place]
    print '{} to {} for {}'.format(
        p,
        ';'.join(set(k)),
        ';'.join(set(activity))
    )

You can translate the code to use directly spreadsheet rows and cells or extract the list l first with something like:

您可以将代码翻译为直接使用电子表格行和单元格,或者首先使用以下内容提取列表:

l = []
with xlrd.open_workbook("testing.xlsx") as the_file:
    the_sheet = the_file.sheet_by_name("Sheet1")

    for row_index in range(1, the_sheet.nrows):
        l.append([
            the_sheet.cell(row_index, 0).value, 
            the_sheet.cell(row_index, 1).value, 
            the_sheet.cell(row_index, 2).value])

#2


1  

First you extract the rows you want here I make it as a nested list i.e. [[col1, col2, col3],[col1, col2, col3]]

首先你提取你想要的行我把它作为嵌套列表,即[[col1,col2,col3],[col1,col2,col3]]

box = list()
bigbox = []
for i in range(len(the_sheet.col(1))):
    if i > 2:
        for j in range(1,4):
            box.append(str(the_sheet.col(j)[i]).split(":")[1])
        bigbox.append(box)
        box = []

print bigbox

And then I convert the nested list into a nested dictionary of nested set i.e. {'name':{'travel': set of travel, 'expense': set of expense}, ...}

然后我将嵌套列表转换为嵌套字典的嵌套字典,即{'name':{'travel':旅行集,'费用':费用集},...}

dbox = dict()

for name, travel, expense in bigbox:
    if name not in dbox:
        dbox[name] = {'travel': {travel}, 'expense': {expense}}
    else:
        dbox[name]['travel'].add(travel)
        dbox[name]['expense'].add(expense)

print dbox

lastly you print it out using some voodoo magic read the doc for more info

最后你用一些巫毒魔法打印出来阅读文档了解更多信息

for name in dbox:
    print(name, 'to', "; ".join(dbox[name]['travel']), 'for', "; ".join(dbox[name]['expense']))

Hope this help I want to complain on how you don't give me the excel file I need to create that on my own next time include it and this is familar to some programming courses assignment??

希望这个帮助我想抱怨你怎么不给我我需要创建的excel文件,我自己下次包含它,这是一些编程课程分配的熟悉?

#3


1  

Solution that comes to me at this moment :

此时遇到的解决方案:

from collections import defaultdict

the_file = xlrd.open_workbook("4_test.xlsx")
the_sheet = the_file.sheet_by_name("Sheet1")

nested_dict = lambda: defaultdict(nested_dict)
_dict = nested_dict()

for row_index in range(1, the_sheet.nrows):
    expense = []
    travel = []
    name = str(the_sheet.cell(row_index, 0).value)
    for row_index_1 in range(1, the_sheet.nrows):
        if name == str(the_sheet.cell(row_index_1, 0).value):
            travel.append(str(the_sheet.cell(row_index_1, 1).value))
            expense.append(str(the_sheet.cell(row_index_1, 2).value))
            _dict[name]['travel'] = travel
            _dict[name]['expense']= expense

for name in _dict:
    print name + " to "+ ",".join(set(_dict[name]['travel'])) + " for " + ",".join(set(_dict[name]['expense']))

Output ::

Pintu to Europe for Bonous

Pintu到欧洲为Bonous

Jack to Europe,America for Hotel,Meal

杰克到欧洲,美国为酒店,用餐

Mike to America for Bonous,Hotel,Transport

迈克到美国为Bonous,酒店,运输