I am familiar with basic concepts of reading and writing a csv file in python. But I am stuck to make a logic for this problem. I think GROUP BY can solve my problem but how one can do in python
我熟悉在python中读取和编写csv文件的基本概念。但我坚持为这个问题制定逻辑。我认为GROUP BY可以解决我的问题,但是如何在python中做到
Category Data
A Once upon a time.
A There was a king.
A who ruled a great and glorious nation.
B He loved each of them dearly.
B One day, when the young ladies were of age to be married.
B terrible, three-headed dragon laid.
C It is so difficult to deny
C the reality
I want to make logic for such an output that data with category A merges to one row and same for category B and C like this.
我想为这样的输出制作逻辑,即类别A的数据合并为一行,类似于B和C的类似。
Category Data
A Once upon a time. There was a king. who ruled a great and glorious nation.
B He loved each of them dearly. One day, when the young ladies were of age to be married. terrible, three-headed dragon laid.
C It is so difficult to deny the reality
Please if anyone of you can help me out with this logic I would appreciate his effort.
如果你们中的任何人能够帮助我解决这个逻辑,我将不胜感激他的努力。
2 个解决方案
#1
2
With pandas
library you can use groupby
and make a custom aggregate function that just concatenates each category's Data
使用pandas库,您可以使用groupby并创建一个自定义聚合函数,该函数只是连接每个类别的数据
>>> import pandas as pd
>>> data = [['A', 'Once upon a time.'], ['A', 'There was a king.'], ['A', 'who ruled a great and glorious nation.'], ['B', 'He loved each of them dearly. '], ['B', 'One day, when the young ladies were of age to be married. '], ['B', 'terrible, three-headed dragon laid. '], ['C', 'It is so difficult to deny '], ['C', 'the reality']]
>>> df = pd.DataFrame(data=data, columns=['Category','Data'])
>>> df
Category Data
0 A Once upon a time.
1 A There was a king.
2 A who ruled a great and glorious nation.
3 B He loved each of them dearly.
4 B One day, when the young ladies were of age to ...
5 B terrible, three-headed dragon laid.
6 C It is so difficult to deny
7 C the reality
>>> df.groupby('Category').agg({'Data': lambda x : ' '.join(x)})
Data
Category
A Once upon a time. There was a king. who ruled ...
B He loved each of them dearly. One day, when t...
C It is so difficult to deny the reality
#2
1
itertools.groupby
can help (assuming the letters in your first row are ordered):
itertools.groupby可以提供帮助(假设第一行中的字母是有序的):
from itertools import groupby
from io import StringIO
text = '''Category Data
A Once upon a time.
A There was a king.
A who ruled a great and glorious nation.
B He loved each of them dearly.
B One day, when the young ladies were of age to be married.
B terrible, three-headed dragon laid.
C It is so difficult to deny
C the reality
'''
with StringIO(text) as file:
next(file) # skip header
rows = (row.split(' ') for row in file)
for key, items in groupby(rows, key=lambda x: x[0]):
phrases = (item[1].strip() for item in items)
print(key, ' '.join(phrases))
which gives:
A Once upon a time. There was a king. who ruled a great and glorious nation.
B He loved each of them dearly. One day, when the young ladies were of age to be married. terrible, three-headed dragon laid.
C It is so difficult to deny the reality
if your data is in a file, you have to replace the with StringIO(text) as file:
above with:
如果您的数据在文件中,则必须将with StringIO(text)替换为file:上面的:
with('textfile.txt') as file:
# do stuff as above with file
#1
2
With pandas
library you can use groupby
and make a custom aggregate function that just concatenates each category's Data
使用pandas库,您可以使用groupby并创建一个自定义聚合函数,该函数只是连接每个类别的数据
>>> import pandas as pd
>>> data = [['A', 'Once upon a time.'], ['A', 'There was a king.'], ['A', 'who ruled a great and glorious nation.'], ['B', 'He loved each of them dearly. '], ['B', 'One day, when the young ladies were of age to be married. '], ['B', 'terrible, three-headed dragon laid. '], ['C', 'It is so difficult to deny '], ['C', 'the reality']]
>>> df = pd.DataFrame(data=data, columns=['Category','Data'])
>>> df
Category Data
0 A Once upon a time.
1 A There was a king.
2 A who ruled a great and glorious nation.
3 B He loved each of them dearly.
4 B One day, when the young ladies were of age to ...
5 B terrible, three-headed dragon laid.
6 C It is so difficult to deny
7 C the reality
>>> df.groupby('Category').agg({'Data': lambda x : ' '.join(x)})
Data
Category
A Once upon a time. There was a king. who ruled ...
B He loved each of them dearly. One day, when t...
C It is so difficult to deny the reality
#2
1
itertools.groupby
can help (assuming the letters in your first row are ordered):
itertools.groupby可以提供帮助(假设第一行中的字母是有序的):
from itertools import groupby
from io import StringIO
text = '''Category Data
A Once upon a time.
A There was a king.
A who ruled a great and glorious nation.
B He loved each of them dearly.
B One day, when the young ladies were of age to be married.
B terrible, three-headed dragon laid.
C It is so difficult to deny
C the reality
'''
with StringIO(text) as file:
next(file) # skip header
rows = (row.split(' ') for row in file)
for key, items in groupby(rows, key=lambda x: x[0]):
phrases = (item[1].strip() for item in items)
print(key, ' '.join(phrases))
which gives:
A Once upon a time. There was a king. who ruled a great and glorious nation.
B He loved each of them dearly. One day, when the young ladies were of age to be married. terrible, three-headed dragon laid.
C It is so difficult to deny the reality
if your data is in a file, you have to replace the with StringIO(text) as file:
above with:
如果您的数据在文件中,则必须将with StringIO(text)替换为file:上面的:
with('textfile.txt') as file:
# do stuff as above with file