I have a file that contains three attributes: id, text and date. There are roughly 70K records in this file. I am looking to add this data to a dictionary and then sort it by date. Below is the code.
我有一个包含三个属性的文件:id,text和date。此文件中大约有70K记录。我希望将这些数据添加到字典中,然后按日期对其进行排序。下面是代码。
matchinput = csv.reader(open(filename,"rb"),delimiter=',', quotechar='|')
tweets = []
for row in matchinput:
data = dict()
data['id']=str(row[0])
data['text']=str(row[1])
data['date']=str(row[2])
tweets.append(data)
sorted(tweets, key=lambda tweets: tweets[2])
print tweets
Code is giving below error:
代码提供以下错误:
sorted(tweets, key=lambda tweets: tweets[2])
KeyError: 2
Input File:
566561942949474304,"lala is only 52 runs and 7 wickets away from being the only player to score 8000 runs and take 400 wickets in odi's !!! #pakvsind #cwc15",2015-02-14 22:37:48
566561925178200064,"rt @shoaibakhtarpk: captain @misbahulhaqpk, speaking to media, says want to make history by wining match against india #cwc15#pakvind #ind",2015-02-14 22:37:43
Output File:
566561925178200064,"rt @shoaibakhtarpk: captain @misbahulhaqpk, speaking to media, says want to make history by wining match against india #cwc15#pakvind #ind",2015-02-14 22:37:43
566561942949474304,"lala is only 52 runs and 7 wickets away from being the only player to score 8000 runs and take 400 wickets in odi's !!! #pakvsind #cwc15",2015-02-14 22:37:48
1 个解决方案
#1
Why not store each row as a list
/tuple
knowing that row[0] = id
, row[1] = text
, and row[2] = date
as you already assume when parsing the csv file. That way, each id/text/date combo is kept together:
为什么不将每一行存储为列表/元组,知道row [0] = id,row [1] = text,row [2] = date,因为您在解析csv文件时已经假设了。这样,每个id / text / date组合都保存在一起:
# to take care of any fileio cleanup and clean unnecessary lines
with open(filename, 'rb') as csvfile:
data = [row for row in csv.reader(csvfile, delimiter=',', quotechar='|')
sorted_data = sorted(data, key=lambda t: t[-1]) # or t[2]
and if you want the ids, texts, dates separated, you can use zip
:
如果你想要id,文本,日期分开,你可以使用zip:
ids, texts, dates = zip(*sorted_data)
Edit: reflecting your concern about dates, the string format in your example code should be properly sorted as a string. However, more generally, you could always do the following to ensure any date/time format is sorted properly (I used the strptime
string corresponding to your current datetime format).
编辑:反映您对日期的关注,示例代码中的字符串格式应正确排序为字符串。但是,更一般地说,您可以始终执行以下操作以确保正确排序任何日期/时间格式(我使用了与您当前日期时间格式对应的strptime字符串)。
import datetime
date_key = lambda t: datetime.datetime.strptime(t[-1], '%Y-%m-%d %H:%M:%S')
sorted_data = sorted(data, key=date_key)
#1
Why not store each row as a list
/tuple
knowing that row[0] = id
, row[1] = text
, and row[2] = date
as you already assume when parsing the csv file. That way, each id/text/date combo is kept together:
为什么不将每一行存储为列表/元组,知道row [0] = id,row [1] = text,row [2] = date,因为您在解析csv文件时已经假设了。这样,每个id / text / date组合都保存在一起:
# to take care of any fileio cleanup and clean unnecessary lines
with open(filename, 'rb') as csvfile:
data = [row for row in csv.reader(csvfile, delimiter=',', quotechar='|')
sorted_data = sorted(data, key=lambda t: t[-1]) # or t[2]
and if you want the ids, texts, dates separated, you can use zip
:
如果你想要id,文本,日期分开,你可以使用zip:
ids, texts, dates = zip(*sorted_data)
Edit: reflecting your concern about dates, the string format in your example code should be properly sorted as a string. However, more generally, you could always do the following to ensure any date/time format is sorted properly (I used the strptime
string corresponding to your current datetime format).
编辑:反映您对日期的关注,示例代码中的字符串格式应正确排序为字符串。但是,更一般地说,您可以始终执行以下操作以确保正确排序任何日期/时间格式(我使用了与您当前日期时间格式对应的strptime字符串)。
import datetime
date_key = lambda t: datetime.datetime.strptime(t[-1], '%Y-%m-%d %H:%M:%S')
sorted_data = sorted(data, key=date_key)