综合练习
词频统计预处理
下载一首英文的歌词或文章
将所有,.?!’:等分隔符全部替换为空格
将所有大写转换为小写
生成单词列表
生成词频统计
排序
排除语法型词汇,代词、冠词、连词
输出词频最大TOP20
将分析对象存为utf-8编码的文件,通过文件读取的方式获得词频分析内容。
一、
news="wo are faimly.w aor kb.jg,wo are w wo Oh ma baby nal ddeo nan geu dae mam sok geu ro bul reo bon da" # f=open("news.txt","r") # news=f.read() # f.close() # print(news) #字典统计数字,键值对 sep=''',?!."''' exclude={"wo","w"} for c in sep: news=news.replace(c," ") wordList=news.lower().split()#大写换成小写 #一、字典遍历 wordDict={} wordSet=set(wordList)-exclude for w in wordSet: wordDict[w]=wordList.count(w) for w in wordDict: print(w,wordDict[w]) #二、列表遍历 # wordDict={} # for w in wordList: # wordDict[w]=wordDict.get(w,0)+1 # # for w in exclude: # del(wordDict[w]) # #
('are', 2) ('geu', 2) ('baby', 1) ('aor', 1) ('kb', 1) ('bul', 1) ('oh', 1) ('faimly', 1) ('bon', 1) ('jg', 1) ('reo', 1) ('nal', 1) ('nan', 1) ('ma', 1) ('sok', 1) ('da', 1) ('ro', 1) ('ddeo', 1) ('dae', 1) ('mam', 1)
# 保存文件, f = open('newscount.txt','a') for i in range(20): f.write(dictList[i][0]+' '+str(dictList[i][1])+'\n') f.close()