python中利用jieba库统计词频,counts[word] = counts.get(word,0)+1的使用

时间:2024-05-30 21:28:13
import jieba       
txt = open("阿甘正传-网络版.txt","r",encoding ="utf-8").read()
words = jieba.lcut(txt)        #用jieba库对文本进行中文分词,输出可能的分词的精确模式
counts ={}            #新建一个空字典
for word in words:
    if len(word) == 1:            #挑出单个的分词(不计数)
        continue
    else:
        counts[word] = counts.get(word,0)+1          #对word出现的频率进行统计,当word不在words时,返回值是0,当word在words中时,返回+1,以此进行累计计数
items = list(counts.items())
items.sort(key = lambda x:x[1],reverse = True)
for i in range(10):
    word,count = items[i]    #返回相对应的键值对
    print("{0}:{1}".format(word,count))
    

注: counts[word] = counts.get(word,0)+1 是对进行计数word出现的频率进行统计,当word不在words时,返回值是0,当word在words中时,返回+1,以此进行累计计数。

运行结果:python中利用jieba库统计词频,counts[word] = counts.get(word,0)+1的使用