Python实现统计英文文章词频的方法分析

本文实例讲述了Python实现统计英文文章词频的方法。分享给大家供大家参考，具体如下：

应用介绍：

统计英文文章词频是很常见的需求，本文利用python实现。

思路分析：

1、把英文文章的每个单词放到列表里，并统计列表长度；
2、遍历列表，对每个单词出现的次数进行统计，并将结果存储在字典中；
3、利用步骤1中获得的列表长度，求出每个单词出现的频率，并将结果存储在频率字典中；
4、以字典键值对的“值”为标准，对字典进行排序，输出结果（也可利用切片输出频率最大或最小的特定几个，因为经过排序sorted()函数处理后，单词及其频率信息已经存储在元组中，所有元组再组成列表。）

代码实现：

				?

									fin = open('The_Magic_Skin _Honore_de_Balzac.txt') #the txt is up

									#to you

									lines=fin.readlines()

									fin.close()

									'''transform the article into word list

									'''

									def words_list():

									  chardigit='ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 '

									  all_lines = ''

									  for line in lines:

									    one_line=''

									    for ch in line:

									      if ch in chardigit:

									        one_line = one_line + ch

									    all_lines = all_lines + one_line

									  return all_lines.split()

									'''calculate the total number of article list

									s is the article list

									'''

									def total_num(s):

									  return len(s)

									'''calculate the occurrence times of every word

									t is the article list

									'''

									def word_dic(t):

									  fre_dic = dict()

									  for i in range(len(t)):

									    fre_dic[t[i]] = fre_dic.get(t[i],0) + 1

									  return fre_dic

									'''calculate the occurrence times of every word

									w is dictionary of the occurrence times of every word

									'''

									def word_fre(w):

									  for key in w:

									    w[key] = w[key] / total

									  return w

									'''sort the dictionary

									v is the frequency of words

									'''

									def word_sort(v):

									  sort_dic = sorted(v.items(), key = lambda e:e[1])

									  return sort_dic

									'''This is entrance of functions

									output is the ten words with the largest frequency

									'''

									total = total_num(words_list())

									print(word_sort(word_fre(word_dic(words_list())))[-10:])

PS：这里再为大家推荐2款相关统计工具供大家参考：

在线字数统计工具：
https://tool.zzvips.com/t/textcount/

希望本文所述对大家Python程序设计有所帮助。

秒客网

Python实现统计英文文章词频的方法分析

相关文章