TihuanWords.txt文档格式
注意:同一行的词用单个空格隔开,每行第一个词为同行词的替换词。
年休假 年假 年休
究竟 到底
回家场景 我回来了
代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
|
import jieba
def replaceSynonymWords(string1):
# 1读取同义词表,并生成一个字典。
combine_dict = {}
# synonymWords.txt是同义词表,每行是一系列同义词,用空格分割
for line in open ( "TihuanWords.txt" , "r" , encoding = 'utf-8' ):
seperate_word = line.strip().split( " " )
num = len (seperate_word)
for i in range ( 1 , num):
combine_dict[seperate_word[i]] = seperate_word[ 0 ]
print (seperate_word)
print (combine_dict)
# 2提升某些词的词频,使其能够被jieba识别出来
jieba.suggest_freq( "年休假" , tune = True )
# 3将语句切分成单词
seg_list = jieba.cut(string1, cut_all = False )
f = "/" .join(seg_list).encode( "utf-8" )
f = f.decode( "utf-8" )
print (f)
# 4返回同义词替换后的句子
final_sentence = " "
for word in f.split( '/' ):
if word in combine_dict:
word = combine_dict[word]
final_sentence + = word
else :
final_sentence + = word
# print final_sentence
return final_sentence
string1 = '年休到底放几天?'
print (replaceSynonymWords(string1))
|
结果
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持服务器之家。
原文链接:https://blog.csdn.net/weixin_44208569/article/details/104048793