我们观察用户评论发现:属性词往往和情感词伴随出现,原因是用户通常会在描述属性时表达情感,属性是情感表达的对象。还发现:属性词和专用情感词基本都是名词或形容词(形谓词)。
算法流程图如下:
评论数据如下:
代码如下:
python" id="highlighter_853221">
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
|
#encoding=utf-8
#############################
#
# 功能:给定一些中文的产品评论,希望从中找到评价对象及评价词。
#
# @author:licl
#
##############################
fdata = open ( 'jd_dfb_comments_out.txt' , 'r' )
output = open ( 'pattern_result.txt' , 'a' )
try :
data = fdata.readlines()
listline = []
for line in data:
listline = line.replace( " " , "/" )
listline = listline.split( "/" )
i = 1
while i < len (listline):
if listline[i] ! = "名词" :
i = i + 2
else :
new_list = [" "," "," "]
new_list[ 0 ] = listline[i - 1 ]
a = i - 1
i = i + 2
while i < len (listline):
if listline[i] = = "标点" :
i = i + 2
break
else :
if listline[i - 1 ] = = '不' or listline[i - 1 ] = = '不怎么样' or listline[i - 1 ] = = '不怎么' or listline[i - 1 ] = = '不太' :
new_list[ 1 ] = listline[i - 1 ]
if listline[i] = = "形容词" or listline[i] = = "形谓词" :
new_list[ 1 ] + = listline[i - 1 ]
b = i - 1 t = (b - a) / 2 new_list[ 2 ] = str (t)
for line in new_list:
output.write(line + " " )
output.write( "\n" )
break
else :
i = i + 2
except :
print "‘文件不存在'或者‘文件无法打开'"
finally :
fdata.close()
output.close()
|
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持服务器之家。
原文链接:https://blog.csdn.net/m53931422/article/details/41042791