I am new to Python and to *(please be gentle) and am trying to learn how to do a sentiment analysis. I am using a combination of code I found in a tutorial and here: Python - AttributeError: 'list' object has no attribute However, I keep getting
我对Python和*都不熟悉(请小心点),我正在学习如何进行情绪分析。我使用的是我在教程中找到的代码的组合:Python - AttributeError: 'list'对象没有属性,但是,我一直在获取
Traceback (most recent call last):
File "C:/Python27/training", line 111, in <module>
processedTestTweet = processTweet(row)
File "C:/Python27/training", line 19, in processTweet
tweet = tweet.lower()
AttributeError: 'list' object has no attribute 'lower'`
This is my code:
这是我的代码:
import csv
#import regex
import re
import pprint
import nltk.classify
#start replaceTwoOrMore
def replaceTwoOrMore(s):
#look for 2 or more repetitions of character
pattern = re.compile(r"(.)\1{1,}", re.DOTALL)
return pattern.sub(r"\1\1", s)
# process the tweets
def processTweet(tweet):
#Convert to lower case
tweet = tweet.lower()
#Convert www.* or https?://* to URL
tweet = re.sub('((www\.[\s]+)|(https?://[^\s]+))','URL',tweet)
#Convert @username to AT_USER
tweet = re.sub('@[^\s]+','AT_USER',tweet)
#Remove additional white spaces
tweet = re.sub('[\s]+', ' ', tweet)
#Replace #word with word
tweet = re.sub(r'#([^\s]+)', r'\1', tweet)
#trim
tweet = tweet.strip('\'"')
return tweet
#start getStopWordList
def getStopWordList(stopWordListFileName):
#read the stopwords file and build a list
stopWords = []
stopWords.append('AT_USER')
stopWords.append('URL')
fp = open(stopWordListFileName, 'r')
line = fp.readline()
while line:
word = line.strip()
stopWords.append(word)
line = fp.readline()
fp.close()
return stopWords
def getFeatureVector(tweet, stopWords):
featureVector = []
words = tweet.split()
for w in words:
#replace two or more with two occurrences
w = replaceTwoOrMore(w)
#strip punctuation
w = w.strip('\'"?,.')
#check if it consists of only words
val = re.search(r"^[a-zA-Z][a-zA-Z0-9]*[a-zA-Z]+[a-zA-Z0-9]*$", w)
#ignore if it is a stopWord
if(w in stopWords or val is None):
continue
else:
featureVector.append(w.lower())
return featureVector
def extract_features(tweet):
tweet_words = set(tweet)
features = {}
for word in featureList:
features['contains(%s)' % word] = (word in tweet_words)
return features
#Read the tweets one by one and process it
inpTweets = csv.reader(open('C:/GsTraining.csv', 'rb'),
delimiter=',',
quotechar='|')
stopWords = getStopWordList('C:/stop.txt')
count = 0;
featureList = []
tweets = []
for row in inpTweets:
sentiment = row[0]
tweet = row[1]
processedTweet = processTweet(tweet)
featureVector = getFeatureVector(processedTweet, stopWords)
featureList.extend(featureVector)
tweets.append((featureVector, sentiment))
# Remove featureList duplicates
featureList = list(set(featureList))
# Generate the training set
training_set = nltk.classify.util.apply_features(extract_features, tweets)
# Train the Naive Bayes classifier
NBClassifier = nltk.NaiveBayesClassifier.train(training_set)
# Test the classifier
with open('C:/CleanedNewGSMain.txt', 'r') as csvinput:
with open('GSnewmain.csv', 'w') as csvoutput:
writer = csv.writer(csvoutput, lineterminator='\n')
reader = csv.reader(csvinput)
all=[]
row = next(reader)
for row in reader:
processedTestTweet = processTweet(row)
sentiment = NBClassifier.classify(
extract_features(getFeatureVector(processedTestTweet, stopWords)))
row.append(sentiment)
processTweet(row[1])
writer.writerows(all)
Any help would be massively appreciated.
任何帮助都将得到极大的感激。
1 个解决方案
#1
9
The result from the csv reader is a list, lower
only works on strings. Presumably it is a list of string, so there are two options. Either you can call lower
on each element, or turn the list into a string and then call lower
on it.
csv阅读器的结果是一个列表,只有在字符串中才能使用low。假设它是字符串列表,所以有两个选项。您可以在每个元素上调用low,或者将列表转换为string,然后在它上调用low。
# the first approach
[item.lower() for item in tweet]
# the second approach
' '.join(tweet).lower()
But more reasonably (hard to tell without more information) you only actually want one item out of your list. Something along the lines of:
但更合理的是(如果没有更多的信息,很难说)你实际上只想要清单上的一项。类似于:
for row in reader:
processedTestTweet = processTweet(row[0]) # Again, can't know if this is actually correct without seeing the file
Also, guessing that you aren't using the csv reader quite like you think you are, because right now you are training a naive bayes classifier on a single example every time and then having it predict the one example it was trained on. Maybe explain what you're trying to do?
另外,猜测你使用的csv阅读器不像你想象的那样,因为现在你每次都在一个例子上训练一个朴素的贝叶斯分类器,然后让它预测它训练的一个例子。也许解释一下你想做什么?
#1
9
The result from the csv reader is a list, lower
only works on strings. Presumably it is a list of string, so there are two options. Either you can call lower
on each element, or turn the list into a string and then call lower
on it.
csv阅读器的结果是一个列表,只有在字符串中才能使用low。假设它是字符串列表,所以有两个选项。您可以在每个元素上调用low,或者将列表转换为string,然后在它上调用low。
# the first approach
[item.lower() for item in tweet]
# the second approach
' '.join(tweet).lower()
But more reasonably (hard to tell without more information) you only actually want one item out of your list. Something along the lines of:
但更合理的是(如果没有更多的信息,很难说)你实际上只想要清单上的一项。类似于:
for row in reader:
processedTestTweet = processTweet(row[0]) # Again, can't know if this is actually correct without seeing the file
Also, guessing that you aren't using the csv reader quite like you think you are, because right now you are training a naive bayes classifier on a single example every time and then having it predict the one example it was trained on. Maybe explain what you're trying to do?
另外,猜测你使用的csv阅读器不像你想象的那样,因为现在你每次都在一个例子上训练一个朴素的贝叶斯分类器,然后让它预测它训练的一个例子。也许解释一下你想做什么?