Python Bag of Words NameError:名称'unicode'未定义

时间:2021-05-08 20:06:17

I have been following this site, https://radimrehurek.com/data_science_python/, to apply bag of words on a list of tweets.

我一直在关注这个网站,https://radimrehurek.com/data_science_python/,在推文列表中应用一些单词。

import csv
from textblob import TextBlob
import pandas

messages = pandas.read_csv('C:/Users/Suki/Project/Project12/newData1.csv', sep='\t', quoting=csv.QUOTE_NONE,
                               names=["label", "message"])

def split_into_tokens(message):
    message = unicode(message, encoding="utf8")  # convert bytes into proper unicode
    return TextBlob(message).words

messages.message.head().apply(split_into_tokens)

print (messages)

However I keep getting this error. I've checked and I following the code on the site but the error keeps arising.

但是我一直收到这个错误。我已经检查过并且我在网站上遵循代码,但错误不断出现。

Error

Traceback (most recent call last):
  File "C:/Users/Suki/Project/Project12/projectBagofWords.py", line 34, in <module>
    messages.message.head().apply(split_into_tokens)
  File "C:\Program Files\Python36\lib\site-packages\pandas\core\series.py", line 2510, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)
  File "pandas/_libs/src\inference.pyx", line 1521, in pandas._libs.lib.map_infer
  File "C:/Users/Suki/Project/Project12/projectBagofWords.py", line 31, in split_into_tokens
    message = unicode(message, encoding="utf8")  # convert bytes into proper unicode
NameError: name 'unicode' is not defined

Can someone offer advice on how I could rectify this?

有人可以就如何纠正这个提出建议吗?

Thanks

2 个解决方案

#1


2  

unicode is python 2.x method. If you are running Python 3.x, then all strings are unicode and that call is not needed.

unicode是python 2.x方法。如果您运行的是Python 3.x,则所有字符串都是unicode,并且不需要该调用。

https://docs.python.org/3/howto/unicode.html

#2


1  

unicode is a python 2 method. If you are not sure which version will run this code, you can simply add this at the beginning of your code so it will replace the old unicode with new str:

unicode是一个python 2方法。如果您不确定哪个版本将运行此代码,您可以在代码的开头添加它,以便用新的str替换旧的unicode:

import sys
if sys.version_info[0] >= 3:
    unicode = str

#1


2  

unicode is python 2.x method. If you are running Python 3.x, then all strings are unicode and that call is not needed.

unicode是python 2.x方法。如果您运行的是Python 3.x,则所有字符串都是unicode,并且不需要该调用。

https://docs.python.org/3/howto/unicode.html

#2


1  

unicode is a python 2 method. If you are not sure which version will run this code, you can simply add this at the beginning of your code so it will replace the old unicode with new str:

unicode是一个python 2方法。如果您不确定哪个版本将运行此代码,您可以在代码的开头添加它,以便用新的str替换旧的unicode:

import sys
if sys.version_info[0] >= 3:
    unicode = str