Windows及CentOS下安装nltk并载入nltk_data

由于工作需要，所以去了解一下自然语言处理，在参考《python自然语言处理》配置nltk时遇到问题，因此记录该过程，以供参考。

CentOS解决过程与Windows相同，因此本文以Windows为例。

本次使用的是python3.6.5，安装工具为pip3。

参考：Python: ubuntu 下安装nltk以及载入 nltk-data

安装NLTK

通过pip3安装nltk模块
```
pip3 install nltk
```
在python终端中导入nltk，成功导入即成功
```
import nltk
```

下载nltk_data并载入

通过官方方式下载。这种方式下载速度极慢，而且容易出错。不推荐。
```
import nltk
nltk.download()
```

手动方式下载

网上有很多网友自己下载并上传的nltk_data压缩包百度云，下载完成后，解压缩。此时执行以下命令会报错：


import nltk
from nltk.book import *

报错内容：

LookupError: 
**********************************************************************
  Resource 'corpora/gutenberg' not found.  Please use the NLTK
  Downloader to obtain the resource:  >>> nltk.download()
  Searched in:
    - 'C:\\Users\\user/nltk_data'
    - 'C:\\nltk_data'
    - 'D:\\nltk_data'
    - 'E:\\nltk_data'
    - 'D:\\Python\\anaconda3-5.0.1\\nltk_data'
    - 'D:\\Python\\anaconda3-5.0.1\\lib\\nltk_data'
    - 'C:\\Users\\user\\AppData\\Roaming\\nltk_data'
**********************************************************************

可以看到，是由于python未在search目录中检索到nltk_data文件夹，因此将解压后的文件夹放到上述随意一个路径中（推荐C盘根目录）。再次尝试，结果依旧报同一个错。

仔细分析报错内容，未找到 "corpora/gutenberg" 路径，在nltk_data中搜索 “gutenberg”，发现路径为“C:\nltk_data\packages\corpora”，路径中多了“packages”一级，将“C:\nltk_data\packages\corpora”文件夹移动至“C:\nltk_data\corpora”，再次尝试，成功，打印内容如下：


*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908

至此，nltk_data加载成功。

秒客网

Windows及CentOS下安装nltk并载入nltk_data

安装NLTK

下载nltk_data并载入

相关文章