如何在Amazon EC2实例上安装nltk?

时间:2023-01-26 12:47:03

I am trying to install nltk on an Amazon EC2 instance using pip and virtual environment. nltk is listed in the requirements.txt file that pip uses. I also want to download punkt using the nltk.download('punkt') option.

我正在尝试使用pip和虚拟环境在Amazon EC2实例上安装nltk。 nltk列在pip使用的requirements.txt文件中。我还想使用nltk.download('punkt')选项下载punkt。

Here's what my makefile looks like:

这是我的makefile的样子:

INSTANCE_NAME=toolsapp
ENV=env
LOAD_ENV=source $(ENV)/bin/activate
VIRTUALENV_BIN:=$(shell which virtualenv || echo /usr/local/bin/virtualenv )
EB_BIN:=$(shell $(LOAD_ENV) && which eb || echo /usr/local/bin/eb )
AWS_BIN:=$(shell $(LOAD_ENV) && which aws || echo /usr/local/bin/aws )

default: dev
.PHONY: default

# Ensure virtualenv is installed
$(VIRTUALENV_BIN):
    pip3 install virtualenv

# Create a virtualenv folder called env/ and install dependencies
$(ENV): $(VIRTUALENV_BIN)
    virtualenv -p python3 $(ENV)
    $(LOAD_ENV) && pip install --upgrade pip && pip install -r requirements.txt

But this is the error I get when I try to run the webapp:

但这是我尝试运行webapp时遇到的错误:

[Tue Jul 24 14:46:13.798460 2018] [:error] [pid 22801] model type: 9
[Tue Jul 24 14:46:13.801193 2018] [:error] [pid 22801] Traceback (most recent call last):
[Tue Jul 24 14:46:13.801227 2018] [:error] [pid 22801]   File "/opt/python/run/venv/local/lib/python3.6/site-packages/sumy/nlp/tokenizers.py", line 79, in _get_sentence_tokenizer
[Tue Jul 24 14:46:13.801232 2018] [:error] [pid 22801]     return nltk.data.load(path)
[Tue Jul 24 14:46:13.801239 2018] [:error] [pid 22801]   File "/opt/python/run/venv/local/lib/python3.6/site-packages/nltk/data.py", line 836, in load
[Tue Jul 24 14:46:13.801242 2018] [:error] [pid 22801]     opened_resource = _open(resource_url)
[Tue Jul 24 14:46:13.801247 2018] [:error] [pid 22801]   File "/opt/python/run/venv/local/lib/python3.6/site-packages/nltk/data.py", line 954, in _open
[Tue Jul 24 14:46:13.801251 2018] [:error] [pid 22801]     return find(path_, path + ['']).open()
[Tue Jul 24 14:46:13.801255 2018] [:error] [pid 22801]   File "/opt/python/run/venv/local/lib/python3.6/site-packages/nltk/data.py", line 675, in find
[Tue Jul 24 14:46:13.801259 2018] [:error] [pid 22801]     raise LookupError(resource_not_found)
[Tue Jul 24 14:46:13.801274 2018] [:error] [pid 22801] LookupError: 
[Tue Jul 24 14:46:13.801278 2018] [:error] [pid 22801] **********************************************************************
[Tue Jul 24 14:46:13.801280 2018] [:error] [pid 22801]   Resource \x1b[93mpunkt\x1b[0m not found.
[Tue Jul 24 14:46:13.801283 2018] [:error] [pid 22801]   Please use the NLTK Downloader to obtain the resource:
[Tue Jul 24 14:46:13.801285 2018] [:error] [pid 22801] 
[Tue Jul 24 14:46:13.801288 2018] [:error] [pid 22801]   \x1b[31m>>> import nltk
[Tue Jul 24 14:46:13.801299 2018] [:error] [pid 22801]   >>> nltk.download('punkt')
[Tue Jul 24 14:46:13.801301 2018] [:error] [pid 22801]   \x1b[0m
[Tue Jul 24 14:46:13.801303 2018] [:error] [pid 22801]   Searched in:
[Tue Jul 24 14:46:13.801305 2018] [:error] [pid 22801]     - '/home/wsgi/nltk_data'
[Tue Jul 24 14:46:13.801307 2018] [:error] [pid 22801]     - '/usr/share/nltk_data'
[Tue Jul 24 14:46:13.801310 2018] [:error] [pid 22801]     - '/usr/local/share/nltk_data'
[Tue Jul 24 14:46:13.801312 2018] [:error] [pid 22801]     - '/usr/lib/nltk_data'
[Tue Jul 24 14:46:13.801314 2018] [:error] [pid 22801]     - '/usr/local/lib/nltk_data'
[Tue Jul 24 14:46:13.801316 2018] [:error] [pid 22801]     - '/opt/python/run/venv/nltk_data'
[Tue Jul 24 14:46:13.801318 2018] [:error] [pid 22801]     - '/opt/python/run/venv/share/nltk_data'
[Tue Jul 24 14:46:13.801320 2018] [:error] [pid 22801]     - '/opt/python/run/venv/lib/nltk_data'
[Tue Jul 24 14:46:13.801322 2018] [:error] [pid 22801]     - ''
[Tue Jul 24 14:46:13.801325 2018] [:error] [pid 22801] **********************************************************************
[Tue Jul 24 14:46:13.801327 2018] [:error] [pid 22801] 
[Tue Jul 24 14:46:13.801342 2018] [:error] [pid 22801] 
[Tue Jul 24 14:46:13.801345 2018] [:error] [pid 22801] During handling of the above exception, another exception occurred:
[Tue Jul 24 14:46:13.801348 2018] [:error] [pid 22801] 
[Tue Jul 24 14:46:13.801352 2018] [:error] [pid 22801] Traceback (most recent call last):
[Tue Jul 24 14:46:13.801369 2018] [:error] [pid 22801]   File "/opt/python/current/app/summarization.py", line 40, in reform
[Tue Jul 24 14:46:13.801372 2018] [:error] [pid 22801]     parser = HtmlParser.from_url(inputFile, Tokenizer("english"))
[Tue Jul 24 14:46:13.801377 2018] [:error] [pid 22801]   File "/opt/python/run/venv/local/lib/python3.6/site-packages/sumy/nlp/tokenizers.py", line 67, in __init__
[Tue Jul 24 14:46:13.801380 2018] [:error] [pid 22801]     self._sentence_tokenizer = self._get_sentence_tokenizer(tokenizer_language)
[Tue Jul 24 14:46:13.801385 2018] [:error] [pid 22801]   File "/opt/python/run/venv/local/lib/python3.6/site-packages/sumy/nlp/tokenizers.py", line 82, in _get_sentence_tokenizer
[Tue Jul 24 14:46:13.801388 2018] [:error] [pid 22801]     "NLTK tokenizers are missing. Download them by following command: "
[Tue Jul 24 14:46:13.801397 2018] [:error] [pid 22801] LookupError: NLTK tokenizers are missing. Download them by following command: python -c "import nltk; nltk.download('punkt')"
[Tue Jul 24 14:46:13.801414 2018] [:error] [pid 22801] unable to read https://webapp-input.s3.amazonaws.com/43629be5a07a43029f359abd1340ad08.input.txt?AWSAccessKeyId=ASIAI4UVLLOHEUNNKWNQ&Signature=NBbRoAhFB5mP3SNi3jR6rDcz8LY%3D&x-amz-security-token=FQoDYXdzEHgaDKt%2FFOhHU4UQgQLgdCK3A3amyJ9mziqpLJ01DR5yYqszDzAfi8e9B9Uj1xw9pJw4yDqyF5KFtul7D7o6Xm2qX%2FQvSb9tbnMoW2r8Pur%2FbhlJnhfKFUriT6ggk0THgAgXQWQ8pDOIIMOjn7XZLtFvTfWttukS40VC17geWmEod%2FsO9IZh3LyhN46V%2FdDQo21YZfZFRoQbFHgTd823mnnTLwNoZs51B%2BluwOJ70U22P0K%2FdhzFGVEEGj%2FDiT1oC%2B1aGHQoK4h9JC45%2BqdetOoxZZsdc2z8hxFPQbTW59AT2L4PC2icjkzjJ9prhJvzU25iuZeYoO5tC3SZ1fpNtJ5QCiBYdK1R1c0TRygeOGbev24j5qlTb5DLG4HknH47S6XBMKE%2Fs4EyEo2zNbu%2Fg7QhcebwjJ9%2FMcCpmbMV60H2cj2zMxk8gzV83E%2B19CODShcUQ7WSmNcXj5dyupEJ8SCHRBABBXhOZ8wzMLwU%2BgJy59DTzXl7ZZH1t0LzaOQLPSvjdqu%2FPNGhn1M7K4vVNti93hOCWsK0T3tltg9gfIjFF90gHSawS708WvcyUTmN%2BLD7m3cMJ9B9g9adISSU5d4NS7ohPyX%2FEQMom%2FDc2gU%3D&Expires=1532447173.

1 个解决方案

#1


0  

Add scripts to download the required models after installing nltk.

安装nltk后,添加脚本以下载所需的模型。

python3 -m nltk.downloader punkt
python3 -m nltk.downloader stopwords

#1


0  

Add scripts to download the required models after installing nltk.

安装nltk后,添加脚本以下载所需的模型。

python3 -m nltk.downloader punkt
python3 -m nltk.downloader stopwords