在学到“训练基于分类器的分块器”这一小节的时候,在测试代码之后遇到了问题。
class ConsecutiveNPChunkTagger(nltk.TaggerI): def __init__(self, train_sents): train_set = [] for tagged_sent in train_sents: untagged_sent = nltk.tag.untag(tagged_sent) history = [] for i, (word, tag) in enumerate(tagged_sent): featureset = npchunk_features(untagged_sent, i, history) train_set.append( (featureset, tag) ) history.append(tag) self.classifier = nltk.MaxentClassifier.train(train_set, algorithm='megam', trace=0) def tag(self, sentence): history = [] for i, word in enumerate(sentence): featureset = npchunk_features(sentence,i, history) tag = self.classifier.classify(featureset) history.append(tag) return zip(sentence, history)
class ConsecutiveNPChunker(nltk.ChunkParserI): def __init__(self, train_sents): tagged_sents = [[((w,t),c) for (w,t,c) in nltk.chunk.tree2conlltags(sent)]for sent in train_sents] self.tagger = ConsecutiveNPChunkTagger(tagged_sents) def parse(self, sentence): tagged_sents = self.tagger.tag(sentence) conlltags =[(w,t,c) for ((w,t),c) in tagged_sents] return nltk.chunk.conlltags2tree(conlltags)
def npchunk_features(sentence,i, history): ... word,pos= sentence[i] ... return {"pos": pos} >>>chunker = ConsecutiveNPChunker(train_sents) >>>print chunker.evaluate(test_sents)
以上是书上提供的代码,问题是,当在执行
chunker = ConsecutiveNPChunker(train_sents)并没有如期执行,反而出现了一个错误。
Traceback (most recent call last): File "<pyshell#119>", line 1, in <module> chunker = ConsecutiveNPChunker(train_sents) File "<pyshell#118>", line 5, in __init__ self.tagger = ConsecutiveNPChunkTagger(tagged_sents) File "<pyshell#116>", line 11, in __init__ self.classifier = nltk.MaxentClassifier.train(train_set, algorithm='megam', trace=0) File "D:\SpecialSoftware\Python25\Lib\site-packages\nltk\classify\maxent.py", line 319, in train gaussian_prior_sigma, **cutoffs) File "D:\SpecialSoftware\Python25\Lib\site-packages\nltk\classify\maxent.py", line 1522, in train_maxent_classifier_with_megam stdout = call_megam(options) File "D:\SpecialSoftware\Python25\Lib\site-packages\nltk\classify\megam.py", line 163, in call_megam config_megam() File "D:\SpecialSoftware\Python25\Lib\site-packages\nltk\classify\megam.py", line 59, in config_megam url='http://www.cs.utah.edu/~hal/megam/') File "D:\SpecialSoftware\Python25\Lib\site-packages\nltk\internals.py", line 528, in find_binary url, verbose) File "D:\SpecialSoftware\Python25\Lib\site-packages\nltk\internals.py", line 512, in find_file raise LookupError('\n\n%s\n%s\n%s' % (div, msg, div)) LookupError: =========================================================================== NLTK was unable to find the megam file! Use software specific configuration paramaters or set the MEGAM environment variable. For more information, on megam, see: <http://www.cs.utah.edu/~hal/megam/> ===========================================================================
虽然说给出了相应的提示,但是并不完全。
通过对谷歌的搜索,找到了一些解决的眉目。
我的操作系统是Windows8.
nltk语言工具的官网给出了提示:
https://sites.google.com/site/naturallanguagetoolkit/download
Megam为可选包,将来使用的时候可以再来安装。下载的网址为:MegaM: http://hal3.name/megam/megam_src.tgz,直接下载我并没有下载成功,使用迅雷下载成功的。
但是打开之后发现,都是些源文件。但是在这个压缩包里面有一个README文件,给出了怎样使用的提示,发现,还需要装一个东西。
README中这样写到:ocaml(http://caml.inria.fr)
需要到这个网站下载源文件的编译器,于是我下载了和自己电脑系统相匹配的版本,但是看说明安装起来还是需要琢磨的,在安装过程中提示是病毒,但是我还是选择信任,要不然没办法继续。
【现在Ocaml正在安装,我点的完全安装(可能实际当中没有必要完全安装吧),等安装完成后,再继续探索怎么解决这个问题】