文件名称:Ngram Statistics Package (Text-NSP)
文件大小:955KB
文件格式:GZ
更新时间:2012-12-19 03:26:16
Ngram
The Ngram Statistics Package (NSP) is a suite of programs that aids in analyzing Ngrams in text files. NSP consists of two core programs: Program count.pl takes flat text files as input and generates a list of all the Ngrams that occur in those files. The Ngrams, along with their frequencies, are output in descending order of their frequency. Program statistic.pl takes as input a list of Ngrams with their frequencies (in the format output by count.pl) and runs a user-selected statistical measure of association to compute a "score" for each Ngram. The Ngrams, along with their scores, are output in descending order of this score. The statistical score computed for each Ngram can be used to decide whether or not there is enough evidence to reject the null hypothesis (that the Ngram is not a collocation) for that Ngram.