文件名称:nltk_hadoop:Hadoop上的NLTK
文件大小:598KB
文件格式:ZIP
更新时间:2024-06-15 08:08:02
Python
设置 实现Nltk语料库 在找到语料库,例如inaugural python materialize_nltk_corpus.py inaugral 设置适当的环境变量 source ./settings.sh 或者,只需手动设置变量: export HADOOP_VERSION= # the version of hadoop you are using, e.g. 2.5.1 export AVRO_VERSION= # if you are using avro, the version, e.g. 1.7.7 export HADOOP_HOME= # the location of your hadoop installation export RELATIVE_PATH_JAR= # location of hadoop streaming jar in HA
【文件预览】:
nltk_hadoop-master
----Patents_and_NLP.ipynb(5KB)
----cos_sim_map.py(1001B)
----map_reduce_utils.py(11KB)
----cos_sim_red.py(821B)
----create_db.py(4KB)
----word_freq_map.py(1KB)
----slurm_hadoop_tfidf.sb(615B)
----hadoop-tag.sh(337B)
----normalize_mapper.py(438B)
----cpp()
--------run-with-classpath.sh(144B)
--------cosine_similarity.cpp(12KB)
--------Makefile(652B)
--------HDFS.hpp(3KB)
--------srun-with-classpath.sh(147B)
----materialize_nltk_corpus.py(748B)
----word_freq_red.py(751B)
----mapred_tfidf.py(8KB)
----lib()
--------avro-mapred-1.7.7.jar(176KB)
--------avro-1.7.7.jar(426KB)
----compare_texts.py(2KB)
----word_count_red.py(912B)
----run.sh(202B)
----corp_freq_red.py(1012B)
----stopwords.txt(9KB)
----query_results.py(5KB)
----__init__.py(0B)
----mrjob()
--------PatentMap.py(831B)
--------BigramMap.py(1KB)
--------CosineSimilarityTooBig.py(2KB)
--------run.sh(2KB)
--------PatentCount.py(612B)
--------TFIDF.py(3KB)
--------CosineSimilarity3.py(2KB)
--------AvroReader.py(1KB)
--------Compare.py(961B)
----.travis.yml(228B)
----settings.sh(475B)
----corpus_size_map.py(343B)
----hadoop_utils.py(3KB)
----contents_mapper.py(2KB)
----radish()
--------tfidf.myl(1KB)
--------README.md(93B)
--------catalog.py(238B)
----README.md(6KB)
----invoke.sh(108B)
----corp_freq_map.py(813B)
----word_count_map.py(745B)
----corpus_size_red.py(419B)
----word_join_map.py(630B)
----tf_idf_map.py(1KB)
----tests()
--------unit_tests()
--------fixtures()
--------__init__.py(1B)
--------test_utils.py(9KB)
----.gitignore(14B)
----normalize_reducer.py(1KB)
----word_join_red.py(1KB)
----claims_mapper.py(1KB)