文件名称:CS544_Project
文件大小:114.9MB
文件格式:ZIP
更新时间:2024-06-28 10:44:12
Python
CS544_项目 10k_data.txt structure { 'id' : ['title', 'body', 'code', 'tags'] } 1k_test_data structure { 'id' : ['title', 'body', 'code', 'tags'] } e.g. {"3911035": ["Should I convert RAW to jpeg before making an HDR?", "I've been going back and forth with this. ...", "", "raw hdr jpeg"] }
【文件预览】:
CS544_Project-master
----1k_test_data.txt(865KB)
----1000()
--------NN_100k_train.txt(49.86MB)
--------25k_test.txt(21.51MB)
--------100k_train.txt(86.02MB)
--------NN_25k_test.txt(12.44MB)
----10tags_test_dataset.txt(8.99MB)
----BogOfWords.py(3KB)
----getPython.sh(542B)
----file_cov.txt(568KB)
----learner_classifier.py(5KB)
----tfidf_learner_classifier.py(5KB)
----coverage.py(2KB)
----Tag_Extract.py(421B)
----tagDistance_BagOfWords.py(2KB)
----tagDistance.py(4KB)
----proba.py(4KB)
----data_desc.txt(2KB)
----multiprocess.py(215B)
----10k_data.txt(8.71MB)
----result_1.txt(9KB)
----result()
--------100tags()
--------10tags()
----plot()
--------tag_cdf.png(29KB)
--------figure_2.png(43KB)
--------test_dist.txt(1KB)
--------popular_tag_distribution.png(47KB)
--------tag_cdf_log.png(30KB)
--------tag_cdf.py(671B)
--------plot_tag_dist.py(1KB)
--------original_dist.txt(1KB)
--------plot.py(1KB)
--------figure_1.png(37KB)
--------train_dist.txt(1KB)
----dup.py(585B)
----word2vec()
--------RandomForest.py(6KB)
--------Word2Vec_AverageVectors.py(6KB)
--------Word2Vec_BagOfCentroids.py(5KB)
--------RandomForest_output(97KB)
--------BagOfWords.py(4KB)
--------KaggleWord2VecUtility.py(2KB)
----tag_analysis.py(898B)
----preprocess.py(3KB)
----cdf.txt(1.79MB)
----fscore.py(2KB)
----tag_count.txt(688KB)
----OnevsRest.py(3KB)
----tag_2000.txt(20KB)
----README.md(389B)
----data_clean()
--------generate_test_set_2000_tags.py(3KB)
--------generate_10k_data.py(2KB)
--------file.py(809B)
--------get_title.py(2KB)
--------segment.py(2KB)
--------replace.py(362B)
----10tags_dataset.txt(8.99MB)
----tagFeature.py(2KB)
----500()
--------50k_train.txt(42.95MB)
--------12k_test.txt(10.77MB)
--------NN_50k_train.txt(25.03MB)
--------NN_12k_test.txt(6.25MB)
----200()
--------20k_train.txt(17.22MB)
--------NN_20k_train.txt(9.97MB)
--------5k_test.txt(4.19MB)
--------NN_5k_test.txt(2.4MB)
----file_coverage.txt(2.19MB)