文件名称:twitter_hate_speech
文件大小:32.06MB
文件格式:ZIP
更新时间:2024-03-26 08:50:52
JupyterNotebook
twitter_hate_speech 已实施的端到端自然语言处理项目将目标仇恨言论与仅包含令人讨厌的单词的言论区分开来。 从数据清理,特征提取(单词包,nGrams,TF-IDF BOW,word2vec嵌入),特征选择和建模。 使用word2vec CBOW和Skip-gram进行无监督学习。 还创建了用于转移学习的多层感知器和用于半监督学习的标签传播算法。
【文件预览】:
twitter_hate_speech-master
----.gitignore(50B)
----deliverables()
--------Milestone_3.pdf(1.02MB)
--------feature_selection_colin_-_Jupyter_Notebook.pdf(1.49MB)
--------feature_selection_colin_-_Jupyter_Notebook-merged.pdf(1.71MB)
--------Week2_Features_Sean - Jupyter Notebook.pdf(1024KB)
--------Week3 ML Implementation - Logistic regression + full dataset_Sean - Jupyter Notebook.pdf(908KB)
--------Hate_Speech_-_Milestone_Four.pptx(613KB)
--------Hate_Speech_-_Milestone_Two.pptx(1.77MB)
--------Milestone_3-merged.pdf(1.34MB)
--------Feature Engineering on Text Data - SZ.pdf(574KB)
--------Week2_Features_Sean - with supervised learning W2v.pdf(1.49MB)
--------Hate_Speech_-_Milestone_Two.pdf(2.07MB)
--------Feature Engineering Text Data - Traditional Strategies_SZ.pdf(590KB)
--------Week1_Problem,Dataset, Exploratory Data Analysis-checkpoint - Jupyter Notebook.pdf(985KB)
--------Hate_Speech_-_Milestone_One.pptx(809KB)
----data()
--------.ipynb_checkpoints()
--------labeled_data.csv(2.43MB)
--------Automated Hate Speech Detection and the Problem of Offensive Language Python 3.6.ipynb(53KB)
--------labeled_data.p(3.18MB)
--------Automated Hate Speech Detection and the Problem of Offensive Language.ipynb(82KB)
--------readme.md(642B)
----semi_supervised_learning.ipynb(38KB)
----.DS_Store(6KB)
----sample learning curve.png(27KB)
----transfer_learning()
--------tfidf_df.csv(48.64MB)
--------.ipynb_checkpoints()
--------labeled_data.csv(2.43MB)
--------Independent_Transfer_Learning_SZ.ipynb(34KB)
--------trained_embedding.csv(6.03MB)
--------df_clean.csv(1MB)
--------twitter_feature_array.csv(5.09MB)
----notebooks()
--------Final_Week_Feature_Selection_Colin.ipynb(295KB)
--------tfidf_df.csv(48.64MB)
--------.ipynb_checkpoints()
--------RF and LR on Balanced Set.ipynb(863KB)
--------labeled_data.csv(2.43MB)
--------Week3 ML Implementation - Logistic regression + full dataset_Sean.ipynb(456KB)
--------Week2_Feature Selection-checkpoint.ipynb(74KB)
--------feature.csv(5.49MB)
--------Create BOW.ipynb(18KB)
--------Full dataset cleaned and TFIDF.ipynb(34KB)
--------Final Feature selection _ Sean.ipynb(46KB)
--------Untitled.ipynb(72B)
--------univariate_dataset.csv(10.07MB)
--------Week3 Training curve (separate for memory reasons).ipynb(201KB)
--------trained_embedding.csv(6.03MB)
--------Week3 Training curve - Random Forest.ipynb(151KB)
--------Week2_Features_Sean.ipynb(329KB)
--------Week3 Stacking.ipynb(36KB)
--------twitter_hate.db(8.33MB)
--------feature_selection_colin.ipynb(3.51MB)
--------Week1_Problem,Dataset, Exploratory Data Analysis-checkpoint.ipynb(912KB)
--------Univariate Data Exploration.ipynb(82KB)
--------df_clean.csv(969KB)
--------Final_Week_Feature_Selection_Colin-checkpoint.ipynb(295KB)
--------twitter_feature_array.csv(5.09MB)
--------Final Modeling.ipynb(516KB)
--------seanz.db(0B)
--------Random Forest Full Dataset.ipynb(191KB)
----README.md(473B)
----individual_project_transfer_learning.ipynb(168KB)
----decisionregions.png(38KB)