文件名称:Mastering-Feature-Engineering-Principles-Techniques.pdf
文件大小:3.63MB
文件格式:PDF
更新时间:2023-06-16 08:51:44
ML,Feature, data
The Machine Learning Pipeline 10 Data 11 Tasks 11 Models 12 Features 13 2. Basic Feature Engineering for Text Data: Flatten and Filter. . . . . . . . . . . . . . . . . . . . . . . 15 Turning Natural Text into Flat Vectors 15 Bag-of-words 16 Implementing bag-of-words: parsing and tokenization 20 Bag-of-N-Grams 21 Collocation Extraction for Phrase Detection 23 Quick summary 26 Filtering for Cleaner Features 26 Stopwords 26 Frequency-based filtering 27 Stemming 30 Summary 31 3. The Effects of Feature Scaling: From Bag-of-Words to Tf-Idf. . . . . . . . . . . . . . . . . . . . . . . 33 Tf-Idf : A Simple Twist on Bag-of-Words 33 Feature Scaling 35 Min-max scaling 35 Standardization (variance scaling) 36 L2 normalization 37 iii www.it-ebooks.info Putting it to the Test 38 Creating a classification dataset 39 Implementing tf-idf and feature scaling 40 First try: plain logistic regression 42 Second try: logistic regression with regularization 43 Discussion of results 46 Deep Dive: What is Happening? 47