文件名称:feature-engineering-for-machine-learning:在线课程“机器学习功能工程”的代码存储库
文件大小:5.76MB
文件格式:ZIP
更新时间:2024-05-31 00:10:07
JupyterNotebook
机器学习的特征工程-代码存储库 在线课程代码存储库 发布于2017年11月,最后更新于2020年12月 目录 简介:变量类型 数值变量:离散和连续 分类变量:标称和序数 日期时间变量 混合变量:字符串和数字 可变特征 缺失数据 基数 类别频率 发行版 离群值 震级 缺少数据插补 均值和中位数插补 任意值估算 尾插补 频繁归类 添加字符串丢失 随机样本插补 添加缺少的指标 用Scikit学习进行插补 使用特征引擎进行归因 多元归因 老鼠 分类变量编码 一种热门编码:简单分类和频繁分类 序数编码:任意和有序 目标均值编码 证据权重 机率 稀有标签编码 使用Scikit学习进行编码 使用功能引擎编码 使用类别编码器编码 变量变换 日志,功率和倒数 Box-Cox 约翰逊 使用Scikit学习进行转型 使用特征引擎进行转换 离散化 随意的 等频离散化 等宽离散 K-均值离散化 树木离散化 使用S
【文件预览】:
feature-engineering-for-machine-learning-master
----Section-07-Variable-Transformation()
--------07.03-Gaussian-transformation-feature-engine.ipynb(315KB)
--------07.02-Gaussian-transformation-sklearn.ipynb(438KB)
--------07.01-Gaussian-transformation.ipynb(312KB)
----feml_logo.png(144KB)
----Section-06-Categorical-Encoding()
--------06.10-Engineering-Rare-Categories.ipynb(161KB)
--------06.09-Comparison-categorical-encoding-techniques.ipynb(54KB)
--------06.04_Count_or_frequency_encoding.ipynb(18KB)
--------06.06-Mean-Encoding.ipynb(133KB)
--------06.02-One-hot-encoding-frequent_categories.ipynb(44KB)
--------06.07-Probability-Ratio-Encoding.ipynb(135KB)
--------06.01-One-hot-encoding.ipynb(81KB)
--------06.08-Weight-of-Evidence.ipynb(195KB)
--------06.05-Ordered-Integer-Encoding.ipynb(175KB)
--------06.03-Integer-Encoding.ipynb(35KB)
----Section-11-Mixed-Variables()
--------11.01-Engineering-mixed-variables.ipynb(62KB)
----Section-04-Missing-Data-Imputation()
--------04.12-Missing-Category-Imputation-Sklearn.ipynb(26KB)
--------04.03-Arbitrary-Value-Imputation.ipynb(161KB)
--------04.14-Automatic-Imputation-Method-Detection-Sklearn.ipynb(36KB)
--------04.21-Random-Sample-Imputation-Feature-Engine.ipynb(16KB)
--------04.01-Complete-Case-Analysis.ipynb(196KB)
--------04.06-Missing-Category-Imputation.ipynb(167KB)
--------04.04-End-Distribution-Imputation.ipynb(163KB)
--------04.16-Mean-Median-Imputation-Feature-Engine.ipynb(18KB)
--------04.09-Mean-Median-Imputation-Sklearn.ipynb(106KB)
--------04.11-Frequent-Category-Imputation-Sklearn.ipynb(29KB)
--------04.19-Frequent-Category-Imputation-Feature-Engine.ipynb(16KB)
--------04.17-Arbitrary-Value-Imputation-Feature-Engine.ipynb(40KB)
--------04.05-Frequent-Category-Imputation.ipynb(133KB)
--------04.13-MissingIndicator-Sklearn.ipynb(25KB)
--------04.18-End-Tail-Imputation-Feature-Engine.ipynb(30KB)
--------04.08-Missing-Indicator.ipynb(33KB)
--------04.20-Missing-Category-Imputation-Feature-Engine.ipynb(20KB)
--------04.10-Arbitrary-Value-Imputation-Sklearn.ipynb(123KB)
--------04.07-Random-Sample-Imputation.ipynb(199KB)
--------04.22-Missing-Indicator-Feature-Engine.ipynb(28KB)
--------04.02-Mean-Median-Imputation.ipynb(201KB)
----Section-08-Discretisation()
--------08.01-Equal-width-discretisation.ipynb(122KB)
--------08.02-Equal-frequency-discretisation.ipynb(97KB)
--------tree_model.txt(805B)
--------tree_visualisation.png(95KB)
--------08.07-Domain-knowledge-discretisation.ipynb(74KB)
--------08.06-Discretisation-using-Decision-Trees-and-Feature-Engine.ipynb(123KB)
--------08.04-Discretisation-plus-Encoding.ipynb(87KB)
--------08.05-Discretisation-using-Decision-Trees.ipynb(250KB)
--------08.03-Discretisation-with-kmeans.ipynb(41KB)
----Section-02-Types-of-Variables()
--------02.2-Categorical-Variables.ipynb(54KB)
--------02.3-Date-Time-Variables.ipynb(108KB)
--------02.1-Numerical-Variables.ipynb(91KB)
--------02.4-Mixed-Variables.ipynb(21KB)
----Section-05-Multivariate-Imputation()
--------05.02-MICE.ipynb(67KB)
--------05.01-KNN-imputation.ipynb(32KB)
----SAVE_DATASETS_HERE.txt(0B)
----Section-03-Variable-Characteristics()
--------03.4-Linear-Model-Assumptions.ipynb(704KB)
--------03.2-Cardinality.ipynb(49KB)
--------03.6-Outliers.ipynb(157KB)
--------03.3-Rare-Labels.ipynb(372KB)
--------03.1-Missing-Data.ipynb(31KB)
--------03.7-Variable-magnitude.ipynb(28KB)
----trainindata.png(63KB)
----Section-09-Outlier-Engineering()
--------09.03-Capping-gaussian-approximation.ipynb(276KB)
--------09.04-Capping-Quantiles.ipynb(218KB)
--------09.05-Capping-Arbitrary.ipynb(17KB)
--------09.02-Capping-IQR-proximity-rule.ipynb(220KB)
--------09.01-Outlier-Trimming.ipynb(168KB)
----LICENSE(2KB)
----requirements.txt(197B)
----Section-01-Introduction()
--------CreditApprovalUCI_dataPrep.ipynb(19KB)
--------Titanic_dataPrep.ipynb(9KB)
----.gitignore(293B)
----README.md(3KB)
----Section-12-Engineering-Date-Time()
--------12.02_Engineering_time.ipynb(31KB)
--------12.01_Engineering_dates.ipynb(39KB)
----Section-13-Putting-it-altogether()
--------13.02-Regression-house-prices.ipynb(912KB)
--------13.01-Classification-titanic.ipynb(78KB)
--------13.03-Assembling-pipeline-with-crossvalidation.ipynb(31KB)
----Section-10-Feature-Scaling()
--------10.07-Further-reading-on-scaling.ipynb(2KB)
--------10.04-Maximum-Absolute-Scaling.ipynb(150KB)
--------10.01-Standardisation.ipynb(142KB)
--------10.05-Robust-Scaling.ipynb(96KB)
--------10.06-Scaling-to-unit-length.ipynb(203KB)
--------10.02-Mean-normalisation.ipynb(118KB)
--------10.03-MinMaxScaling.ipynb(112KB)