<Hands-on ML with Sklearn & TF> Chapter 1
- what is ml
- from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.
- what problems to solve
- exist solution but a lot of hand-tuning/rules
- no good solutions using a traditional approach
- fluctuating environment
- get insight about conplex problem and large data
- type
- whether or not trained with human supervision(supervised, unsupervised, semisupervised, reinforcement)
- whether or not learn incrementally on the fly(online, batch)
- whether or not work by simply comparing new data point vs know data point,or instance detect pattern in training data and build a prediction model(instace-based, model-based)
- (un)supervision learning
- supervision : include the desired solution called labels
- classification,K-Nearest Neighbors, Linear Regression, Logistic Regression, SVM, Decision Trees, Random Forests, Neural network
- unsupervision : without labels
- Clustering : k-means, HCA, ecpectation maximization
- Viualization and dimensionality reducation : PCA, kernal PCA, LLE, t-SNE
- Association rule learning : Apriori, Eclat
- semisupervision
- unsupervision --> supervision
- reinforcement : an agent in context
- observe the environment
- select and perform action
- get rewards in return
- supervision : include the desired solution called labels
- batch/online learning
- batch : offline, to known new data need to train a new version from scratch one the full dataset
- online : incremental learning : challenge is bad data
- instance-based/model-based
- instance-based : the system learns the examples by heart, then the generalizes to the new cases using a similarity measure
- model-based : studied the data; select a model; train it on the training data; applied the model to make predictions on new cases
- Challenge
- insufficient quantity of training data
- nonrepresentative training data
- poor-quality data
- irrelevant features : feature selection; feature extraction; creating new feature by gathering new data
- overfitting : regularization -> hyperparameter
- underfitting : powerful model; better feature; reduce construct
- Testing and Validating
- 80% of data for training 20% for testing
- validating : best model and hyperparameter for training set unliking perform as well on new data
- train multiple models with various hyperparameters using training data
- to get generatlization error , select the model and hyperparamaters that perform best on the validation set
- cross-validating : the training set is split into complementary subsets, ans each model is trained against a different conbination of thse subsets and validated against the remain parts.
Example 1-1:
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sklearn.linear_model #load the data
oecd_bli = pd.read_csv("datasets/lifesat/oecd_bli_2015.csv",thousands=',')
gdp_per_capita = pd.read_csv("datasets/lifesat/gdp_per_capita.csv",thousands=',',delimiter='\t',encoding='latin1',na_values='n/a') #prepare the data
def prepare_country_stats(oecd_bli, gdp_per_capita):
#get the pandas dataframe of GDP per capita and Life satisfaction
oecd_bli = oecd_bli[oecd_bli["INEQUALITY"]=="TOT"]
oecd_bli = oecd_bli.pivot(index="Country", columns="Indicator", values="Value")
gdp_per_capita.rename(columns={"": "GDP per capita"}, inplace=True)
gdp_per_capita.set_index("Country", inplace=True)
full_country_stats = pd.merge(left=oecd_bli, right=gdp_per_capita, left_index=True, right_index=True)
return full_country_stats[["GDP per capita", 'Life satisfaction']] country_stats = prepare_country_stats(oecd_bli, gdp_per_capita)
#regularization remove_indices = [0, 1, 6, 8, 33, 34, 35]
country_stats.to_csv('country_stats.csv',encoding='utf-8')
X = np.c_[country_stats["GDP per capita"]]
Y = np.c_[country_stats["Life satisfaction"]] #Visualize the data
country_stats.plot(kind='scatter',x='GDP per capita',y='Life satisfaction') #Select a linear model
lin_reg_model = sklearn.linear_model.LinearRegression() #Train the model
lin_reg_model.fit(X, Y) #plot Regression model
t0, t1 = lin_reg_model.intercept_[0], lin_reg_model.coef_[0][0]
X = np.linspace(0, 110000, 1000)
plt.plot(X, t0 + t1 * X, "k")
plt.show() #Make a prediction for Cyprus
X_new=[[22587]]
print(lin_reg_model.predict(X_new))
课后练习挺好的
Notes : <Hands-on ML with Sklearn & TF> Chapter 1的更多相关文章
-
Notes : <;Hands-on ML with Sklearn &; TF>; Chapter 5
.caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...
-
Notes : <;Hands-on ML with Sklearn &; TF>; Chapter 7
.caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...
-
Notes : <;Hands-on ML with Sklearn &; TF>; Chapter 6
.caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...
-
Notes : <;Hands-on ML with Sklearn &; TF>; Chapter 4
.caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px so ...
-
Notes : <;Hands-on ML with Sklearn &; TF>; Chapter 3
Chapter 3-Classification .caret, .dropup > .btn > .caret { border-top-color: #000 !important; ...
-
Book : <;Hands-on ML with Sklearn &; TF>; pdf/epub
非常好的书,最近发现了pdf版本,链接:http://www.finelybook.com/hands-on-machine-learning-with-scikit-learn-and-tensor ...
-
H5 Notes:PostMessage Cross-Origin Communication
Javascript中要实现跨域通信,主要有window.name,jsonp,document.domain,cors等方法.不过在H5中有一种新的方法postMessage可以安全实现跨域通信,并 ...
-
H5 Notes:Navigator Geolocation
H5的地理位置API可以帮助我们来获取用户的地理位置,经纬度.海拔等,因此我们可以利用该API做天气应用.地图服务等. Geolocation对象是我们获取地理位置用到的对象. 首先判断浏览器是否支持 ...
-
notes:spm多重比较校正
SPM做完统计后,statistical table中的FDRc实际上是在该p-uncorrected下,可以令FDR-correcred p<=0.05的最小cluster中的voxel数目: ...
随机推荐
-
mac osx 安装redis扩展
1 php -v查看php版本 2 brew search php|grep redis 搜索对应的redis ps:如果没有brew 就根据http://brew.sh安装 3 brew ins ...
-
nginx basic auth 登陆验证模块
#1. 新建一个pw.pl文件专门用来生成密码 #!/usr/bin/perl use strict; my $pw=$ARGV[0]; print crypt($pw,$pw)."\n&q ...
-
NSArray 所有基础点示例
#import <Foundation/Foundation.h> //排序算法,应用于 NSArray *arr=[arrs1 sortedArrayUsingFunction:sort ...
-
mysql DDL语句
sql语言分为三个级别. 1.ddl 语句 ,数据定义语句,定义了数据库.表.索引等对象的定义.常用语句包含:create.drop.alter. 2.dml 语句 ,数据操纵语句,用于添加.删除.更 ...
-
移动存储卡仍然用FAT32文件系统的真相
微软在2001年就为自家的XP系统的本地磁盘默认使用了NTFS文件系统,但是12年之后,市面上的USB可移动设备和SD卡等外置存储器仍然在用着FAT32文件格式,这是什么理由让硬件厂商选择过时的文件系 ...
-
【高斯消元】兼 【期望dp】例题
[总览] 高斯消元基本思想是将方程式的系数和常数化为矩阵,通过将矩阵通过行变换成为阶梯状(三角形),然后从小往上逐一求解. 如:$3X_1 + 2X_2 + 1X_3 = 3$ $ ...
-
Spring MVC核心技术
目录 异常处理 类型转换器 数据验证 文件上传与下载 拦截器 异常处理 Spring MVC中, 系统的DAO, Service, Controller层出现异常, 均通过throw Exceptio ...
-
PVS桌面主镜像配置后,实际用户登录,配置未生效
1.打开系统属性——高级——用户配置文件下的[设置] 2.打开用户配置文件,可以看到[复制]项灰化 3.使用windwows enable 工具启动上述灰化项,运行附件的exe文件后,任务栏出现下图标 ...
-
.Net Core(三)MVC Core
MVC Core的改动感觉挺大的,需要的功能大多从Nuget安装,还内置了IOC,支持SelfHost方式运行等等. 一.项目结构的变化创建的新MVC项目的结构发生了变化,比如:静态文件需要统一放置到 ...
-
SQL Server 日期函数大全
一.统计语句 1.--统计当前[>当天00点以后的数据] SELECT * FROM 表 WHERE CONVERT(Nvarchar, dateandtime, 111) = CONVERT( ...