数据挖掘学习-准备篇-python基础

时间:2022-06-15 18:41:09

python科学计算

1.使用python内置数据集

from sklearn import datasets

iris = datasets.load_iris()

>>> print(iris.data)  
[[ 0. 0. 5. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 10. 0. 0.]
[ 0. 0. 0. ..., 16. 9. 0.]
...,
[ 0. 0. 1. ..., 6. 0. 0.]
[ 0. 0. 2. ..., 12. 0. 0.]
[ 0. 0. 10. ..., 12. 1. 0.]]

2.使用svm

>>> from sklearn import svm
>>> clf = svm.SVC(gamma=0.001, C=100.)


3.拟合fit和predict

fit(X, y) 和 predict(T).

X, y = iris.data, iris.target

4.获取数组的大小——shape属性

   iris.shape  

  得(28,19)

5.target

digits.target 就是数字数据集各样例对应的真实数字值。也就是我们的程序要学习的。

6.pickle来保存scikit中的模型

>>>import pickle

>>>s= pickle.dumps(clf)

>>>clf2= pickle.loads(s)


7.Estimators对象

一个 estimator 可以是任意一个从数据中学习到的对象;他可能是分类算法(classification),回归算法(regression), 聚类算法(clustering),或者一个变换算法

不管他是何种算法,所有的 estimator 对象都向外部暴露了一个 fit 方法 ,该成员方法的操作对象是一个数据集

一个estimator的所有参数即可以在初始化的时候设置,也可以 按对应属性修改:

>>> estimator = Estimator(param1=1, param2=2)
>>> estimator.param1
predict(X)  用于预测数据集  X  中的未知标签的样本,并返回预测的标签  y.

每一个estimator暴露一个计算estimator在测试数据上的测试得分的方法: score 得分越大,estimator对数据的拟合模型越好。 .

8.KNN (k nearest neighbors) 分类器例子:

>>> # Split iris data in train and test data
>>> # A random permutation, to split the data randomly
>>> np.random.seed(0)
>>> indices = np.random.permutation(len(iris_X))
>>> iris_X_train = iris_X[indices[:-10]]
>>> iris_y_train = iris_y[indices[:-10]]
>>> iris_X_test = iris_X[indices[-10:]]
>>> iris_y_test = iris_y[indices[-10:]]
>>> # Create and fit a nearest-neighbor classifier
>>> from sklearn.neighbors import KNeighborsClassifier
>>> knn = KNeighborsClassifier()
>>> knn.fit(iris_X_train, iris_y_train)
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
metric_params=None, n_jobs=1, n_neighbors=5, p=2,
weights='uniform')
>>> knn.predict(iris_X_test)
array([1, 2, 1, 0, 0, 0, 2, 1, 2, 0])
>>> iris_y_test
array([1, 1, 1, 0, 0, 0, 2, 1, 2, 0])

9.在python当中处理csv文件,可以使用标准库当中的csv模块。其中的writer和reader方法可以对csv文件进行读写。

import csv
rf
= open('bank.csv','rb')
reader
= csv.reader(rf)

此处要注意,打开一个csv文件,必须用二进制的形式打开。
此时的reader为一个迭代器,它只能使用next()和for循环。
reader.next() 返回即为第一行的内容。

要看得到所有内容,就可以使用for循环了。

for row in reader: print row

接下来,来看写入csv文件。

wf = open('bank2.csv','wb')
writer
= csv.writer(wf)
writer
.writerow(['id','age','sex','region','income','married','children','car','save_act','current_act','mortgage','pep'])
writer
.writerow(reader.next())

10.线性回归LR

线性回归的最简单形式是通过调节一个参数集合为数据集拟合一个线性模型,使得其残差平方和尽可能小。

线性模型:

数据挖掘学习-准备篇-python基础

  • 数据挖掘学习-准备篇-python基础: 数据
  • 数据挖掘学习-准备篇-python基础: 目标变量
  • 数据挖掘学习-准备篇-python基础: 系数
  • 数据挖掘学习-准备篇-python基础: 观测噪声
>>> from sklearn import linear_model
>>> regr = linear_model.LinearRegression()
>>> regr.fit(diabetes_X_train, diabetes_y_train)
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
>>> print(regr.coef_)
[ 0.30349955 -237.63931533 510.53060544 327.73698041 -814.13170937
492.81458798 102.84845219 184.60648906 743.51961675 76.09517222]
>>> # The mean square error
>>> np.mean((regr.predict(diabetes_X_test)-diabetes_y_test)**2)
2004.56760268...
>>> # Explained variance score: 1 is perfect prediction
>>> # and 0 means that there is no linear relationship
>>> # between X and Y.
>>> regr.score(diabetes_X_test, diabetes_y_test)
0.5850753022690...

11.把数据集拆分成 folds 用于训练和测试

>>> import numpy as np
>>> X_folds = np.array_split(X_digits, 3)
>>> y_folds = np.array_split(y_digits, 3)

12.交叉验证生成器

交叉验证生成器 (cross-validation generators)去产生一个 索引列表来达到数据分割的目的:

>>> from sklearn import cross_validation
>>> k_fold = cross_validation.KFold(n=6, n_folds=3)
>>> for train_indices, test_indices in k_fold:
... print('Train: %s | test: %s' % (train_indices, test_indices))
Train: [2 3 4 5] | test: [0 1]
Train: [0 1 4 5] | test: [2 3]
Train: [0 1 2 3] | test: [4 5]

基于交叉验证生成器,交叉验证的实现将会变得非常简单轻松:

>>> kfold = cross_validation.KFold(len(X_digits), n_folds=3)
>>> [svc.fit(X_digits[train], y_digits[train]).score(X_digits[test], y_digits[test])
... for train, test in kfold]
[0.93489148580968284, 0.95659432387312182, 0.93989983305509184]

为了计算一个estimator的score 方法的值, sklearn 暴露了一个辅助性的方法:

>>> cross_validation.cross_val_score(svc, X_digits, y_digits, cv=kfold, n_jobs=-1)
array([ 0.93489149, 0.95659432, 0.93989983])

13.Numpy库

(1)会求均值、方差、协方差

平均值——mean()求均值 print('残差平均值: %.2f' % np.mean((model.predict(X) - y) **2)) 方差——var()求方差 print(np.var([6,8,10,14,18], ddof=1))#numpy.var()可以直接计算方差 协方差——cov()求协方差 print(np.cov([6, 8, 10, 14, 18], [7, 9, 13, 17.5, 18])[0][1]) #numpy.cov()计算协方差

(2)矩阵计算-求逆inv,点乘dot,转置transpose

<code class="hljs lua has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">from numpy.linalg import inv
from numpy import dot, transpose
X = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">[[1, 6, 2], [1, 8, 1], [1, 10, 0], [1, 14, 2], [1, 18, 0]]</span>
y = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">[[7], [9], [13], [17.5], [18]]</span>
<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">print</span>(dot(inv(dot(transpose(X), X)), dot(transpose(X), y)))</code>
(3)使用lstsq()求最小二乘法
<code class="hljs perl has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">from numpy.linalg import lstsqprint(lsts<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">q(X, y)</span>[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>])</code>

14.Sklearn库

(1)会基于linear_model.LinearRegression建立一元/多元线性回归模型;会基于LinearRegression和preprocessing.PolynomialFeatures建立一元多次线性回归模型;会基于linear_model.SGDRegressor建立随机梯度下降SGD模型;

(2)使用model.fit()建模,使用model.predict()预测,使用model.score()求测试集的R-Square;

(3)基于cross_validation,会用train_test_split()函数划分训练集和测试集,会用cross_val_score()计算交叉检验的R-Squre结果;

  1. 调入线性回归函数LinearRegression;
    <code class="hljs python has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> sklearn.linear_model <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> LinearRegression</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li></ul>
  2. fit()建立一元线性回归模型
    <code class="hljs lua has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">X = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">[[8], [9], [11], [16], [12]]</span>
    y = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">[[11], [8.5], [15], [18], [11]]</span>
    model = LinearRegression()
    model.fit(X, y)#建立一元线性回归模型</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li></ul>
  3. fit()建立多元线性回归模型
    <code class="hljs lua has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">X = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">[[6, 2], [8, 1], [10, 0], [14, 2], [18, 0]]</span>
    y = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">[[7], [9], [13], [17.5], [18]]</span>
    model = LinearRegression()
    model.fit(X, y) #建立二元线性回归模型</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li></ul>
  4. predict()通过fit()算出的模型参数构成的模型,对解释变量进行预测获得的值;
    <code class="hljs lua has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">print</span>(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'预测一张12英寸匹萨价格:$%.2f'</span> % model.predict([<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">12</span>])[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>]) #单值预测

    X_test = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">[[8, 2], [9, 0], [11, 2], [16, 2], [12, 0]]</span>
    predictions = model.predict(X_test)#一组数进行预测</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li></ul>
  5. mode.score计算R方R-Square
    <code class="hljs avrasm has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">model<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.score</span>(X_test, y_test) <span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">#LinearRegression的score方法可以计算R方</span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li></ul>
  6. 建立一元多项式回归模型
    <code class="hljs lua has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">from sklearn.linear_model import LinearRegression
    from sklearn.preprocessing import PolynomialFeatures

    X_train = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">[[6], [8], [10], [14], [18]]</span>
    y_train = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">[[7], [9], [13], [17.5], [18]]</span> #需要输入列向量,而不是行向量
    X_test = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">[[6], [8], [11], [16]]</span>#测试数据准备
    y_test = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">[[8], [12], [15], [18]]</span>

    quadratic_featurizer = PolynomialFeatures(degree=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>) #定义多项式的最高阶数
    X_train_quadratic = quadratic_featurizer.fit_transform(X_train) #为fit()建模准备输入
    regressor_quadratic = LinearRegression()
    regressor_quadratic.fit(X_train_quadratic, y_train)# fit()函数建模

    X_test_quadratic = quadratic_featurizer.transform(X_test) #为预测准备输入量
    y_test_quadratic=regressor_quadratic.predict(xx_quadratic) #使用模型预测数据

    <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">print</span>(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'一元线性回归 r-squared'</span>, regressor.score(X_test, y_test)) #计算R-Square</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li></ul>
  7. train_test_split()按数据集分成训练集和测试集;分区比例可以设置,默认25%分给测试集;
    <code class="hljs python has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> sklearn.cross_validation <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> train_test_split
    df = pd.read_csv(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'mlslpic/winequality-red.csv'</span>, sep=<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">';'</span>)
    X = df[list(df.columns)[:-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>]]
    y = df[<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'quality'</span>]
    X_train, X_test, y_train, y_test = train_test_split(X, y)</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li></ul>
  8. train_size改变训练集和测试集的比例
    <code class="language-python hljs  has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">X_train,X_test,y_train,y_test=train_test_split(X,y,<strong>train_size=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0.5</span></strong>)</code>
  9. cross_val_score()可以返回交叉检验的score结果;
    <code class="hljs python has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> sklearn.cross_validation <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> cross_val_score
    regressor = LinearRegression()
    scores = cross_val_score(regressor, X, y, cv=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5</span>)
    print(scores.mean(), scores</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li></ul>
  10. 加载scikit-learn数据集
    <code class="hljs python has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> sklearn.datasets <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> load_boston</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li></ul>
  11. 加载SGDRegressor,归一化StandardScaler,建立模型以及求R-Square
    <code class="hljs python has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> sklearn.linear_model <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> SGDRegressor
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> sklearn.preprocessing <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> StandardScaler
    X_scaler = StandardScaler()
    y_scaler = StandardScaler()

    X_train = X_scaler.fit_transform(X_train)
    y_train = y_scaler.fit_transform(y_train)
    X_test = X_scaler.transform(X_test)
    y_test = y_scaler.transform(y_test)

    regressor = SGDRegressor(loss=<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'squared_loss'</span>)
    regressor.fit_transform(X_train, y_train) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#建立模型</span>

    print(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'测试集R方值:'</span>, regressor.score(X_test, y_test))</code>

15.list索引号

<code class="hljs python has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-prompt" style="color: rgb(0, 102, 102); box-sizing: border-box;">>>> </span>a = [<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">6</span>]
<span class="hljs-prompt" style="color: rgb(0, 102, 102); box-sizing: border-box;">>>> </span>a[:-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>] <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#输出[1, 2, 3, 4, 5]</span>
<span class="hljs-prompt" style="color: rgb(0, 102, 102); box-sizing: border-box;">>>> </span>a[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>:<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>] <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#输出[2]</span>
<span class="hljs-prompt" style="color: rgb(0, 102, 102); box-sizing: border-box;">>>> </span>a[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>:] <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#输出[2, 3, 4, 5, 6]</span>
</code>

16.使用enumerate()循环,得到数组的序号idx(放在前面的)和数值val(放在后面的);

<code class="hljs fsharp has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">for</span> idx, <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span> enumerate(ints):
print(idx, <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span>)</code>

17.linspace()将区间进行划分

<code class="hljs avrasm has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">xx = np<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.linspace</span>(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">26</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">100</span>)
xx <span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">#输出: array([ 0. , 6.5, 13. , 19.5, 26. ])</span></code>

将区间划分arangelinsapce
前者是从起点按照给定步长进行划分,只有当终点也在步长整数倍时才会被包含在内;
后者是将起点和终点中间等距划分,终点位最后一位数;
<code class="language-python hljs  has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">X=np.arange(-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">6</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">6</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5</span>);X

<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># 输出array([-6, -1, 4])</span>

X=np.arange(-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">6</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">6</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>);X

<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># 输出array([-6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5])</span>

X=np.linspace(-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">6</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">6</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5</span>);X

<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># 输出 array([-6., -3., 0., 3., 6.])</span></code>

18.使用LogisticRegression分类器进行训练和分类

<code class="language-python hljs  has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> sklearn.linear_model.logistic <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> LogisticRegression
classifier = LogisticRegression()
classifier.fit(X_train, y_train)
predictions = classifier.predict(X_test)</code>

19.基本用法-Math

  1. 基本数学运算(加、减、乘、除、取余、绝对值、幂),不需要调用math
  2. 数学运算(三角函数、反三角函数、指数、对数),需要调用math
    <code class="language-python hljs  has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> math

    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> math.sin(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span>) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># sine</span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> math.cos(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span>) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># cosine</span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> math.tan(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span>) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># tangent </span>

    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> math.asin(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span>) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># arc sine</span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> math.acos(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span>) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># arc cosine</span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> math.atan(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span>) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># arc tangent</span>

    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> math.sinh(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span>) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># hyperbolic sine </span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> math.cosh(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span>) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># hyperbolic cosine</span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> math.tanh(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span>) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># hyperbolic tangent</span>

    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> math.pow(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4</span>) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># 2 raised to 4</span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> math.exp(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4</span>) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># e ^ 4</span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> math.sqrt(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span>) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># square root</span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> math.pow(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>/<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3.0</span>) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># cubic root of 5</span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> math.log(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3</span>) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># ln; natural logarithm</span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> math.log(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">100</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span>) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># base 10</span>

    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> math.ceil(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2.3</span>) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># ceiling</span>
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> math.floor(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2.7</span>) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;"># floor</span>

    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> math.pi
    <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> math.e</code>

20.numpy.linalg.eig()计算特征值和特征向量

<code class="language-python hljs  has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> numpy <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">as</span> np
w,v=np.linalg.eig(np.array([[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>],[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>,-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">6</span>]]))</code>



sklearn用户手册http://download.csdn.net/detail/ssrob/8757217