Logistic Regression：银行贷款申请审批实例

问题定义

这是一个贷款的审批问题，假设你是一个银行的贷款审批员，现在有客户需要一定额度的贷款，他们填写了个人的信息（信息在datas.txt中给出），你需要根据他们的信息，建立一个分类模型，判断是否可以给他们贷款。

请根据所给的信息，建立分类模型，评价模型，同时将模型建立过程简单介绍一下，同时对各特征进行简单的解释说明。

Dataset

用户id，年龄，性别，申请金额，职业类型，教育程度，婚姻状态，房屋类型，户口类型，贷款用途，公司类型，薪水，贷款标记：0不放贷，1同意放贷

Data preprocessing

在对数据进行建模时，用户ID是没有用的。在描述用户信息的几个维度数据中，年龄，申请金额，薪水是连续值，剩下的是离散值。

通过观察发现有些数据存在数据缺失的情况，需要对这些数据进行处理，比如直接删除或者通过缺失值补全。

The Logit Function

Logistic Regression：银行贷款申请审批实例

The Logistic Regression

Logistic Regression：银行贷款申请审批实例

Model Data

 #逻辑回归模型

 #对银行客户是否放贷进行分类

 import pandas

 import numpy

 import matplotlib.pyplot as plt

 from sklearn.linear_model import  LogisticRegression

 from sklearn.metrics import roc_curve, roc_auc_score

 data = pandas.read_csv("datas.csv")

 data = data.dropna()

 # Randomly shuffle our data for the training and test set

 admissions = data.loc[numpy.random.permutation(data.index)]

 # train with 700 and test with the following 300, split dataset

 num_train = 14968

 data_train = admissions[:num_train]

 data_test = admissions[num_train:]

 # Fit Logistic regression to admit with features using the training set

 logistic_model = LogisticRegression()

 logistic_model.fit(data_train[['Age','Gender','AppAmount','Occupation',

                                'Education','Marital','Property','Residence',

                                'LoanUse','Company','Salary']], data_train['Label'])

 # Print the Models Coefficients

 print(logistic_model.coef_)

 # .predict() using a threshold of 0.50 by default

 predicted = logistic_model.predict(data_train[['Age','Gender','AppAmount','Occupation',

                                'Education','Marital','Property','Residence',

                                'LoanUse','Company','Salary']])

 # The average of the binary array will give us the accuracy

 accuracy_train = (predicted == data_train['Label']).mean()

 # Print the accuracy

 print("Accuracy in Training Set = {s}".format(s=accuracy_train))

 # Predicted to be admitted

 predicted = logistic_model.predict(data_test[['Age','Gender','AppAmount','Occupation',

                                'Education','Marital','Property','Residence',

                                'LoanUse','Company','Salary']])

 # What proportion of our predictions were true

 accuracy_test = (predicted == data_test['Label']).mean()

 print("Accuracy in Test Set = {s}".format(s=accuracy_test))

 # Predict the chance of label from those in the training set

 train_probs = logistic_model.predict_proba(data_train[['Age','Gender','AppAmount','Occupation',

                                'Education','Marital','Property','Residence',

                                'LoanUse','Company','Salary']])[:,1]

 test_probs = logistic_model.predict_proba(data_test[['Age','Gender','AppAmount','Occupation',

                                'Education','Marital','Property','Residence',

                                'LoanUse','Company','Salary']])[:,1]

 # Compute auc for training set

 auc_train = roc_auc_score(data_train["Label"], train_probs)

 # Compute auc for test set

 auc_test = roc_auc_score(data_test["Label"], test_probs)

 # Difference in auc values

 auc_diff = auc_train - auc_test

 # Compute ROC Curves

 roc_train = roc_curve(data_train["Label"], train_probs)

 roc_test = roc_curve(data_test["Label"], test_probs)

 # Plot false positives by true positives

 plt.plot(roc_train[0], roc_train[1])

 plt.plot(roc_test[0], roc_test[1])

秒客网

Logistic Regression：银行贷款申请审批实例

问题定义

Dataset

Data preprocessing

The Logit Function

The Logistic Regression

Model Data

相关文章