课后作业:广义线性模型

作者:欧新宇(Xinyu OU)

【作业提交】

将测试结果Excel电子表格形式进行提交,同时提交源代码。

  1. 测试结果命名为: ex04-结果-你的学号-你的姓名.xls (.xlsx)
  2. 源代码命名为: ex04-01-你的学号-你的姓名.py

*结果文件,要求每小题标注题号,两题之间要求空一行*


分别使用线性回归、岭回归及套索回归对"波士顿房价"数据集进行回归分析,要求:

  1. 在进行数据集划分的时候,训练集占70%,测试集占30%
  2. 分别测试岭回归 $alpha = [0, 0.1, 2, 5]$ 时的准确率
  3. 分别测试套索回归 $alpha = [0.001, 0.2, 1]$ 时的准确率
  4. 分别给出训练集和测试机的得分,要求结果保留4位小数 (ex04-01)
  1. 导入相关库
In [1]:
# 1.导入基本运算库
import numpy as np
import matplotlib.pyplot as plt

# 配置参数使 matplotlib绘图工具可以显示中文
plt.rcParams['font.sans-serif'] = [u'Microsoft YaHei']

# 2.导入线性回归模型
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge
from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split


# 3. 载入糖尿病情数据集,并使用train_test_split()函数对数据集进行划分
from sklearn.datasets import load_boston
X = load_boston().data
y = load_boston().target

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 83, test_size = 0.3)

# 4. 训练模型
lr = LinearRegression().fit(X_train, y_train)
ridge0 = Ridge(alpha = 0).fit(X_train, y_train)
ridge01 = Ridge(alpha = 0.1).fit(X_train, y_train)
ridge2 = Ridge(alpha = 2).fit(X_train, y_train)
ridge5 = Ridge(alpha = 5).fit(X_train, y_train)
lasso001 = Lasso(alpha = 0.001, max_iter = 100000).fit(X_train, y_train)
lasso02 = Lasso(alpha = 0.2, max_iter = 100000).fit(X_train, y_train)
lasso1 = Lasso(alpha = 1, max_iter = 100000).fit(X_train, y_train)

# 4. 性能分析
print("线性回归:训练集得分: {0:.4f},测试集得分: {1:.4f}".format(lr.score(X_train, y_train), lr.score(X_test, y_test)))
print("岭回归(alpha=0):训练集得分: {0:.4f},测试集得分: {1:.4f}".format(ridge0.score(X_train, y_train), ridge0.score(X_test, y_test)))
print("岭回归(alpha=0.1):训练集得分: {0:.4f},测试集得分: {1:.4f}".format(ridge01.score(X_train, y_train), ridge01.score(X_test, y_test)))
print("岭回归(alpha=2):训练集得分: {0:.4f},测试集得分: {1:.4f}".format(ridge2.score(X_train, y_train), ridge2.score(X_test, y_test)))
print("岭回归(alpha=5):训练集得分: {0:.4f},测试集得分: {1:.4f}".format(ridge5.score(X_train, y_train), ridge5.score(X_test, y_test)))
print("套索回归(alpha=0.001):训练集得分: {0:.4f},测试集得分: {1:.4f}".format(lasso001.score(X_train, y_train), lasso001.score(X_test, y_test)))
print("套索回归(alpha=0.2):训练集得分: {0:.4f},测试集得分: {1:.4f}".format(lasso02.score(X_train, y_train), lasso02.score(X_test, y_test)))
print("套索回归(alpha=1):训练集得分: {0:.4f},测试集得分: {1:.4f}".format(lasso1.score(X_train, y_train), lasso1.score(X_test, y_test)))
线性回归:训练集得分: 0.7469,测试集得分: 0.7129
岭回归(alpha=0):训练集得分: 0.7469,测试集得分: 0.7129
岭回归(alpha=0.1):训练集得分: 0.7468,测试集得分: 0.7138
岭回归(alpha=2):训练集得分: 0.7404,测试集得分: 0.7143
岭回归(alpha=5):训练集得分: 0.7362,测试集得分: 0.7123
套索回归(alpha=0.001):训练集得分: 0.7469,测试集得分: 0.7131
套索回归(alpha=0.2):训练集得分: 0.7232,测试集得分: 0.7069
套索回归(alpha=1):训练集得分: 0.6793,测试集得分: 0.6261