本文共 2086 字,大约阅读时间需要 6 分钟。
导入必要的数据分析库:
import numpy as npimport matplotlib as mplimport matplotlib.pyplot as pltimport pandas as pdfrom sklearn.model_selection import train_test_splitfrom sklearn.linear_model import LinearRegression
读取广告投放与销量的数据集:
path = 'Advertising.csv'data = pd.read_csv(path)x = data[['TV', 'Radio', 'Newspaper']]y = data['Sales']
绘制广告投放与销量的对比图:
mpl.rcParams['font.sans-serif'] = [u'simHei']mpl.rcParams['axes.unicode_minus'] = Falseplt.figure(facecolor='w')plt.plot(data['TV'], y, 'ro', label='TV')plt.plot(data['Radio'], y, 'g^', label='Radio')plt.plot(data['Newspaper'], y, 'mv', label='Newspaper')plt.legend(loc='lower right')plt.xlabel(u'广告花费', fontsize=16)plt.ylabel(u'销售额', fontsize=16)plt.title(u'广告花费与销售额对比数据', fontsize=20)plt.grid()plt.show()
划分训练集与测试集:
x_train, x_test, y_train, y_test = train_test_split(x, y, train_size=0.8, random_state=1)print("x_train.shape=", x_train.shape, "y_train.shape=", y_train.shape) 使用线性回归模型拟合数据:
linreg = LinearRegression()model = linreg.fit(x_train, y_train)print("模型系数:", linreg.coef_, "模型截距:", linreg.intercept_) 计算预测误差:
order = y_test.argsort(axis=0)y_test = y_test.values[order]x_test = x_test.values[order, :]y_hat = linreg.predict(x_test)mse = np.average((y_hat - np.array(y_test)) ** 2)rmse = np.sqrt(mse)print('MSE = ', mse)print('RMSE = ', rmse)print('R² = ', linreg.score(x_train, y_train))print('R² = ', linreg.score(x_test, y_test)) 绘制真实数据与预测数据对比图:
plt.figure(facecolor='w')t = np.arange(len(x_test))plt.plot(t, y_test, 'r-', linewidth=2, label=u'真实数据')plt.plot(t, y_hat, 'g-', linewidth=2, label=u'预测数据')plt.legend(loc='upper right')plt.title(u'线性回归预测销量', fontsize=18)plt.grid(b=True)plt.show()
fit(X, y, [sample_weight]):拟合线性模型
X:训练数据,形状为 [n_samples, n_features]y:函数值,形状为 [n_samples, n_targets]sample_weight:样本权重,形状为 [n_samples]predict(X):利用训练好的模型进行预测
X:预测数据集,形状为 (n_samples, n_features)score(X, y, [sample_weight]):返回预测的决定系数 R²
X:训练数据,形状为 [n_samples, n_features]y:关于 X 的真实函数值,形状为 (n_samples) 或 (n_samples, n_outputs)sample_weight:样本权重转载地址:http://bujh.baihongyu.com/