云计算百科
云计算领域专业知识百科平台

用最小二乘法求解一元一次方程模型的参数

接下来,在采用公式

J

(

θ

)

=

1

2

m

i

=

0

m

1

(

y

p

i

y

i

)

2

J\\left( \\mathbf{\\theta} \\right) = \\frac{1}{2m}\\sum_{i = 0}^{m – 1}{(y_{pi} – y_{i})}^{2}

J(θ)=2m1i=0m1(ypiyi)2这样的优化函数的情况下,一起来求解模型的参数。

在极值点必有导数值为0。导数值为0则表明函数随未知变量的变化率为0。下面先对

θ

0

\\theta_{0}

θ0求偏导数:

J

(

θ

)

θ

0

=

θ

0

(

1

2

m

i

=

0

m

1

(

(

θ

1

x

i

+

θ

0

)

y

i

)

2

)

=

1

2

m

i

=

0

m

1

θ

0

(

(

θ

1

x

i

+

θ

0

)

y

i

)

2

=

1

2

m

i

=

0

m

1

(

2

(

(

θ

1

x

i

+

θ

0

)

y

i

)

θ

0

(

(

θ

1

x

i

+

θ

0

)

y

i

)

)

=

1

2

m

i

=

0

m

1

(

2

(

(

θ

1

x

i

+

θ

0

)

y

i

)

θ

0

(

θ

1

x

i

+

θ

0

)

)

=

1

2

m

i

=

0

m

1

(

2

(

(

θ

1

x

i

+

θ

0

)

y

i

)

)

=

1

m

i

=

0

m

1

(

(

θ

1

x

i

+

θ

0

)

y

i

)

\\frac{\\partial J\\left( \\mathbf{\\theta} \\right)}{\\partial\\theta_{0}} = \\frac{\\partial}{\\partial\\theta_{0}}\\left( \\frac{1}{2m}\\sum_{i = 0}^{m – 1}\\left( \\left( \\theta_{1}x_{i} + \\theta_{0} \\right) – y_{i} \\right)^{2} \\right) = \\frac{1}{2m}\\sum_{i = 0}^{m – 1}{\\frac{\\partial}{\\partial\\theta_{0}}\\left( \\left( \\theta_{1}x_{i} + \\theta_{0} \\right) – y_{i} \\right)^{2}} = \\frac{1}{2m}\\sum_{i = 0}^{m – 1}\\left( 2\\left( \\left( \\theta_{1}x_{i} + \\theta_{0} \\right) – y_{i} \\right)\\frac{\\partial}{\\partial\\theta_{0}}\\left( \\left( \\theta_{1}x_{i} + \\theta_{0} \\right) – y_{i} \\right) \\right) = \\frac{1}{2m}\\sum_{i = 0}^{m – 1}\\left( 2\\left( \\left( \\theta_{1}x_{i} + \\theta_{0} \\right) – y_{i} \\right)\\frac{\\partial}{\\partial\\theta_{0}}\\left( \\theta_{1}x_{i} + \\theta_{0} \\right) \\right) = \\frac{1}{2m}\\sum_{i = 0}^{m – 1}\\left( 2\\left( \\left( \\theta_{1}x_{i} + \\theta_{0} \\right) – y_{i} \\right) \\right) = \\frac{1}{m}\\sum_{i = 0}^{m – 1}\\left( \\left( \\theta_{1}x_{i} + \\theta_{0} \\right) – y_{i} \\right)

θ0J(θ)=θ0(2m1i=0m1((θ1xi+θ0)yi)2)=2m1i=0m1θ0((θ1xi+θ0)yi)2=2m1i=0m1(2((θ1xi+θ0)yi)θ0((θ1xi+θ0)yi))=2m1i=0m1(2((θ1xi+θ0)yi)θ0(θ1xi+θ0))=2m1i=0m1(2((θ1xi+θ0)yi))=m1i=0m1((θ1xi+θ0)yi)

要想打好机器学习的数学基础,请参见清华大学出版社的人人可懂系列,包括《人人可懂的微积分》(已上市)、《人人可懂的线性代数》(即将上市)、《人人可懂的概率统计》(即将上市)。 在这里插入图片描述

再对

θ

1

\\theta_{1}

θ1求偏导数:

J

(

θ

)

θ

1

=

θ

1

(

1

2

m

i

=

0

m

1

(

(

θ

1

x

i

+

θ

0

)

y

i

)

2

)

=

1

2

m

i

=

0

m

1

θ

1

(

(

θ

1

x

i

+

θ

0

)

y

i

)

2

=

1

2

m

i

=

0

m

1

(

2

(

(

θ

1

x

i

+

θ

0

)

y

i

)

θ

1

(

(

θ

1

x

i

+

θ

0

)

y

i

)

)

=

1

2

m

i

=

0

m

1

(

2

(

(

θ

1

x

i

+

θ

0

)

y

i

)

θ

1

(

θ

1

x

i

+

θ

0

)

)

=

1

2

m

i

=

0

m

1

(

2

(

(

θ

1

x

i

+

θ

0

)

y

i

)

θ

1

(

θ

1

x

i

)

)

=

1

2

m

i

=

0

m

1

(

2

(

(

θ

1

x

i

+

θ

0

)

y

i

)

x

i

)

=

1

m

i

=

0

m

1

(

(

(

θ

1

x

i

+

θ

0

)

y

i

)

x

i

)

=

1

m

i

=

0

m

1

(

(

θ

1

x

i

+

θ

0

)

x

i

y

i

x

i

)

\\frac{\\partial J(\\mathbf{\\theta})}{\\partial\\theta_{1}} = \\frac{\\partial}{\\partial\\theta_{1}}\\left( \\frac{1}{2m}\\sum_{i = 0}^{m – 1}\\left( \\left( \\theta_{1}x_{i} + \\theta_{0} \\right) – y_{i} \\right)^{2} \\right) = \\frac{1}{2m}\\sum_{i = 0}^{m – 1}{\\frac{\\partial}{\\partial\\theta_{1}}\\left( \\left( \\theta_{1}x_{i} + \\theta_{0} \\right) – y_{i} \\right)^{2}} = \\frac{1}{2m}\\sum_{i = 0}^{m – 1}\\left( 2\\left( \\left( \\theta_{1}x_{i} + \\theta_{0} \\right) – y_{i} \\right)\\frac{\\partial}{\\partial\\theta_{1}}\\left( \\left( \\theta_{1}x_{i} + \\theta_{0} \\right) – y_{i} \\right) \\right) = \\frac{1}{2m}\\sum_{i = 0}^{m – 1}\\left( 2\\left( \\left( \\theta_{1}x_{i} + \\theta_{0} \\right) – y_{i} \\right)\\frac{\\partial}{\\partial\\theta_{1}}\\left( \\theta_{1}x_{i} + \\theta_{0} \\right) \\right) = \\frac{1}{2m}\\sum_{i = 0}^{m – 1}\\left( 2\\left( \\left( \\theta_{1}x_{i} + \\theta_{0} \\right) – y_{i} \\right)\\frac{\\partial}{\\partial\\theta_{1}}(\\theta_{1}x_{i}) \\right) = \\frac{1}{2m}\\sum_{i = 0}^{m – 1}\\left( 2\\left( \\left( \\theta_{1}x_{i} + \\theta_{0} \\right) – y_{i} \\right)x_{i} \\right) = \\frac{1}{m}\\sum_{i = 0}^{m – 1}\\left( \\left( \\left( \\theta_{1}x_{i} + \\theta_{0} \\right) – y_{i} \\right)x_{i} \\right) = \\frac{1}{m}\\sum_{i = 0}^{m – 1}\\left( \\left( \\theta_{1}x_{i} + \\theta_{0} \\right)x_{i} – y_{i}x_{i} \\right)

θ1J(θ)=θ1(2m1i=0m1((θ1xi+θ0)yi)2)=2m1i=0m1θ1((θ1xi+θ0)yi)2=2m1i=0m1(2((θ1xi+θ0)yi)θ1((θ1xi+θ0)yi))=2m1i=0m1(2((θ1xi+θ0)yi)θ1(θ1xi+θ0))=2m1i=0m1(2((θ1xi+θ0)yi)θ1(θ1xi))=2m1i=0m1(2((θ1xi+θ0)yi)xi)=m1i=0m1(((θ1xi+θ0)yi)xi)=m1i=0m1((θ1xi+θ0)xiyixi)

由此,可得到方程组:

{

J

(

θ

)

θ

0

=

1

m

i

=

0

m

1

(

(

θ

1

x

i

+

θ

0

)

y

i

)

=

0

J

(

θ

)

θ

1

=

1

m

i

=

0

m

1

(

(

θ

1

x

i

+

θ

0

)

x

i

y

i

x

i

)

=

0

 

\\left\\{ \\begin{matrix} \\frac{\\partial J(\\mathbf{\\theta})}{\\partial\\theta_{0}} = \\frac{1}{m}\\sum_{i = 0}^{m – 1}\\left( \\left( \\theta_{1}x_{i} + \\theta_{0} \\right) – y_{i} \\right) = 0 \\\\ \\frac{\\partial J(\\mathbf{\\theta})}{\\partial\\theta_{1}} = \\frac{1}{m}\\sum_{i = 0}^{m – 1}\\left( \\left( \\theta_{1}x_{i} + \\theta_{0} \\right)x_{i} – y_{i}x_{i} \\right) = 0 \\\\ \\end{matrix} \\right.\\

{θ0J(θ)=m1i=0m1((θ1xi+θ0)yi)=0θ1J(θ)=m1i=0m1((θ1xi+θ0)xiyixi)=0 

(方程组1)

怎么求解这个方程组呢?可根据方程组的第1个方程,得到:

1

m

i

=

0

m

1

(

(

θ

1

x

i

+

θ

0

)

y

i

)

=

0

1

m

i

=

0

m

1

(

θ

1

x

i

+

θ

0

y

i

)

=

0

\\frac{1}{m}\\sum_{i = 0}^{m – 1}\\left( \\left( \\theta_{1}x_{i} + \\theta_{0} \\right) – y_{i} \\right) = 0 \\Longrightarrow \\frac{1}{m}\\sum_{i = 0}^{m – 1}\\left( \\theta_{1}x_{i} + \\theta_{0} – y_{i} \\right) = 0

m1i=0m1((θ1xi+θ0)yi)=0m1i=0m1(θ1xi+θ0yi)=0

i

=

0

m

1

θ

0

=

i

=

0

m

1

(

y

i

θ

1

x

i

)

m

θ

0

=

i

=

0

m

1

(

y

i

θ

1

x

i

)

\\Longrightarrow \\sum_{i = 0}^{m – 1}\\theta_{0} = \\sum_{i = 0}^{m – 1}\\left( y_{i} – \\theta_{1}x_{i} \\right) \\Longrightarrow m\\theta_{0} = \\sum_{i = 0}^{m – 1}\\left( y_{i} – \\theta_{1}x_{i} \\right)

i=0m1θ0=i=0m1(yiθ1xi)mθ0=i=0m1(yiθ1xi)

m

θ

0

=

i

=

0

m

1

y

i

i

=

0

m

1

(

θ

1

x

i

)

0

=

i

=

0

m

1

y

i

θ

1

i

=

0

m

1

x

i

\\Longrightarrow m\\theta_{0} = \\sum_{i = 0}^{m – 1}y_{i} – \\sum_{i = 0}^{m – 1}\\left( \\theta_{1}x_{i} \\right) \\Longrightarrow \\text{mθ}_{0} = \\sum_{i = 0}^{m – 1}y_{i} – \\theta_{1}\\sum_{i = 0}^{m – 1}x_{i}

mθ0=i=0m1yii=0m1(θ1xi)0=i=0m1yiθ1i=0m1xi

θ

0

=

1

m

i

=

0

m

1

y

i

1

m

θ

1

i

=

0

m

1

x

i

\\Longrightarrow \\theta_{0} = \\frac{1}{m}\\sum_{i = 0}^{m – 1}y_{i} – {\\frac{1}{m}\\theta}_{1}\\sum_{i = 0}^{m – 1}x_{i}

θ0=m1i=0m1yim1θ1i=0m1xi

根据方程组4-5的第2个方程,可得:

1

m

i

=

0

m

1

(

(

θ

1

x

i

+

θ

0

)

x

i

y

i

x

i

)

=

0

i

=

0

m

1

(

(

θ

1

x

i

+

θ

0

)

x

i

y

i

x

i

)

=

0

\\frac{1}{m}\\sum_{i = 0}^{m – 1}\\left( \\left( \\theta_{1}x_{i} + \\theta_{0} \\right)x_{i} – y_{i}x_{i} \\right) = 0 \\Longrightarrow \\sum_{i = 0}^{m – 1}\\left( \\left( \\theta_{1}x_{i} + \\theta_{0} \\right)x_{i} – y_{i}x_{i} \\right) = 0

m1i=0m1((θ1xi+θ0)xiyixi)=0i=0m1((θ1xi+θ0)xiyixi)=0

i

=

0

m

1

(

θ

1

x

i

2

+

θ

0

x

i

y

i

x

i

)

=

0

i

=

0

m

1

(

θ

0

x

i

)

=

i

=

0

m

1

(

y

i

x

i

θ

1

x

i

2

)

\\Longrightarrow \\sum_{i = 0}^{m – 1}\\left( \\theta_{1}{x_{i}}^{2} + \\theta_{0}x_{i} – y_{i}x_{i} \\right) = 0 \\Longrightarrow \\sum_{i = 0}^{m – 1}\\left( \\theta_{0}x_{i} \\right) = \\sum_{i = 0}^{m – 1}\\left( y_{i}x_{i} – \\theta_{1}{x_{i}}^{2} \\right)

i=0m1(θ1xi2+θ0xiyixi)=0i=0m1(θ0xi)=i=0m1(yixiθ1xi2)

θ

0

i

=

0

m

1

x

i

=

i

=

0

m

1

(

y

i

x

i

)

θ

1

i

=

0

m

1

x

i

2

\\Longrightarrow \\theta_{0}\\sum_{i = 0}^{m – 1}x_{i} = \\sum_{i = 0}^{m – 1}\\left( y_{i}x_{i} \\right) – \\theta_{1}\\sum_{i = 0}^{m – 1}{x_{i}}^{2}

θ0i=0m1xi=i=0m1(yixi)θ1i=0m1xi2

θ

0

=

(

i

=

0

m

1

(

y

i

x

i

)

θ

1

i

=

0

m

1

x

i

2

)

i

=

0

m

1

x

i

\\Longrightarrow \\theta_{0} = \\frac{\\left( \\sum_{i = 0}^{m – 1}\\left( y_{i}x_{i} \\right) – \\theta_{1}\\sum_{i = 0}^{m – 1}{x_{i}}^{2} \\right)}{\\sum_{i = 0}^{m – 1}x_{i}}

θ0=i=0m1xi(i=0m1(yixi)θ1i=0m1xi2)

因此,可得:

1

m

i

=

0

m

1

y

i

1

m

θ

1

i

=

0

m

1

x

i

=

(

i

=

0

m

1

(

y

i

x

i

)

θ

1

i

=

0

m

1

x

i

2

)

i

=

0

m

1

x

i

\\frac{1}{m}\\sum_{i = 0}^{m – 1}y_{i} – {\\frac{1}{m}\\theta}_{1}\\sum_{i = 0}^{m – 1}x_{i} = \\frac{\\left( \\sum_{i = 0}^{m – 1}\\left( y_{i}x_{i} \\right) – \\theta_{1}\\sum_{i = 0}^{m – 1}{x_{i}}^{2} \\right)}{\\sum_{i = 0}^{m – 1}x_{i}}

m1i=0m1yim1θ1i=0m1xi=i=0m1xi(i=0m1(yixi)θ1i=0m1xi2)

1

m

i

=

0

m

1

y

i

i

=

0

m

1

x

i

1

m

θ

1

i

=

0

m

1

x

i

i

=

0

m

1

x

i

=

i

=

0

m

1

(

y

i

x

i

)

θ

1

i

=

0

m

1

x

i

2

\\Longrightarrow \\frac{1}{m}\\sum_{i = 0}^{m – 1}y_{i}\\sum_{i = 0}^{m – 1}x_{i} – \\frac{1}{m}\\theta_{1}\\sum_{i = 0}^{m – 1}x_{i}\\sum_{i = 0}^{m – 1}x_{i} = \\sum_{i = 0}^{m – 1}\\left( y_{i}x_{i} \\right) – \\theta_{1}\\sum_{i = 0}^{m – 1}{x_{i}}^{2}

m1i=0m1yii=0m1xim1θ1i=0m1xii=0m1xi=i=0m1(yixi)θ1i=0m1xi2

i

=

0

m

1

y

i

i

=

0

m

1

x

i

θ

1

i

=

0

m

1

x

i

i

=

0

m

1

x

i

=

m

i

=

0

m

1

(

y

i

x

i

)

m

θ

1

i

=

0

m

1

x

i

2

\\Longrightarrow \\sum_{i = 0}^{m – 1}y_{i}\\sum_{i = 0}^{m – 1}x_{i} – \\theta_{1}\\sum_{i = 0}^{m – 1}x_{i}\\sum_{i = 0}^{m – 1}x_{i} = m\\sum_{i = 0}^{m – 1}\\left( y_{i}x_{i} \\right) – m\\theta_{1}\\sum_{i = 0}^{m – 1}{x_{i}}^{2}

i=0m1yii=0m1xiθ1i=0m1xii=0m1xi=mi=0m1(yixi)mθ1i=0m1xi2

 

m

θ

1

i

=

0

m

1

x

i

2

θ

1

i

=

0

m

1

x

i

i

=

0

m

1

x

i

=

m

i

=

0

m

1

(

y

i

x

i

)

i

=

0

m

1

y

i

i

=

0

m

1

x

i

\\Longrightarrow \\ m\\theta_{1}\\sum_{i = 0}^{m – 1}{x_{i}}^{2} – \\theta_{1}\\sum_{i = 0}^{m – 1}x_{i}\\sum_{i = 0}^{m – 1}x_{i} = m\\sum_{i = 0}^{m – 1}\\left( y_{i}x_{i} \\right) – \\sum_{i = 0}^{m – 1}y_{i}\\sum_{i = 0}^{m – 1}x_{i}

 mθ1i=0m1xi2θ1i=0m1xii=0m1xi=mi=0m1(yixi)i=0m1yii=0m1xi

 

θ

1

(

m

i

=

0

m

1

x

i

2

i

=

0

m

1

x

i

i

=

0

m

1

x

i

)

=

m

i

=

0

m

1

(

y

i

x

i

)

i

=

0

m

1

y

i

i

=

0

m

1

x

i

\\Longrightarrow \\ \\theta_{1}\\left( m\\sum_{i = 0}^{m – 1}{x_{i}}^{2} – \\sum_{i = 0}^{m – 1}x_{i}\\sum_{i = 0}^{m – 1}x_{i} \\right) = m\\sum_{i = 0}^{m – 1}\\left( y_{i}x_{i} \\right) – \\sum_{i = 0}^{m – 1}y_{i}\\sum_{i = 0}^{m – 1}x_{i}

 θ1(mi=0m1xi2i=0m1xii=0m1xi)=mi=0m1(yixi)i=0m1yii=0m1xi

 

θ

1

=

(

m

i

=

0

m

1

(

y

i

x

i

)

i

=

0

m

1

y

i

i

=

0

m

1

x

i

)

(

m

i

=

0

m

1

x

i

2

i

=

0

m

1

x

i

i

=

0

m

1

x

i

)

\\Longrightarrow \\ \\theta_{1} = \\frac{\\left( m\\sum_{i = 0}^{m – 1}\\left( y_{i}x_{i} \\right) – \\sum_{i = 0}^{m – 1}y_{i}\\sum_{i = 0}^{m – 1}x_{i} \\right)}{\\left( m\\sum_{i = 0}^{m – 1}{x_{i}}^{2} – \\sum_{i = 0}^{m – 1}x_{i}\\sum_{i = 0}^{m – 1}x_{i} \\right)}

 θ1=(mi=0m1xi2i=0m1xii=0m1xi)(mi=0m1(yixi)i=0m1yii=0m1xi)

至此,已求解得直线模型的2个参数值为:

{

θ

1

=

(

m

i

=

0

m

1

(

y

i

x

i

)

i

=

0

m

1

y

i

i

=

0

m

1

x

i

)

(

m

i

=

0

m

1

x

i

2

i

=

0

m

1

x

i

i

=

0

m

1

x

i

)

θ

0

=

1

m

i

=

0

m

1

y

i

1

m

θ

1

i

=

0

m

1

x

i

 

\\left\\{ \\begin{matrix} \\theta_{1} = \\frac{\\left( m\\sum_{i = 0}^{m – 1}\\left( y_{i}x_{i} \\right) – \\sum_{i = 0}^{m – 1}y_{i}\\sum_{i = 0}^{m – 1}x_{i} \\right)}{\\left( m\\sum_{i = 0}^{m – 1}{x_{i}}^{2} – \\sum_{i = 0}^{m – 1}x_{i}\\sum_{i = 0}^{m – 1}x_{i} \\right)} \\\\ \\theta_{0} = \\frac{1}{m}\\sum_{i = 0}^{m – 1}y_{i} – {\\frac{1}{m}\\theta}_{1}\\sum_{i = 0}^{m – 1}x_{i} \\\\ \\end{matrix} \\right.\\

θ1=(mi=0m1xi2i=0m1xii=0m1xi)(mi=0m1(yixi)i=0m1yii=0m1xi)θ0=m1i=0m1yim1θ1i=0m1xi 

(方程组2的解)

要想打好机器学习的数学基础,请参见清华大学出版社的人人可懂系列,包括《人人可懂的微积分》(已上市)、《人人可懂的线性代数》(即将上市)、《人人可懂的概率统计》(即将上市)。

提示:

i

=

0

m

1

(

y

i

x

i

)

\\sum_{i = 0}^{m – 1}\\left( y_{i}x_{i} \\right)

i=0m1(yixi)

i

=

0

m

1

y

i

i

=

0

m

1

x

i

\\sum_{i = 0}^{m – 1}y_{i}\\sum_{i = 0}^{m – 1}x_{i}

i=0m1yii=0m1xi不同,只要将这2个式子展开来就可以看得很清楚了:

i

=

0

m

1

(

y

i

x

i

)

=

y

0

x

0

+

+

y

m

1

x

m

1

\\sum_{i = 0}^{m – 1}\\left( y_{i}x_{i} \\right) = y_{0}x_{0} + \\ldots + y_{m – 1}x_{m – 1}

i=0m1(yixi)=y0x0++ym1xm1

i

=

0

m

1

y

i

i

=

0

m

1

x

i

=

(

y

0

+

+

y

m

1

)

(

x

0

+

+

x

m

1

)

\\sum_{i = 0}^{m – 1}y_{i}\\sum_{i = 0}^{m – 1}x_{i} = (y_{0} + \\ldots + y_{m – 1})(x_{0} + \\ldots + x_{m – 1})

i=0m1yii=0m1xi=(y0++ym1)(x0++xm1)

这2个式子明显不同。再来看

i

=

0

m

1

x

i

2

\\sum_{i = 0}^{m – 1}{x_{i}}^{2}

i=0m1xi2

i

=

0

m

1

x

i

i

=

0

m

1

x

i

\\sum_{i = 0}^{m – 1}x_{i}\\sum_{i = 0}^{m – 1}x_{i}

i=0m1xii=0m1xi,将这2个式子展开来就可以看得很清晰:

i

=

0

m

1

x

i

2

=

x

0

2

+

+

x

m

1

2

\\sum_{i = 0}^{m – 1}{x_{i}}^{2} = {x_{0}}^{2} + \\ldots + {x_{m – 1}}^{2}

i=0m1xi2=x02++xm12

i

=

0

m

1

x

i

i

=

0

m

1

x

i

=

(

x

0

+

+

x

m

1

)

2

\\sum_{i = 0}^{m – 1}x_{i}\\sum_{i = 0}^{m – 1}x_{i} = {(x_{0} + \\ldots + x_{m – 1})}^{2}

i=0m1xii=0m1xi=(x0++xm1)2

这2个式子看起来也明显不同。

赞(0)
未经允许不得转载:网硕互联帮助中心 » 用最小二乘法求解一元一次方程模型的参数
分享到: 更多 (0)

评论 抢沙发

评论前必须登录!