학교수업/수치해석

[Numerical Analysis] 11. Least-Squares

hwijin97 2022. 5. 23. 16:03

Least-Squares Fitting

- Least-Squares Line

- Power Fit

- Data Linearization

- Linear Least-Squares

- Nonlinear Least-Squares

 

Least-Squares Line

- Data points $(x_1, y_1), \cdots, (x_N, y_N), {x_k}$ are distinct

- Numerical method is to defermine $y= f(x)$

- Linear approximation : $y = f(x) = Ax + B$

- if there is an error in the data points : $f(x_k) = y_k + e_k$, where $e_k = f(x_k) - y_k$ is the error 

- How to find the best linear approximation that goes near the points?

 

Errors

- Errors : $e_k = f(x_k) - y_k$ for $ 1 \leq k \leq N$

- Norms to measure how far the curve $y=f(x)$ lies from the data

- Maximun error : $E_\infty (f) = \max_{1 \leq k \leq N} { | f(_k)-y_k | }$

- Average error : $E_1 (f) = \frac{1}{N} \sum_{k=1}^N | f(x_k) - y_k | $

- Root-mean-square error : $E_2 (f) = \sqrt{ \frac{1}{N} \sum_{k=1}^N | f(x_k) - y_k |^2 } $

 

Least-Squares Line

- A best-fitting line is minimizing the error

- RMS is the traditional choice

- Let ${(x_k, y_k)}_{k=1}^N $ be a set of $N$ points

- Least-squares line $y=f(x)= Ax+B$ minimizes Root-Mean-Square error $E_2(f)$

- $E_2(f)$ minimum iff $\sum_{k=1}^N (Ax_k + B - y_k)^2 $ is minimun

- $E(A,B) = \sum e_k^2$,

- $\frac{\partial E}{\partial A} = 0 = \sum 2(Ax_k + B - y_k)x_k \to (\sum x_k^2)A + (\sum x_k)B = \sum x_k y_k$  

- $\frac{\partial E}{\partial B} = 0 = \sum (Ax_k + B - y_k) \to (\sum x_k) A + NB = \sum y_k$

 

Power Fit

- Fitting $f(x) = Ax^M$, where $M$ is known

- ${ (x_k, y_k) }_{k=1}^N$ are $N$ points, $E = \sum e_k^2 = \sum (Ax_k^M - y_k)^2 $

- Least-squares power curve : minimize the RMS (Root-Mean-Square) error : $A = \frac{(\sum_{k=1}^N x_k^M y_k)}{ \sum_{k=1}^N x_k^{2M}}$

 

 

Data Linearization

- Data linearization : process to transform a non-linear relation into a linear relation

- Fitting $y = C e^{Ax}$, $E(A,C) = \sum ( Ce^{Ax_k} - y_k)^2$, $\frac{\partial E}{\partial C} = 0 = 2 \sum ( Ce^{2Ax_k} - y_k e^{Ak})$

- $ (x_k, y_k) \to (X_k, Y_k) $

- Take the logarithm of both sides : $\ln (y) = Ax + \ln (C)$

- Assign new variables : $Y = \ln (y)$, $ X = x$, $B = \ln (C)$

- Linear relation : $Y = AX + B$ -> ${ X_k, Y_k} = {x_k, ln(y_k)}$

- Data linearization of exponential functions : $(X_k, Y_k) = (x_k, \ln (y_k))$

- Solve : (\sum_{k=1}^N X_k^2) A + (\sum_{k=1}^N X_k) B = \sum_{k=1}^N X_kY_k = (\sum_{k=1}^N X_k)X + NB = \sum_{k=1}^N Y_k$, $C = e^B$

 

 

Linear Least Squares

- Linear least-squares problem

- $N$ data points ${(x_k, y_k)}$

- a set of $M$ linear independent functions ${f_j(x)}$

- We want to find $M$ coefficients ${c_j}$

- Minimize $E(c_1, c_2, \cdots, c_M) = \sum_{k=1}^N (f(x_k) - y_k)^2 = \sum_{k=1}^N(( \sum_{j=1}^M c_j f_j(x_k) ) - y_k)^2 $

- Each partial derivative should be zero $\frac{\partial E}{\partial c_i}  = 0$ for $ i = 1, 2, \cdots, M$ : $\sum_{k=1}^N(( \sum_{j=1}^M c_j f_j(x_k) ) - y_k) (f_i(x_k)) = 0$ for $ i = 1,2, \cdots, M$

- $M$ x $M$ linear system for coeffieicnts ${c_j}$ : $\sum_{j=1}^M ( \sum_{k=1}^N f_i(x_k)f_j(x_k))c_j = \sum_{k=1}^N f_i(x_k)y_k$ for $i = 1,2, \cdots, M$

- can represent coefficient ${c_j}$ to matrix $ F^T F C = F^T Y $,

$$F = \begin{bmatrix}
f_1(x_1) & f_2(x_1) & \cdots & f_M(x_1) \\
f_1(x_2) & f_2(x_2) & \cdots & f_M(x_2) \\
\vdots & \vdots & & \vdots \\
f_1(x_N) & f_2(x_N) & \cdots & f_M(x_N) \\
\end{bmatrix} $$

 

Polynomial Fitting

- Polynomial function : $f(x) = c_1 + c_2 x + c_3 x^2 + \cdots + c_{M+1} x^M$

- Least-squares parabola : ${(x_k, y_k)}_{k=1}^N$ are $N$ points , $y = f(x) = Ax^2 + Bx + C$

-Minimize sum of squares of errors : $E(A, B, C) = \sum_{k=1}^N (Ax_k^2 + Bx_k + C - y_k)^2$

- The partial derivatives $\frac{\partial E}{\partial A}, \frac{\partial E}{\partial B}, \frac{\partial E}{\partial C} = 0$

- solving 3 x 3 linear system

 

Nonlinear Least-Squares Method

- Nomlinear system can be solved with Newton's Metehod : Time consuming, Requires good starting values for $A$ and $C$

- Minimize $E(A, C)$ directly using optimization methods