# Optimization: Least Squares

Hongtao Hao / 2023-03-22

Images in this posst came from the slides of CS524 at UW-Madison , 2023Spring

This notebook runs in Julia and is rendered by Hupyter .

# Least squares #

Matrix equations: $Ax = b$

If you have more equations than variables, i.e., when $A$ is tall (overdetermined), for example

$$A = \begin{bmatrix} 2 & 3 \\ 4 & -1 \\ 2 & 1 \end{bmatrix}$$ And $$x = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix}$$ $$b = \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix}$$

More often than not, you don’t have viable solutions. One fix is to use least squares.

If you have more variables than equations (e.g., you only have $2x + 3y = 0$), i.e., when $A$ is wide:

$$A = \begin{bmatrix} 2 & 3 \end{bmatrix}$$ And $$x = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix}$$ $$b = 0$$

it means you either have no solutions or have infinitely many solutions. One solution is to use regularization.

Least squres: to minimize $||Ax - b||^2$ and subject to $x$.

## Interpretation #

The graphcal interpretation of minimizing $||Ax - b||^2$

Take the above example as an instance.

$$A = \begin{bmatrix} 2 & 3 \\ 4 & -1 \\ 2 & 1 \end{bmatrix}$$ $$x = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix}$$ $$\vec{b} = \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix}$$

We write

$$a_1 = \begin{bmatrix} 2 \\ 4 \\ 2 \end{bmatrix}$$ and

$$a_2 = \begin{bmatrix} 3 \\ -1 \\ 1 \end{bmatrix}$$

So we are actually minimizing

$$||\vec{a_1}x_1 + \vec{a_2}x_2 - \vec{b}||^2$$ It should be noted that $\vec{a_1}x_1 + \vec{a_2}x_2$ is a vector itself (Let’s denoted it as $\vec{t}$) and it is on the plane defined by $\vec{a_1}$ and $\vec{a_2}$. So the above objective is to minimize the length of the vector $\vec{t} - \vec{b}$. This is equivalent to finding the projection of $\vec{b}$ on the plane defined by $\vec{a_1}$ and $\vec{a_2}$. Also note that if the equations have a solution, for example

$$A = \begin{bmatrix} 2 & 1 \\ 3 & -1 \end{bmatrix}$$ $$x = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix}$$ $$\vec{b} = \begin{bmatrix} 1 \\ 2 \end{bmatrix}$$ So $$a_1 = \begin{bmatrix} 2 \\ 3 \end{bmatrix}$$ and

$$a_2 = \begin{bmatrix} 5 \\ 7 \end{bmatrix}$$ We have

$$\vec{a_1}x_1 + \vec{a_2}x_2 = \vec{b}$$

In this case, $\vec{b}$ is on the plane defined by $\vec{a_1}$ and $\vec{a_2}$.

In the case previously, because there are no solutions, $\vec{b}$ is not on the plane defined by $\vec{a_1}$ and $\vec{a_2}$. What we want to do is to find a vector on the plane defined by $\vec{a_1}$ and $\vec{a_2}$ that is as close to $\vec{b}$ as possible.

## Norm #

• One norm: $||r||_1 = |r_1| + |r_2| + ... |r_n|$
• Two norm (aka Euclidean distance): $||r||_2 = \sqrt{{r_1}^2 + {r_2}^2 + ... + {r_n}^2}$

## Curve Fitting (Regression) #

Suppose we are giving two series of numbers:

x = [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 ]
y = [ 1, 3, 0, 1, 2, 4, 6, 7, 5, 5, 6, 7.2, 5.5,  4, 3.2, 5]


And we suspect that they are related by a fourth-degree polynomial $y = u_1x^4 + u_2x^3 + u_3x^2 + u_4x + u_5$. How can we find all the $u$s that best agree with our data? Using least squares!

using Plots, Gurobi, JuMP, Polynomials

x = [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 ]
y = [ 1, 3, 0, 1, 2, 4, 6, 7, 5, 5, 6, 7.2, 5.5,  4, 3.2, 5]

Plots.scatter(x, y,
label="",
xlabel="x",
ylabel="y"
)

savefig("/en/blog/2023-03-22-least-square_files/least-squares-01.png") k = 4
n = length(x)
A = zeros(n, k+1)

# get the A matrix
for i in 1:n
A[i,:] = [x[i]^m for m in k:-1:0]
end


Something like this but with a higher degree: m = Model(Gurobi.Optimizer)

@variable(m, u[1:k+1])
@objective(m, Min, sum( (y - A*u).^2 ))

optimize!(m)

Set parameter Username
Gurobi Optimizer version 10.0.1 build v10.0.1rc0 (mac64[x86])

CPU model: Intel(R) Core(TM) i5-5250U CPU @ 1.60GHz
Thread count: 2 physical cores, 4 logical processors, using up to 4 threads

Optimize a model with 0 rows, 5 columns and 0 nonzeros
Model fingerprint: 0xc477cc36
Model has 15 quadratic objective terms
Coefficient statistics:
Matrix range     [0e+00, 0e+00]
Objective range  [1e+02, 2e+06]
QObjective range [3e+01, 2e+10]
Bounds range     [0e+00, 0e+00]
RHS range        [0e+00, 0e+00]
Warning: Model contains large quadratic objective coefficients
Consider reformulating model or setting NumericFocus parameter
to avoid numerical issues.
Presolve time: 0.00s
Presolved: 0 rows, 5 columns, 0 nonzeros
Presolved model has 15 quadratic objective terms
Ordering time: 0.00s

Barrier statistics:
Free vars  : 9
AA' NZ     : 6.000e+00
Factor NZ  : 1.000e+01
Factor Ops : 3.000e+01 (less than 1 second per iteration)

Objective                Residual
Iter       Primal          Dual         Primal    Dual     Compl     Time
0   3.35330000e+02  3.35330000e+02  0.00e+00 2.34e+06  0.00e+00     0s
1   3.33379918e+02  3.35327161e+02  2.23e-08 2.33e+06  0.00e+00     0s
2   3.33281966e+02  3.35327055e+02  4.91e-08 2.33e+06  0.00e+00     0s
3   3.33104809e+02  3.35326933e+02  1.08e-07 2.33e+06  0.00e+00     0s
4   3.32715344e+02  3.35326658e+02  2.37e-07 2.33e+06  0.00e+00     0s
5   3.20366412e+02  3.35178088e+02  5.17e-07 2.29e+06  0.00e+00     0s
6   3.18195814e+02  3.35161569e+02  1.14e-06 2.29e+06  0.00e+00     0s
7   3.13444319e+02  3.35122582e+02  2.50e-06 2.28e+06  0.00e+00     0s
8   2.81119294e+02  3.33872605e+02  5.38e-06 2.18e+06  0.00e+00     0s
9   2.60552748e+02  3.33436335e+02  1.18e-05 2.16e+06  0.00e+00     0s
10   2.67193397e+02  3.30286001e+02  2.97e-06 2.05e+06  0.00e+00     0s
11   2.65244952e+02  3.28704816e+02  6.46e-06 2.00e+06  0.00e+00     0s
12   2.77148108e+02  3.27643838e+02  1.41e-05 1.98e+06  0.00e+00     0s
13   2.48136085e+02  3.09760405e+02  2.89e-05 1.68e+06  0.00e+00     0s
14   9.38163318e+01  2.76831470e+02  1.16e-05 1.34e+06  0.00e+00     0s
15   1.10524099e+02  2.61479626e+02  3.38e-06 1.21e+06  0.00e+00     0s
16   1.03515571e+02  2.45285834e+02  7.12e-06 1.10e+06  0.00e+00     0s
17   1.15518922e+02  2.36704952e+02  1.53e-05 1.04e+06  0.00e+00     0s
18   4.94511564e+01  2.11255549e+02  5.38e-06 8.79e+05  0.00e+00     0s
19   5.54851811e+01  1.91957180e+02  1.68e-06 7.69e+05  0.00e+00     0s
20   5.84647149e+01  1.86886513e+02  3.99e-06 7.42e+05  0.00e+00     0s
21   6.15863372e+01  1.70310423e+02  8.34e-06 6.55e+05  0.00e+00     0s
22   7.64449887e+01  1.52720041e+02  1.73e-05 5.67e+05  0.00e+00     0s
23   1.21617080e+01  1.71645472e+01  2.14e-06 5.67e-01  0.00e+00     0s
24   1.71642681e+01  1.71643923e+01  5.68e-11 5.67e-07  0.00e+00     0s

Barrier solved model in 24 iterations and 0.00 seconds (0.00 work units)
Optimal objective 1.71642681e+01

User-callback calls 88, time in user-callback 0.00 sec

u = value.(u)

5-element Vector{Float64}:
0.002320865140005629
-0.08461679037317738
0.969732004634473
-3.3515882584235666
4.320673076823761


### A simpler way #

In fact, there is a much simpler way. Because the above operations are used frequently, there is a special operator for that: \

u = A\y

5-element Vector{Float64}:
0.002320865086925458
-0.08461679037336926
0.9697320046439619
-3.3515882584477747
4.3206730769230814

Plots.scatter(x, y,
label="",
xlabel="x",
ylabel="y"
)

f(x) = u*x^4 + u*x^3 + u*x^2 + u*x + u

# use continuous x range; otherwise the fit line is discrete
xs = 1:0.01:16

Plots.plot!(xs, f,
linecolor = "green",
linewidth = 2,
label = "fourth-degree polynomial fit line"
)

savefig("/en/blog/2023-03-22-least-square_files/least-squares-02.png") When polynomials have much higher orders, it is best to use the Polynomials package rather than writting it manually like f(x) = u*x^4 + u*x^3 + u*x^2 + u*x + u. In Polynomials, the power order increases, i.e., f(x) = ux + ux^2 + ux^3 ..., so we need to reverse the order of u:

p = Polynomial(reverse(u))


4.3206730769230814 - 3.3515882584477747∙x + 0.9697320046439619∙x2 - 0.08461679037336926∙x3 + 0.002320865086925458∙x4

ys = p.(xs)
Plots.scatter(x, y,
label="",
xlabel="x",
ylabel="y"
)

# note it is is (xs, ys) rather than (xs, p)
Plots.plot!(xs, ys,
linecolor = "green",
linewidth = 2,
label = "fourth-degree polynomial fit line"
)

savefig("/en/blog/2023-03-22-least-square_files/least-squares-03.png") #Optimization