Processing math: 100%
  • References

The least absolute deviation (LAD) — also known as L1 regression — is a statistical optimization technique attempting to find a function f(xi) which closely approximates a data set of the points (xi, yi) with i=1,,n. It minimizes the sum of absolute errors between points generated by the function (ˆyi) and the corresponding data points. minimize  ni|yiˆyi| As such can be classified as a non-linear optimization problem.

However, following, e.g., Brooks and Dula (2013) the objective function can be expressed as

minimizeβ0,β,e+,e  ni=1e+i+ei                 subject to                                                         β0+βxi+e+iei=0      i=1,,n                              βj=1                                            e+i,ei0     i=1,,n given a set of points xiRm, i=1,,n and the jth column representing the dependent variable y. In comparison to the original formulation we differentiate between positive and negative deviations/errors (e+i and ei, respectively) to get to the linear programming problem shown above.


LP Implementation

The linear programming formulation of the L1 regression problem as shown above can be constructed using ROI methods via the following R function.

create_L1_problem <- function(x, j) {
  m <- ncol(x) + 1L
  n <- 2 * nrow(x)
  beta <- double(m + n)
  beta[j + 1] <- 1
  OP(objective = L_objective(c(rep(0, m), rep(1, n))),
    constraints = rbind(
      L_constraint(L = cbind(1, x, diag(nrow(x)), -diag(nrow(x))),
        dir = eq(nrow(x)), rhs = rep(0, nrow(x))),
      L_constraint(L = beta, dir = "==", rhs = -1)),
    bounds = V_bound(li = seq_len(m), lb = rep(-Inf, m),
      nobj = length(beta)))
}

With ROI one can solve e.g., Brownlee’s stack loss plant data example from the stats package using the above OP and the solver GLPK as follows.

Sys.setenv(ROI_LOAD_PLUGINS = FALSE)
library(ROI)
library(ROI.plugin.glpk)

data(stackloss)
l1p <- create_L1_problem(as.matrix(stackloss), 4)
L1_res <- ROI_solve(l1p, solver = "glpk")

The first value corresponds to the intercept and the others to the model coefficients.

Comparison

For comparison purposes we solve the same problem with R package quantreg.

library("quantreg")
fm <- rq(stack.loss ~ stack.x, 0.5)
n <- ncol(stackloss)
cbind(ROI = head(solution(L1_res), n),
      quantreg = coef(fm))
##                            ROI     quantreg
## (Intercept)       -39.68985507 -39.68985507
## stack.xAir.Flow     0.83188406   0.83188406
## stack.xWater.Temp   0.57391304   0.57391304
## stack.xAcid.Conc.  -0.06086957  -0.06086957

References

  • Brooks JP, Dula JH (2013). The L1-norm best-fit hyperplane problem. Applied Mathematics Letters, 26(1), 51–55. URL doi:10.1016/j.aml.2012.03.031