The logistic model (or logit model) belongs to the generalized linear models family (GLM). It is widely used in regression analysis to model a binary dependent variable.
Since GLMs are commonly used R has already built-in functionality to estimate GLMs. Specifically the glm
function from the stats package, with a binomial family and logit link can be used to estimate a logistic model.
The following logistic regression example is from Brian S. Everitt and Torsten Hothorn (2017).
options(width = 10000)
library(HSAUR)
## Loading required package: tools
glm(ESR ~ fibrinogen, data = plasma, family = binomial(link = "logit"))
m <-coef(m)
## (Intercept) fibrinogen
## -6.845075 1.827081
Making use of maximum likelihood estimation the logistic regression model can also be estimated in ROI. Here either a conic solver or a general purpose solver can be used. The conic solvers have the advantages that they are designed to find the global optimum and are (typically) faster.
The log-likelihood \[ \underset{\beta}{\text{maximize}} \sum_{i = 1}^n ~ y_i ~ \log \left(\frac{\exp(X_{i*} \beta)}{1 + \exp(X_{i*} \beta)} \right) + \sum_{i = 1}^n ~ (1 - y_i) ~ \log \left(1 - \frac{\exp(X_{i*} \beta)}{1 + \exp(X_{i*} \beta)} \right) \] can be simplified to \[ \underset{\beta}{\text{maximize}} ~ \sum_{i = 1}^n y_i ~ X_{i*} \beta - \sum_{i = 1}^n \log(1 + \exp(X_{i*} \beta)). \]
Sys.setenv(ROI_LOAD_PLUGINS = FALSE)
library(ROI)
library(ROI.plugin.optimx)
library(ROI.plugin.ecos)
cbind(Intercept = 1, fibrinogen = plasma$fibrinogen)
X <- as.integer(plasma$ESR) - 1L y <-
function(beta) {
mle <-drop(y %*% X %*% beta - sum(log(1 + exp(X %*% beta))))
}
OP(F_objective(mle, n = ncol(X)), maximum = TRUE,
op_gps <-bounds = V_bound(ld = -Inf, nobj = ncol(X)))
ROI_solve(op_gps, "optimx", start = double(ncol(X))) s1 <-
## Maximizing -- use negfn and neggr
## Warning in max(logpar): no non-missing arguments to max; returning -Inf
## Warning in min(logpar): no non-missing arguments to min; returning Inf
solution(s1)
## [1] -6.845075 1.827081
For conic optimization we need to build the matrices necessary to express the objective function.
library(slam)
function(y, X, solver = "ecos", ...) {
logistic_regression <- simple_triplet_matrix
stm <- simple_triplet_zero_matrix
stzm <-stopifnot(is.vector(y), length(y) == nrow(X))
nrow(X); n <- ncol(X)
m <- 3 * seq_len(m) - 2
i <- OP(c(-(y %*% X), rep.int(1, m), double(m)), maximum = FALSE)
op <- stm(rep(i, n), rep(seq_len(n), each = m), -drop(X), 3 * m, n)
C11 <- stm(i, seq_len(m), rep.int(1, m), 3 * m, m)
C12 <- stm(i + 2, seq_len(m), rep.int(-1, m), 3 * m, m)
C13 <- cbind(C11, C12, C13)
C1 <- cbind(stzm(3 * m, n), C12, -C13)
C2 <- rbind(C1, C2)
C <- K_expp(2 * m)
cones <- c(rep(c(0, 1, 1), m), rep(c(0, 1, 0), m))
rhs <-constraints(op) <- C_constraint(C, cones, rhs)
bounds(op) <- V_bound(ld = -Inf, nobj = ncol(C))
ROI_solve(op, solver = solver, ...)
}
logistic_regression(y, X)
s2 <-head(solution(s2), ncol(X))
## [1] -6.845075 1.827081
https://CRAN.R-project.org/package=HSAUR