Model

The multinomial logistic model (or multinomial logit model) is widely used in regression analysis to model unordered categorical variables.

Likelihood

Assume we have a categorical dependent variable \(y_{i} \in {1, \ldots, J}\) which can take a value out of \(J\) unordered categories for each observation \(i = 1,\ldots, n\). Moreover, for each observation we observe a vector of \(p\) covariates \(\boldsymbol x_i\) which do not depend on the category (this assumption can easily be relaxed). Let us create a binary vector \(\boldsymbol {\tilde y}_i\) where \(\tilde y_{ij}=1\) if \(y_i=j\). The likelihood is given by: \[ \ell(\boldsymbol\beta_1, \ldots, \boldsymbol\beta_J) = \prod_{i=1}^n\prod_{j=1}^J\left[\frac{\exp(\boldsymbol x^\top_i \boldsymbol\beta_j)}{\sum_{l=1}^J\exp(\boldsymbol x_i^\top \boldsymbol\beta_l)}\right]^{\tilde y_{ij}} \] and the log-likelihood is: \[ \sum_{i=1}^n \sum_{j=1}^J \tilde y_{ij} \log\left[\frac{\exp(\boldsymbol x^\top_i \boldsymbol\beta_j)}{\sum_{l=1}^J\exp(\boldsymbol x_i^\top \boldsymbol\beta_l)}\right]= \sum_{i=1}^n \left(\sum_{j=1}^J \tilde y_{ij}\boldsymbol x^\top_i \boldsymbol\beta_j\right) - \mathrm{log}\sum_{l=1}^J\exp(\boldsymbol x_i^\top \boldsymbol\beta_l) \] For identification purposes one must set the \(\boldsymbol\beta\) equal to 0 for one baseline category.

Conic program

The second term of the log-likelihood can be modeled by conic programming. Assuming that the first category is the baseline category, the problem of maximizing the log-likelihood can be written as: \[\begin{align} \min_{\substack{\boldsymbol\beta_l,\\ l=1,\ldots,J}}\quad &- \sum_{i=1}^n \left(\sum_{j=1}^J \tilde y_{ij}\boldsymbol x^\top_i \boldsymbol\beta_j\right) + \sum_{i=1}^n t_i\\ \text{s.t.}\quad & u^{1}_i + \ldots + u^{J}_i \leq 1, \quad \forall i=1,\ldots, n\\ & (-t_{i}, 1, u^1_{i})^\top\in\mathcal{K}_\text{expp}\\ &(x_i^\top\boldsymbol \beta_j - t_i, 1, u^j_i)^\top\in\mathcal{K}_\text{expp}, \quad \forall j = 2,\ldots,J. \end{align}\]

Estimation

In R several packages have built in functionality for estimating the multinomial logistic regression. Among others, the multinom() function from nnet package (Venables & Ripley, 2002), the vglm() and multinomial() functions of the VGAM package (Yee, 2010) and the mlogit() function from the mlogit package (Croissant, 2020).

When implementing the function in ROI, the conic program above must be specified by constructing the appropriate matrices.

mlogit_roi <- function(X, y, solver = "auto", ...) {
  stm <- simple_triplet_matrix
  stzm <- simple_triplet_zero_matrix
  y <- as.numeric(y)
  stopifnot(is.vector(y), length(y) == nrow(X))
  ymat <- model.matrix(~ as.factor(y))[, - 1] 
  xtilde <- model.matrix(~ 0 + ymat : X)    
  ytilde <- (y != min(y)) + 0 # indicator taking zero for category to be excluded
  n <- nrow(X); p <- ncol(X); J <- max(y); ptilde <- ncol(xtilde)
  
  i <- 3 * seq_len(n) - 2 ## triplets for cones
  ## Variables: beta_2, .., beta_J, t_i, u^1,..., u^J
  op <- OP(c(- (ytilde %*% xtilde), rep.int(1, n), double(n * J)), maximum = FALSE)
  Ct <- stm(i, seq_len(n), rep.int(1, n), 3 * n, n)  
  Cu <- stm(i + 2, seq_len(n), rep.int(-1, n), 3 * n, n)
  Clist <- lapply(seq_len(J), function(j) {
    Cx <- if(j == 1) stzm(3 * n, ptilde) else
                 stm(rep(i, p), rep((seq_len(p) - 1) * (J - 1) + j - 1, each = n),
                  -drop(X), 3 * n, ptilde)
    CC <- cbind(Cx, Ct, stzm(3 * n, n * (j - 1)), Cu, stzm(3 * n, n * (J - j)))
  })
  
  C <- do.call("rbind", Clist)
  cones <- K_expp(J * n)
  rhs <- rep(c(0, 1, 0), n * J)
  
  CL <- cbind(stzm(n, ptilde + n),
              stm(rep(seq_len(n), J), seq_len(n * J), rep.int(1, n * J), n, n * J))
  
  rhs <- rep(c(0, 1, 0), n * J)
  constraints(op) <- rbind(C_constraint(C, cones, rhs),
                           L_constraint(CL, 
                                        dir = rep("<=", nrow(CL)), 
                                        rhs =  rep(1, nrow(CL))))
  
  bounds(op) <- V_bound(ld = -Inf, nobj = ncol(C))
  ROI_solve(op, solver = solver, ...)
}

Examples

Heating data

We using the Heating data set from the mlogit package as an illustration:s

library("mlogit")
data("Heating", package = "mlogit")

We estimate the model using the function mlogit(), which uses the Newton-Raphson algorithm.

data("Heating", package = "mlogit")

H <- dfidx(Heating, choice = "depvar", varying = c(3:12))
coef(mlogit(depvar ~ 0 | rooms +  region | 0, data = H, 
            reflevel = "gc"))

##  (Intercept):ec  (Intercept):er  (Intercept):gr  (Intercept):hp        rooms:ec        rooms:er        rooms:gr        rooms:hp regionscostl:ec regionscostl:er regionscostl:gr regionscostl:hp regionmountn:ec regionmountn:er regionmountn:gr regionmountn:hp regionncostl:ec regionncostl:er regionncostl:gr regionncostl:hp 
##    -2.397389558    -1.959492165    -1.329071339    -2.277360440     0.064488335     0.039762875    -0.010950178     0.020221356    -0.076876160    -0.008165969     0.040204869    -0.216228239     0.119548090     0.108706856     0.131126030     0.059236047    -0.225780841    -0.551739531    -0.553304337    -0.639282368

Now using ROI.

library(ROI)
library(ROI.plugin.ecos)
library(slam)
y <- Heating$depvar
X <- model.matrix(~ rooms +  region, data = Heating)
res <- mlogit_roi(X, y)
s2 <- solution(res)[1:20]
names(s2) <- apply(expand.grid(levels(y)[-1], colnames(X)), 1,
                   function(x) paste0(x[2], ":", x[1]))
s2

##  (Intercept):gr  (Intercept):ec  (Intercept):er  (Intercept):hp        rooms:gr        rooms:ec        rooms:er        rooms:hp regionscostl:gr regionscostl:ec regionscostl:er regionscostl:hp regionmountn:gr regionmountn:ec regionmountn:er regionmountn:hp regionncostl:gr regionncostl:ec regionncostl:er regionncostl:hp 
##    -1.329071700    -2.397389592    -1.959492204    -2.277360469    -0.010950099     0.064488342     0.039762884     0.020221361     0.040204860    -0.076876160    -0.008165968    -0.216228243     0.131126046     0.119548092     0.108706857     0.059236047    -0.553308096    -0.225780854    -0.551739549    -0.639282359

Fishing data

Using the Fishing data set in mlogit, we estimate the multinomial model introduced in the first section using the function mlogit() from the mlogit package

data("Fishing", package = "mlogit")
Fish <- dfidx(Fishing, varying = 2:9, shape = "wide", choice = "mode")
coef(mlogit(mode ~ 0 | income, data = Fish))

##    (Intercept):boat (Intercept):charter    (Intercept):pier         income:boat      income:charter         income:pier 
##        7.389208e-01        1.341291e+00        8.141503e-01        9.190636e-05       -3.163988e-05       -1.434029e-04

In ROI:

y <- Fishing$mode
X <- model.matrix(~ income, data = Fishing)
res2 <- mlogit_roi(X, y)
nam <- apply(expand.grid(levels(y)[-1], colnames(X)), 1,
             function(x) paste0(x[2], ":", x[1]))

s1 <- solution(res2)[1:6]
names(s1) <- nam 
s1

##    (Intercept):pier    (Intercept):boat (Intercept):charter         income:pier         income:boat      income:charter 
##        8.141508e-01        7.389215e-01        1.341292e+00       -1.434031e-04        9.190626e-05       -3.163994e-05

Extensions

Constraints on the coefficients

Often more parsimonious models should be employed where constraints on the \(\boldsymbol\beta\)’s are desired. An example of such a model is: \[\begin{align*} \eta_{ij} &= \beta_{0j} + \boldsymbol x_i^\top \boldsymbol\beta\\ P(y_i = j | \boldsymbol x_i) &= \frac{\exp(\eta_{ij})}{\sum_{l=1}^J\exp(\eta_{il})} \end{align*}\] We can introduce the VGAM type constraints where for each covariate a full-rank matrix of constraints \(H_p\) is specified, which in the most general case are all equal to the identity matrix. The rows of each matrix correspond to the category \(j=1\ldots J\) and each column stands for a parameter to be estimated. Combining these \(H_1, \ldots, H_P\) matrices into a block diagonal matrix gives rise to the \(H_\beta\) matrix of constraints.

Estimation

We interact each column of the covariate matrix \(X\) with the \(n\times J\) design matrix \(\tilde Y\) and obtain the model matrix: \[\begin{align*} \tilde X&= \left(\mathrm{diag}(X\cdot \boldsymbol e_1){\tilde{Y}}|\ldots|\mathrm{diag}(X\cdot\boldsymbol e_{P}){\tilde{Y}} \right)\\ \end{align*}\] where \(\boldsymbol e_p\) for \(p=1,\ldots P\) is the orthonormal basis. The total number of coefficients \(P^*\) is equal to the number of columns of \(H\): \(P^*=\mathrm{ncol}(H)\). Let \(\tilde H^{(j)}_\beta\) be the \((P \times P^*)\) matrix of constraints corresponding to the \(j\)-th category. This is obtained by taking the rows in \(H_p\) that correspond to the \(j\)-th category.

For example, the matrix \(H_\text{(Intercept)}\) for the model above is (assuming the first category is the baseline): \[ \begin{pmatrix} & \beta_{02} & \beta_{03}&\ldots & \beta_{0J}\\ j = 1 & 0 &0 & \ldots&0 \\ j = 2 & 1 & 0&\ldots& 0\\ j = 3 & 0 & 1&\ldots& 0\\ \vdots & \vdots & \ddots &\ldots& \vdots\\ j = J & 0&0&\ldots& 1\\ \end{pmatrix}. \] Note that there is no column corresponding to \(\beta_{01}\), as for identifiability one of the \(\beta_{0\cdot}\) parameters should be set to zero. The matrix \(H_\text{(X1)}\) for the first covariate would be: \[ \begin{pmatrix} & \beta_{\text{X1}}\\ j = 1 & 1 \\ j = 2 & 1 \\ j = 3 & 1\\ \vdots & \vdots\\ j = J & 1\\ \end{pmatrix}. \] Let \(\boldsymbol{\tilde \beta}\) be the vector of coefficients to be estimated (in the example above \(\boldsymbol{\tilde \beta}=(\beta_{02}, \beta_{03}, \ldots, \beta_{0J}, \beta_{\text{X1}}, \ldots)^\top\)).

The problem including constraints is: \[\begin{align*} \min_{\substack{\boldsymbol\beta_l,\\ l=1,\ldots,J}}\quad &\sum_{i=1}^n \left(\sum_{j=1}^J \tilde y_{ij}\boldsymbol {\tilde x}^\top_i H_\beta \boldsymbol{\tilde\beta}\right) + \sum_{i=1}^n t_i\\ \text{s.t.}\quad & u^{1}_i + \ldots + u^{J}_i \leq 1, \quad \forall i=1,\ldots, n\\ &(x_i^\top\tilde{H}^{(j)}_\beta\tilde{\boldsymbol\beta}- t_i, 1, u^j_i)^\top\in\mathcal{K}_\text{expp}, \quad \forall j = 1,\ldots,J. \end{align*}\]

mlogit_hbeta_roi <- function(X, y, Hbeta = NULL, 
                             solver = "auto", ...) {
  stm <- simple_triplet_matrix
  stzm <- simple_triplet_zero_matrix
  y <- as.numeric(y)
  stopifnot(is.vector(y), length(y) == nrow(X))
  n <- nrow(X); p <- ncol(X); J <- max(y); 
  if (is.null(Hbeta)) Hbeta <- diag(p * J) 
  if (is.list(Hbeta)) Hbeta <- Matrix::bdiag(Hbeta)
  if (!is.matrix(Hbeta)) Hbeta <- as.matrix(Hbeta) 
  ptilde <- ncol(Hbeta)

  ymat <- model.matrix(~ -1 + as.factor(y))
  xtilde <- model.matrix(~ 0 + ymat : X)    


  H <- lapply(seq_len(J), function(j) {
   Hbeta[c((seq_len(p) - 1) * J + j), ]
  })
  i <- 3 * seq_len(n) - 2 ## triplets for cones

  op <- OP(c(- drop(colSums(xtilde %*% Hbeta)), rep.int(1, n),
             double(n * J)), 
    maximum = FALSE)
  Ct <- stm(i, seq_len(n), rep.int(1, n), 3 * n, n)  
  Cu <- stm(i + 2, seq_len(n), rep.int(-1, n), 3 * n, n)
  Clist <- lapply(seq_len(J), function(j) {
    Cx <- stm(rep(i, ptilde), rep(seq_len(ptilde), each = n),
              -drop(X %*% H[[j]]), 3 * n, ptilde)
    
    CC <- cbind(Cx, Ct, stzm(3 * n, n * (j - 1)), Cu, 
                stzm(3 * n, n * (J - j)))
  })
  
  C <- do.call("rbind", Clist)
  cones <- K_expp(J * n)
  rhs <- rep(c(0, 1, 0), n * J)

  CL <- cbind(stzm(n, ptilde + n), 
              stm(rep(seq_len(n), J), seq_len(n * J),
                  rep.int(1, n * J), n, n * J))

  constraints(op) <- rbind(C_constraint(C, cones, rhs),
                           L_constraint(CL, 
                                        dir = rep("<=", nrow(CL)), 
                                        rhs =  rep(1, nrow(CL))))
  
  bounds(op) <- V_bound(ld = -Inf, nobj = ncol(C))
  ROI_solve(op, solver = solver, ...)
}

Example

For comparison purposes we use the VGAM package to estimate a multinomial logistic model with constraints. The data set Fishing is used for illustration. We estimate the model with different intercepts for each category where \(\beta_{04}=0\) with one common \(\boldsymbol\beta=\boldsymbol\beta_1=\boldsymbol\beta_2=\boldsymbol\beta_3\) and \(\boldsymbol\beta_4=0\).

library(VGAM)
pneumo <- transform(pneumo, let = log(exposure.time))
coef(vglm(mode ~ income, multinomial, 
          data = Fishing, 
          constraints = list("(Intercept)" = diag(3),
                             "income" = cbind(c(1, 1, 1)))))

## (Intercept):1 (Intercept):2 (Intercept):3        income 
## -1.459912e+00 -1.175968e+00 -3.222706e-01  6.023268e-05

Now using ROI.

y <- Fishing$mode
X <- model.matrix(~ income, data = Fishing)
J <- max(as.numeric(y))
Hbeta <- list(rbind(diag(J - 1), 0),  
              c(rep(1L, J - 1), 0))
Hbeta

## [[1]]
##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    0    1    0
## [3,]    0    0    1
## [4,]    0    0    0
## 
## [[2]]
## [1] 1 1 1 0

res <- mlogit_hbeta_roi(X, y, Hbeta = Hbeta)
s1 <- solution(res)[1:4]
s1

## [1] -1.459912e+00 -1.175968e+00 -3.222704e-01  6.023264e-05

Individual and alternative specific covariates

We illustrate how a multinomial logistic model with individual and alternative-specific covariates (such as the ones introduced in mlogit) can be estimated using ROI. Consider the following model \(j\in \{1,\ldots,J\}\).

\[\begin{align*} \eta_{ij} &= \beta_{0j} + \boldsymbol x_i^\top \boldsymbol\beta_j + \boldsymbol z_{ij}^\top \boldsymbol\gamma_j\\ P(y_i = j | \boldsymbol x_i, \boldsymbol z_{ij}) &= \frac{\exp(\eta_{ij})}{\sum_{l=1}^J\exp(\eta_{il})} \end{align*}\]

Estimation

For identifiability, one of the intercepts and one of the \(\beta\)’s should be fixed to zero. The parameters of the alternative specific covariates can all be estimated.

The problem is: \[\begin{align*} \min_{\substack{\boldsymbol\beta_l,\\ l=1,\ldots,J}}\quad &\sum_{i=1}^n \left(\sum_{j=2}^J \tilde y_{ij}\boldsymbol {x}^\top_i \boldsymbol{\beta}_j + \sum_{j=1}^J \tilde y_{ij} \boldsymbol {z}_{ij}^\top \boldsymbol{\gamma}_j\right) + \sum_{i=1}^n t_i\\ \text{s.t.}\quad & u^{1}_i + \ldots + u^{J}_i \leq 1, \quad \forall i=1,\ldots, n\\ &(\boldsymbol {z}_{ij}^\top \boldsymbol{\gamma}_j - t_i, 1, u^1_i)^\top\in\mathcal{K}_\text{expp}, \\ &(x_i^\top\boldsymbol \beta_j + \boldsymbol {z}_{ij}^\top \boldsymbol{\gamma}_j - t_i, 1, u^j_i)^\top\in\mathcal{K}_\text{expp}, \quad \forall j = 2,\ldots,J. \end{align*}\].

We also include constraints on both the \(\boldsymbol \beta\) and \(\boldsymbol \gamma\) coefficients, similar to the setup introduced in the previous section: \[\begin{align*} \min_{\substack{\boldsymbol\beta_l,\\ l=1,\ldots,J}}\quad &\sum_{i=1}^n \left(\sum_{j=1}^J \tilde y_{ij}\boldsymbol {\tilde x}^\top_i H_\beta \boldsymbol{\tilde\beta}+ \sum_{j=1}^J \tilde y_{ij}\boldsymbol{z}^\top_{ij} H_\gamma \boldsymbol{\tilde\gamma}\right) + \sum_{i=1}^n t_i\\ \text{s.t.}\quad & u^{1}_i + \ldots + u^{J}_i \leq 1, \quad \forall i=1,\ldots, n\\ &(x_i^\top\tilde{H}^{(j)}_\beta\tilde{\boldsymbol\beta} + z_{ij}^\top\tilde{H}^{(j)}_\gamma\tilde{\boldsymbol\gamma} - t_i, 1, u^j_i)^\top\in\mathcal{K}_\text{expp}, \quad \forall j = 1,\ldots,J. \end{align*}\]

mlogit_roi_xz <- function(X, Z, y, Hbeta = NULL, Hgamma = NULL, 
                          solver = "auto", ...) {
  stm <- simple_triplet_matrix
  stzm <- simple_triplet_zero_matrix
  lev <- levels(as.factor(y))
  y <- as.numeric(y)
  Z <- as.matrix(Z)
  stopifnot(is.vector(y), length(y) == nrow(X))
  varz <- unique(gsub("\\..*", "", colnames(Z)))
  px <- ncol(X); pz <- length(varz)
  n <- nrow(X); p <- px + pz; J <- max(y); 
  if (is.null(Hbeta))  Hbeta <- diag(px * J) 
  if (is.null(Hgamma)) Hgamma <- diag(pz * J) 
  if (is.list(Hbeta))  Hbeta <- Matrix::bdiag(Hbeta)
  if (is.list(Hgamma))  Hgamma <- Matrix::bdiag(Hgamma)
  if (!is.matrix(Hbeta)) Hbeta <- as.matrix(Hbeta) 
  if (!is.matrix(Hgamma)) Hgamma <- as.matrix(Hgamma) 
  pxtilde <- ncol(Hbeta); pztilde <- ncol(Hgamma)
  ptilde  <- pxtilde + pztilde
  Hx <- lapply(seq_len(J), function(j) {
   Hbeta[c((seq_len(px) - 1) * J + j), ]
  })
  Hz <- lapply(seq_len(J), function(j) {
   Hgamma[c((seq_len(pz) - 1) * J + j), ]
  })
  
  ymat <- model.matrix(~ -1 + as.factor(y))
  colnames(ymat) <- lev
  xtilde <- model.matrix(~ 0 + ymat : X)  

  yZ <- c(sapply(varz, function(x) 
    colSums(ymat * Z[, grep(x, colnames(Z))])))
  
  i <- 3 * seq_len(n) - 2 ## triplets for cones
  
  op <- OP(c(- drop(colSums(xtilde %*% Hbeta)), - drop(yZ %*% Hgamma), 
             rep.int(1, n), double(n * J)), 
           maximum = FALSE)
  Ct <- stm(i, seq_len(n), rep.int(1, n), 3 * n, n)  
  Cu <- stm(i + 2, seq_len(n), rep.int(-1, n), 3 * n, n)
  Clist <- lapply(seq_len(J), function(j) {
    Cx <- stm(rep(i, pxtilde), rep(seq_len(pxtilde), each = n),
              -drop(X %*% Hx[[j]]), 3 * n, pxtilde)
    Cz <- stm(rep(i, pztilde), rep(seq_len(pztilde), each = n), 
              - drop(Z[, grepl(lev[j], colnames(Z))] %*% Hz[[j]]), 
              3 * n, pztilde)
    CC <- cbind(Cx, Cz, Ct, stzm(3 * n, n * (j - 1)), Cu, stzm(3 * n, n * (J - j)))
  })
  
  C <- do.call("rbind", Clist)
  cones <- K_expp(J * n)
  rhs <- rep(c(0, 1, 0), n * J)

  CL <- cbind(stzm(n, ptilde + n), 
              stm(rep(seq_len(n), J), seq_len(n * J),
                  rep.int(1, n * J), n, n * J))

  constraints(op) <- rbind(C_constraint(C, cones, rhs),
                           L_constraint(CL, 
                                        dir = rep("<=", nrow(CL)), 
                                        rhs =  rep(1, nrow(CL))))
  
  bounds(op) <- V_bound(ld = -Inf, nobj = ncol(C))
  ROI_solve(op, solver = solver, ...)
}

Examples

data("Fishing", package = "mlogit")
head(Fishing)

##      mode price.beach price.pier price.boat price.charter catch.beach catch.pier catch.boat catch.charter   income
## 1 charter     157.930    157.930    157.930       182.930      0.0678     0.0503     0.2601        0.5391 7083.332
## 2 charter      15.114     15.114     10.534        34.534      0.1049     0.0451     0.1574        0.4671 1250.000
## 3    boat     161.874    161.874     24.334        59.334      0.5333     0.4522     0.2413        1.0266 3750.000
## 4    pier      15.134     15.134     55.930        84.930      0.0678     0.0789     0.1643        0.5391 2083.333
## 5    boat     106.930    106.930     41.514        71.014      0.0678     0.0503     0.1082        0.3240 4583.332
## 6 charter     192.474    192.474     28.934        63.934      0.5333     0.4522     0.1665        0.3975 4583.332

We estimate the following model for the Fishing data: \[ \eta_{ij} = \beta_{0j} + \text{income}_i \beta_j + \text{price}_{ij} \gamma_{\text{price},j} + \text{catch}_{ij} \gamma_{\text{catch},j}, \quad j\in\{\text{beach, pier, boat, charter}\} \] \[ P(\text{mode}_i = j |\cdot )= \frac{\exp(\eta_{ij})}{\sum_{l=1}^J\exp(\eta_{il})} \]

where we fix \(\beta_{0\text{beach}}=0\) and \(\beta_\text{beach} = 0\).

Fish <- dfidx(Fishing, varying = 2:9, shape = "wide", choice = "mode")
coef(mlogit(mode ~0|income|price+catch, data = Fish))

##    (Intercept):boat (Intercept):charter    (Intercept):pier         income:boat      income:charter         income:pier         price:beach          price:boat       price:charter          price:pier         catch:beach          catch:boat       catch:charter          catch:pier 
##        0.8640023382        1.8473698326        1.1318876044       -0.0001105399       -0.0002780873       -0.0001282887       -0.0379576275       -0.0208554401       -0.0160143807       -0.0392180091        4.9522607681        2.4704939055        0.7610421776        4.8834835714

y <- Fishing$mode
J <- nlevels(y)
X <- model.matrix(~ income, data = Fishing) 
Z <- Fishing[, grep("price|catch", colnames(Fishing))]
head(X)

##   (Intercept)   income
## 1           1 7083.332
## 2           1 1250.000
## 3           1 3750.000
## 4           1 2083.333
## 5           1 4583.332
## 6           1 4583.332

head(Z)

##   price.beach price.pier price.boat price.charter catch.beach catch.pier catch.boat catch.charter
## 1     157.930    157.930    157.930       182.930      0.0678     0.0503     0.2601        0.5391
## 2      15.114     15.114     10.534        34.534      0.1049     0.0451     0.1574        0.4671
## 3     161.874    161.874     24.334        59.334      0.5333     0.4522     0.2413        1.0266
## 4      15.134     15.134     55.930        84.930      0.0678     0.0789     0.1643        0.5391
## 5     106.930    106.930     41.514        71.014      0.0678     0.0503     0.1082        0.3240
## 6     192.474    192.474     28.934        63.934      0.5333     0.4522     0.1665        0.3975

Hbeta <- list(
  "(Intercept)" = rbind(0, diag(J - 1)),
  "income"      = rbind(0, diag(J - 1))
)
Hgamma <- list(
  "price"      = diag(J),
  "catch"      = diag(J)
)
res <- mlogit_roi_xz(X, Z, y, Hbeta, Hgamma)
s1 <- solution(res)[1:14]

names(s1) <-   c(apply(expand.grid(levels(y)[-1], colnames(X)),1, 
      function(x) paste0(x[2], ":", x[1])), 
  colnames(Z))
s1

##    (Intercept):pier    (Intercept):boat (Intercept):charter         income:pier         income:boat      income:charter         price.beach          price.pier          price.boat       price.charter         catch.beach          catch.pier          catch.boat       catch.charter 
##        1.1318876547        0.8640016741        1.8473696889       -0.0001282887       -0.0001105399       -0.0002780873       -0.0379576629       -0.0392180451       -0.0208554561       -0.0160143959        4.9522619460        4.8834848920        2.4704949136        0.7610424568

Let us have a look at the following modification: \[ \eta_{ij} = \beta_{0j} + \text{income}_i \beta_j + \text{price}_{ij} \gamma_{\text{price}} + \text{catch}_{ij} \gamma_{\text{catch},j}, \quad j\in\{\text{beach, pier, boat, charter}\} \]

where we fix \(\beta_{0\text{beach}}=0\) and \(\beta_\text{beach} = 0\).

coef(mlogit(mode ~ price | income | catch, data = Fish))

##    (Intercept):boat (Intercept):charter    (Intercept):pier               price         income:boat      income:charter         income:pier         catch:beach          catch:boat       catch:charter          catch:pier 
##        8.418450e-01        2.154866e+00        1.043026e+00       -2.528145e-02        5.542799e-05       -7.233725e-05       -1.355007e-04        3.117711e+00        2.542482e+00        7.594943e-01        2.851215e+00

Hbeta <- list(
  "(Intercept)" = rbind(0, diag(J - 1)),
  "income"      = rbind(0, diag(J - 1))
)
Hgamma <- list(
  "price"      = rep.int(1L, J),
  "catch"      = diag(J)
)
res <- mlogit_roi_xz(X, Z, y, Hbeta, Hgamma)
s2 <- solution(res)[1:11]
names(s2) <- c(apply(expand.grid(levels(y)[-1], colnames(X)),1, 
      function(x) paste0(x[2], ":", x[1])), 
  "price",  "catch.beach", "catch.pier", "catch.boat", "catch.charter")
s2

##    (Intercept):pier    (Intercept):boat (Intercept):charter         income:pier         income:boat      income:charter               price         catch.beach          catch.pier          catch.boat       catch.charter 
##        1.043026e+00        8.418447e-01        2.154866e+00       -1.355007e-04        5.542808e-05       -7.233716e-05       -2.528146e-02        3.117710e+00        2.851215e+00        2.542482e+00        7.594945e-01

References

Croissant, Y. (2020). mlogit: Multinomial Logit Models. R package version 1.1-0. https://CRAN.R-project.org/package=mlogit
Venables, W. N. & Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth Edition. Springer, New York. ISBN 0-387-95457-0
Yee, T. W. (2010). The VGAM Package for Categorical Data Analysis. Journal of Statistical Software, 32(10), 1-34. URL http://www.jstatsoft.org/v32/i10/.