Interaction term in a non-linear model

In a non-linear model (for example, logit or poisson model), the interpretation of the coefficient on the interaction term is tricky. Ai and Norton (2003) points out that the interaction term coefficient is not the same as people can interpret as in a linear model; that is, how much effect of \(x1\) changes with the value of \(x2\). They interpret this as a cross

If we have a linear model with interaction:

\[ E(y) = \beta_1 x_1 + \beta_2 x_2 + \beta_{12} x_1*x_2 \]

Then, the marginal effect

\[ \frac{\partial^2 E(y)}{\partial x_1 \partial x_2} = \beta_{12} \]

That is, \(\beta_{12}\) is the second derivative of \(E(y)\) on \(x_1\) and \(x_2\). The marginal effect of \(x_1\)

In a non-linear model,

\[ F(E(y)) = \beta_1 x_1 + \beta_2 x_2 + \beta_{12} x_1*x_2 \]

\[ \frac{\partial^2 F(E(y))}{\partial x_1 \partial x_2} = \beta_{12} \]

Here, the partial derivative of \(F(E(y))\) on \(x_1\) and \(x_2\) is still \(\beta_{12}\). However, most people are interested in \(\frac{\partial^2 E(y)}{\partial x_1 \partial x_2}\).

\[ \frac{\partial^2 E(y)}{\partial x_1 \partial x_2} = \beta_{12} G'() + (\beta_{1} + \beta_{12} x_2)(\beta_2 + \beta_{12} x_1) G''()\]

where \(G()\) is the inverse function of \(F()\).

It is true that in a non-linear model with interaction, the marginal effect of \(x_1\) differs with different values of \(x_2\). However, even if we have a non-linear model without interaction, the marginal effect of \(x_1\) is still different with different values of \(x_2\). To see this,

\[ F(E(y)) = \beta_1 x_1 + \beta_2 x_2 \]

\[ \frac{\partial^2 E(y)}{\partial x_1 \partial x_2} = (\beta_{1} \beta_2 ) G''()\]

Therefore, when we set up our model,

\[ F(E(y)) = \beta_1 x_1 + \beta_2 x_2 + \beta_{12} x_1*x_2 \]

we have in mind that we allow interaction of \(x_1\) and \(x_1\) to interact for the effect on \(F(E(y))\); not on \(E(y)\).

We agree with Bill Greene, 2013. In a nonlinear model, the partial effects (as Greene calls it) is nonlinear, regardless of the model. For example, in a logit model, even if you don’t have an interaction term in your model, the effect of \(x_1\) will still be different for every value of \(x_2\), simply because it’s a nonlinear model.

As Greene put it at the summary section, “Build the model based on appropriate statistical procedures and principles. Statistical testing about the model specification is done at this step Hypothesis tests are about model coefficients and about the structural aspects of the model specifications. Partial effects are neither coefficients nor elements of the specification of the model. They are implications of the specified and estimated model.”

We also agree with Maarten Buis 2010, that we should use multiplicative effect in a non-linear model. That is, in a non-linear model,

\[ F(E(y)) = \beta_1 x_1 + \beta_2 x_2 + \beta_{12} x_1*x_2 \]

We should pay more attention to

\[ \frac{\partial^2 F(E(y))}{\partial x_1 \partial x_2} = \beta_{12} \]

For example, in a logit model,

\[ log(P(y=1)/(1-P(y=1))) = \beta_1 x_1 + \beta_2 x_2 + \beta_{12} x_1*x_2 \]

That is, the log of odds is a linear function of \(x_1\) and \(x_2\) and interaction. The interaction effect has the same interpretation as the linear model, in terms of log of odds.

Or, it becomes multiplicative effect when we talk about odds ratios. Stata’s “margins” command is a great tool to calculate marginal effects in various situations, as shown in Maarten Buis 2010.

Senior Statistician / Data Scientist

I enjoy statistics and programming.