predict(mod, se.fit = TRUE)
marginaleffects
Two Problems
R
outputs are inconsistentThis simple command:
Produces different objects:
glm()
fit
, se.fit
, residual.scale
MASS::polr()
Logistic regression model:
An ultra-simple model, yet suprisingly hard to interpret.
(1) | |
---|---|
(Intercept) | -1.490 |
(0.327) | |
Num | 0.189 |
(0.330) | |
CatB | 0.840 |
(0.424) | |
CatC | 3.142 |
(0.455) | |
Num × CatB | -0.339 |
(0.451) | |
Num × CatC | -0.402 |
(0.430) |
\[\ln \left[ \left (\frac{p_{1}}{1-p_1}\right ) / \left (\frac{p_0}{1-p_0} \right ) \right]\]
There are logit-specific tricks, but what about interactions, splines, multinomial logit, or XGBoost?
¯\_(ツ)_/¯
(1) | |
---|---|
(Intercept) | -1.490 |
(0.327) | |
Num | 0.189 |
(0.330) | |
CatB | 0.840 |
(0.424) | |
CatC | 3.142 |
(0.455) | |
Num × CatB | -0.339 |
(0.451) | |
Num × CatC | -0.402 |
(0.430) |
One Solution
“A parameter is just a resting stone on the road to prediction.”
-Philip Dawid (via Stephen Senn)
In other words:
“Parameter estimates are usually very difficult (or impossible) to interpret as-is. We must transform them into quantities that stakeholders will understand and care about.” -Vincent (citing myself)
marginaleffects
📦tidymodels
, mlr3
Free online book with 30+ chapters and case studies:
marginaleffects
📦Hypothesis tests to compare all those quantities.
predictions() |
comparisons() |
slopes() |
---|---|---|
avg_predictions() |
avg_comparisons() |
avg_slopes() |
plot_predictions() |
plot_comparisons() |
plot_slopes() |
Demo
hypotheses(hypothesis = )
Coefficients:
Null hypothesis: \(\beta_3=\beta_4\)
hypothesis
argumentHypothesis tests on:
Scientific questions:
predictions()
Base R
:
[1] 0.1839088
Same syntax, but richer results:
predictions()
Fitted values for every row in the original data:
Estimate Pr(>|z|) S 2.5 % 97.5 %
0.197 <0.001 15.4 0.113 0.320
0.316 0.0417 4.6 0.180 0.493
0.185 <0.001 17.6 0.108 0.300
0.213 0.0016 9.3 0.107 0.379
0.852 <0.001 20.9 0.744 0.920
--- 190 rows omitted. See ?avg_predictions and ?print.marginaleffects ---
0.351 0.0258 5.3 0.240 0.481
0.829 <0.001 19.6 0.719 0.902
0.351 0.0248 5.3 0.240 0.481
0.846 <0.001 22.3 0.743 0.913
0.314 0.0460 4.4 0.176 0.497
avg_predictions()
Average of the fitted values:
Average by subgroup:
avg_predictions(hypothesis = )
Cat Estimate Pr(>|z|) S 2.5 % 97.5 %
A 0.186 <0.001 17.6 0.108 0.301
B 0.344 0.0166 5.9 0.236 0.471
C 0.843 <0.001 22.6 0.741 0.909
plot_predictions()
Counterfactual comparisons
Contrast, risk difference, risk ratio, odds, lift, etc.
comparisons()
Functions of two predictions. “All else equal” model-based comparisons:
\[\hat{Y}_{X=1} - \hat{Y}_{X=0}\]
\[\hat{Y}_{X=x + 1} - \hat{Y}_{X=x}\] \[\hat{Y}_{X=x + \sigma_X} - \hat{Y}_{X=x}\]
\[\frac{\hat{Y}_{X=1}}{\hat{Y}_{X=0}}\] \[\frac{\hat{Y}_{X=1} - \hat{Y}_{X=0}}{\hat{Y}_{X=0}}\]
comparisons()
One estimate per row:
Average risk difference for changes in each predictor:
Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
Cat mean(B) - mean(A) 0.1600 0.0776 2.06 0.0392 4.7 0.00795 0.3120
Cat mean(C) - mean(A) 0.6531 0.0645 10.13 <0.001 77.7 0.52667 0.7795
Num mean(+1) -0.0118 0.0309 -0.38 0.7039 0.5 -0.07235 0.0488
MOAR!!!
tidymodels
+ mlr3
Things I want to 🔌
tinytable
modelsummary