HS.model <- "
visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
"
fit <- acfa(HS.model, HolzingerSwineford1939, meanstructure = TRUE,
verbose = FALSE)
(res <- loo(fit))
#> Taylor leave-one-subject-out cross-validation (INLAvaan)
#> Computed from 301 subjects (second-order Taylor approximation)
#>
#> Estimate SE
#> elpd_loo -3769.1 42.9
#> p_loo 32.4 2.2
#> looic 7538.2 85.9Introduction
How well would my model predict new data? Leave-one-out (LOO) cross-validation answers this by holding out one unit at a time, refitting the model to the remaining data, and scoring the held-out unit under the refitted posterior. The total score is the expected log predictive density, which rewards models that predict well and automatically penalises overfitting – making it a natural criterion for comparing models.
Computed naively, LOO needs refits. MCMC-based packages such as blavaan avoid this by importance-sampling over posterior draws (Vehtari et al. 2017), but this still requires the full set of MCMC draws. INLAvaan instead exploits its Laplace machinery: loo() approximates each case-deletion posterior by a Taylor expansion around the full-data posterior summary, so the entire LOO is computed from a single fit – no refitting and no sampling. On the Holzinger–Swineford example below this takes a fraction of a second.
Two unit types are scored, resolved automatically from the model:
- LOSO (leave-one-subject-out): single-level models, one unit per row.
- LOCO (leave-one-cluster-out): two-level models, one unit per cluster – the relevant predictive question is “how well would the model predict a new cluster?”.
How it works
Write for unit ’s log-likelihood contribution, with score and Hessian evaluated at the posterior summary from the fit. loo() reports first- and second-order Taylor approximations of the log conditional predictive ordinate: The headline elpd_loo is the sum of the second-order terms, with standard error , and looic on the familiar information-criterion scale. The effective number of parameters uses the analogous expansion of the full-posterior pointwise predictive density . The rare unit whose second-order curvature matrix is not positive definite falls back to first order (flagged in the output); a warning is raised if this happens for many units, since it suggests the Gaussian posterior summary itself is poor.
A first example
We fit the classic three-factor CFA to the Holzinger–Swineford data. Fitting with meanstructure = TRUE is recommended for LOO: otherwise unit log-likelihoods are evaluated at zero means and absolute ELPD values are biased (comparisons between models on the same data remain valid either way).
The pointwise contributions are available for inspection – useful for spotting influential observations (a large score_norm or unusually low log_cpo_2 flags a unit the model predicts poorly):
head(res$per_unit)
#> unit nobs l_star score_norm lpd_1 lpd_2 log_cpo_1 log_cpo_2
#> 1 1 1 -17.35080 6.660341 -17.23618 -17.28232 -17.46543 -17.51525
#> 2 2 1 -13.81769 5.472756 -13.73389 -13.80482 -13.90148 -13.97719
#> 3 3 1 -11.11048 3.763302 -11.07508 -11.16502 -11.14589 -11.23819
#> 4 4 1 -10.25806 2.576577 -10.24387 -10.29030 -10.27225 -10.31970
#> 5 5 1 -10.70333 2.968563 -10.68998 -10.73711 -10.71669 -10.76489
#> 6 6 1 -13.41261 5.032849 -13.34949 -13.43705 -13.47572 -13.56755
#> det_term ok
#> 1 -0.04818982 TRUE
#> 2 -0.07305219 TRUE
#> 3 -0.09161483 TRUE
#> 4 -0.04736834 TRUE
#> 5 -0.04815107 TRUE
#> 6 -0.08981841 TRUEComparing models
Because ELPD differences between models fitted to the same data are paired, their standard errors should be computed from the pointwise differences rather than the marginal SEs. compare(..., loo = TRUE) does this automatically:
one.factor <- "g =~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9"
fit1f <- acfa(one.factor, HolzingerSwineford1939, meanstructure = TRUE,
verbose = FALSE)
compare(fit, fit1f, loo = TRUE)
#> Bayesian Model Comparison (INLAvaan)
#> Models ordered by ELPD (Taylor LOO)
#> elpd_diff/se_diff are paired differences vs the best model
#>
#> Model npar Marg.Loglik logBF DIC pD ELPD SE p_loo
#> fit 30 -3885.211 0.000 7534.755 29.370 -3769.109 42.945 32.433
#> fit1f 27 -3990.563 -105.352 7758.007 27.401 -3878.134 46.800 27.516
#> elpd_diff se_diff
#> 0.000 0.000
#> -109.025 17.072Models are sorted by descending ELPD; elpd_diff and se_diff are relative to the best model. A common heuristic is that a difference smaller than a couple of se_diff units is not practically meaningful (Vehtari et al. 2017) – here the three-factor model is preferred by a wide margin.
Two-level models
For models fitted with a cluster argument, loo() automatically switches to per-cluster scoring (LOCO). Here the covariates are modelled jointly (fixed.x = FALSE); the next section explains what happens under lavaan’s default covariate treatment.
model2l <- "
level: 1
fw =~ y1 + y2 + y3
fw ~ x1 + x2 + x3
level: 2
fb =~ y1 + y2 + y3
fb ~ w1 + w2
"
fit2l <- asem(model2l, Demo.twolevel, cluster = "cluster",
meanstructure = TRUE, fixed.x = FALSE, verbose = FALSE)
loo(fit2l)
#> Taylor leave-one-cluster-out cross-validation (INLAvaan)
#> Computed from 200 clusters (second-order Taylor approximation)
#>
#> Estimate SE
#> elpd_loo -23344.2 731.4
#> p_loo 34.3 2.0
#> looic 46688.3 1462.9Exogenous covariates: joint and conditional scores
When a model contains exogenous covariates, the flavour of the LOO score follows the likelihood the model was fitted with. Under fixed.x = FALSE the covariates receive a saturated Gaussian block and each unit is scored by the joint predictive density of its outcomes and covariates (“how surprised is the model by a brand-new unit, characteristics included?”). Under fixed.x = TRUE – lavaan’s default – the fitted likelihood is the conditional one, and each unit is scored by the predictive density of its outcomes given its covariates (“given a new unit with known characteristics, how well do we predict its outcomes?”), the familiar regression cross-validation convention. No refitting trickery is involved: the conditional likelihood is exactly invariant to the fixed covariate moments, so the conditional score carries no additional approximation.
model_x <- "
visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
visual ~ ageyr + grade
textual ~ ageyr + grade
"
dat_x <- na.omit(
HolzingerSwineford1939[, c(paste0("x", 1:9), "ageyr", "grade")]
)
# lavaan's default fixed.x = TRUE: scored conditionally on the covariates
fit_cond <- asem(model_x, dat_x, meanstructure = TRUE, verbose = FALSE)
loo(fit_cond)
#> Taylor leave-one-subject-out cross-validation (INLAvaan)
#> Computed from 300 subjects (second-order Taylor approximation)
#> Scored conditionally on the exogenous covariates (fixed.x fit)
#>
#> Estimate SE
#> elpd_loo -3748.1 44.7
#> p_loo 45.1 2.7
#> looic 7496.2 89.5The two flavours estimate different quantities whose scales differ by the covariate predictive density, so a joint and a conditional elpd must never appear in the same comparison – compare(..., loo = TRUE) refuses mixed-flavour comparisons outright. Within the conditional flavour, however, models conditioning on different covariate sets are directly comparable as long as the outcome variables match, which makes covariate selection straightforward:
model_x1 <- "
visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
visual ~ ageyr
"
fit_cond1 <- asem(model_x1, dat_x, meanstructure = TRUE, verbose = FALSE)
compare(fit_cond, fit_cond1, loo = TRUE)
#> Bayesian Model Comparison (INLAvaan)
#> Models ordered by ELPD (Taylor LOO)
#> elpd_diff/se_diff are paired differences vs the best model
#>
#> Model npar Marg.Loglik logBF DIC pD ELPD SE p_loo
#> fit_cond 32 -3875.892 0.000 7540.322 60.511 -3748.090 44.737 45.076
#> fit_cond1 29 -3903.093 -27.201 7568.875 30.392 -3787.678 43.881 38.271
#> elpd_diff se_diff
#> 0.000 0.000
#> -39.587 10.261(Under the joint flavour the same comparison would require retaining grade in both models, since joint scores of models spanning different variable sets live on different sample spaces.) Both flavours support any covariate placement, including cluster-level and within-level covariates in two-level models.
Storing the result with the fit
loo(fit) never modifies the fitted object – but under the default test = "standard", the fit itself already computes and stores both the full LOO and the WAIC whenever the model is supported, has a mean structure, and (for LOO) the predicted serial cost is within a 10-second budget. The prediction is calibrated at run time by timing a single score evaluation, so on typical single-level models – where the full LOO costs a fraction of the fit itself – you simply get loo(fit) and waic(fit) for free. The WAIC reuses the very draws the fit produced for its posterior summaries, so it costs only one casewise pass.
For expensive cases you can force the LOO regardless of the budget at fit time,
or store it afterwards with an explicit reassignment:
fit <- add_loo(fit)A stored result is returned instantly by loo(fit) and reused by fitMeasures() (where the blavaan-style names appear) and compare(..., loo = TRUE):
fitMeasures(fit, c("elpd_loo", "se_loo", "p_loo", "looic"))
#> elpd_loo p_loo looic se_loo
#> -3769.109 32.433 7538.217 85.890WAIC
The widely applicable information criterion (Watanabe 2010) is asymptotically equivalent to LOO and is also available. Unlike loo(), waic() is sampling-based: it evaluates unit log-likelihoods over posterior draws, so results carry Monte Carlo error, and units with trigger a reliability warning – in which case prefer loo() (Vehtari et al. 2017).
Scoring submodels without refitting
The theta and Sigma arguments evaluate the LOO at an arbitrary Gaussian posterior summary instead of the fit’s own. Combined with Gaussian conditioning, this scores a constrained submodel from the encompassing fit alone. For example, to score the submodel with the visual ~~ speed covariance fixed to zero, condition the summary on that parameter and re-evaluate:
int <- get_inlavaan_internal(fit)
theta <- int$theta_star
Sigma <- int$Sigma_theta
p <- which(names(coef(fit)) == "visual~~speed")
theta_c <- theta - Sigma[, p] * (theta[p] / Sigma[p, p])
Sigma_c <- Sigma - tcrossprod(Sigma[, p]) / Sigma[p, p]
loo(fit, theta = theta_c, Sigma = Sigma_c)
#> Taylor leave-one-subject-out cross-validation (INLAvaan)
#> Computed from 301 subjects (second-order Taylor approximation)
#>
#> Estimate SE
#> elpd_loo -3785.9 45.0
#> p_loo 34.0 2.5
#> looic 7571.9 90.1
#>
#> Evaluated at a user-supplied (theta, Sigma) summary.No refit took place: the conditioned summary has the covariance locked at zero (its row and column of Sigma_c vanish), and the LOO machinery automatically restricts to the remaining parameters. This pair of arguments is the building block for custom model-search strategies – screen many candidate restrictions by conditioning, score each with loo(), and only refit the winners. INLAvaan deliberately provides just this evaluation API; the search logic is yours to design.
Practical notes
-
Supported models. Continuous-indicator models fitted with the
MLestimator, single-group or multigroup – groups are independent, so each unit is scored against its own group’s moments, with agroupcolumn in the pointwise table (see the multigroup article for the measurement-invariance workflow). Fits with missing data are scored under full-information maximum likelihood (missing = "ml"), single- or two-level; see the missing-data article. Ordinal (PML) and multigroup two-level models are not supported yet. Models with exogenous covariates are scored jointly (fixed.x = FALSE) or conditionally (fixed.x = TRUE), following the fitted likelihood, for any covariate placement. -
Missing data. Under FIML each unit is scored on the entries it actually has – the observed-data predictive, with the full row (single-level) or whole cluster (two-level LOCO) deleted from the conditioning set – carrying the same missing-at-random assumption as the fit. A single-level unit with fewer observed entries self-weights, contributing a smaller score; a two-level cluster contributes its observed-data marginal likelihood. Two missing-data fits are comparable with
compare()only when they share the same observed entries (the same data and the same holes). The two-level conditional predictive (type = "loso") is available under missing data too. -
Parallelism is opt-in. The default runs serially; pass
loo(fit, cores = 2)to parallelise the Hessian stage via forking (not available on Windows). -
First order vs second order. The second-order correction matters: in our validation it tracks brute-force refits to within about one ELPD unit, while the first-order score can be off by tens of units. Use
second_order = FALSEonly for quick screening. -
Marginal vs conditional predictive on two-level models. The default
type = "loco"is the marginal predictive (leave-one-cluster-out: prediction for a new cluster).loo(fit2l, type = "loso")– andwaic(fit2l, type = "loso")– instead score the conditional predictive (leave-one-unit-out: a new observation within an observed cluster, each contribution the conditional density of a row given the rest of its cluster). These answer different questions and are easily conflated (Merkle et al. 2019), so the marginal is the default and the conditional warns. It is available with and without missing data and is expensive for large datasets – subset withunits.
