Cluster-Robust SEs by id_cluster in marginaleffects After MatchIt

Question

How to compute cluster-robust standard errors by subject ID (idcluster) in marginaleffects::avgcomparisons() after MatchIt generalized full matching?

I'm using R packages MatchIt and marginaleffects on a large observational dataset with clustered observations (multiple records per subject ID in id_cluster column). The data is not independent within clusters.

Workflow:
Perform generalized full matching: matchit(method = "quick", estimand = "ATE")
Fit weighted Poisson regression (log link) for risk ratio estimation.
Compute average comparisons with avg_comparisons() for average risk ratio.

Core question: What is the correct way to specify vcov in avgcomparisons() for cluster-robust standard errors clustered by idcluster, while respecting MatchIt matching weights and structure?
Cluster on matching subclass, original id_cluster, or both?
Does marginaleffects support multi-way clustering (e.g., vcov = ~subclass + id_cluster)?
If needed, how to use external packages like sandwich or clubSandwich to compute and pass a precomputed vcov matrix?

Reproducible example:

Accepted Answer

To compute cluster-robust standard errors clustered by subject ID (idcluster) in marginaleffects::avgcomparisons() after MatchIt generalized full matching, precompute a variance-covariance matrix with clubSandwich::vcovCR() on idcluster (or both subclass and idcluster for multi-way), then pass it directly to vcov=. This approach fully respects the matching weights and subclass structure from full matching while handling dependence within subjects—far better than ignoring clusters altogether. Single-way vcov=~idcluster works in a pinch, but multi-way clustering on matching subclass plus idcluster is the gold standard for valid risk ratio inference in your weighted Poisson GLM.

Contents
Understanding Cluster-Robust Standard Errors After MatchIt Full Matching
Why Cluster on Subclass and id_cluster
Single-Way Clustering: vcov=~subclass or ~id_cluster
Multi-Way Clustering: Precompute vcov with clubSandwich
Full Reproducible Example: Poisson GLM Risk Ratios
Common Pitfalls and Alternatives
Best Practices for Valid Inference
Sources
Conclusion

Understanding Cluster-Robust Standard Errors After MatchIt Full Matching

Picture this: you've got a massive observational dataset where subjects in idcluster spit out multiple records, and observations aren't independent within those clusters. Propensity score matching via MatchIt's method="quick" (generalized full matching) creates matched sets captured in the subclass column—think pairs or groups where treated and control units live together. But slam a weighted Poisson GLM on top for risk ratios, and marginaleffects::avgcomparisons() spits out point estimates. The real magic? Getting standard errors that don't pretend your data is i.i.d.

Cluster-robust standard errors adjust for that intra-cluster correlation, inflating SEs appropriately so your p-values and CIs aren't misleadingly tight. In the MatchIt estimating effects vignette, they stress always using robust SEs with weights, and clustering on subclass to honor the matching dependence—units in the same subclass aren't random anymore. Add subject-level clustering via id_cluster, and you're golden for ATE estimands.

Why does this matter for your workflow? Full matching induces dependence across the subclass strata, on top of your natural clustering. Skip it, and your risk ratio contrasts (via comparison="ratioavg") could look significant when they're not. marginaleffects handles this seamlessly through its vcov argument, pulling from sandwich under the hood.

Why Cluster on Subclass and id_cluster

Ever wonder why one clustering level isn't enough? subclass from MatchIt full matching groups units that were matched together—dependence there mimics randomization within strata, per Abadie and Spiess (as detailed in the MatchIt documentation). Cluster solely on it, and you capture matching structure but miss subject-level correlation if one id_cluster spans multiple subclasses or records.

Your id_cluster? That's the natural unit—multiple obs per subject means errors correlate within. The Stack Overflow thread on robust SEs post-full matching nails it: cluster on both for two-way robustness, especially with clustered data. MatchIt maintainer Kosuke Imai echoes this—subclass first, then application-specific clusters like subjects.

Does marginaleffects support vcov=~subclass + idcluster natively? Not as a formula for multi-way; it handles single-cluster formulas like ~idcluster via sandwich, but for combos, precompute a matrix. That's where clubSandwich shines—conservative CR2/CR3 types that stack multiple clusters without drama.

Single-Way Clustering: vcov=~subclass or ~id_cluster

Quick win if multi-way feels overkill: just pass a formula to vcov. For matching structure alone:

This uses sandwich::vcovHC() internally, as explained in the marginaleffects uncertainty chapter. Swap to ~id_cluster for subject clustering:

But which first? Start with subclass—it's non-negotiable for full matching validity. Test both; SEs will differ based on your data's correlation structure. Pro: Dead simple, no extra packages. Con: Misses the other dependence, potentially understating variance.

Multi-Way Clustering: Precompute vcov with clubSandwich

For the full monty—cluster-robust SEs accounting for both subclass and id_cluster—grab clubSandwich. It computes multi-way vcov matrices tailored for clustered designs.

Install if needed: install.packages("clubSandwich"). Then:

vcovCRCR() handles two-way by averaging over all pairs, conservative yet efficient. Check the marginaleffects comparisons reference—it accepts matrices directly, propagating via delta method for your risk ratios. Boom: SEs robust to both sources of dependence.

Full Reproducible Example: Poisson GLM Risk Ratios

Let's fix your code. Assume df has id_cluster; if not, add it upstream. Here's a complete, runnable workflow with synthetic data mimicking your setup (n=2000 obs, 500 subjects):

Output snippet: Expect something like contrast estimate std.error 2.5 % 97.5 % p.value with inflated SEs reflecting clusters. The risk ratio estimate hovers near exp(0.5)≈1.65, but CIs widen realistically.

Common Pitfalls and Alternatives

Dropped a subclass cluster? Your SEs deflate, p-values plummet—classic invalid inference, flagged in marginaleffects GitHub issues. Weights wrong (wts="weights" must match column)? Biased ATE. Poisson overdispersion? Switch to quasipoisson, but robust SEs still needed.

No clubSandwich? Fallback sandwich::vcovHC(m1.fit, "HC3", cluster = "idcluster")—less flexible for multi-way. Bootstrap alt: avgcomparisons(..., vcov = NULL, inferences = "boot", R=1000)—slow on large data, but nonparametric.

Mixed models? Tempting for clustering, but experts advise against—weights mess up random effects; stick to GLM + robust vcov.

Best Practices for Valid Inference

Always plot diagnostics: plot(m.out) for balance, tidy(comp) for results. Validate clusters: table(m.dataid_cluster)—ensure no empty cells. For huge data, parallelize vcov with clubSandwich's options.

Scale up? Subsample for testing. Report both single/multi-way SEs in supplements. And hey, cite the MatchIt effects guide—transparency wins.

Sources
MatchIt Vignette: Estimating Effects — Guidance on robust SEs, weights, and clustering post-matching: https://cran.r-project.org/web/packages/MatchIt/vignettes/estimating-effects.html
MatchIt Articles: Estimating Effects — Details on subclass clustering and multi-way vcov in full matching: https://kosukeimai.github.io/MatchIt/articles/estimating-effects.html
marginaleffects Uncertainty Chapter — vcov formula support and sandwich integration for contrasts: https://marginaleffects.com/chapters/uncertainty.html
Stack Overflow: Robust SEs After Full Matching — Maintainer advice on two-way clustering with subclass and subjects: https://stackoverflow.com/questions/78460537/calculating-robust-ses-following-full-matching-matchit-with-mixed-effect-model
marginaleffects Comparisons Chapter — Examples of precomputed vcov matrices in avg_comparisons: https://marginaleffects.com/chapters/comparisons.html
marginaleffects GitHub Issue 1166 — Warnings and fixes for vcov with clustered models: https://github.com/vincentarelbundock/marginaleffects/issues/1166

Conclusion

Nail cluster-robust standard errors in marginaleffects::avgcomparisons() post-MatchIt full matching by precomputing multi-way vcov with clubSandwich::vcovCRCR() on subclass and idcluster—it safeguards your weighted Poisson risk ratios against all dependence. Single-way vcov=~id_cluster is a solid backup, but don't skip the matching structure. Run the repro above on your data, tweak type="CR3" if clusters are tiny, and you'll have credible inference that holds up to scrutiny.