contact@mightymetrika.com
Lately, I've been reading up on statistical methods for small sample sizes when observations are not independent. There seems to be this dilema where:
In Small Samples in Multilevel Modeling by Hox & McNeish, the authors mention that:
A study by Cameron, Gelbach, and Miller (2008) showed that the “wild bootstrap”, which is similar to the residual bootstrap, was effective with as few as five clusters, which is even lower that the minimal sample size reported in Yung and Chan (1999). Unfortunately, the residuals bootstrap is not implemented in all software (but is available in MLwiN (Rasbash, Steele, Browne, & Goldstein, 2019) and Mplus (Muthén & Muthén, 2017); the wild bootstrap can be carried out in the R package clusterSEs (Esarey & Menger, 2018).
I was intrigued by this WILD BOOTSTRAP and I immediately went to read the Esarey & Menger paper. On first read (I'm still digesting and rereading), I came away with two major takeaways:
Given item 1, and that an easy to use implementation of CATs is available on CRAN, I am extremely exited to start experimenting and learning more about CATs. Given item 2, I wonder how well CATs would perform if stats::glm is swapped out for robust::glmRob/robust::lmRob or the like? I don't know but I am excited to find out.
As I learn more, I will be building some simple 'shiny' applications, games, and simulation functions focused on CATs in a github repo which is titled mmiCATs.
I should mention that the Esarey & Menger paper is written with a political science bent in that the clusters have a lot of observations within.