contact@mightymetrika.com

Approaching CATs

December 15, 2023

Lately, I've been reading up on statistical methods for small sample sizes when observations are not independent. There seems to be this dilema where:

mixed effects models: efficient but sensitive to violations of assumptions
cluster robust standard errors: robust but do not perform well in smaller samples

In Small Samples in Multilevel Modeling by Hox & McNeish, the authors mention that:

A study by Cameron, Gelbach, and Miller (2008) showed that the “wild bootstrap”, which is similar to the residual bootstrap, was effective with as few as five clusters, which is even lower that the minimal sample size reported in Yung and Chan (1999). Unfortunately, the residuals bootstrap is not implemented in all software (but is available in MLwiN (Rasbash, Steele, Browne, & Goldstein, 2019) and Mplus (Muthén & Muthén, 2017); the wild bootstrap can be carried out in the R package clusterSEs (Esarey & Menger, 2018).

I was intrigued by this WILD BOOTSTRAP and I immediately went to read the Esarey & Menger paper. On first read (I'm still digesting and rereading), I came away with two major takeaways:

The Cluster adjusted t-statistics (CATs) method performs better than the wild bootstrap method
CATs "often produce a small number of outlying cluster coefficient estimates"

Given item 1, and that an easy to use implementation of CATs is available on CRAN, I am extremely exited to start experimenting and learning more about CATs. Given item 2, I wonder how well CATs would perform if stats::glm is swapped out for robust::glmRob/robust::lmRob or the like? I don't know but I am excited to find out.

As I learn more, I will be building some simple 'shiny' applications, games, and simulation functions focused on CATs in a github repo which is titled mmiCATs.

I should mention that the Esarey & Menger paper is written with a political science bent in that the clusters have a lot of observations within.

< Older Post Newer Post >

Mail

Who Said It Best Episode 1: Daniel McNeish

August 19, 2024

Mighty Metrika focuses on statistical methods and mathematics for the analysis of small sample size data. As such, the project runs the risk of people with small sample sizes using tools and methods from mightymetrika.com and becoming over confident in their results because they used "small sample size methods." The long term rigorous goal to combat this disservice is to host citizen science projects, include simulation function in R packages, and share simulation results from the literature and from mightymetrika.com tools through blogs. A short and quick way to combat misuse is through the Who Said It Best series. The series will share some of the best warnings from the small sample size statistical literature. In the Conclusion section of Daniel McNeish's paper Challenging Conventional Wisdom for Multivariate Statistical Models With Small Samples he shares a clear and wonderfully worded warning:

Resources for Building R Shiny x PostgreSQL x AWS EC2 Apps

June 25, 2024

This is a quick blog post to list some of the essential resources that I needed to get a citizen science app up and running. The app uses: R Shiny PostgreSQL Pool AWS EC2 The post is basically a way for me to bookmark resources that I found useful and also as a way to say thank you to the folks that put these resources up online.

BFfe: Test Update

June 10, 2024

In 'mmibain' v0.2.0, the unit tests are passing at the moment, but on r-devel-linux-x86_64-debian-clang it really seems to be hit or miss. I believe that when the test fails it is do to the new BFfe function which is a case-by-case type implementation of ' bain ' for linear models; however, I used a unit test which relies on a synthetic data set where I generated random numbers and then just used the rep() function to group observations by participants. As such, the data generating process does fit the statistical model and sometimes the random data set that is generated does not make it through bain::bain() without error. I have already changed the unit test and corresponding Roxygen2 documentation example on the Mighty Metrika GitHub and this blog post will walk through the new data and model. But just for further context, here is the original code that sometimes runs through and sometimes throws and error.

Approaching CATs

Who Said It Best Episode 1: Daniel McNeish

Resources for Building R Shiny x PostgreSQL x AWS EC2 Apps

BFfe: Test Update

CONTACT

contact@mightymetrika.com

CONNECT