contact@mightymetrika.com

Release Party

February 23, 2024

Mighty Metrika Interface to Cluster Adjusted t Statistics

The goal of the 'mmiCATs' R package is to introduce researchers to cluster adjusted t statistics (CATs), help them better understand when to use them, and to provide a tool which makes the method easy to get started using.

My Introduction to CATs

As you may already know, mightymetrika's mission is centered around advancing methods for statistical methods which improve the analysis of small sample size data. A lot of studies that I work on have a double complication:


  1. Small number of participants
  2. A clustering structure such as repeated measurements within participant


When looking for papers that discuss methods for handling this double complication, I came across Small Samples in Multilevel Modeling by Joop Hox & Daniel McNeish (2020). I often reference this paper in my work as an applied statistician. In particular, I often find two key parts of the paper extremely helpful. The first part that I often return to is Table 15.1 (see image below) and the discussion surrounding.



The other part of the paper which I find very useful is the discussion of what results from Bayesian models might look like in comparison to the results presented in Table 15.1:


"Table 15.1 does not mention Bayesian estimation because suggestions are highly dependent on specification of the prior. With uninformative priors, Bayesian estimation should work with the sample sizes indicated for ML. Bayesian estimation with weakly informative priors roughly corresponds to the REML column, and Bayesian estimation with strongly informative priors is typically appropriate with lower samples than suggested for REML with the Kenward–Roger correction (McNeish, 2016b). "


Originally, I read this paper as an applied statistician looking for more efficient ways to implement mixed effects modeling with smaller sample sizes. However, when I started working on mightymetrika, I came to a point where I wanted to work with a method for smaller sample sizes when the data generating process suggests that the "independent observations" assumption of a statistical model is violated; however, I was not ready to start working on a project focused on prior distributions. So I came back and read the Hox & McNeish (2020) paper again, and this time, here is the passage that stood out for me:


A study by Cameron, Gelbach, and Miller (2008) showed that the “wild bootstrap”, which is similar to the residual bootstrap, was effective with as few as five clusters, which is even lower that the minimal sample size reported in Yung and Chan (1999)...the wild bootstrap can be carried out in the R package 'clusterSEs' (Esarey & Menger, 2018).


To learn more about this fascinating wild bootstrap method, I turned to the Esarey & Menger (2018) paper which studies four methods side-by-side:


  •  Cluster-robust standard error
  • Which might be considered the reference method which inspires the search for a replacement
  • Pairs cluster bootstrapped t-statistics
  • Wild cluster bootstrapped t-statistics
  • Which I came to learn about
  •  Cluster-adjusted t-statistics
  • Which turned out to be the star


One section of the paper provides a relatively simple overview of the CATs method:



When to Use CATs

The conclusion section of Esarey & Menger (2018) had a paragraph which provided a nice outline of when to select which method:



Often, researchers select a random effects model to handle clustered data. However, when the sample size is small, a statistician might try to avoid specifying a model that is too complex for the data by specifying a simpler random effects model than the model considered to be the correct specification.  The 'mmiCATs' R package comes with a card game called CloseCATs(). The 'mmiCATs' GitHub README describes the game as follows:


How to Start Using CATs


To start using CATs, I recommend three resources:


  1. Esarey & Menger (2018) for a better understanding of the method
  2. The clusterSEs R package which you can find on CRAN
  3. The MetaCran GitHub repository for clusterSEs where you can browse the code for the method


If you have a csv with clustered data and you would like to get started experimenting with CATs, you can also use the mmiCATs::mmiCATs() shiny application on the mightymetrika website. This app implements the basic functionality of the clusterSEs::clusterIM.glm() function.


Coming Soon

In the coming weeks, mightymetrika.com is planning to release a few blog posts which can help you delve deeper into CATs including:


  1. A tutorial which will use screenshots to walk through a CATs analysis using the mmiCATs::mmiCATs() application
  2. A blog post that will walk through a hand or two of the mmiCATs::CloseCATs() game


In addition to the two main 'mmiCATs' functions (mmiCATs() & CloseCATs()), the 'mmiCATs' package also includes two functions which experiment with the possibility of using CATs with robust regression models in place of stats::glm():


  1. mmiCATs::cluster_im_lmRob()
  2. mmiCATs::cluster_im_glmRob()


These functions can be used with robust models from both 'robust' and 'robustbase'.


The mmiCATs::pwr_func_lmer() function can be used to streamline the workflow of conducting simulation studies by:


  • Simulating clustered datasets
  • Fitting the data sets to the following statistical models:
  • Linear mixed effects model
  • Random intercept model
  • CATs
  • CATs with truncation
  • mmiCATs::cluster_im_lmRob() with 'robust'
  • mmiCATs::cluster_im_lmRob() with 'robustbase'


Then results can be compared to give users a better understanding of each models performance under various data generation schemes.

August 19, 2024
Mighty Metrika focuses on statistical methods and mathematics for the analysis of small sample size data. As such, the project runs the risk of people with small sample sizes using tools and methods from mightymetrika.com and becoming over confident in their results because they used "small sample size methods." The long term rigorous goal to combat this disservice is to host citizen science projects, include simulation function in R packages, and share simulation results from the literature and from mightymetrika.com tools through blogs. A short and quick way to combat misuse is through the Who Said It Best series. The series will share some of the best warnings from the small sample size statistical literature. In the Conclusion section of Daniel McNeish's paper Challenging Conventional Wisdom for Multivariate Statistical Models With Small Samples he shares a clear and wonderfully worded warning:
June 25, 2024
This is a quick blog post to list some of the essential resources that I needed to get a citizen science app up and running. The app uses: R Shiny PostgreSQL Pool AWS EC2 The post is basically a way for me to bookmark resources that I found useful and also as a way to say thank you to the folks that put these resources up online.
June 10, 2024
In 'mmibain' v0.2.0, the unit tests are passing at the moment, but on r-devel-linux-x86_64-debian-clang it really seems to be hit or miss. I believe that when the test fails it is do to the new BFfe function which is a case-by-case type implementation of ' bain ' for linear models; however, I used a unit test which relies on a synthetic data set where I generated random numbers and then just used the rep() function to group observations by participants. As such, the data generating process does fit the statistical model and sometimes the random data set that is generated does not make it through bain::bain() without error. I have already changed the unit test and corresponding Roxygen2 documentation example on the Mighty Metrika GitHub and this blog post will walk through the new data and model. But just for further context, here is the original code that sometimes runs through and sometimes throws and error.
More Posts
Share by: