contact@mightymetrika.com

Categories of Small

December 18, 2023

There are a few categories of "small" that come to my mind when I hear the term small sample size study. All of these categories can lead to misleading inferences if they are not handled correctly. Here are the categories that are on my radar.

1) Small Number of Observations

This might mean collecting data on individuals when you are only capable of collecting data on very few individuals. This is the most common situation that comes to mind when I think of a small sample size study. Often times, this situation arises due to budget constraints such as data collection being expensive. It can also arise due to studying a population that is hard to reach or a rare event. In less savory situations, the situation may arise due to poor study planning such as collecting data without conducting a sample size estimation or power analysis first.


When there is a small number of observations, one of the best ways to understand the data is through an indepth description of the data via descriptive statistics and visualizations. Visualizations which show all of the data points for a variable can be particularly helpful.


In order to get adequate statistical power when there is a small number of observations, it may be necessary to spend a lot of time and effort during the planning phase to consider what is known and how this knowledge can be ingested into the statistical modeling process. Statistical methods such as Bayesian statistics with informative priors and informative hypothesis testing (see restriktor & bain for introductions to informative hypothesis testing) can be used to ingest prior knowledge into the model and will result in models with greater statistical power than standard methods.


The obvious downside to "ingesting knowledge" into statistical models is that it can be seen as highly subjective and it can potentially be seen as a form of p-hacking. For example consider the following fictional analysis:


"I ran an ANOVA via anova(stats::lm(formula, data)) and my p-value was 0.09, so I plugged the fitted model into restriktor::iht with a constraint that I obtained by reading my fitted model's summary, and now I've obtained a Type B p-value of 0.90 and a Type A p-value of 0.03, rendering statistically significant support for my hypothesis test."  Ficticious P-hacker


In order to conduct a trustworthy study which ingests knowledge in to the model, I would recommend taking the advice given in a bain tutorial and pre-registering the study. This will give you the opportunity to formulate your knowledge in the form of a statistical analysis plan which discusses prior distributions and constraints before data is collected.

2) Small Number of Clusters

It is common to have a nested data generating processs such as polling  citizens within district. In this situation, you might have a large amount of citizens but a smaller amount of districts. If the data shows that the nested structure of the data is likely to bias the standard errors, for example when the intraclass correlation is large, then it will be necessary to use a method that adjusts for the lack of independence between observations within cluster.


When the number of clusters is large, cluster-robust standard errors provides a solution that is easy to implement and does not suffer from sensitivity to misspecification. However, when the number of clusters is small, this method tends to have confidence intervals that are too narrow and false positive rates that are too high.


Mixed effects models are a very good option when the number of clusters is small; however, they are sensitive to model specification so this method takes more knowledge and care to implement successfully. In addition, with smaller samples, it may not be possible to specify the correct model without overfitting.


The Cluster adjusted t-statistics (CATs) approach often performs well with a small number of clusters. When it is possible to specify a mixed effects model correctly, the mixed effects model will be more efficient and powerful than CATs, but CATs is a safer option when correctly specifying a mixed effects model presents difficulties.


As of know, I am not sure what methods should be recommended in the case of a small number of clusters and a small number of observations within cluster. For some effect sizes, I have obtained sufficient statistical power using Bayesian random intercept models; however, in many situations, a simple random intercept model may not be the correct specification of the random effects structure.

3) Small Number of Observations Relative to Predictors

Another important category is when the statistical model has a lot of predictors relative to the number of observations. This situation could lead to overfitting if the statistical model is not specified in a way that can handle the complexity. Mighty Metrika may have more apps focused on this issue in 2024; but for now, one method worth learning about in this regard is Bayesian Penalized Regression with Shrinkage Priors.

August 19, 2024
Mighty Metrika focuses on statistical methods and mathematics for the analysis of small sample size data. As such, the project runs the risk of people with small sample sizes using tools and methods from mightymetrika.com and becoming over confident in their results because they used "small sample size methods." The long term rigorous goal to combat this disservice is to host citizen science projects, include simulation function in R packages, and share simulation results from the literature and from mightymetrika.com tools through blogs. A short and quick way to combat misuse is through the Who Said It Best series. The series will share some of the best warnings from the small sample size statistical literature. In the Conclusion section of Daniel McNeish's paper Challenging Conventional Wisdom for Multivariate Statistical Models With Small Samples he shares a clear and wonderfully worded warning:
June 25, 2024
This is a quick blog post to list some of the essential resources that I needed to get a citizen science app up and running. The app uses: R Shiny PostgreSQL Pool AWS EC2 The post is basically a way for me to bookmark resources that I found useful and also as a way to say thank you to the folks that put these resources up online.
June 10, 2024
In 'mmibain' v0.2.0, the unit tests are passing at the moment, but on r-devel-linux-x86_64-debian-clang it really seems to be hit or miss. I believe that when the test fails it is do to the new BFfe function which is a case-by-case type implementation of ' bain ' for linear models; however, I used a unit test which relies on a synthetic data set where I generated random numbers and then just used the rep() function to group observations by participants. As such, the data generating process does fit the statistical model and sometimes the random data set that is generated does not make it through bain::bain() without error. I have already changed the unit test and corresponding Roxygen2 documentation example on the Mighty Metrika GitHub and this blog post will walk through the new data and model. But just for further context, here is the original code that sometimes runs through and sometimes throws and error.
More Posts
Share by: