contact@mightymetrika.com
There are a few categories of "small" that come to my mind when I hear the term small sample size study. All of these categories can lead to misleading inferences if they are not handled correctly. Here are the categories that are on my radar.
This might mean collecting data on individuals when you are only capable of collecting data on very few individuals. This is the most common situation that comes to mind when I think of a small sample size study. Often times, this situation arises due to budget constraints such as data collection being expensive. It can also arise due to studying a population that is hard to reach or a rare event. In less savory situations, the situation may arise due to poor study planning such as collecting data without conducting a sample size estimation or power analysis first.
When there is a small number of observations, one of the best ways to understand the data is through an indepth description of the data via descriptive statistics and visualizations. Visualizations which show all of the data points for a variable can be particularly helpful.
In order to get adequate statistical power when there is a small number of observations, it may be necessary to spend a lot of time and effort during the planning phase to consider what is known and how this knowledge can be ingested into the statistical modeling process. Statistical methods such as Bayesian statistics with informative priors and informative hypothesis testing (see restriktor & bain for introductions to informative hypothesis testing) can be used to ingest prior knowledge into the model and will result in models with greater statistical power than standard methods.
The obvious downside to "ingesting knowledge" into statistical models is that it can be seen as highly subjective and it can potentially be seen as a form of p-hacking. For example consider the following fictional analysis:
"I ran an ANOVA via anova(stats::lm(formula, data)) and my p-value was 0.09, so I plugged the fitted model into restriktor::iht with a constraint that I obtained by reading my fitted model's summary, and now I've obtained a Type B p-value of 0.90 and a Type A p-value of 0.03, rendering statistically significant support for my hypothesis test." Ficticious P-hacker
In order to conduct a trustworthy study which ingests knowledge in to the model, I would recommend
taking the advice given in a bain tutorial and pre-registering the study. This will give you the opportunity to formulate your knowledge in the form of a statistical analysis plan which discusses prior distributions and constraints before data is collected.
It is common to have a nested data generating processs such as polling citizens within district. In this situation, you might have a large amount of citizens but a smaller amount of districts. If the data shows that the nested structure of the data is likely to bias the standard errors, for example when the intraclass correlation is large, then it will be necessary to use a method that adjusts for the lack of independence between observations within cluster.
When the number of clusters is large, cluster-robust standard errors provides a solution that is easy to implement and does not suffer from sensitivity to misspecification. However, when the number of clusters is small, this method tends to have confidence intervals that are too narrow and false positive rates that are too high.
Mixed effects models are a very good option when the number of clusters is small; however, they are sensitive to model specification so this method takes more knowledge and care to implement successfully. In addition, with smaller samples, it may not be possible to specify the correct model without overfitting.
The Cluster adjusted t-statistics (CATs) approach often performs well with a small number of clusters. When it is possible to specify a mixed effects model correctly, the mixed effects model will be more efficient and powerful than CATs, but CATs is a safer option when correctly specifying a mixed effects model presents difficulties.
As of know, I am not sure what methods should be recommended in the case of a small number of clusters and a small number of observations within cluster. For some effect sizes, I have obtained sufficient statistical power using Bayesian random intercept models; however, in many situations, a simple random intercept model may not be the correct specification of the random effects structure.
Another important category is when the statistical model has a lot of predictors relative to the number of observations. This situation could lead to overfitting if the statistical model is not specified in a way that can handle the complexity. Mighty Metrika may have more apps focused on this issue in 2024; but for now, one method worth learning about in this regard is Bayesian Penalized Regression with Shrinkage Priors.