Handling Factors in Formulas

December 11, 2023

Currently, two Tools on mightymetrika.com handle factors (we can also call factors categorical variables) in a way which is likely to cause confusion



Keep in mind that this post pertains to using formulas within the 'shiny' web-applications. When using 'mmirestriktor', 'mmibain', 'restriktor', or 'bain', users can handle factors in the regular R fashion before fitting statistical models.

The Factor Issue

Consider these two csv files:

When processing the csv on the left (data_f_grpnum) the grp variable will typically get treated as a numeric variable. On the other hand, when working with the csv on the right (data_f_grpfac) the grp variable will get treated as a factor.


When uploading a csv file for mightymetrika.com 'shiny' applications, you should make sure that variables that you want treated as factors cannot be automatically parsed as numeric. Put another way, make sure your factor variables include letters in their values.


For the csv files above, we force our grp variable to behave as a factor by prefixing the values with "Group". Any letter would achieve the same goal.

What Happens When We Read a Factor as Numeric?

Let's see what happens when we accidently try to run the mmirestriktor app on the data_f_grpnum.csv file when we want the grp variable to function as a categorical variable (i.e., a factor).


In the example below, we start by:

  • Read in the data_f_grpnum.csv file
  • Notice that the available variables are 'x' & 'grp'
  • Set up the formula x ~ -1 + grp


Up to this point, everything looks as expected. However, after we fit the model we see that the Available Terms for Constraint only shows the variable 'grp'. However, since we wanted the grp variable to at as a categorical variable, we wanted the available terms to show us something that corresponds to the groups. Since the "terms" (i.e., grp) are not consistent with the goal (comparing groups or another goal that would inspire use to use a categorical variable), we know that something has gone horribly wrong.

What Happens When We Read a Factor as Factor?

Now since we know we want the grp variable to function as a categorical variable, we run the mmirestriktor app on the data_f_grpfac.csv and we get the following Available Terms for Constraint.

Now we can use the information in the Available Terms for Constraint to specify a constraint which compares the groups in our 'grp' variable and then we can run an Informative Hypothesis Test analysis.

August 19, 2024
Mighty Metrika focuses on statistical methods and mathematics for the analysis of small sample size data. As such, the project runs the risk of people with small sample sizes using tools and methods from mightymetrika.com and becoming over confident in their results because they used "small sample size methods." The long term rigorous goal to combat this disservice is to host citizen science projects, include simulation function in R packages, and share simulation results from the literature and from mightymetrika.com tools through blogs. A short and quick way to combat misuse is through the Who Said It Best series. The series will share some of the best warnings from the small sample size statistical literature. In the Conclusion section of Daniel McNeish's paper Challenging Conventional Wisdom for Multivariate Statistical Models With Small Samples he shares a clear and wonderfully worded warning:
June 25, 2024
This is a quick blog post to list some of the essential resources that I needed to get a citizen science app up and running. The app uses: R Shiny PostgreSQL Pool AWS EC2 The post is basically a way for me to bookmark resources that I found useful and also as a way to say thank you to the folks that put these resources up online.
June 10, 2024
In 'mmibain' v0.2.0, the unit tests are passing at the moment, but on r-devel-linux-x86_64-debian-clang it really seems to be hit or miss. I believe that when the test fails it is do to the new BFfe function which is a case-by-case type implementation of ' bain ' for linear models; however, I used a unit test which relies on a synthetic data set where I generated random numbers and then just used the rep() function to group observations by participants. As such, the data generating process does fit the statistical model and sometimes the random data set that is generated does not make it through bain::bain() without error. I have already changed the unit test and corresponding Roxygen2 documentation example on the Mighty Metrika GitHub and this blog post will walk through the new data and model. But just for further context, here is the original code that sometimes runs through and sometimes throws and error.
More Posts