contact@mightymetrika.com

Handling Factors in Formulas pt 2

December 29, 2023

In a recent blog post we discussed the process for reading in variables correctly. The gist was this:


If you want your variable treated as a factor (i.e., a categorical variable) then ensure that the values have letters.


This is still good advice. But an ongoing (note: CRAN is closed for the holidays so the updates are taking a while) update to Mighty Metrika tools will have another way to make sure your variables are being handled correctly. This blog will give a basic overview on using this new method. Other blogs posts which will drop within the next few weeks will also feature this new method.


First, let's use mmirestriktor to read in the data_f_grpnum.csv file which gave us issues in the Handling Factors in Formulas.


As in the previous blog post, notice that the grp variable has the type integer when we know that we want type factor. Before, this meant that we would need to refresh the app, ensure that all the values have a letter (i.e., g1, g2, g3 instead of 1, 2, 3), and re-import the data. Now you can fix the issue by:

  1. Double click 'integer' for the grp row
  2. Replace 'integer' by 'factor'
  3. Click in the white space anywhere beneath the table


Now you should see the following:


Now you can continue the analysis with the correct variable types.


I am hoping that this will make the app easier to use. However, the process still needs a few major fixes. For example, look what happens when you accidently enter the type wrong. Instead of "factor" we accidently enter "fctor":


Notice that we get an error in the bottom right hand corner alerting us that we made a mistake. This should help us understand that we did something wrong and that we should not continue without correcting the mistake. However, if we are rushing or absent-minded, then we might accidently continue and get unfortunate results.


Notice that we cannot set up constraints based  on the groups we expect to see since our data was not processed as a factor. We can fix this by:

  1. Going back up to the table and editing the type
  2. Click Fit Model again


Now just delete the error flag (I left it there for didactic purposes but remove it whenever you like) and finish the analysis.


The process is not so pretty, but I believe it is better than before. I will make the same update to mmibain as soon as CRAN gets back from vacation. Blog posts with examples of the process will be posted soon too.

August 19, 2024
Mighty Metrika focuses on statistical methods and mathematics for the analysis of small sample size data. As such, the project runs the risk of people with small sample sizes using tools and methods from mightymetrika.com and becoming over confident in their results because they used "small sample size methods." The long term rigorous goal to combat this disservice is to host citizen science projects, include simulation function in R packages, and share simulation results from the literature and from mightymetrika.com tools through blogs. A short and quick way to combat misuse is through the Who Said It Best series. The series will share some of the best warnings from the small sample size statistical literature. In the Conclusion section of Daniel McNeish's paper Challenging Conventional Wisdom for Multivariate Statistical Models With Small Samples he shares a clear and wonderfully worded warning:
June 25, 2024
This is a quick blog post to list some of the essential resources that I needed to get a citizen science app up and running. The app uses: R Shiny PostgreSQL Pool AWS EC2 The post is basically a way for me to bookmark resources that I found useful and also as a way to say thank you to the folks that put these resources up online.
June 10, 2024
In 'mmibain' v0.2.0, the unit tests are passing at the moment, but on r-devel-linux-x86_64-debian-clang it really seems to be hit or miss. I believe that when the test fails it is do to the new BFfe function which is a case-by-case type implementation of ' bain ' for linear models; however, I used a unit test which relies on a synthetic data set where I generated random numbers and then just used the rep() function to group observations by participants. As such, the data generating process does fit the statistical model and sometimes the random data set that is generated does not make it through bain::bain() without error. I have already changed the unit test and corresponding Roxygen2 documentation example on the Mighty Metrika GitHub and this blog post will walk through the new data and model. But just for further context, here is the original code that sometimes runs through and sometimes throws and error.
More Posts
Share by: