My first paper was published in the Journal of Statistical Research

by John Ryan
23 July 2023


The paper titled “Debias Random Forest Regression Predictors” with Professors Lihua Chen and Prabhashi Withana Gamage, was accepted and published in the Journal of Statistical Research.

The abstract is as follows:

The random forest can reduce the variance of regression predictors through bagging while leaving the bias mostly unchanged. In general, the bias is not negligible and consequently bias correction is necessary. The default bias correction method implemented in the R package randomForest often works poorly. Several approaches have been developed which in general outperform the R default. However, little work has been done to com- prehensively evaluate the performance of these methods and thus guide users to select an appropriate method for bias correction. This paper fills this gap by providing an informative ranking of these bias correction methods based on an extensive numerical study. We further offered practical suggestions on the application of the winner of these methods and suggested a visualization technique to help users decide when bias correction is needed.

The goal was to test, review and categorize the different bias reduction methods for the popular machine learning estimator, Random Forest. This is pertinent to the omnipresent bias-variance tradeoff in machine learning and statistics. We aimed to improve usage of this already powerful estimator and provide user guidance for choosing a method.

This paper spawned from my participation in an NSF funded Research Experiences for Undergraduates program, headed by Professors Chen and Withana Gamage. Because of the analytically complex nature of the Random Forest, much of the study consisted of implementing different methods and performing countless Monte-Carlo simulations in R. I thank Professors Chen and Withana Gamage for including me in this project and for their guidance throughout. As this is my first publication and the first project for which I played a significant role, I learned a lot about machine learning, statistics, programming and the research process along the way.

back to home