Congratulations to Alison Weingarten who defended her dissertation proposal on “Balancing Inference and Prediction in Institutional Research: A practical comparison of logistic regression with machine learning techniques in modeling student persistence”
Jay Verkuilen, Associate Professor, Educational Psychology, CUNY Graduate Center (Chairperson)
David Rindskopf, Distinguished Professor, Educational Psychology/Psychology, CUNY Graduate Center
Paul Attewell, Distinguished Professor, Urban Education/Sociology/Social Welfare, CUNY Graduate Center
In education research, causal inference has traditionally been the focus over predictive power, with statistical models designed to understand and explain the relationships between variables. In the field of institutional research in particular, there is a growing need to not only understand these causal relationships, but also predict what is likely to occur in the future (Which students are most likely to succeed at our college, and who should we admit? Which students are we most likely to lose to attrition, and how can we engage them? Which students are most likely to struggle academically, and what interventions can we provide?). While many institutional researchers are adept in statistical analysis, machine learning methods—widely touted as being more nimble and powerful at making predictions—are still relatively untapped in the field.
This study intends to compare the efficacy of conventional logistic regression with several machine learning classification techniques on predicting secondary educational outcomes. The analysis will use data from the Education Longitudinal Study of 2002, which surveyed students throughout their secondary and postsecondary years to learn about their trajectories into college, the workforce, and beyond. The public-use dataset includes student characteristics, demographics, activities, and high-level academic achievements, as well as life (academic, professional, and personal) after high school. Analysis of the data will provide insight into whether various machine learning techniques yield better predictions about student persistence, as well as a pragmatic discussion about the trade-offs between inference and predictive accuracy.