Date of Award
Spring 4-18-2024
Degree Type
Thesis
Degree Name
MS Data Science
Department
Mathematics and Computer Science
Advisor
Manfred Minimair, PhD
Committee Member
Nathan Kahl, PhD
Committee Member
Kobi Abayomi, PhD
Keywords
regression models, reading complexity, readability, reading scores, excerpt
Abstract
Reading is an essential skill when pursuing academic success. In order to assist students in developing their critical reading skills, it is essential to offer texts that are both within their capability, yet challenging enough to push them beyond their boundaries. At the moment, the majority of excerpts are matched to readability scores utilizing tools such as the Flesch-Kincaid Grade Level scoring in order to determine reader level. The core issue with reading scores structured around the Flesch-Kincaid Grade Level scoring is that the scoring process is not readily available to the public while also lacking validation studies. In this project, reading complexity will be determined by applying exploratory data analysis (EDA) and training a readability score that utilizes regression models. This reading score will be utilized in order to determine a passage’s reading complexity. A positive target value will denote a simple excerpt of text while a negative target value will denote a more difficult or complex excerpt of text. The focus of this M.S. thesis is to establish an understanding of the ‘Target’ score in order to establish readability scores as a trusted resource. We will delve deeper into the ‘Target’ variable and textual excerpts by use of EDA to gather insight on how data should be trained by use of regression models. The regression models output readability and accuracy scores that determine how accurately each regression model is able to determine a proper readability score. The accuracy of a given regression model determines the validity of the regression model’s ability to determine readability.
KEYWORDS
regression models, reading complexity, readability, reading scores, excerpt
Recommended Citation
Talamayan, Joshua, "Determining Reading Complexity using Regression Models" (2024). Seton Hall University Dissertations and Theses (ETDs). 3207.
https://scholarship.shu.edu/dissertations/3207