Date of Award
Fall 2024
Document Type
Thesis
Degree Name
Master of Science (MS)
Department
Computer Science
First Advisor
Nasim Yahyasoltani
Second Advisor
Praveen Madiraju Madiraju,
Third Advisor
Walter Bialkowski
Abstract
Radiological exams are initiated by an order that includes a “reason for the exam” (Indication) text. The Indication field is data that is known before the exam and has predictive possibilities for whether there will be significant findings. Such predictive information has application in value-based payment modeling, differential billing, study prioritizing, etc. To date, however, no sustained work has been done on determining the extent of the information present in the Indication field and and how readily it can be brought bear in radiology data intelligence contexts. This study begins such work using the MIMIC CXR JPG dataset. It extracts Indication text from the radiology reports in the dataset, using the included CheXpert generated labels as ground truth for the Finding field. It compares Naïve Bayes models and BERT models with a classification layer to begin to explore the potential to use Indication text to predict the absence or presence of a significant finding. The results here show that while both approaches show promise for yielding actionable information, the semantically sophisticated BERT models as implemented here are not clearly superior.