Date of Award

Fall 2024

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Science

First Advisor

Nasim Yahyasoltani

Second Advisor

Praveen Madiraju Madiraju,

Third Advisor

Walter Bialkowski

Abstract

Radiological exams are initiated by an order that includes a “reason for the exam” (Indication) text. The Indication field is data that is known before the exam and has predictive possibilities for whether there will be significant findings. Such predictive information has application in value-based payment modeling, differential billing, study prioritizing, etc. To date, however, no sustained work has been done on determining the extent of the information present in the Indication field and and how readily it can be brought bear in radiology data intelligence contexts. This study begins such work using the MIMIC CXR JPG dataset. It extracts Indication text from the radiology reports in the dataset, using the included CheXpert generated labels as ground truth for the Finding field. It compares Naïve Bayes models and BERT models with a classification layer to begin to explore the potential to use Indication text to predict the absence or presence of a significant finding. The results here show that while both approaches show promise for yielding actionable information, the semantically sophisticated BERT models as implemented here are not clearly superior.

Available for download on Thursday, October 29, 2026

COinS