Date of Award

Fall 2020

Document Type


Degree Name

Master of Science (MS)


Computer Science



First Advisor

Madiraju, Praveen

Second Advisor

Ahamed, Iqbal

Third Advisor

Kaczmarek, Thomas


Data mining for drug-reaction associations is a major topic in the pharmaceutical industry. Historically the focus has been on using privately owned and maintained datasets consisting of information that has been transformed via the FDA Adverse Event Reporting System (FAERS) and privatized reporting systems that house the data from clinical trials. Our focus will be on building a pipeline that demonstrates an open source solution for building a drug’s safety profile from data collection through signal detection. In contrast this pipeline primarily uses the openFDA and social media data available through Reddit with all analysis being done in the R statistical programming language. The aim was to collect the information available in these public sources and apply popular data mining methodologies used to identify and predict the occurrence of adverse events. The results show the ability of the openFDA and social media sites to create real-time drug safety occurrence profiles by applying the same statistical methods applied in clinical trials. Social media will be shown to provide the best results when applied to prescribed daily use medications compared to common over-the-counter drugs or last line of defense medications. The information and results reported in this paper are not intended or implied to be a substitute for professional medical advice, diagnosis, or treatment. Do not delay seeking medical treatment or advice because of something you have read in this paper.