Date of Award
Summer 2022
Document Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Mathematics, Statistics and Computer Science
Program
Computational Mathematical and Statistical Sciences
First Advisor
Ahamed, Sheikh Iqbal
Second Advisor
Bansal, Naveen
Third Advisor
Maadooliat, Mehdi
Abstract
Temporal sentiment labels are used in various multimedia studies. They are useful for numerous classification and detection tasks such as video tagging, segmentation, and labeling. However, generating a large-scale sentiment dataset through manual labeling is usually expensive and challenging. Some recent studies explored the possibility of using online Time-Sync Comments (TSCs) as the primary source of their sentiment maps. Although the approach has positive results, existing TSCs datasets are limited in scale and content categories. Guidelines for generating such data within a constrained budget are yet to be developed and discussed. This dissertation tries to address the above issues by leveraging existing live comments from a popular video distributed platform, YouTube, as a primary time-synchronized data source and exploring efficient strategies for generating TSCs with a constrained budget. An automatic data mining system was first developed and deployed across multiple platforms. Then, long-period experiments were conducted to test the efficiency of the framework. Additionally, two large-scale TSCs datasets were created through the proposed data framework and analyzed for their characteristics. Finally, the outcomes were tested against the original temporal Automatic Speech Recognition (ASR) sentiment labeling to validate their accuracy. The experiment shows the potential of automatically generating temporal sentiment datasets through the proposed mapping system. This project also provides valuable tools for future multimedia research.