Date of Award

Fall 2018

Document Type


Degree Name

Master of Science (MS)


Electrical and Computer Engineering

First Advisor

Ababei, Cristinel

Second Advisor

Medeiros, Henry

Third Advisor

Povinelli, Richard


This thesis proposes to study parallelization methods to improve the computational runtime of the popular Viola-Jones face detection algorithm. These methods employ multithreaded programming and CUDA programming approaches. The thesis provides a discussion of background information on all relevant topics, which is then followed by a presentation of the code architecture changes that are proposed. Specific implementation details are then discussed in more details followed by a discussion and comparison of results obtained through various tests. This thesis first begins by presenting a history and description of the Viola-Jones algorithm. Detailed explanations of each step in the process used to detect a face are provided. Next, background information about parallel processing is provided. This includes both standard multithreaded program design as well as CUDA programming. New algorithm design methods that employ parallelization techniques will then be proposed to improve over the original Viola-Jones algorithm. These techniques include both multithreading and CUDA programming, whose potential advantages and disadvantages are discussed as well. Implementations of these new algorithms will be provided next as well as a detailed explanation of the functionality used. Finally, this thesis will provide test results for all algorithm versions, including the original algorithm as well as a comparison and possible future improvements. Simulation results indicate that the multithreaded algorithm was able to provide a maximum of 7.8x speedup over the original version when running on 16 processing cores. The CUDA version algorithm was able to provide a maximum of 47x speedup over the original version. After exploring more detailed results and comparisons, it was determined that each version has advantages and disadvantages. The multithreaded version was much simpler to code and would run on a wider range of hardware, however the CUDA version was significantly faster. In addition, the CUDA version has much room for future optimizations to further increase the speed of the algorithm.