Abstract
Research interest in data-driven COVID-19 detection surged with unprecedented urgency to support overburdened testing infrastructures. In light of the crucial role accessible testing played in pandemic management, this begs the question: Why was digital screening not officially endorsed for widespread healthcare use? This thesis investigates a significant factor: the necessity for quantifiable confidence in Digital Health solutions.A critical review of state-of-the-art literature revealed that proposals were largely empirically evaluated without guarantees of future performance. To address this gap, this thesis introduces novel confidence techniques based on the foundational Conformal Prediction framework, a statistical solution to quantifying uncertainty. In particular, novel conformal approaches were developed for three high-priority challenges of digital COVID-19 testing: confident sample prediction, confident model learning, and confident data synthesis.
Machine Learning has achieved notable success in predicting COVID-19, particularly from cough samples. However, state-of-the-art evaluations generally omitted confidence characteristics, causing doubt in their transferability to real-world use. Therefore, a domain-specific conformal measure was devised explicitly for COVID-19detection to statistically ensure any underlying algorithm’s prediction performance. Furthermore, the model’s confidence was informed by the samples’ difficulty, further bolstering the prediction’s reliability.
Instead of quantifying confidence as a post-prediction step like a traditional Con-formal Prediction wrapper, an increasingly popular model design philosophy asserts that critical characteristics should be inherent to the predictor itself. In line with this argument, a unique uncertainty-aware loss function was developed that approximated the conformal output distribution and consequently approximated conformal validity. This approach successfully embedded confidence guarantees into stand-alone Deep Learning models during training, maintaining approximate upper-bound error rates for COVID-19 detection.
The third addressed obstacle to digital COVID-19 detection is the data-hungry models’ need for large datasets. Data synthesis is a promising solution for generating additional samples. However, reliably assessing the generated samples’ trustworthiness is an open research question and a critical challenge for high-stakes prediction tasks. Consequently, a novel conformal synthesis algorithm was pro-posed to generate samples from high-confidence feature space regions identified by a user-selected significance level. Through extension with conformal samples, prediction performance on challenging COVID-19 datasets was consistently and significantly improved.
The three proposed confidence algorithms were meticulously and systematically evaluated throughout this thesis to demonstrate their theoretical foundation and empirical success. Rigorous experiments were conducted primarily on real-worldCOVID-19 datasets, including respiratory audio and medical imaging data. Addition-ally, prediction tasks from diverse domains were included to illustrate the proposals’ generalisation potential, providing a robust understanding of their effectiveness and versatility.
Date of Award | Nov 2024 |
---|---|
Original language | English |
Awarding Institution |
|
Supervisor | Marcus Winter (Supervisor), Khuong An Nguyen (Supervisor) & Alison Bruce (Supervisor) |