Fine-grained e-commerce product classification using image and text modalities

Student thesis: Doctoral Thesis

Abstract

Accurate classification of e-commerce products, particularly the fine-grained differentiation of visually similar items, presents a significant challenge. Errors in classifying visually similar items have a ripple effect, negatively impacting various downstream processes. While logistical operations like automated warehouse sorting are affected, the consequences are particularly significant for efforts promoting product reuse and recycling to minimise electronic waste (e-waste). This thesis tackles this challenge by exploring and creating novel methods for improving fine-grained classification, specifically for e-commerce products within the Waste Electrical and Electronic Equipment (WEEE) category. Furthermore, a key objective is to develop solutions that can be applied beyond WEEE classification, benefiting a wider array of downstream applications in e-commerce.
This research undertakes a comprehensive effort to advance the field of fine-grained classification by exploring and developing novel techniques that enhance both performance and robustness. A key aspect of this work involves utilising the complementary information found in image and text modalities, with a focus on determining the most effective methods for combining their respective features. To achieve this, we developed and implemented a multimodal learning architecture based on both image and text, establishing a baseline performance of 75% precision, 71% recall, and 67% F1-score on the TAIMD-17k dataset. We introduce a novel weighted ensemble method that estimates the contribution of each modality by calculating its Shapley value, leading to substantial gains in fine-grained classification accuracy, reaching 80% precision, 82% recall, and 79% F1-score. Furthermore, addressing the critical need for reliable uncertainty estimation in practical applications, we also integrate conformal prediction into our multimodal learning framework. This provides a robust method for generating prediction sets with guaranteed coverage probabilities, enabling more confident and informed decision-making. Exploring different conformity measures and adapting conformal prediction to multimodal settings are significant components of this research. Beyond model development, this thesis includes the creation and public release of a large-scale multimodal dataset specifically tailored for fine-grained classification of WEEE e-commerce products. Finally, we develop a practical framework designed to facilitate the deployment and integration of our proposed multimodal models into real-world applications.
Date of AwardJun 2025
Original languageEnglish
Awarding Institution
  • University of Brighton
SupervisorMichalis Pavlidis (Supervisor) & Khuong An Nguyen (Supervisor)

Cite this

'