Coreset-Based data compression for Logistic Regression

Nery Riquelme-Granada, Khuong An Nguyen, Zhiyuan Luo

Research output: Chapter in Book/Conference proceeding with ISSN or ISBNConference contribution with ISSN or ISBNpeer-review

Abstract

The coreset paradigm is a fundamental tool for analysing complex and large datasets. Although coresets are used as an acceleration technique for many learning problems, the algorithms used for constructing them may become computationally exhaustive in some settings. We show that this can easily happen when computing coresets for learning a logistic regression classifier. We overcome this issue with two methods: Accelerating Clustering via Sampling (ACvS) and Regressed Data Summarisation Framework (RDSF); the former is an acceleration procedure based on a simple theoretical observation on using Uniform Random Sampling for clustering problems, the latter is a coreset-based data-summarising framework that builds on ACvS and extends it by using a regression algorithm as part of the construction. We tested both procedures on five public datasets, and observed that computing the coreset and learning from it, is 11 times faster than learning directly from the full input data in the worst case, and 34 times faster in the best case. We further observed that the best regression algorithm for creating summaries of data using the RDSF framework is the Ordinary Least Squares (OLS).
Original languageEnglish
Title of host publicationData Management Technologies and Applications - 9th International Conference, DATA 2020, Revised Selected Papers
EditorsSlimane Hammoudi, Christoph Quix, Jorge Bernardino
Place of PublicationSwitzerland
PublisherSpringer
Pages195-222
Number of pages28
Volume1446
ISBN (Electronic)9783030830144
ISBN (Print)9783030830137
DOIs
Publication statusPublished - 23 Jul 2021

Publication series

NameCommunications in Computer and Information Science
Volume1446
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Bibliographical note

Funding Information: This research is supported by AstraZeneca and the Paraguayan Government.

Keywords

  • Coresets
  • Logistic Regression
  • Data compression
  • Logistic regression

Fingerprint

Dive into the research topics of 'Coreset-Based data compression for Logistic Regression'. Together they form a unique fingerprint.

Cite this