On generating efficient data summaries for logistic regression: A coreset-based approach

Nery Riquelme-Granada, Khuong An Nguyen, Zhiyuan Luo

Research output: Chapter in Book/Conference proceeding with ISSN or ISBNConference contribution with ISSN or ISBNpeer-review

Abstract

In the era of datasets of unprecedented sizes, data compression techniques are an attractive approach for speeding up machine learning algorithms. One of the most successful paradigms for achieving good-quality compression is that of coresets: small summaries of data that act as proxies to the original input data. Even though coresets proved to be extremely useful to accelerate unsupervised learning problems, applying them to supervised learning problems may bring unexpected computational bottlenecks.

We show that this is the case for Logistic Regression classification, and hence propose two methods for accelerating the computation of coresets for this problem. When coresets are computed using our methods on three public datasets, computing the coreset and learning from it is, in the worst case, 11 times faster than learning directly from the full input data, and 34 times faster in the best case. Furthermore, our results indicate that our accelerating approaches do not degrade the empirical performance of coresets.
Original languageEnglish
Title of host publicationDATA 2020 - Proceedings of the 9th International Conference on Data Science, Technology and Applications
EditorsSlimane Hammoudi, Christoph Quix, Jorge Bernardino
Pages78-89
Number of pages12
Volume1
ISBN (Electronic)9789897584404
DOIs
Publication statusPublished - Jul 2020
Event9th International Conference on Data Science, Technology and Applications (DATA 2020) - , France
Duration: 6 Jul 20209 Jul 2020
http://www.dataconference.org/

Publication series

NameDATA 2020 - Proceedings of the 9th International Conference on Data Science, Technology and Applications

Conference

Conference9th International Conference on Data Science, Technology and Applications (DATA 2020)
Country/TerritoryFrance
Period6/07/209/07/20
Internet address

Bibliographical note

Winner of the Best Paper Award

Keywords

  • data summaries
  • logistic regression
  • coresets
  • Data summaries
  • Logistic regression
  • Large-data
  • Computing time
  • Coresets

Fingerprint

Dive into the research topics of 'On generating efficient data summaries for logistic regression: A coreset-based approach'. Together they form a unique fingerprint.

Cite this