Automatic Identification of Information Quality Metrics in Health News Stories

Majed Al-Jefri, Roger Evans, Joon Lee, Pietro Ghezzi

    Research output: Contribution to journalArticlepeer-review


    Objective: Many online and printed media publish health news of questionable trustworthiness and it may be difficult for laypersons to determine the information quality of such articles. The purpose of this work was to propose a methodology for the automatic assessment of the quality of health-related news stories using natural language processing and machine learning.

    Materials and Methods: We used a database from the website that aims to improve the public dialogue about health care. developed a set of criteria to critically analyze health care interventions' claims. In this work, we attempt to automate the evaluation process by identifying the indicators of those criteria using natural language processing-based machine learning on a corpus of more than 1,300 news stories. We explored features ranging from simple n-grams to more advanced linguistic features and optimized the feature selection for each task. Additionally, we experimented with the use of pre-trained natural language model BERT.

    Results: For some criteria, such as mention of costs, benefits, harms, and “disease-mongering,” the evaluation results were promising with an F1 measure reaching 81.94%, while for others the results were less satisfactory due to the dataset size, the need of external knowledge, or the subjectivity in the evaluation process.

    Conclusion: These used criteria are more challenging than those addressed by previous work, and our aim was to investigate how much more difficult the machine learning task was, and how and why it varied between criteria. For some criteria, the obtained results were promising; however, automated evaluation of the other criteria may not yet replace the manual evaluation process where human experts interpret text senses and make use of external knowledge in their assessment.
    Original languageEnglish
    Article number515347
    Pages (from-to)1-10
    Number of pages10
    JournalFrontiers in Public Health
    Publication statusPublished - 18 Dec 2020

    Bibliographical note

    Copyright © 2020 Al-Jefri, Evans, Lee and Ghezzi. This is an open-access article
    distributed under the terms of the Creative Commons Attribution License (CC BY).
    The use, distribution or reproduction in other forums is permitted, provided the
    original author(s) and the copyright owner(s) are credited and that the original
    publication in this journal is cited, in accordance with accepted academic practice.
    No use, distribution or reproduction is permitted which does not comply with these


    Dive into the research topics of 'Automatic Identification of Information Quality Metrics in Health News Stories'. Together they form a unique fingerprint.

    Cite this