The Use of a Large Language Model for Cyberbullying Detection

Bayode Ogunleye; Babitha  Dharmaraj

doi:10.3390/analytics2030038

The Use of a Large Language Model for Cyberbullying Detection

Bayode Ogunleye, Babitha Dharmaraj

School of Arch, Tech and Eng

Research output: Contribution to journal › Article › peer-review

Abstract

The dominance of social media has added to the channels of bullying for perpetrators. Unfortunately, cyberbullying (CB) is the most prevalent phenomenon in today’s cyber world, and is a severe threat to the mental and physical health of citizens. This opens the need to develop a robust system to prevent bullying content from online forums, blogs, and social media platforms to manage the impact in our society. Several machine learning (ML) algorithms have been proposed for this purpose. However, their performances are not consistent due to high class imbalance and generalisation issues. In recent years, large language models (LLMs) like BERT and RoBERTa have achieved state-of-the-art (SOTA) results in several natural language processing (NLP) tasks. Unfortunately, the LLMs have not been applied extensively for CB detection. In our paper, we explored the use of these models for cyberbullying (CB) detection. We have prepared a new dataset (D2) from existing studies (Formspring and Twitter). Our experimental results for dataset D1 and D2 showed that RoBERTa outperformed other models.

Original language	English
Pages (from-to)	694-707
Number of pages	14
Journal	Analytics
Volume	2
Issue number	3
DOIs	https://doi.org/10.3390/analytics2030038
Publication status	Published - 6 Sept 2023

Keywords

BERT
cyberbullying
large language model
machine learning
natural language processing online abuse
RoBERTa
social media analytics

Access to Document

10.3390/analytics2030038Licence: CC BY

analytics-02-00038Final published version, 1.69 MBLicence: CC BY

https://www.mdpi.com/2813-2203/2/3/38Licence: CC BY

Cite this

@article{40a7863609124643abc7c93eb82d384a,

title = "The Use of a Large Language Model for Cyberbullying Detection",

abstract = "The dominance of social media has added to the channels of bullying for perpetrators. Unfortunately, cyberbullying (CB) is the most prevalent phenomenon in today{\textquoteright}s cyber world, and is a severe threat to the mental and physical health of citizens. This opens the need to develop a robust system to prevent bullying content from online forums, blogs, and social media platforms to manage the impact in our society. Several machine learning (ML) algorithms have been proposed for this purpose. However, their performances are not consistent due to high class imbalance and generalisation issues. In recent years, large language models (LLMs) like BERT and RoBERTa have achieved state-of-the-art (SOTA) results in several natural language processing (NLP) tasks. Unfortunately, the LLMs have not been applied extensively for CB detection. In our paper, we explored the use of these models for cyberbullying (CB) detection. We have prepared a new dataset (D2) from existing studies (Formspring and Twitter). Our experimental results for dataset D1 and D2 showed that RoBERTa outperformed other models.",

keywords = "BERT, cyberbullying, large language model, machine learning, natural language processing online abuse, RoBERTa, social media analytics",

author = "Bayode Ogunleye and Babitha Dharmaraj",

year = "2023",

month = sep,

day = "6",

doi = "10.3390/analytics2030038",

language = "English",

volume = "2",

pages = "694--707",

journal = "Analytics ",

issn = "2813-2203",

publisher = "MDPI",

number = "3",

}

TY - JOUR

T1 - The Use of a Large Language Model for Cyberbullying Detection

AU - Ogunleye, Bayode

AU - Dharmaraj, Babitha

PY - 2023/9/6

Y1 - 2023/9/6

N2 - The dominance of social media has added to the channels of bullying for perpetrators. Unfortunately, cyberbullying (CB) is the most prevalent phenomenon in today’s cyber world, and is a severe threat to the mental and physical health of citizens. This opens the need to develop a robust system to prevent bullying content from online forums, blogs, and social media platforms to manage the impact in our society. Several machine learning (ML) algorithms have been proposed for this purpose. However, their performances are not consistent due to high class imbalance and generalisation issues. In recent years, large language models (LLMs) like BERT and RoBERTa have achieved state-of-the-art (SOTA) results in several natural language processing (NLP) tasks. Unfortunately, the LLMs have not been applied extensively for CB detection. In our paper, we explored the use of these models for cyberbullying (CB) detection. We have prepared a new dataset (D2) from existing studies (Formspring and Twitter). Our experimental results for dataset D1 and D2 showed that RoBERTa outperformed other models.

AB - The dominance of social media has added to the channels of bullying for perpetrators. Unfortunately, cyberbullying (CB) is the most prevalent phenomenon in today’s cyber world, and is a severe threat to the mental and physical health of citizens. This opens the need to develop a robust system to prevent bullying content from online forums, blogs, and social media platforms to manage the impact in our society. Several machine learning (ML) algorithms have been proposed for this purpose. However, their performances are not consistent due to high class imbalance and generalisation issues. In recent years, large language models (LLMs) like BERT and RoBERTa have achieved state-of-the-art (SOTA) results in several natural language processing (NLP) tasks. Unfortunately, the LLMs have not been applied extensively for CB detection. In our paper, we explored the use of these models for cyberbullying (CB) detection. We have prepared a new dataset (D2) from existing studies (Formspring and Twitter). Our experimental results for dataset D1 and D2 showed that RoBERTa outperformed other models.

KW - BERT

KW - cyberbullying

KW - large language model

KW - machine learning

KW - natural language processing online abuse

KW - RoBERTa

KW - social media analytics

U2 - 10.3390/analytics2030038

DO - 10.3390/analytics2030038

M3 - Article

SN - 2813-2203

VL - 2

SP - 694

EP - 707

JO - Analytics

JF - Analytics

IS - 3

ER -

The Use of a Large Language Model for Cyberbullying Detection

Abstract

Keywords

Access to Document

Fingerprint

Cite this