Use of Large Language Model for Cyberbullying Detection

B Ogunleye; B Dharmaraj

doi:10.20944/preprints202306.1075.v1

Use of Large Language Model for Cyberbullying Detection

B Ogunleye, B Dharmaraj

School of Arch, Tech and Eng

Research output: Working paper › Preprint

Abstract

The dominance of social media has added to the channels of bullying to perpetrators. Unfortunately, cyberbullying (CB) is the most prevalent phenomenon in today’s cyber world and is a severe threat to the mental and physical health of citizens. This opens the need to develop a robust system to prevent bullying content from online forums, blogs, and social media platforms to manage the impact in our society. Several machine learning (ML) algorithms have been proposed for this purpose. However, their performances are not consistent due to high class imbalance issue and generalisation. In recent years, large language models (LLM) like BERT and RoBERTa have achieved state of the art (SOTA) results in several natural language processing (NLP) tasks. Unfortunately, the LLMs have not been applied extensively. In our paper, we explored the use of these models for cyberbullying (CB) detection. We have prepared a new dataset (D2) from existing studies (Formspring and Twitter). Our experimental results for dataset D1 and D2 showed RoBERTa outperformed other models.

Original language	English
Number of pages	15
DOIs	https://doi.org/10.20944/preprints202306.1075.v1
Publication status	Published - 15 Jun 2023

Keywords

BERT
Cyberbullying
RoBERTa
Language model
Machine learning
Online abuse
Natural language processing
NLP

Access to Document

10.20944/preprints202306.1075.v1Licence: CC BY

Cite this

@techreport{3c61f023c00f41a9aafa4f3ee4768aa0,

title = "Use of Large Language Model for Cyberbullying Detection",

abstract = "The dominance of social media has added to the channels of bullying to perpetrators. Unfortunately, cyberbullying (CB) is the most prevalent phenomenon in today{\textquoteright}s cyber world and is a severe threat to the mental and physical health of citizens. This opens the need to develop a robust system to prevent bullying content from online forums, blogs, and social media platforms to manage the impact in our society. Several machine learning (ML) algorithms have been proposed for this purpose. However, their performances are not consistent due to high class imbalance issue and generalisation. In recent years, large language models (LLM) like BERT and RoBERTa have achieved state of the art (SOTA) results in several natural language processing (NLP) tasks. Unfortunately, the LLMs have not been applied extensively. In our paper, we explored the use of these models for cyberbullying (CB) detection. We have prepared a new dataset (D2) from existing studies (Formspring and Twitter). Our experimental results for dataset D1 and D2 showed RoBERTa outperformed other models.",

keywords = "BERT, Cyberbullying, RoBERTa, Language model, Machine learning, Online abuse, Natural language processing, NLP",

author = "B Ogunleye and B Dharmaraj",

year = "2023",

month = jun,

day = "15",

doi = "10.20944/preprints202306.1075.v1",

language = "English",

type = "WorkingPaper",

}

TY - UNPB

T1 - Use of Large Language Model for Cyberbullying Detection

AU - Ogunleye, B

AU - Dharmaraj, B

PY - 2023/6/15

Y1 - 2023/6/15

N2 - The dominance of social media has added to the channels of bullying to perpetrators. Unfortunately, cyberbullying (CB) is the most prevalent phenomenon in today’s cyber world and is a severe threat to the mental and physical health of citizens. This opens the need to develop a robust system to prevent bullying content from online forums, blogs, and social media platforms to manage the impact in our society. Several machine learning (ML) algorithms have been proposed for this purpose. However, their performances are not consistent due to high class imbalance issue and generalisation. In recent years, large language models (LLM) like BERT and RoBERTa have achieved state of the art (SOTA) results in several natural language processing (NLP) tasks. Unfortunately, the LLMs have not been applied extensively. In our paper, we explored the use of these models for cyberbullying (CB) detection. We have prepared a new dataset (D2) from existing studies (Formspring and Twitter). Our experimental results for dataset D1 and D2 showed RoBERTa outperformed other models.

AB - The dominance of social media has added to the channels of bullying to perpetrators. Unfortunately, cyberbullying (CB) is the most prevalent phenomenon in today’s cyber world and is a severe threat to the mental and physical health of citizens. This opens the need to develop a robust system to prevent bullying content from online forums, blogs, and social media platforms to manage the impact in our society. Several machine learning (ML) algorithms have been proposed for this purpose. However, their performances are not consistent due to high class imbalance issue and generalisation. In recent years, large language models (LLM) like BERT and RoBERTa have achieved state of the art (SOTA) results in several natural language processing (NLP) tasks. Unfortunately, the LLMs have not been applied extensively. In our paper, we explored the use of these models for cyberbullying (CB) detection. We have prepared a new dataset (D2) from existing studies (Formspring and Twitter). Our experimental results for dataset D1 and D2 showed RoBERTa outperformed other models.

KW - BERT

KW - Cyberbullying

KW - RoBERTa

KW - Language model

KW - Machine learning

KW - Online abuse

KW - Natural language processing

KW - NLP

UR - http://europepmc.org/abstract/PPR/PPR676685

U2 - 10.20944/preprints202306.1075.v1

DO - 10.20944/preprints202306.1075.v1

M3 - Preprint

BT - Use of Large Language Model for Cyberbullying Detection

ER -

Use of Large Language Model for Cyberbullying Detection

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this