Implementation of machine learning and data mining to improve cybersecurity and limit vulnerabilities to cyber attacks

Mohamed Alloghani; Dhiya Al-Jumeily; Abir Hussain; Jamila Mustafina; Thar Baker; Ahmed J. Aljaaf

doi:10.1007/978-3-030-28553-1_3

Implementation of machine learning and data mining to improve cybersecurity and limit vulnerabilities to cyber attacks

Mohamed Alloghani, Dhiya Al-Jumeily, Abir Hussain, Jamila Mustafina, Thar Baker, Ahmed J. Aljaaf

School of Arch, Tech and Eng

Research output: Chapter in Book/Conference proceeding with ISSN or ISBN › Chapter › peer-review

Abstract

Of the many challenges that continue to make detection of cyber-attack detection elusive, lack of training data remains the biggest one. Even though organizations and business turn to known network monitoring tools such as Wireshark, millions of people are still vulnerable because of lack of information pertaining to website behaviors and features that can amount to an attack. In fact, most of the attacks do not occur because of threat actors’ resort to complex coding and evasion techniques but because victims lack the basic tools to detect and avoid the attacks. Despite these challenges, machine learning is proving to revolutionize the understanding of the nature of cyber-attacks, and this study implemented machine learning techniques to Phishing Website data with the objective of comparing five algorithms and providing insight that the general public can use to avoid phishing pitfalls. The findings of the study suggest that Neural Network is the best performing algorithm and the model suggest that inclusion of an IP address in the domain name, longer URL, use of URL shortening services, inclusion of “@” symbol in the URL, inclusion of “−” symbol in the URL, use of non-trusted SSL certificates with expiry duration less than 6 months, domains registered for less than one year, and favicon redirecting from other URLs as the leading features of phishing websites. Neural Network is based on multi-layer perceptron and is the basis of intelligence so that in future, phishing detection will be automated and rendered an artificial intelligence task.

Original language	English
Title of host publication	Studies in Computational Intelligence
Publisher	Springer-Verlag
Pages	47-76
Number of pages	30
ISBN (Electronic)	9783030285531
ISBN (Print)	9783030285524
DOIs	https://doi.org/10.1007/978-3-030-28553-1_3
Publication status	Published - 4 Sept 2019

Publication series

Name	Studies in Computational Intelligence
Volume	855
ISSN (Print)	1860-949X
ISSN (Electronic)	1860-9503

Bibliographical note

Funding Information:
The challenges of accessing reliable cyber security dataset are well documented and a common one among researchers. As such, we are grateful to Rami Mustafa and Lee McCluskey of the University of Huddersfield and Fadi Thabtah of the Canadian University of Dubai for their preparing and sharing the data.

Publisher Copyright:
© Springer Nature Switzerland AG 2020.

Keywords

Cybersecurity
Data mining
Machine learning
Phishing websites

Access to Document

10.1007/978-3-030-28553-1_3

Cite this

Alloghani, M., Al-Jumeily, D., Hussain, A., Mustafina, J., Baker, T., & Aljaaf, A. J. (2019). Implementation of machine learning and data mining to improve cybersecurity and limit vulnerabilities to cyber attacks. In Studies in Computational Intelligence (pp. 47-76). (Studies in Computational Intelligence; Vol. 855). Springer-Verlag. https://doi.org/10.1007/978-3-030-28553-1_3

@inbook{1fd21d3e08f1448f85a94e61efd46753,

title = "Implementation of machine learning and data mining to improve cybersecurity and limit vulnerabilities to cyber attacks",

abstract = "Of the many challenges that continue to make detection of cyber-attack detection elusive, lack of training data remains the biggest one. Even though organizations and business turn to known network monitoring tools such as Wireshark, millions of people are still vulnerable because of lack of information pertaining to website behaviors and features that can amount to an attack. In fact, most of the attacks do not occur because of threat actors{\textquoteright} resort to complex coding and evasion techniques but because victims lack the basic tools to detect and avoid the attacks. Despite these challenges, machine learning is proving to revolutionize the understanding of the nature of cyber-attacks, and this study implemented machine learning techniques to Phishing Website data with the objective of comparing five algorithms and providing insight that the general public can use to avoid phishing pitfalls. The findings of the study suggest that Neural Network is the best performing algorithm and the model suggest that inclusion of an IP address in the domain name, longer URL, use of URL shortening services, inclusion of “@” symbol in the URL, inclusion of “−” symbol in the URL, use of non-trusted SSL certificates with expiry duration less than 6 months, domains registered for less than one year, and favicon redirecting from other URLs as the leading features of phishing websites. Neural Network is based on multi-layer perceptron and is the basis of intelligence so that in future, phishing detection will be automated and rendered an artificial intelligence task.",

keywords = "Cybersecurity, Data mining, Machine learning, Phishing websites",

author = "Mohamed Alloghani and Dhiya Al-Jumeily and Abir Hussain and Jamila Mustafina and Thar Baker and Aljaaf, {Ahmed J.}",

note = "Funding Information: The challenges of accessing reliable cyber security dataset are well documented and a common one among researchers. As such, we are grateful to Rami Mustafa and Lee McCluskey of the University of Huddersfield and Fadi Thabtah of the Canadian University of Dubai for their preparing and sharing the data. Publisher Copyright: {\textcopyright} Springer Nature Switzerland AG 2020.",

year = "2019",

month = sep,

day = "4",

doi = "10.1007/978-3-030-28553-1_3",

language = "English",

isbn = "9783030285524",

series = "Studies in Computational Intelligence",

publisher = "Springer-Verlag",

pages = "47--76",

booktitle = "Studies in Computational Intelligence",

}

Implementation of machine learning and data mining to improve cybersecurity and limit vulnerabilities to cyber attacks. / Alloghani, Mohamed; Al-Jumeily, Dhiya; Hussain, Abir et al.
Studies in Computational Intelligence. Springer-Verlag, 2019. p. 47-76 (Studies in Computational Intelligence; Vol. 855).

Research output: Chapter in Book/Conference proceeding with ISSN or ISBN › Chapter › peer-review

TY - CHAP

T1 - Implementation of machine learning and data mining to improve cybersecurity and limit vulnerabilities to cyber attacks

AU - Alloghani, Mohamed

AU - Al-Jumeily, Dhiya

AU - Hussain, Abir

AU - Mustafina, Jamila

AU - Baker, Thar

AU - Aljaaf, Ahmed J.

N1 - Funding Information: The challenges of accessing reliable cyber security dataset are well documented and a common one among researchers. As such, we are grateful to Rami Mustafa and Lee McCluskey of the University of Huddersfield and Fadi Thabtah of the Canadian University of Dubai for their preparing and sharing the data. Publisher Copyright: © Springer Nature Switzerland AG 2020.

PY - 2019/9/4

Y1 - 2019/9/4

N2 - Of the many challenges that continue to make detection of cyber-attack detection elusive, lack of training data remains the biggest one. Even though organizations and business turn to known network monitoring tools such as Wireshark, millions of people are still vulnerable because of lack of information pertaining to website behaviors and features that can amount to an attack. In fact, most of the attacks do not occur because of threat actors’ resort to complex coding and evasion techniques but because victims lack the basic tools to detect and avoid the attacks. Despite these challenges, machine learning is proving to revolutionize the understanding of the nature of cyber-attacks, and this study implemented machine learning techniques to Phishing Website data with the objective of comparing five algorithms and providing insight that the general public can use to avoid phishing pitfalls. The findings of the study suggest that Neural Network is the best performing algorithm and the model suggest that inclusion of an IP address in the domain name, longer URL, use of URL shortening services, inclusion of “@” symbol in the URL, inclusion of “−” symbol in the URL, use of non-trusted SSL certificates with expiry duration less than 6 months, domains registered for less than one year, and favicon redirecting from other URLs as the leading features of phishing websites. Neural Network is based on multi-layer perceptron and is the basis of intelligence so that in future, phishing detection will be automated and rendered an artificial intelligence task.

AB - Of the many challenges that continue to make detection of cyber-attack detection elusive, lack of training data remains the biggest one. Even though organizations and business turn to known network monitoring tools such as Wireshark, millions of people are still vulnerable because of lack of information pertaining to website behaviors and features that can amount to an attack. In fact, most of the attacks do not occur because of threat actors’ resort to complex coding and evasion techniques but because victims lack the basic tools to detect and avoid the attacks. Despite these challenges, machine learning is proving to revolutionize the understanding of the nature of cyber-attacks, and this study implemented machine learning techniques to Phishing Website data with the objective of comparing five algorithms and providing insight that the general public can use to avoid phishing pitfalls. The findings of the study suggest that Neural Network is the best performing algorithm and the model suggest that inclusion of an IP address in the domain name, longer URL, use of URL shortening services, inclusion of “@” symbol in the URL, inclusion of “−” symbol in the URL, use of non-trusted SSL certificates with expiry duration less than 6 months, domains registered for less than one year, and favicon redirecting from other URLs as the leading features of phishing websites. Neural Network is based on multi-layer perceptron and is the basis of intelligence so that in future, phishing detection will be automated and rendered an artificial intelligence task.

KW - Cybersecurity

KW - Data mining

KW - Machine learning

KW - Phishing websites

UR - http://www.scopus.com/inward/record.url?scp=85072073611&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-28553-1_3

DO - 10.1007/978-3-030-28553-1_3

M3 - Chapter

AN - SCOPUS:85072073611

SN - 9783030285524

T3 - Studies in Computational Intelligence

SP - 47

EP - 76

BT - Studies in Computational Intelligence

PB - Springer-Verlag

ER -

Implementation of machine learning and data mining to improve cybersecurity and limit vulnerabilities to cyber attacks

Abstract

Publication series

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this