Implementation of machine learning and data mining to improve cybersecurity and limit vulnerabilities to cyber attacks

Mohamed Alloghani, Dhiya Al-Jumeily, Abir Hussain, Jamila Mustafina, Thar Baker, Ahmed J. Aljaaf

Research output: Chapter in Book/Conference proceeding with ISSN or ISBNChapterpeer-review

Abstract

Of the many challenges that continue to make detection of cyber-attack detection elusive, lack of training data remains the biggest one. Even though organizations and business turn to known network monitoring tools such as Wireshark, millions of people are still vulnerable because of lack of information pertaining to website behaviors and features that can amount to an attack. In fact, most of the attacks do not occur because of threat actors’ resort to complex coding and evasion techniques but because victims lack the basic tools to detect and avoid the attacks. Despite these challenges, machine learning is proving to revolutionize the understanding of the nature of cyber-attacks, and this study implemented machine learning techniques to Phishing Website data with the objective of comparing five algorithms and providing insight that the general public can use to avoid phishing pitfalls. The findings of the study suggest that Neural Network is the best performing algorithm and the model suggest that inclusion of an IP address in the domain name, longer URL, use of URL shortening services, inclusion of “@” symbol in the URL, inclusion of “−” symbol in the URL, use of non-trusted SSL certificates with expiry duration less than 6 months, domains registered for less than one year, and favicon redirecting from other URLs as the leading features of phishing websites. Neural Network is based on multi-layer perceptron and is the basis of intelligence so that in future, phishing detection will be automated and rendered an artificial intelligence task.

Original languageEnglish
Title of host publicationStudies in Computational Intelligence
PublisherSpringer-Verlag
Pages47-76
Number of pages30
ISBN (Electronic)9783030285531
ISBN (Print)9783030285524
DOIs
Publication statusPublished - 4 Sep 2019

Publication series

NameStudies in Computational Intelligence
Volume855
ISSN (Print)1860-949X
ISSN (Electronic)1860-9503

Bibliographical note

Funding Information:
The challenges of accessing reliable cyber security dataset are well documented and a common one among researchers. As such, we are grateful to Rami Mustafa and Lee McCluskey of the University of Huddersfield and Fadi Thabtah of the Canadian University of Dubai for their preparing and sharing the data.

Publisher Copyright:
© Springer Nature Switzerland AG 2020.

Keywords

  • Cybersecurity
  • Data mining
  • Machine learning
  • Phishing websites

Cite this