Detecting Cyber Security Related Twitter Accounts and Different Sub-Groups: A Multi-Classifier Approach

Citation information:
Title: “Detecting Cyber Security Related Twitter Accounts and Different Sub-Groups: A Multi-Classifier Approach”.
Authors: Mohamad Imad Mahaini and Shujun Li.
Conference: The 2nd International Symposium on Foundations of Open Source Intelligence and Security Informatics (FOSINT-SI 2021), co-located with the 13th IEEE/ACM International Conference on Advances in Social Network Analysis and Mining (ASONAM 2021).
Publisher: ACM ISBN 978-1-4503-9128-3 (doi:10.1145/3487351.3492716).
Paper's official web page at ACM and Kent KAR.
Download bibTex entry.

Abstract: Many cyber security experts, organizations and cyber criminals are active users on online social networks (OSNs). Therefore, detecting cyber security related accounts on OSNs and monitoring their activities can be very useful for different purposes such as cyber threat intelligence, detecting and preventing cyber attacks and online harms on OSNs, and evaluating effectiveness of cyber security awareness activities on OSNs.

In this paper, we report our work on developing a number of machine learning based classifiers for detecting cyber security related accounts on Twitter, including a base-line classifier for detecting cyber security related accounts in general, and three sub-classifiers for detecting three subsets of cyber security related accounts (individuals, hackers, and academia). To train and test the classifiers, we followed a more systemic approach (based on a cyber security taxonomy, real-time sampling of tweets, and crowdsourcing) to construct a dataset of cyber security related accounts with multiple tags assigned to each account. For each classifier, we considered a richer set of features than those used in past studies. Among five machine learning models tested, the Random Forest model achieved the best performance: 93% for the base-line classifier, 88-91% for the three sub-classifiers. We also studied feature reduction of the base-line classifier and showed that using just six features we can already achieve the same performance.

The proposed methodology for developing classifiers for detecting cyber security related OSN accounts and different sub-groups

Methodology

Annonymised Feature-set files

Due to privacy issues, we can not share the original labelled Twitter accounts dataset. Instead, we can provide the extracted annonymised feature-set files upon request. Please contact the authors.