Abstract: Many cyber security experts, organizations and cyber criminals are active users on online social networks (OSNs). Therefore, detecting cyber security related accounts on OSNs and monitoring their activities can be very useful for different purposes such as cyber threat intelligence, detecting and preventing cyber attacks and online harms on OSNs, and evaluating effectiveness of cyber security awareness activities on OSNs.
In this paper, we report our work on developing a number of machine learning based classifiers for detecting cyber security related accounts on Twitter, including a base-line classifier for detecting cyber security related accounts in general, and three sub-classifiers for detecting three subsets of cyber security related accounts (individuals, hackers, and academia). To train and test the classifiers, we followed a more systemic approach (based on a cyber security taxonomy, real-time sampling of tweets, and crowdsourcing) to construct a dataset of cyber security related accounts with multiple tags assigned to each account. For each classifier, we considered a richer set of features than those used in past studies. Among five machine learning models tested, the Random Forest model achieved the best performance: 93% for the base-line classifier, 88-91% for the three sub-classifiers. We also studied feature reduction of the base-line classifier and showed that using just six features we can already achieve the same performance.
Due to privacy issues, we can not share the original labelled Twitter accounts dataset. Instead, we can provide the extracted annonymised feature-set files upon request. Please contact the authors.