Abstract: Taxonomies and ontologies are handy tools in many application domains such as knowledge systematization and automatic reasoning. In the cyber security field, many researchers have proposed such taxonomies and ontologies, most of which were built based on manual work. Some researchers proposed the use of computing tools to automate the building process, but mainly on very narrow sub-areas of cyber security. Thus, there is a lack of general cyber security taxonomies and ontologies, possibly due to the difficulties of manually curating keywords and concepts for such a diverse, inter-disciplinary and dynamically evolving field.
This paper presents a new human-machine teaming based process to build taxonomies, which allows human experts to work with automated natural language processing (NLP) and information retrieval (IR) tools to co-develop a taxonomy from a set of relevant textual documents. The proposed process could be generalized to support non-textual documents and to build (more complicated) ontologies as well. Using the cyber security as an example, we demonstrate how the proposed taxonomy building process has allowed us to build a general cyber security taxonomy covering a wide range of data-driven keywords (topics) with a reasonable amount of human effort.
We have done a lot of work on since we published the paper in August 2019, and the current taxonomy contains more than 1900 nodes.
For an interactive visualization of the taxonomy, please click here.
We used the following libraries to create our visualizations: JavaScript InfoVis Toolkit and D3
The Taxonomy files are available in different formats here
Kindly note that we don't claim that this taxonomy is complete or comprehensive. We will keep improving it continuously. Please revisit this page from time to another, as we will continue to update the taxonomy periodically.
We welcome any comments or suggestions from the community and interested people. You can contact us using our emails (see our personal pages above).