Semantic control for the Cybersecurity domain: investigation on the representativeness of a domain-specific terminology referring to lexical variation

dc.contributor.authorLanza, Claudia
dc.contributor.authorGuarasci, Roberto
dc.contributor.authorCrupi, Felice
dc.date.accessioned2025-11-21T08:39:16Z
dc.date.issued2021-05-12
dc.descriptionUniversità della Calabria. Dipartimento di Ingegneria Infprmatica, Modellistica, Elettronica, e Sistemistica. Dottorato di ricerca in Information and Communication Technologies. Ciclo XXXIII
dc.description.abstractThe underlying idea of this PhD research project is to develop a model meant to guarantee the terminological coverage of a semantic resource, such as a thesaurus, and its representativeness threshold with reference to semantic variation over time within a highly specialized domain, such as the Cybersecurity. By building an Italian thesaurus related to the Cybersecurity domain, this project wants to offer organizations a knowledge representation of the field of study in Information and Communications Technology (ICT) security as complete as possible. The development of an Italian thesaurus for the Cybersecurity knowledge domain is part of the activities included in the main project “Cybersecurity Observatory” held by the Institution of Informatics and Telematics (IIT) at the National Research Council (CNR) sited in Pisa (Italy). The thesis describes the steps followed for the construction of the Italian Cybersecurity thesaurus and for the assessment of a multi-domain methodology to fix a semantic representativeness threshold with reference to qualitative terms richness within a specialized domain and the variation in information related to the latter over time. The main phases henceforth described are related to (1) a presentation of the principal reasons for building a semantic tool, such as a thesaurus, as a means of semantic control for a specific domain; (2) a description of the steps which characterize the corpus creation and the terminological extraction through the use of specific Natural Language Processing (NLP) tasks and linguistic pattern configuration within the employed software; (3) the way a bilingual thesaurus and a bilingual ontology have been realized by creating parallel and comparable corpora; (4) a presentation of a model of mapping existing standards on Cybersecurity in English to all the head terms contained in the source corpus in Italian through Python scripts in order to evaluate which candidate terms should be chosen for inclusion in the thesaurus; (5) a descriptive section on the work done in migrating the terms and their relationships from the Italian thesaurus on Cybersecurity to an ontology system; (6) the phase related to keyphrases extraction, with the help of document oriented algorithms, i.e., Multipartite Rank or TopicRank, from the source documents. This was carried out to obtain a targeted clustering of the domain and as an aide in the process of semantic abstraction, needed to better systematize the structure of thesaurus’ main entry categories; (7) the exploration of new methodologies, i.e., distributional semantics, term variation, pattern-based detection schemes or inference from the Web Ontology Language (OWL) properties, to deduce the technical information included in the source corpus with the goal of automatically generating the semantic network of connections between the representative terms of the Cybersecurity domain in a thesaurus system; (8) a future perspective, accompanied by evolving examples in practice, of creating an additional database to populate the Cybersecurity source corpus through the use of the social media world. Twitter is one of the preferred web portals from which to retrieve information about the domain: this new information flow should give to the semantic resources, set up for Cybersecurity knowledge organization, an increased level of terminological density to be analyzed in order to improve the semantic coverage.
dc.identifier.urihttp://hdl.handle.net/10955/5676
dc.language.isoen
dc.publisherUniversità della Calabria
dc.subjectCybersecurity
dc.subjectThesauri
dc.subjectOntologie
dc.subjectSemantica distribuzionale
dc.subjectRappresentatività
dc.titleSemantic control for the Cybersecurity domain: investigation on the representativeness of a domain-specific terminology referring to lexical variation
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Tesi dottorato Lanza Claudia (2).pdf
Size:
2.73 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
16 B
Format:
Item-specific license agreed upon to submission
Description: