Constraint specifications for domain-specific systems: ontology-driven approach
Shreya Banerjee; Anirban Sarkar
International Journal of Metadata, Semantics and Ontologies, Vol. 13, No. 3 (2019) pp. 227 - 253
Well-formed Domain Specific Modelling Languages (DSML) are devised based on well-defined sets of syntaxes and semantics. These precisions are obtained raising at abstraction levels in domain specifications. Yet, appropriate representations of constraints are also important to limit meanings of generic concepts to different abstraction levels in a DSML. To address this issue, in this paper, a constraint specification language is developed. This proposed language is capable to restrict meanings of general concepts represented in an upper level ontology - Generalised Ontology Modelling (GOM) (Banerjee and Sarkar, 2016b) to domain specific systems in a systematic way. Further, several automated methods impose distinct constraints at different levels of abstractions in domain specific modelling. These methods also validate distinct constructs and constraints in domain specific systems against general concepts of GOM. The applicability of the proposed work is proved using couple of case studies based on applications in data-intensive and service-based domains.
A data exchange solution for emergency response systems based on the EDXL-RESCUER ontology
LaÃs Do Nascimento Salvador; Rebeca Barros; Vaninha Vieira Dos Santos; FÃ©lix Simas De Souza Neto; Renato Lima Novais; Simone Da Silva Amorim; Marian Weber
International Journal of Metadata, Semantics and Ontologies, Vol. 13, No. 3 (2019) pp. 264 - 283
Handling an emergency requires the coordination and cooperation of several people and systems from various agencies and organisations, including the government and society in general. A wide range of heterogeneous data is managed by different stakeholders, thus demanding solutions to support integration issues such as interoperability and ambiguity. In a previous work, we proposed an ontology for Emergency Response Systems, called EDXL-RESCUER, in the scope of the RESCUER project. In this paper, we present the usage of this ontology in a data exchange solution (DILS: Data Integration with Legacy Systems), which aims to provide semantic interoperability between Emergency Response Systems, and the evolution of EDXL-RESCUER. To evaluate the proposed solution, EDXL-RESCUER & DILS, we performed two studies: (i) simulations using two real data sources, the Police Reports from Bahia Public Safety and Security Department, Brazil; and the Canadian Disaster Database; (ii) an emergency simulation at an Industrial Park in Bahia, Brazil. The results demonstrate the potential use of the EDXL-RESCUER as a common vocabulary to support semantic interoperability between emergency response systems. We also verified the feasibility of using the DILS solution in emergency management scenarios. Besides EDXL-RESCUER & DILS, a mapping and integration of concepts related to data exchange are presented.
Mining annotators' common knowledge for automatic text revision
Giovanni Siragusa; Luigi Di Caro; Marco Tosalli
International Journal of Metadata, Semantics and Ontologies, Vol. 13, No. 3 (2019) pp. 254 - 263
Many natural language understanding tasks require clean input textual data in order to train systems with the highest precision. Such data, usually collected from surveys or the web, are manually processed in order to remove morphosyntactic variability, spelling errors and incoherence in naming entities. Since these operations are conducted by domain experts and annotators, they are usually costly and time-consuming. Furthermore, this scenario is very common in industrial tasks where annotators are hired. In this context, we propose an innovative and simple method that extracts correction patterns, i.e., <expression, replacement> pairs, where expression is a matching string and replacement indicates how to re-write the matched string. Such tool can be used both to evaluate annotators (since it provides a deep understanding of their work) and to automatically revise the texts. We extensively tested our method in a multilingual setting, obtaining outstanding results over baseline approaches.
Ontology-based modelling of extended web service secure conversation pattern
Ashish Kumar Dwivedi
International Journal of Metadata, Semantics and Ontologies, Vol. 13, No. 4 (2019) pp. 285 - 299
Securing an application based on Service Oriented Architecture provides defences against a number of security threats arising from exposing applications and data to the internet. Various security guidelines are available to apply security in web applications. But these guidelines are sometimes difficult to understand and generate inconsistencies. In this study, an extended web service secure conversation pattern is presented in the presence of a man-in-the-middle attack. An ontology-based modelling and refinement framework is presented for semantically analysing an extended web service secure conversation pattern. A metamodel is introduced to provide rigorous modelling of security services in terms of concepts, properties, and relationships. At the end of this study, an evaluation of the proposed approach has been made by performing experiments for security requirements against security policies in presence of proposed description logic rules.
A data model-independent approach to big research data integration
Valentina Bartalesi; Carlo Meghini; Costantino Thanos
International Journal of Metadata, Semantics and Ontologies, Vol. 13, No. 4 (2019) pp. 330 - 345
The paper addresses the data integration problem in the context of the scientific domain. The main characteristics of the big research data that make the traditional approach of data integration unfeasible are presented. Two new emerging practices, i.e. an exploratory approach to data seeking and an empiricist epistemological approach to knowledge creation, are discussed. Based on these considerations, we present a new paradigm of data integration and an application ontology that supports it. The ontology is based on five types of events and every event is extensionally modelled as an input/output operation on the involved data entity. The strong point of the ontology and of the whole approach to data integration is that no assumption is made on the data models in which the databases or the views are expressed. This provides a level of generality that successfully deals with the heterogeneity of the domain.
An analysis of the semantic annotation task on the linked data cloud
Michel Gagnon; Amal Zouaq; Francisco Aranha; Faezeh Ensan; Ludovic Jean-Louis
International Journal of Metadata, Semantics and Ontologies, Vol. 13, No. 4 (2019) pp. 317 - 329
Semantic annotation, the process of identifying key phrases in texts and linking them to concepts in a knowledge base, is an important basis for semantic information retrieval and the semantic web uptake. Despite the emergence of semantic annotation systems, very few comparative studies have been published on their performance. In this paper, we provide an evaluation of the performance of existing systems over three tasks: full semantic annotation, named entity recognition, and keyword detection. More specifically, the spotting capability (recognition of relevant surface forms in text) is evaluated for all three tasks, whereas the disambiguation (correctly associating an entity from Wikipedia or DBpedia to the spotted surface forms) is evaluated only for the first two tasks. We use logistic regression to identify significant performance differences. Although some of the annotators are specifically targeted at some task (NE, SA, KW), our results show that they do not necessarily obtain the best performance on those tasks. In fact, systems identified as full semantic annotators beat all other systems on all data sets. We also show that there is still much room for improvement for the identification of the most relevant entities described in a text.
Collections revisited from the perspective of historical testimonies
Annamaria Goy; Diego Magro
International Journal of Metadata, Semantics and Ontologies, Vol. 13, No. 4 (2019) pp. 300 - 316
This paper presents the results of an ontological analysis of collective entities, as an essential step towards the definition of a rich semantic model underlying ontology-driven applications in the historical and cultural heritage domains. The major contributions of our proposal are the following: (a) An explicit distinction of contingent and necessary features, that led us to formalise our ontology by means of modal logics. (b) A description of collective entities from a diachronic perspective (thus including singletons and empty collections). (c) An analysis of the inferences enabled by the characterisation of collective entities and by the inclusion relationships. (d) A representation of the inclusion relationships that does not imply existential dependence. (e) A distinction between emerging and created collective entities. In this paper, we present an in-depth ontological analysis of these aspects and provide a sound formalisation for it.
Semantic architectures and dashboard creation processes within the data and analytics framework
Michele Petito; Francesca Fallucchi; Ernesto William De Luca
International Journal of Metadata, Semantics and Ontologies, Vol. 14, No. 1 (2020) pp. 1 - 15
The open data tools currently on the market do not exploit the semantic web, or provide tools for data analysis and visualisation. Most of them are simple open data portals that display a data catalogue, often not even fulfilling the lowest level of the famous five-star model. The Data and Analytics Framework (DAF), a project run by the Italian government, is enabled to extract knowledge from the immense amount of data owned by the State. It favours the spread of Linked Open Data thanks to the integration of the network of controlled ontologies and vocabularies (OntoPiA). The research outlined in this paper illustrates some of the platform's competitive solutions and introduces the five-step process to create a DAF dashboard, as well as the related data story. The case study created by the authors concerns tourism in Sardinia and represents one of the few demonstrations of a real case being tested in DAF.
EngMeta: metadata for computational engineering
BjÃ¶rn Schembera; Dorothea Iglezakis
International Journal of Metadata, Semantics and Ontologies, Vol. 14, No. 1 (2020) pp. 26 - 38
The huge amounts of data produced in computational engineering make the handling and documentation of the resulting data a challenge. EngMeta is a metadata model based on existing standards and developed to enable a structured documentation of the research process and the simulation environment all together with discipline specific information about the simulated system. A qualitative analysis shows that EngMeta fulfils the criteria of a good metadata model. According to a quantitative survey, EngMeta meets the needs of engineering scientists. The metadata model is defined as an XSD scheme and in practical use in an institutional data repository. Supported by automated metadata extraction and a repository, EngMeta enables specific research data management in computational engineering.
HSLD: a hybrid similarity measure for linked data resources
Gabriela Oliveira Mota Da Silva; Paulo Roberto De Souza; Frederico AraÃºjo DurÃ£o
International Journal of Metadata, Semantics and Ontologies, Vol. 14, No. 1 (2020) pp. 16 - 25
The web of data is a set of deeply linked resources that can be instantly read and understood by both humans and machines. A vast amount of RDF data has been published in freely accessible and interconnected data sets creating the so-called Linked Open Data cloud. Such a huge amount of data available along with the development of semantic web standards has opened up opportunities for the development of semantic applications. However, most of the semantic recommender systems use only the link structure between resources to calculate the similarity between resources. In this paper we propose HSLD, a hybrid similarity measure for Linked Data that exploits information present in RDF literals besides the links between resources. We evaluate the proposed approach in the context of a LOD-based Recommender System using data from DBpedia. Experiment results indicate that HSLD increases the precision of the recommendations in comparison to pure link-based baseline methods.
Document-based RDF storage method for parallel evaluation of basic graph pattern queries
Eleftherios Kalogeros; Manolis Gergatsoulis; Matthew Damigos
International Journal of Metadata, Semantics and Ontologies, Vol. 14, No. 1 (2020) pp. 63 - 80
In this paper, we investigate the problem of efficiently evaluating (Basic Graph Pattern) BGP SPARQL queries over a large amount of RDF data. We propose an effective data model for storing RDF data in a document database using maximum replication factor of 2 (i.e., in the worst case scenario, the data graph will be doubled in storage size). The proposed storage model is utilised for efficiently evaluating SPARQL queries, in a distributed manner. Each query is decomposed into a set of generalised star queries, which are queries that allow both subject-object and object-subject edges from a specific node, called central node. The proposed data model ensures that no joining operations over multiple data sets are required to evaluate generalised star queries. The results of the evaluation of the generalised star sub-queries of a query Q are then combined properly, in order to compute the answers of the query Q posed over the RDF data. The proposed approach has been implemented using MongoDB and Apache Spark.
Layout logical labelling and finding the semantic relationships between citing and cited paper content
Sergey Parinov; Amir Bakarov; Daniil Vodolazcky
International Journal of Metadata, Semantics and Ontologies, Vol. 14, No. 1 (2020) pp. 54 - 62
Currently, large data sets of in-text citations and citation contexts are becoming available for research and developing tools. Using the "topic model" method to analyse these data, one can characterise thematic relationships between citation contexts from citing and the cited paper content. However, to build relevant topic models and to compare them accurately for papers linked by citation relationships we have to know the semantic labels of PDF papers' layout such as section titles, paragraph boundaries, etc. Recent achievements in papers' conversion from a PDF form into a rich attributed JSON format allow us to develop new approaches for the logical labelling of the papers' layout. This paper presents a re-usable method and open source software for the logical labelling of PDF papers, which gave good quality of a layout element's recognition for a set of research papers. Using these semantic labels we made a precise comparison of topic models built for citing and cited papers and we found some level of similarity between them.
SWRL reasoning on ontology-based clinical dengue knowledge base
Runumi Devi; Deepti Mehrotra; Hajer Baazaoui Zghal; Ghada Besbes
International Journal of Metadata, Semantics and Ontologies, Vol. 14, No. 1 (2020) pp. 39 - 53
Dengue is a widespread mosquito-borne viral illness that may lead to death if not treated timely and properly. The aim of this study is to propose a semantic rule-based modelling and reasoning approach directed towards formalising dengue disease definition in conjunction with operational definitions (semantics) that support clinical and diagnostic reasoning. The operational definitions are incorporated using Semantic Web Rule Language (SWRL) as logical rules that enhance the expressive capability of the knowledge base. A dengue knowledge base has been designed which is extended with International Classification of Diseases (ICD) ontology for associating dengue fever with ICD code. The knowledge base created can be reasoned upon for diagnostic classification that can discover dengue symptoms and predict the possibility of patients to suffer from the disease apart from offering interoperability. 153 real patient cases are classified successfully against the operational definitions incorporated by SWRL rules.
Extending the GLOBDEF framework with support for semantic enhancement of various data formats
Maria Nisheva-Pavlova; Asen Alexandrov
International Journal of Metadata, Semantics and Ontologies, Vol. 14, No. 2 (2020) pp. 158 - 168
Semantic enhancement links sections of data files with well-described concepts from some knowledge domain. This allows for further automated reasoning about that data and can be especially useful for extracting value from Big Data. Most of the available enhancement tools focus on specific enhancement needs and data types. In this paper we present our efforts to expand the GLOBDEF framework, introduced in an earlier work, which aims to find a way for processing of large amounts of data and enhancing the data automatically. The framework is designed to leverage a variety of external enhancement tools and has no limitations on the format of the enhanced data. We demonstrate how the framework behaves on a mixed data set of texts and images and explain how an image can be semantically enhanced with a simple automated combination of an object recogniser and a text-based automated enhancer.
The future of interlinked, interoperable and scalable metadata
Getaneh Alemu; Emmanouel Garoufallou
International Journal of Metadata, Semantics and Ontologies, Vol. 14, No. 2 (2020) pp. 81 - 87
With the growing diversity of information resources the emphasis on data-centric applications such as big data, metadata, semantics and ontologies has become central. This editorial paper presents a summary of recent developments in metadata, semantics and ontologies - focusing in particular on metadata enriching, linking and interoperability. National libraries and archives are devising new bibliographic models and metadata presentation formats. Bibliographic metadata sets are being made available using these new data formats such as RDF. The new formats are aiming to represent data in granular structures and define unique identification protocols such as URIs. The paper concludes by introducing the five papers included in the special issue. The papers in this special issue present novel approaches to metadata integration, interoperability frameworks, re-use of metadata ontologies and methods of metadata quality analysis.
Exploring the utility of metadata record graphs and network analysis for metadata quality evaluation and augmentation
Mark Edward Phillips; Oksana L. Zavalina; Hannah Tarver
International Journal of Metadata, Semantics and Ontologies, Vol. 14, No. 2 (2020) pp. 112 - 123
Our study explores the possible uses and effectiveness of network analysis, including Metadata Record Graphs, for evaluating collections of metadata records at scale. We present the results of an experiment applying these methods to records in the University of North Texas (UNT) Digital Library and two sub-collections of different compositions: the UNT Scholarly Works collection, which functions as an institutional repository, and a collection of architectural slide images. The data includes count- and value-based statistics with network metrics for every Dublin Core element in each set. The study finds that network analysis provides useful information that supplements other metrics, for example by identifying records that are completely unconnected to other items through the subject, creator, or other field values. Additionally, network density may help managers identify collections or records that could benefit from enhancement. We also discuss the constraints of these metrics and suggest possible future applications.
Unique challenges facing Linked Data implementation for National Educational Television
International Journal of Metadata, Semantics and Ontologies, Vol. 14, No. 2 (2020) pp. 98 - 111
Implementing Linked Data involves a costly process of converting metadata into an exchange format substantially different from traditional library "records-based" exchange. To achieve full implementation, it is necessary to navigate a complex process of data modelling, crosswalking, and publishing. This paper documents the transition of a data set of National Educational Television (NET) collection records to a "data-based" exchange environment of Linked Data by discussing challenges faced during the conversion. These challenges include silos like the Library's media asset management system Merged Audio-Visual Information System (MAVIS), aligning PBCore with the bibliographic Linked Data model BIBFRAME, modelling differences in works between archival moving image cataloguing and other domains using Entertainment Identifier Registry IDs (EIDR IDs), and possible alignments with EBUCore (the European Broadcasting Union Linked Data model) to address gaps between PBCore and BIBFRAME.
Intermediary XML schemas: constraint, templating and interoperability in complex environments
International Journal of Metadata, Semantics and Ontologies, Vol. 14, No. 2 (2020) pp. 88 - 97
This article introduces the methodology of intermediary schemas for complex metadata environments. Metadata in instances conforming to these is not generally intended for dissemination but must usually be transformed by XSLT transformations to generate instances conforming to the referent schemas to which they mediate. The methodology is designed to enhance the interoperability of complex metadata within XML architectures. This methodology incorporates three subsidiary methods: these are project-specific schemas which represent constrained mediators to over-complex or over-flexible referents (Method 1), templates or conceptual maps from which instances may be generated (Method 2) and serialised maps of instances conforming to their referent schemas (Method 3). The three methods are detailed and their applications to current research in digital ecosystems, archival description and digital asset management and preservation are examined. A possible synthesis of the three is also proposed in order to enable the methodology to operate within a single schema, the Metadata Encoding and Transmission Standard (METS).
CMDIfication process for textbook resources
Francesca Fallucchi; Ernesto William De Luca
International Journal of Metadata, Semantics and Ontologies, Vol. 14, No. 2 (2020) pp. 135 - 148
Interoperability between heterogeneous resources and services is the key to a correctly functioning digital infrastructure, which can provide shared resources at once. We analyse the establishment of a standardised common infrastructure covering metadata, content, and inferred knowledge to allow collaborative work between researchers in the humanities. In this paper, we discuss how to provide a CMDI (Component MetaData Infrastructure) profile for textbooks, in order to integrate it into the Common Language Resources and Technology Infrastructure (CLARIN) and thus to make the data available in an open way and according to the FAIR principles. We focus on the 'CMDIfication' process, which fulfils the needs of our related projects. We describe a process of building resources using CMDI description from Text Encoding Initiative (TEI), Metadata Encoding and Transmission Standard (METS) and Dublin Core (DC) metadata, testing it on the textbook resources of the Georg Eckert Institute (GEI).
Service traceability in SOA-based software systems: a traceability network add-in for BPAOntoSOA framework
Rana Yousef; Sarah Imtera
International Journal of Metadata, Semantics and Ontologies, Vol. 14, No. 2 (2020) pp. 169 - 183
BPAOntoSOA is a generic framework that generates a service model from a given organisational business process architecture. Service Oriented Architecture (SOA) traceability is essentially important to facilitate change management and support reusability of an SOA; it has a wide application in the development and maintenance process. Such a traceability network is not available for BPAOntoSOA framework. This paper introduces an ontology-based traceability network for BPAOntoSOA framework that semantically generates trace links between services and business process architectural elements in both forward and backward directions. The proposed traceability approach was evaluated using the postgraduate faculty information system case study in order to assess the framework behaviour in general. As a continued evaluation effort, a group of parameters have been selected to create an evaluation criterion, which was used to compare the BPAOntoSOA trace solution to one of the most related traceability frameworks, STraS traceability framework.
From the web of bibliographic data to the web of bibliographic meaning: structuring, interlinking and validating ontologies on the semantic web
Helena SimÃµes PatrÃcio; Maria InÃªs Cordeiro; Pedro Nogueira Ramos
International Journal of Metadata, Semantics and Ontologies, Vol. 14, No. 2 (2020) pp. 124 - 134
Bibliographic data sets have revealed good levels of technical interoperability observing the principles and good practices of linked data. However, they have a low level of quality from the semantic point of view, due to many factors: lack of a common conceptual framework for a diversity of standards often used together, reduced number of links between the ontologies underlying data sets, proliferation of heterogeneous vocabularies, underuse of semantic mechanisms in data structures, "ontology hijacking" (Feeney et al., 2018), point-to-point mappings, as well as limitations of semantic web languages for the requirements of bibliographic data interoperability. After reviewing such issues, a research direction is proposed to overcome the misalignments found by means of a reference model and a superontology, using Shapes Constraint Language (SHACL) to solve current limitations of RDF languages.
Citation content/context data as a source for research cooperation analysis
Sergey Parinov; Victoria Antonova
International Journal of Metadata, Semantics and Ontologies, Vol. 14, No. 2 (2020) pp. 149 - 157
Using citation relationships, one can build three groups of papers: (1) the papers of a selected author; (2) those papers cited by the author; (3) papers citing the author. Authors of papers from these three groups can be presented as a fragment of a research cooperation network, because they use/cite research outputs of each other. Their papers' full texts and especially the contexts of their in-text citations contain some information about the character of this research cooperation. We present a concept of research cooperation, based on publications and the current results of the Cirtec project for building the research cooperation characteristics. This work is based on the processing of citation content/context data. The results include an on-line service for authors to monitor the citation content data extractions and three types of built indicators/parameters: co-citation statistics, spatial distribution of citations over papers' body and topic models for citation contexts.
Towards linked open government data in Canada
International Journal of Metadata, Semantics and Ontologies, Vol. 14, No. 3 (2020) pp. 209 - 217
Governments are publishing enormous amounts of open data on the web every day in an effort to increase transparency and reusability. Linking data from multiple sources on the web enables the performance of advanced data analytics, which can lead to the development of valuable services and data products. However, Canada's open government data portals are isolated from one another and remain unlinked to other resources on the web. In this paper, we first expose the statistical data sets in Canadian provincial open data portals as Linked Data, and then integrate them using RDF Cube vocabulary, thereby making different open data portals available through a single search endpoint. We leverage Semantic Web Technologies to publish open data sets taken from two provincial portals (Nova Scotia and Alberta) as RDF (the Linked Data format), and to connect them to one another. The success of our approach illustrates its high potential for linking open government data sets across Canada, which will in turn enable greater data accessibility and improved search results.
An algorithm to generate short sentences in natural language from linked open data based on linguistic templates
Augusto Lopes Da Silva; Sandro JosÃ© Rigo; JÃ©ssica Braun De Moraes
International Journal of Metadata, Semantics and Ontologies, Vol. 14, No. 3 (2020) pp. 197 - 208
The generation of natural language phrases from Linked Open Data can benefit from a significant amount of information available on the internet, as well as from the existence of properties within them, which appears, mostly, in the RDF format. These properties can represent semantic relationships between concepts that might help in creating sentences in natural language. Nevertheless, research in this field tends not to use the information in RDF. We support that this is a factor that might foster the generation of more natural phrases. In this scenario, this research explores these RDF properties for the generation of natural language phrases. The short sentences generated by the algorithm implementation were evaluated regarding their fluency by linguists and native English speakers. The results show that the sentences generated are promising regarding sentence fluency.
Modelling weightlifting 'Training-Diet-Competition' cycle following a modular and scalable approach
Piyaporn Tumnark; Paulo Cardoso; Jorge Cabral; Filipe ConceiÃ§Ã£o
International Journal of Metadata, Semantics and Ontologies, Vol. 14, No. 3 (2020) pp. 185 - 196
Studies in weightlifting have been characterised by unclear results and information paucity, mainly due to the lack of information sharing between athletes, coaches, biomechanists, physiologists and nutritionists. These experts' knowledge is not captured, classified or integrated into an information system for decision-making. An ontology-driven knowledge model for Olympic weightlifting was developed to leverage a better understanding of the weightlifting domain as a whole, bringing together related knowledge domains of training methodology, weightlifting biomechanics, and dietary regimes, while modelling the synergy among them. It unifies terminology, semantics, and concepts among sport scientists, coaches, nutritionists, and athletes to partially obviate the recognised limitations and inconsistencies, leading to the provision of superior coaching and a research environment which promotes better understanding and more conclusive results. The ontology-assisted weightlifting knowledge base consists of 110 classes, 50 object properties, 92 data properties, 167 inheritance relationships concepts, in a total of 1761 axioms, alongside 23 SWRL rules.