Noticias em eLiteracias

🔒
❌ Sobre o FreshRSS
Há novos artigos disponíveis, clique para atualizar a página.
Antes de ontemInternational Journal of Metadata, Semantics and Ontologies

CIDaTa: an ontology-based framework for international data transfers and GDPR compliance

Por Mohammad Mahmudul Hasan

CIDaTa: an ontology-based framework for international data transfers and GDPR compliance
Mohammad Mahmudul Hasan; Marcelo Corrales Compagnucci; George Kousiouris; Dimosthenis Anagnostopoulos
International Journal of Metadata, Semantics and Ontologies, Vol. 16, No. 3 (2023) pp. 195 - 209
Cross-border data transfers and their legal aspects have created a daunting landscape for application and service providers, in which rules and regulations need to be constantly monitored and addressed, especially in dynamic scenarios such as cloud brokerage or cloud/edge operations. The aim of this work is to semantically model several concepts surrounding international data transfers based on the current changes and formulate them around a newly defined ontology (CIDaTa). The work exploits 23 existing ontologies, as dictated by the Linked Data paradigm, and introduces 54 links between them. This aids the IT professional in answering questions regarding the legality of a transfer or the necessary steps needed to achieve it. Example questions set to the framework are demonstrated that can enhance the understanding of the implications of a data transfer, enabling future additions that can lead to more automated management of these transfers.

Domain-specific schema discovery from general-purpose knowledge base

Por Everaldo Costa Neto

Domain-specific schema discovery from general-purpose knowledge base
Everaldo Costa Neto; Johny Moreira; Luciano Barbosa; Ana Carolina Salgado
International Journal of Metadata, Semantics and Ontologies, Vol. 16, No. 3 (2023) pp. 210 - 226
General-purpose Knowledge Bases (KBs) have been used for various applications. An essential step for leveraging the content of KBs on domain-specific tasks is to discover their schema. In this paper, we propose ANCHOR, an end-to-end pipeline for schema discovery from general-purpose KB in an automated way. ANCHOR identifies a domain of interest based on category mapping from KB. Next, it learns representations of entities in this domain based on the entity-category mappings and uses these representations to identify the entities' topics within this domain. Finally, ANCHOR generates a profile for each topic using a strategy based on attributes co-occurrence. We have evaluated ANCHOR on four domains. The results show that: (1) the learned entity representation effectively produces better entity clusters than some traditional and embedding-based baselines; (2) our solution produces a high-quality profile for the discovered topics.

Assessing the effectiveness of image recognition tools in metadata identification through semantic and label-based analysis

Por Everaldo Costa Neto

Assessing the effectiveness of image recognition tools in metadata identification through semantic and label-based analysis
Akara Thammastitkul
International Journal of Metadata, Semantics and Ontologies, Vol. 16, No. 3 (2023) pp. 227 - 237
This study evaluates the performance of four image recognition tools (Amazon Rekognition, Clarifai, Imagga and Google Cloud Vision API) for automatic image metadata. The experiment was conducted on various image categories, including human, animal, plant and flower, view and landscape, vegetable and fruit, food, vehicle, tourist landmark, art and culture and old book cover and posters. Semantic and label-based analysis was used to evaluate the performance of each tool. Results indicate that each tool performed differently across categories, demonstrating the importance of selecting the appropriate tool for specific tasks. Clarifai was found to perform best for human, animal and food image tagging, while Amazon Rekognition was best for vegetable-fruit and vehicle images. Imagga performed best for plant and flower, art and culture and old book cover and posters image recognition, while Google Cloud Vision API performed best for view and landscape and tourist landmark recognition.

Towards the generalisation of the generation of answerable questions from ontologies for education

Por Toky Hajatiana Raboanary

Towards the generalisation of the generation of answerable questions from ontologies for education
Toky Hajatiana Raboanary; Steve Wang; C. Maria Keet
International Journal of Metadata, Semantics and Ontologies, Vol. 16, No. 1 (2022) pp. 86 - 103
Generating questions automatically from ontologies, and marking thereof, may support teaching and learning activities and therewith alleviate a teacher's workload. Numerous studies considered this for MCQs; however, learners also have to be confronted with, for instance, yes/no and short answer questions. We investigated ten types of educationally valuable questions. For each question type, we determined the axiom prerequisites to be able to generate and answer it and declared a set of template specifications as question sentence plans. Three algorithmic approaches were devised for generating the text from the ontology: semantics-based with 1) template variables using foundational ontology categories, or 2) using main classes from the domain ontology and 3) generation mostly driven by NLP techniques. User evaluation demonstrated that option three far outperformed the ontology-based ones on syntactic and semantic correctness of the generated questions, and it generated 98.45% of the questions from all valid axiom prerequisites in our experiment.

Semantic association rules for data interestingness using domain ontology

Por C.B. Abhilash

Semantic association rules for data interestingness using domain ontology
C.B. Abhilash; Kavi Mahesh
International Journal of Metadata, Semantics and Ontologies, Vol. 16, No. 1 (2022) pp. 47 - 67
The COVID-19 pandemic is a major public health crisis threatening people's health, well-being, freedom to travel and the global economy. Understanding COVID-19 symptoms for determining the severity of cases is critical. This study aimed to discover interesting facts from the COVID-19 data set considering symptoms, medicines and comorbidity. For data mining research, the semantic web raises new possibilities. Resource Description Framework (RDF) triple format is commonly used to express semantic web data. Association Rule Mining (ARM) is one of the most effective methods of detecting frequent patterns. However, finding potential rules is a difficult task. We propose an improved method that uses ontology with ARM for finding semantic-rich rules from COVID-19 data sets. The outcomes are semantic association rules that are potentially beneficial for decision-makers. We compare our results with one of the most recent approaches in this field to demonstrate the importance of ontology-based methods.

O'FAIRe makes you an offer: metadata-based automatic FAIRness assessment for ontologies and semantic resources

Por Emna Amdouni

O'FAIRe makes you an offer: metadata-based automatic FAIRness assessment for ontologies and semantic resources
Emna Amdouni; Syphax Bouazzouni; Clement Jonquet
International Journal of Metadata, Semantics and Ontologies, Vol. 16, No. 1 (2022) pp. 16 - 46
We have not yet seen a clear methodology implemented and tooled to automatically assess the level of FAIRness of semantic resources. We propose a metadata-based automatic FAIRness assessment methodology for ontologies and semantic resources called Ontology FAIRness Evaluator (O'FAIRe). It is based on the projection of the 15 foundational FAIR principles for ontologies, and it is aligned and nourished with relevant state-of-the-art initiatives for FAIRness assessment. We propose 61 questions of which 80% are based on the resource metadata descriptions and we review the standard metadata properties (taken from the MOD 1.4 ontology metadata model) that could be used to implement these metadata. We also demonstrate the importance of relying on ontology libraries or repositories to harmonise and harness unified metadata and thus allow FAIRness assessment. Moreover, we have implemented O'FAIRe in the AgroPortal semantic resource repository and produced a preliminary FAIRness analysis over 149 semantic resources in the agri-food/environment domain.

Automated subject indexing using word embeddings and controlled vocabularies: a comparative study

Por Michalis Sfakakis

Automated subject indexing using word embeddings and controlled vocabularies: a comparative study
Michalis Sfakakis; Leonidas Papachristopoulos; Kyriaki Zoutsou; Christos Papatheodorou; Giannis Tsakonas
International Journal of Metadata, Semantics and Ontologies, Vol. 15, No. 4 (2021) pp. 233 - 243
Text mining methods contribute significantly to the understanding and the management of digital content, increasing the potential of entry links. This paper introduces a method for subject analysis combining topic modelling and automated labelling of the generated topics exploiting terms from existing knowledge organisation systems. A testbed was developed in which the Latent Dirichlet Allocation (LDA) algorithm was deployed for modelling the topics of a corpus of papers related to the Digital Library Evaluation domain. The generated topics were represented in the form of bags-of-words word embeddings and were utilised for retrieving terms from the EuroVoc Thesaurus and the Computer Science Ontology (CSO). The results of this study show that the domain of DL can be described with different vocabularies, but during the process of automatic labelling the context needs to be taken into account.

Integrated classification schemas to interlink cultural heritage collections over the web using LOD technologies

Por Paraskevas Koukaras

Integrated classification schemas to interlink cultural heritage collections over the web using LOD technologies
Carlos Henrique Marcondes
International Journal of Metadata, Semantics and Ontologies, Vol. 15, No. 3 (2021) pp. 170 - 177
Libraries, archives and museum collections are now being published over the web using LOD technologies. Many of them have thematic intersections or are related to other web subjects and resources such as authorities, sites for historic events, online exhibitions, or to articles in Wikipedia and its sibling resources DBpedia and Wikidata. The full potential of such published initiatives using LOD rests heavily on the meaningful interlinking of such collections. Within these contextual vocabularies and classifications, schemas are important, as they provide meaning and context to heritage data. This paper proposes comprehensive classification schemas - a Culturally Relevant Relationships (CRR) vocabulary and a classification schema of types of heritage objects - to order, integrate and provide structure to cultural heritage data brought about with the publication of heritage collections as LOD.

Systematic design and implementation of a semantic assistance system for aero-engine design and manufacturing

Por Sonika Gogineni

Systematic design and implementation of a semantic assistance system for aero-engine design and manufacturing
Sonika Gogineni; Jörg Brünnhäußer; Kai Lindow; Erik Paul Konietzko; Rainer Stark; Jonas Nickel; Heiko Witte
International Journal of Metadata, Semantics and Ontologies, Vol. 15, No. 2 (2021) pp. 87 - 103
Data in organisations is often spread across various Information and Communication Technology (ICT) systems, leading to redundancies, lack of overview and time wasted searching for information while carrying out daily activities. This paper focuses on addressing these problems for an aerospace company by using semantic technologies to design and develop an assistance system using existing infrastructure. In the aero-engine industry, complex data systems for design, configuration, manufacturing and service data are common. Additionally, unstructured data and information from numerous sources become available during the product's life cycle. In this paper, a systematic approach is followed to design a system, which integrates data silos by using a common ontology. This paper highlights the problems being addressed, the approach selected to develop the system, along with the implementation of two use cases to support user activities in an aerospace company.

Keyphrase extraction from single textual documents based on semantically defined background knowledge and co-occurrence graphs

Por Mauro Dalle Lucca Tosi

Keyphrase extraction from single textual documents based on semantically defined background knowledge and co-occurrence graphs
Mauro Dalle Lucca Tosi; Julio Cesar Dos Reis
International Journal of Metadata, Semantics and Ontologies, Vol. 15, No. 2 (2021) pp. 121 - 132
The keyphrase extraction task is a fundamental and challenging task designed to extract a set of keyphrases from textual documents. Keyphrases are essential to assist publishers in indexing documents and readers in identifying the most relevant ones. They are short phrases composed of one or more terms used to represent a textual document and its main topics. In this article, we extend our research on C-Rank, which is an unsupervised approach that automatically extracts keyphrases from single documents. C-Rank uses concept-linking to link concepts in common between single documents and an external background knowledge base. We advance our study over C-Rank by evaluating it using different concept-linking approaches - Babelfy and DBPedia Spotlight. We evaluated C-Rank on data sets composed of academic articles, academic abstracts, and news articles. Our findings indicate that C-Rank achieves state-of-the-art results extracting keyphrases from scientific documents by experimentally comparing it to existing unsupervised approaches.

Applying cross-data set identity reasoning for producing URI embeddings over hundreds of RDF data sets

Por Michalis Mountantonakis

Applying cross-data set identity reasoning for producing URI embeddings over hundreds of RDF data sets
Michalis Mountantonakis; Yannis Tzitzikas
International Journal of Metadata, Semantics and Ontologies, Vol. 15, No. 1 (2021) pp. 1 - 22
There is a proliferation of approaches that exploit RDF data sets for creating URI embeddings, i.e., embeddings that are produced by taking as input URI sequences (instead of simple words or phrases), since they can be of primary importance for several tasks (e.g., machine learning tasks). However, existing techniques exploit either a single or a few data sets for creating URI embeddings. For this reason, we introduce a prototype, called LODVec, which exploits LODsyndesis for enabling the creation of URI embeddings by using hundreds of data sets simultaneously, after enriching them with the results of cross-data set identity reasoning. By using LODVec, it is feasible to produce URI sequences by following paths of any length (according to a given configuration), and the produced URI sequences are used as input for creating embeddings through word2vec model. We provide comparative results for evaluating the gain of using several data sets for creating URI embeddings, for the tasks of classification and regression, and for finding the most similar entities to a given one.

An ontology-driven perspective on the emotional human reactions to social events

Por Danilo Cavaliere

An ontology-driven perspective on the emotional human reactions to social events
Danilo Cavaliere; Sabrina Senatore
International Journal of Metadata, Semantics and Ontologies, Vol. 15, No. 1 (2021) pp. 23 - 38
Social media has become a fulcrum for sharing information on everyday-life events: people, companies, and organisations express opinions about new products, political and social situations, football matches, and concerts. The recognition of feelings and reactions to events from social networks requires dealing with great amounts of data streams, especially for tweets, to investigate the main sentiments and opinions that justify some reactions. This paper presents an emotion-based classification model to extract feelings from tweets related to an event or a trend, described by a hashtag, and build an emotional concept ontology to study human reactions to events in a context. From the tweet analysis, terms expressing a feeling are selected to build a topological space of emotion-based concepts. The extracted concepts serve to train a multi-class SVM classifier that is used to perform soft classification aimed at identifying the emotional reactions towards events. Then, an ontology allows arranging classification results, enriched with additional DBpedia concepts. SPARQL queries on the final knowledge base provide specific insights to explain people's reactions towards events. Practical case studies and test results demonstrate the applicability and potential of the approach.

An ontology-based method for improving the quality of process event logs using database bin logs

Por Shokoufeh Ghalibafan

An ontology-based method for improving the quality of process event logs using database bin logs
Shokoufeh Ghalibafan; Behshid Behkamal; Mohsen Kahani; Mohammad Allahbakhsh
International Journal of Metadata, Semantics and Ontologies, Vol. 14, No. 4 (2020) pp. 279 - 289
The main goal of process mining is discovering models from event logs. The usefulness of these discovered models is directly related to the quality of event logs. Researchers proposed various solutions to detect deficiencies and improve the quality of event logs; however, only a few have considered the application of a reliable external source for the improvement of the quality of event data. In this paper, we propose a method to repair the event log using the database bin log. We show that database operations can be employed to overcome the inadequacies of the event logs, including incorrect and missing data. To this end, we, first, extract an ontology from each of the event logs and the bin log. Then, we match the extracted ontologies and remove inadequacies from the event log. The results show the stability of our proposed model and its superiority over related works.

Stress-testing big data platform to extract smart and interoperable food safety analytics

Por Ioanna Polychronou

Stress-testing big data platform to extract smart and interoperable food safety analytics
Ioanna Polychronou; Giannis Stoitsis; Mihalis Papakonstantinou; Nikos Manouselis
International Journal of Metadata, Semantics and Ontologies, Vol. 14, No. 4 (2020) pp. 306 - 314
One of the significant challenges for the future is to guarantee safe food for all inhabitants of the planet. During the last 15 years, very important fraud issues like the '2013 horse meat scandal' and the '2008 Chinese milk scandal' have greatly affected the food industry and public health. One of the alternatives for this issue consists of increasing production, but to accomplish this, it is necessary that innovative options be applied to enhance the safety of the food supply chain. For this reason, it is quite important to have the right infrastructure in order to manage data of the food safety sector and provide useful analytics to Food Safety Experts. In this paper, we describe Agroknow's Big Data Platform architecture and examine its scalability for data management and experimentation.

Semantic similarity measurement: an intrinsic information content model

Por Abhijit Adhikari

Semantic similarity measurement: an intrinsic information content model
Abhijit Adhikari; Biswanath Dutta; Animesh Dutta; Deepjyoti Mondal
International Journal of Metadata, Semantics and Ontologies, Vol. 14, No. 3 (2020) pp. 218 - 233
Ontology dependent Semantic Similarity (SS) measurement has emerged as a new research paradigm in finding the semantic strength between any two entities. In this regard, as observed, the information theoretic intrinsic approach yields better accuracy in correlation with human cognition. The precision of such a technique highly depends on how accurately we calculate Information Content (IC) of concepts and its compatibility with a SS model. In this work, we develop an intrinsic IC model to facilitate better SS measurement. The proposed model has been evaluated using three vocabularies, namely SNOMED CT, MeSH and WordNet against a set of benchmark data sets. We compare the results with the state-of-the-art IC models. The results show that the proposed intrinsic IC model yields a high correlation with human assessment. The article also evaluates the compatibility of the proposed IC model and the other existing IC models in combination with a set of state-of-the-art SS models.

Formalisation and classification of grammar and template-mediated techniques to model and ontology verbalisation

Por Zola Mahlaza

Formalisation and classification of grammar and template-mediated techniques to model and ontology verbalisation
Zola Mahlaza; C. Maria Keet
International Journal of Metadata, Semantics and Ontologies, Vol. 14, No. 3 (2020) pp. 249 - 262
Computational tools that translate modelling languages to a restricted natural language can improve end-user involvement in modelling. Templates are a popular approach for such a translation and are often paired with computational grammar rules to support grammatical complexity to obtain better quality sentences. There is no explicit specification of the relations used for the pairing of templates with grammar rules, so it is challenging to compare the latter templates' suitability for less-resourced languages, where grammar reuse is vital in reducing development effort. In order to enable such comparisons, we devise a model of pairing templates and rules, and assess its applicability by considering 54 existing systems for classification, and 16 of them in detail. Our classification shows that most grammar-infused template systems support detachable grammar rules and half of them introduce syntax trees for multilingualism or error checking. Furthermore, out of the 16 considered grammar-infused template systems, most do not currently support any of form of aggregation (63%) or the embedding of verb conjugation rules (81%); hence, if such features would be required, then they would need to be implemented from the ground up.

Automatic classification of digital objects for improved metadata quality of electronic theses and dissertations in institutional repositories

Por Abhijit Adhikari

Automatic classification of digital objects for improved metadata quality of electronic theses and dissertations in institutional repositories
Lighton Phiri
International Journal of Metadata, Semantics and Ontologies, Vol. 14, No. 3 (2020) pp. 234 - 248
Higher education institutions typically employ Institutional Repositories (IRs) in order to curate and make available Electronic Theses and Dissertations (ETDs). While most of these IRs are implemented with self-archiving functionalities, self-archiving practices are still a challenge. This arguably leads to inconsistencies in the tagging of digital objects with descriptive metadata, potentially compromising searching and browsing of scholarly research output in IRs. This paper proposes an approach to automatically classify ETDs in IRs, using supervised machine learning techniques, by extracting features from the minimum possible input expected from document authors: the ETD manuscript. The experiment results demonstrate the feasibility of automatically classifying IR ETDs and, additionally, ensuring that repository digital objects are appropriately structured. Automatic classification of repository objects has the obvious benefit of improving the searching and browsing of content in IRs and further presents opportunities for the implementation of third-party tools and extensions that could potentially result in effective self-archiving strategies.

Modelling weightlifting 'Training-Diet-Competition' cycle following a modular and scalable approach

Por Piyaporn Tumnark

Modelling weightlifting 'Training-Diet-Competition' cycle following a modular and scalable approach
Piyaporn Tumnark; Paulo Cardoso; Jorge Cabral; Filipe Conceição
International Journal of Metadata, Semantics and Ontologies, Vol. 14, No. 3 (2020) pp. 185 - 196
Studies in weightlifting have been characterised by unclear results and information paucity, mainly due to the lack of information sharing between athletes, coaches, biomechanists, physiologists and nutritionists. These experts' knowledge is not captured, classified or integrated into an information system for decision-making. An ontology-driven knowledge model for Olympic weightlifting was developed to leverage a better understanding of the weightlifting domain as a whole, bringing together related knowledge domains of training methodology, weightlifting biomechanics, and dietary regimes, while modelling the synergy among them. It unifies terminology, semantics, and concepts among sport scientists, coaches, nutritionists, and athletes to partially obviate the recognised limitations and inconsistencies, leading to the provision of superior coaching and a research environment which promotes better understanding and more conclusive results. The ontology-assisted weightlifting knowledge base consists of 110 classes, 50 object properties, 92 data properties, 167 inheritance relationships concepts, in a total of 1761 axioms, alongside 23 SWRL rules.

CMDIfication process for textbook resources

Por Francesca Fallucchi

CMDIfication process for textbook resources
Francesca Fallucchi; Ernesto William De Luca
International Journal of Metadata, Semantics and Ontologies, Vol. 14, No. 2 (2020) pp. 135 - 148
Interoperability between heterogeneous resources and services is the key to a correctly functioning digital infrastructure, which can provide shared resources at once. We analyse the establishment of a standardised common infrastructure covering metadata, content, and inferred knowledge to allow collaborative work between researchers in the humanities. In this paper, we discuss how to provide a CMDI (Component MetaData Infrastructure) profile for textbooks, in order to integrate it into the Common Language Resources and Technology Infrastructure (CLARIN) and thus to make the data available in an open way and according to the FAIR principles. We focus on the 'CMDIfication' process, which fulfils the needs of our related projects. We describe a process of building resources using CMDI description from Text Encoding Initiative (TEI), Metadata Encoding and Transmission Standard (METS) and Dublin Core (DC) metadata, testing it on the textbook resources of the Georg Eckert Institute (GEI).

Intermediary XML schemas: constraint, templating and interoperability in complex environments

Por Getaneh Alemu

Intermediary XML schemas: constraint, templating and interoperability in complex environments
Richard Gartner
International Journal of Metadata, Semantics and Ontologies, Vol. 14, No. 2 (2020) pp. 88 - 97
This article introduces the methodology of intermediary schemas for complex metadata environments. Metadata in instances conforming to these is not generally intended for dissemination but must usually be transformed by XSLT transformations to generate instances conforming to the referent schemas to which they mediate. The methodology is designed to enhance the interoperability of complex metadata within XML architectures. This methodology incorporates three subsidiary methods: these are project-specific schemas which represent constrained mediators to over-complex or over-flexible referents (Method 1), templates or conceptual maps from which instances may be generated (Method 2) and serialised maps of instances conforming to their referent schemas (Method 3). The three methods are detailed and their applications to current research in digital ecosystems, archival description and digital asset management and preservation are examined. A possible synthesis of the three is also proposed in order to enable the methodology to operate within a single schema, the Metadata Encoding and Transmission Standard (METS).

❌