Noticias em eLiteracias

🔒
❌ Sobre o FreshRSS
Há novos artigos disponíveis, clique para atualizar a página.
Antes de ontemKnowledge and Information Systems

Node classification across networks via category-level domain adaptive network embedding

Abstract

To improve the performance of classifying nodes on unlabeled or scarcely-labeled networks, the task of node classification across networks is proposed for transferring knowledge from similar networks with rich labels. As data distribution shift exists across networks, domain adaptive network embedding is proposed to overcome such challenge by learning network-invariant and discriminative node embeddings, in which domain adaptation technique is applied to network embedding for reducing domain discrepancy. However, existing works merely discuss category-level domain discrepancy which is crucial to better adaptation and classification. In this paper, we propose category-level domain adaptive network embedding. The key idea is minimizing intra-class domain discrepancy and maximizing inter-class domain discrepancy between source and target networks simultaneously. To further enhance classification performance on target network, we reduce embedding variation inside each class and enlarge it between different classes. Graph attention network is adopted for learning network embeddings. In addition, a novel pseudo-labeling strategy for target network is developed to better compute category-level information. Theoretical analysis guarantees the effectiveness of our model. Furthermore, extensive experiments on real-world datasets show that our model achieves the state-of-art performance, in particular, outperforming existing domain adaptive network embedding models by up to 32%.

  • 1 de Dezembro de 2023, 00:00

Comment on “New cosine similarity and distance measures for Fermatean fuzzy sets and TOPSIS approach”

Abstract

In the above paper, Kirisci (Knowl Inform Syst 65:855–868, 2023) proposed a new cosine similarity and distance measures for Fermatean fuzzy sets. In this comment, we point out some mistakes in the definitions. In parallel, in the light of the problem mentioned in the paper, we propose two improved cosine similarity measures that are superior and capable of solving the problem.

  • 1 de Dezembro de 2023, 00:00

Ontology extension with NLP-based concept extraction for domain experts in catalytic sciences

Abstract

Ontologies store semantic knowledge in a machine-readable way and represent domain knowledge in controlled vocabulary. In this work, a workflow is set up to derive classes from a text dataset using natural language processing (NLP) methods. Furthermore, ontologies and thesauri are browsed for those classes and corresponding existing textual definitions are extracted. A base ontology is selected to be extended with knowledge from catalysis science, while word similarity is used to introduce new classes to the ontology based on the class candidates. Relations are introduced to automatically reference them to already existing classes in the selected ontology. The workflow is conducted for a text dataset related to catalysis research on methanation of CO \(_2\) and seven semantic artifacts assisting ontology extension by domain experts. Undefined concepts and unstructured relations can be more easily introduced automatically into existing ontologies. Domain experts can then revise the resulting extended ontology by choosing the best fitting definition of a class and specifying suggested relations between concepts of catalyst research. A structured extension of ontologies supported by NLP methods is made possible to facilitate a Findable, Accessible, Interoperable, Reusable (FAIR) data management workflow.

  • 1 de Dezembro de 2023, 00:00

Entropic principal component analysis using Cauchy–Schwarz divergence

Abstract

Modern pattern recognition applications are frequently associated with high-dimensional datasets. In the last decades, different approaches have been proposed to address the curse of dimensionality phenomena present in this kind of data. Principal component analysis is a classic method that even today is widely used for this purpose. However, its procedure is based in the covariance matrix, which is built by the feature vectors scalar product in a point-wise fashion. This makes it very sensible to noise and outliers. This work presents a patch-based mapping to an entropic space. Given a data sample neighborhood, each feature value set is mapped to an univariate Gaussian distribution described by its parameters. Then, each scalar coordinate of a data sample is replaced by the parameter tuple that describes each feature. The difference between two data sample vectors in the entropic space is defined as the vector where each scalar coordinate is given by the stochastic divergence between two probability distributions. The covariance matrix is still defined by the scalar product between the difference vector of a data sample and the average sample, so it can be used with transparency in the original PCA algorithm. This patch mapping in the entropic space aims to mitigate the effect of noise and outliers. Experiments adopting the Cauchy–Schwarz divergence show that the new framework can outperform several existing dimensionality reduction algorithms in cluster analysis tasks in multiple real datasets.

  • 1 de Dezembro de 2023, 00:00

Drug-CoV: a drug-origin knowledge graph discovering drug repurposing targeting COVID-19

Abstract

Drug repurposing is a technique for probing new usages of existing medicines, but its traditional methods, such as computational approaches, can be time-consuming and laborious. Recently, knowledge graphs (KGs) have emerged as a powerful approach for graph-based representation in drug repurposing, encoding entities and relations to predict new connections and facilitate drug discovery. As COVID-19 has become a major public health concern, it is critical to establish an appropriate COVID-19 KG for drug repurposing to combat the spread of the virus. However, most publicly available COVID-19 KGs lack support for multi-relations and comprehensive entity types. Moreover, none of them originates from COVID-19-related drugs, making it challenging to identify effective treatments. To tackle these issues, we developed Drug-CoV, a drug-origin and multi-relational COVID-19 KG. We evaluated the quality of Drug-CoV by performing link prediction and comparing the results to another publicly available COVID-19 KG. Our results showed that Drug-CoV outperformed the comparing KG in predicting new links between entities. Overall, Drug-CoV represents a valuable resource for COVID-19 drug repurposing efforts and demonstrates the potential of KGs for facilitating drug discovery.

  • 1 de Dezembro de 2023, 00:00

A novel quasi-oppositional chaotic student psychology-based optimization algorithm for deciphering global complex optimization problems

Abstract

This research work projects a novel quasi-oppositional chaotic student psychology-based optimization (SPBO) (QOCSPBO) algorithm for solving global optimization problems. To tackle the identified flaws of the standard SPBO, the proffered QOCSPBO algorithm combines two search strategies within the standard SPBO framework. The obtained outcomes exhibit that the proposed QOCSPBO algorithm outperforms SPBO and recently published algorithms in optimizing a set of well-known benchmark test functions. The projected QOCSPBO attains the optimal site and size of distributed generation and shunt capacitors in two radial distribution systems contemplating different types load models at three load levels. The obtained results prove that the recommended method can be highly suitable in solving real-time power system optimization problems with constrained and unknown search space.

  • 1 de Dezembro de 2023, 00:00

Federated search techniques: an overview of the trends and state of the art

Abstract

Conventional search engines, such as Bing, Baidu, and Google, offer a convenient way for users to seek information on the web. However, with all the benefits they provide, one major limitation is that a sizable portion of the information sources on the web may not be available due to commercial or proprietary reasons. Federated search solves this problem by providing a single user interface through which multiple independent resources can be searched and their results are combined for end users. Up to now, federated search has become a well-established research area, with many systems developed and algorithms proposed to deal with three major issues: resource description, resource selection, and results merging. This paper reviews state-of-the-art federated search techniques developed over the past three decades, with more attention to recent achievement. Both resource selection and result merging methods are categorized into three types, heuristic, machine learning-based, and other methods. Apart from the three major issues above-mentioned, we also discuss systems and prototypes developed, and datasets used for federated search experiments. Some other related issues including retrieval evaluation, aggregated search, metasearch, supporting personalization in federated search, are also covered. Finally, we conclude by discussing some directions for future research.

  • 1 de Dezembro de 2023, 00:00

Fuzzy twin support vector machine based on affinity and class probability for class imbalance learning

Abstract

Recently a robust and efficient classifier termed affinity and class probability-based fuzzy support vector machine (ACFSVM) was proposed to address the binary class imbalance and noisy data classification problems. Despite the excellent generalization ability of ACFSVM, there is a scope to improve its classification ability. To enhance the classification performance of ACFSVM, this work suggests a novel fuzzy twin support vector machine based on affinity and class probability (ACFTSVM). In ACFTSVM, regularization terms are added to the primal problems which diminish the negative influence of noise. The affinity (AF) of the majority (MJ) class datapoints is measured using the support vector data description model trained in kernel space using only the MJ class training samples. The k-nearest neighbour method is used to estimate the class probability (CP) of the MJ class datapoints in the same kernel space as before to decrease the potential of noises. Lower CP samples are prone to noise, and their contribution to learning appears to be harmed by their low memberships, which are calculated by adding the AFs and the CPs. ACFTSVM, like ACFSVM, will give preference to MJ class datapoints with higher AFs and CPs, while minimizing the influence of minority class samples with lower AFs and CPs. As a result, the decision boundary is skewed towards the MJ class. Five artificially imbalanced datasets and a few notable real-world datasets are used in numerical simulations.

  • 1 de Dezembro de 2023, 00:00

Temporal word embedding with predictive capability

Abstract

Semantics in natural language processing is largely dependent on contextual relationships between words and entities in a document collection. The context of a word may evolve. For example, the word “apple” currently has two contexts—a fruit and a technology company. The changes in the context of words or entities in text data such as scientific publications and news articles can help us understand the evolution of innovation or events of interest. In this work, we present a new diffusion-based temporal word embedding model that can capture short- and long-term changes in the semantics of entities in different domains. Our model captures how the context of each entity shifts over time. Existing temporal word embeddings capture semantic evolution at a discrete/granular level, aiming to study how a language developed over a long period. Unlike existing temporal embedding methods, our approach provides temporally smooth embeddings, facilitating prediction and trend analysis better than those of existing models. Extensive evaluations demonstrate that our proposed temporal embedding model performs better in sense-making and predicting relationships between words and entities in the future compared to other existing models.

  • 1 de Dezembro de 2023, 00:00

EcoLight+: a novel multi-modal data fusion for enhanced eco-friendly traffic signal control driven by urban traffic noise prediction

Abstract

Urban traffic congestion is of utmost importance for modern societies due to population and economic growth. Thus, it contributes to environmental problems like increasing greenhouse gas emissions and noise pollution. Improved traffic flow in urban networks relies heavily on traffic signal control. Hence, optimizing cycle timing at many intersections is paramount to reducing congestion and increasing sustainability. This paper introduces an alternative to conventional traffic signal control, EcoLight+, which incorporates future noise predictions with the deep dueling Q-network reinforcement Learning algorithm to reduce noise levels, CO \({_{2}}\) emissions, and fuel consumption. An innovative data fusion approach is also proposed to improve our LSTM-based noise prediction model by integrating heterogeneous data from different sources. Our proposed solution allows the system to achieve higher efficiency than its competitors based on real-world data from Tallinn, Estonia.

  • 1 de Dezembro de 2023, 00:00

Fuzzy clustering analysis for the loan audit short texts

Abstract

In China, post-loan management is usually executed in the form of a visit survey conducted by a credit manager. Through a quarterly visit survey, a large number of loan audit short texts, which contain valuable information for evaluating the credit status of small and micro-enterprises, are collected. However, methods for analysing this type of short text remain lacking. This study proposes a method for processing short loan audit texts called fuzzy clustering analysis (FCA). This method first transforms short texts into a fuzzy matrix through lexical analysis; it then calculates the similarity between records based on each fuzzy matrix and constructs an association graph with this similarity. Finally, it uses a prism minimum spanning tree to extract clusters based on different \({\alpha }\) cuts. Experiments using actual data from a commercial bank in China revealed that the FCA yields suitable clustering results when handling loan audit briefs. Moreover, it exhibited superior performance compared to BIRCH, k-means, and fuzzy c-means.

  • 1 de Dezembro de 2023, 00:00

Concise and interpretable multi-label rule sets

Abstract

Multi-label classification is becoming increasingly ubiquitous, but not much attention has been paid to interpretability. In this paper, we develop a multi-label classifier that can be represented as a concise set of simple “if-then” rules, and thus, it offers better interpretability compared to black-box models. Notably, our method is able to find a small set of relevant patterns that lead to accurate multi-label classification, while existing rule-based classifiers are myopic and wasteful in searching rules, requiring a large number of rules to achieve high accuracy. In particular, we formulate the problem of choosing multi-label rules to maximize a target function, which considers not only discrimination ability with respect to labels, but also diversity. Accounting for diversity helps to avoid redundancy, and thus, to control the number of rules in the solution set. To tackle the said maximization problem, we propose a 2-approximation algorithm, which circumvents the exponential-size search space of rules using a novel technique to sample highly discriminative and diverse rules. In addition to our theoretical analysis, we provide a thorough experimental evaluation and a case study, which indicate that our approach offers a trade-off between predictive performance and interpretability that is unmatched in previous work.

  • 1 de Dezembro de 2023, 00:00

A representation learning model based on stochastic perturbation and homophily constraint

Abstract

The network representation learning task of fusing node multi-dimensional classification information aims to effectively combine node multi-dimensional classification information and network structure information for representation learning, thereby improving the performance of network representation. However, the existing methods only consider multi-dimensional classification information as priori features, which assists the representation learning of the network structure information, lacks the coping mechanism in the case of missing data, and have low robustness in the case of incomplete information. To address these issues, in this paper, we propose a representation learning model based on stochastic perturbation and homophily constraint, called IMCIN. On the one hand, the data transformation is carried out through the random perturbation strategy to improve the adaptability of the model to incomplete information. On the other hand, in the process of learning fusion representation vectors, an attribute similarity retention method based on the principle of homogeneity is designed to further mine the effective semantic information in the incomplete information. Experiments show that our method can effectively deal with the problem of incomplete information and improve the performance of node classification and link prediction tasks.

  • 1 de Dezembro de 2023, 00:00

STORM-GAN+: spatio-temporal meta-GAN for cross-city estimation of heterogeneous human mobility responses to COVID-19

Abstract

Estimating human mobility is essential during the COVID-19 pandemic because it provides policymakers with important information for non-pharmaceutical actions. Deep learning methods perform better on tasks with enough training data than traditional estimating techniques. However, estimating human mobility during the rapidly developing pandemic is challenging because of data non-stationarity, a lack of observations, and complicated social situations. Prior studies on estimating mobility either concentrate on a single city or cannot represent the spatio-temporal relationships across cities and time periods. To address these issues, we solve the cross-city human mobility estimation problem using a deep meta-generative framework. Recently, we proposed the Spatio-Temporal Meta-Generative Adversarial Network (STORM-GAN) model, which estimates dynamic human mobility responses under social and policy conditions relevant to COVID-19 and is facilitated by a novel spatio-temporal task-based graph (STTG) embedding. Although STORM-GAN achieves a good average estimation accuracy, it creates higher errors and exhibits over-fitting in particular cities due to spatial heterogeneity. To address these issues, in this paper, we extend our prior work by introducing an improved spatio-temporal deep generative model, namely STORM-GAN+. STORM-GAN+ deals with the difficulties by including a distance-based weighted training technique into the STTG embedding component to better represent the variety of knowledge transfer across cities. Furthermore, to mitigate the issue of overfitting, we modify the meta-learning training objective to teach estimated mobility. Finally, we propose a conditional meta-learning algorithm that explicitly tailors transferable knowledge to various task clusters. We perform comprehensive evaluations, and STORM-GAN+ approximates real-world human mobility responses more accurately than previous methods, including STORM-GAN.

  • 1 de Novembro de 2023, 00:00

Continuous prediction of a time intervals-related pattern’s completion

Abstract

In many daily applications, such as meteorology or patient data, the starting and ending times of the events are stored in a database, resulting in time interval data. Discovering patterns from time interval data can reveal informative patterns, in which the time intervals are related by temporal relations, such as before or overlaps. When multiple temporal variables are sampled in a variety of forms, and frequencies, as well as irregular events that may or may not have a duration, time intervals patterns can be a powerful way to discover temporal knowledge, since these temporal variables can be transformed into a uniform format of time intervals. Predicting the completion of such patterns can be used when the pattern ends with an event of interest, such as the recovery of a patient, or an undesirable event, such as a medical complication. In recent years, an increasing number of studies have been published on time intervals-related patterns (TIRPs), their discovery, and their use as features for classification. However, as far as we know, no study has investigated the prediction of the completion of a TIRP. The main challenge in performing such a completion prediction occurs when the time intervals are coinciding and not finished yet which introduces uncertainty in the evolving temporal relations, and thus on the TIRP’s evolution process. To overcome this challenge, we propose a new structure to represent the TIRP’s evolution process and calculate the TIRP’s completion probabilities over time. We introduce two continuous prediction models (CPMs), segmented continuous prediction model (SCPM), and fully continuous prediction model (FCPM) to estimate the TIRP’s completion probability. With the SCPM, the TIRP’s completion probability changes only at the TIRP’s time intervals’ starting or ending point. The FCPM incorporates, in addition, the duration between the TIRP’s time intervals’ starting and ending time points. A rigorous evaluation of four real-life medical and non-medical datasets was performed. The FCPM outperformed the SCPM and the baseline models (random forest, artificial neural network, and recurrent neural network) for all datasets. However, there is a trade-off between the prediction performance and their earliness since the new TIRP’s time intervals’ starting and ending time points are revealed over time, which increases the CPM’s prediction performance.

  • 1 de Novembro de 2023, 00:00

An efficient deep learning framework for occlusion face prediction system

Abstract

Generally, face detection or prediction and tracking technology is the most critical research direction for target tracking and identifying criminal activities. However, crime detection in a surveillance system is complex to use. Moreover, preprocessing layer takes more time and needs to get pure-quality data. This research designed a novel, Crow Search-based Recurrent Neural Scheme to enhance the prediction performance of occlusion faces and improve classification results. Thus, the developed model was implemented in the Python tool, and the online COFW dataset was collected and trained for the system. Furthermore, enhance the performance of prediction accuracy and classify the person accurately by using Crow search fitness. Thus, the designed optimization technique tracks and searches the person's location and predicts the occlusion faces using labels. Finally, developed model experimental outcomes show better performance in predicting the occlusion faces, and the attained results are validated with prevailing models. The designed model gained 98.75% accuracy, 99% recall, and 98.56% precision for predicting occlusion faces. It shows the efficiency of the developed model and attains better performance while comparing other models.

  • 1 de Novembro de 2023, 00:00

A new type of cosine similarity measures based on intuitionistic hesitant fuzzy rough sets for the evaluation of volatile currency: evidence from the Pakistan economy

Abstract

This article proposes novel cosine and weighted cosine similarity measures based on intuitionistic hesitant fuzzy rough sets and examines their fundamental characteristics. Similarity measures are crucial and advantageous tools that have a broad range of applications in decision making, data mining, medical diagnosis, and pattern recognition. To demonstrate the validity of the proposed similarity measures, an illustrative example in the evaluation of volatile currency in Pakistan is presented to verify the efficacy of our approach. Additionally, the rankings of suggested similarity measures are compared to those identified in the literature. The findings demonstrate that the innovative similarity measures lead in consistent patterns of ranking. The comparison confirms that the suggested similarity measures methodologies may achieve precise classification results and are applicable to real-world challenges involving hesitancy and uncertainty.

  • 1 de Novembro de 2023, 00:00

Near-optimal Steiner tree computation powered by node embeddings

Abstract

Steiner minimum tree problem on a graph, i.e., finding a tree with the minimum weight that covers the set of terminal nodes, is a classical NP-hard problem. Thus, we develop a method based on supervised learning to produce a near-optimal Steiner tree in this paper. It contains three main phases, namely node embedding, candidate set generation, and tree construction. Leveraged on compressed sensing, we devise a novel node embedding that exhibits a good nature of reversibility for sparse linear aggregations, which powers learning a mapping function from the terminal set to the optimal Steiner tree. Finally, we propose efficient pruning techniques to improve the solution quality. The experimental results show that our approach delivers high-quality solutions and runs faster than the competitors by one or two orders of magnitude on graphs with more than 200 nodes.

  • 1 de Novembro de 2023, 00:00

GWNN-HF: beyond assortativity in graph wavelet neural network

Abstract

Graph wavelet neural network exerts a powerful learning ability in assortative networks where most of the adjacent nodes have the same label as the target node. However, it does not perform well in disassortative networks where most of the adjacent nodes have different label than the target node. So graph wavelet neural network cannot extract the most useful information based on different types of networks. On the one hand, graph wavelet neural network is not able to extract the similarity information of the same labeled neighbor nodes and the difference information of different labeled neighbor nodes in a flexible way. On the other hand, graph wavelet neural network only aggregates neighbor nodes so that it cannot obtain information of nodes which have similar feature with the target node and are far from the target node. To solve the above problems, we propose the GWNN-HF model, which can effectively adapt to different types of networks and get a better node representation. Specifically speaking, firstly, we design low-pass filter and high-pass filter convolution kernels to get low-pass and high-pass signals and then use the adaptive fusion method to fuse them, which effectively get commonality of same label nodes and difference of different label nodes. Secondly, we use the Relaxed Minimum Spanning Tree algorithm to construct a feature correlation graph and use an attention mechanism to fuse the original graph and feature correlation graph representation. Extensive experiments on benchmark datasets clearly indicate that GWNN-HF has a good performance in different types of network structures.

  • 1 de Novembro de 2023, 00:00

Leveraging BERT for extractive text summarization on federal police documents

Abstract

A document known as notitia criminis (NC) is use in the Brazilian Federal Police as the starting point of the criminal investigation. An NC aims to report a summary of investigative activities. Thus, it contains all relevant information about a supposed crime that occurred. To manage an inquiry and correlate similar investigations, the Federal Police usually needs to extract essential information from an NC document. The manual extraction (reading and understanding the entire content) may be human mentally exhausting, due to the size and complexity of the documents. In this light, natural language processing (NLP) techniques are commonly used for automatic information extraction from textual documents. Deep neural networks are successfully apply to many different NLP tasks. A neural network model that leveraged the results in a wide range of NLP tasks was the BERT model—an acronym for Bidirectional Encoder Representations from Transformers. In this article, we propose approaches based on the BERT model to extract relevant information from textual documents using automatic text summarization techniques. In other words, we aim to analyze the feasibility of using the BERT model to extract and synthesize the most essential information of an NC document. We evaluate the performance of the proposed approaches using two real-world datasets: the Federal Police dataset (a private domain dataset) and the Brazilian WikiHow dataset (a public domain dataset). Experimental results using different variants of the ROUGE metric show that our approaches can significantly increase extractive text summarization effectiveness without sacrificing efficiency.

  • 1 de Novembro de 2023, 00:00

A two-way dense feature pyramid networks for object detection of remote sensing images

Abstract

The bird’s eye view, multi-scale and dense classes in remote sensing images challenge the object detection of remote sensing images. It is not satisfactory to directly apply the object detection method designed for natural scene images to the object detection of remote sensing images. In this paper, we propose a detector with enhanced feature extraction ability to solve the above challenges, namely TWDFPN. TWDFPN has designed the structure of a two-way feature pyramid network (TWFPN) by combining feature maps with different generation directions and different spatial resolutions, which not only improves the utilization of the underlying feature information, but also strengthens the repeated utilization of the feature information of the backbone network, and ultimately improves the feature extraction ability of the network. Meanwhile, the dense-connected module is used in TWFPN to enhance the feature representation ability through limited additional computation cost, which extends the network and deepens the network. To evaluate the effectiveness of the proposed algorithm, this paper carried out experiments on NWPUVHR-10 and RSOD public remote sensing datasets, and the average accuracy (mAP) of 92.98% and 96.16%, respectively, which achieves advanced performance.

  • 1 de Novembro de 2023, 00:00

A hybrid clustering approach for link prediction in heterogeneous information networks

Abstract

In recent years, researchers from academic and industrial fields have become increasingly interested in social network data to extract meaningful information. This information is used in applications such as link prediction between people groups, community detection, protein module identification, etc. Therefore, the clustering technique has emerged as a solution to finding similarities between social network members. Recently, in most graph clustering solutions, the structural similarity of nodes is combined with their attribute similarity. The results of these solutions indicate that the graph's topological structure is more important. Since most social networks are sparse, these solutions often suffer from insufficient use of node features. This paper proposes a hybrid clustering approach as an application for link prediction in heterogeneous information networks (HINs). In our approach, an adjacency vector is determined for each node until, in this vector, the weight of the direct edge or the weight of the shortest communication path among every pair of nodes is considered. A similarity metric is presented that calculates similarity using the direct edge weight between two nodes and the correlation between their adjacency vectors. Finally, we evaluated the effectiveness of our proposed method using DBLP, Political blogs, and Citeseer datasets under entropy, density, purity, and execution time metrics. The simulation results demonstrate that while maintaining the cluster density significantly reduces the entropy and the execution time compared with the other methods.

  • 1 de Novembro de 2023, 00:00

A robust adaptive linear regression method for severe noise

Abstract

Up to now, the inaccurate supervision problem caused by label noises poses a big challenge for regression modeling. Regularized noise-robust models provide a valid way for dealing with label noises in regression tasks. They generally use robust losses to cope with label noises and further enhance model robustness by feature selection. But most of them may not work well on data sets contaminated by severe noises (whose magnitudes are extreme), because severe noises do not coincide with their noise assumptions. To address this concern, this paper proposes a robust adaptive linear regression method named TC-ALASSO (Truncated Cauchy Adaptive LASSO), in which model learning and feature selection are finished simultaneously. The fat-tailed Cauchy distribution and truncation theory are adopted to deal with moderate noises and identified extreme noises, respectively, and construct the Truncated Cauchy loss for regression tasks. Moreover, TC-ALASSO applies the adaptive regularizer to finish feature selection well. Note that its adaptive regularizer weights are acquired according to regression coefficient estimations under the truncated Cauchy loss. We also theoretically analyze the robustness of proposed TC-ALASSO in this paper. The experimental results on artificial and benchmark data sets all confirm the robustness and effectiveness of TC-ALASSO. In addition, experimental results on face recognition databases validate the performance advantage of TC-ALASSO over state-of-the-art methods in dealing with extreme illumination variations.

  • 1 de Novembro de 2023, 00:00

Revisiting of peer-to-peer traffic: taxonomy, applications, identification techniques, new trends and challenges

Abstract

The services provided through peer-to-peer (P2P) architecture involve the transmission of text, images, documents, and multimedia. Especially the distribution of multimedia content like video and audio is mainly demanded by clients and has become the major reason for generating traffic by consuming significant bandwidth. This traffic is mostly generated by P2P applications like Napster, Gnutella, BitTorrent, PPTV, YuppTV, and many more. To use the network bandwidth proficiently, thus classification and identification of this Internet traffic became necessary. Moreover, it is required to classify the specific P2P application traffic, so data distribution over the P2P network can be improved. This survey paper discusses the working of different P2P applications for which traffic is created and raises related issues. The paper deliberates the various techniques and overlays that are used to provide the services over the P2P network. This paper includes the various techniques of feature selection and the machine learning algorithm for the identification and classification of internet traffic. This paper also reviewed the recent developments and highlights the future direction of research work in P2P networks.

  • 1 de Novembro de 2023, 00:00

Seq2EG: a novel and effective event graph parsing approach for event extraction

Abstract

Event extraction is a fundamental task in information extraction. Most previous approaches typically transform event extraction into two subtasks: trigger classification and argument classification, and solve them via classification-based methods, which suffer from some inherent drawbacks. To overcome these issues, in this paper, we propose a novel event extraction model Seq2EG by first formulating event extraction as an event graph parsing problem, and then exploiting a pre-trained sequence-to-sequence (seq2seq) model to transduce an input sentence into an accurate event graph without the need for trigger words. Based on the generative event graph parsing formulation, our model Seq2EG can explicitly model the multiple event correlations and argument sharing and can naturally incorporate some graph-structured features and the rich semantic information conveyed by the labels of event types and argument roles. Extensive experimental results on the public ACE2005 dataset show that our approach outperforms all previous state-of-the-art models for event extraction by a large margin, respectively, obtaining an improvement of 3.4% F1 score for event detection and an improvement of 4.7% F1 score for argument classification over the best baselines.

  • 1 de Outubro de 2023, 00:00
❌