Noticias em eLiteracias

🔒
✇ Knowledge and Information Systems

Using topic-noise models to generate domain-specific topics across data sources

1 de Maio de 2023, 00:00

Abstract

Domain-specific document collections, such as data sets about the COVID-19 pandemic, politics, and sports, have become more common as platforms grow and develop better ways to connect people whose interests align. These data sets come from many different sources, ranging from traditional sources like open-ended surveys and newspaper articles to one of the dozens of online social media platforms. Most topic models are equipped to generate topics from one or more of these data sources, but models rarely work well across all types of documents. The main problem that many models face is the varying noise levels inherent in different types of documents. We propose topic-noise models, a new type of topic model that jointly models topic and noise distributions to produce a more accurate, flexible representation of documents regardless of their origin and varying qualities. Our topic-noise model, Topic Noise Discriminator (TND) approximates topic and noise distributions side-by-side with the help of word embedding spaces. While topic-noise models are important for the types of short, noisy documents that often originate on social media platforms, TND can also be used with more traditional data sources like newspapers. TND itself generates a noise distribution that when ensembled with other generative topic models can produce more coherent and diverse topic sets. We show the effectiveness of this approach using Latent Dirichlet Allocation (LDA), and demonstrate the ability of TND to improve the quality of LDA topics in noisy document collections. Finally, researchers are beginning to generate topics using multiple sources and finding that they need a way to identify a core set based on text from different sources. We propose using cross-source topic blending (CSTB), an approach that maps topics sets to an s-partite graph and identifies core topics that blend topics from across s sources by identifying subgraphs with certain linkage properties. We demonstrate the effectiveness of topic-noise models and CSTB empirically on large real-world data sets from multiple domains and data sources.

✇ Knowledge and Information Systems

Falcon: lightweight and accurate convolution based on depthwise separable convolution

1 de Maio de 2023, 00:00

Abstract

How can we efficiently compress convolutional neural network (CNN) using depthwise separable convolution, while retaining their accuracy on classification tasks? Depthwise separable convolution, which replaces a standard convolution with a depthwise convolution and a pointwise convolution, has been used for building lightweight architectures. However, previous works based on depthwise separable convolution are limited when compressing a trained CNN model since (1) they are mostly heuristic approaches without a precise understanding of their relations to standard convolution, and (2) their accuracies do not match that of the standard convolution. In this paper, we propose Falcon, an accurate and lightweight method to compress CNN based on depthwise separable convolution.Falcon uses generalized elementwise product (GEP), our proposed mathematical formulation to approximate the standard convolution kernel, to interpret existing convolution methods based on depthwise separable convolution. By exploiting the knowledge of a trained standard model and carefully determining the order of depthwise separable convolution via GEP, Falcon achieves sufficient accuracy close to that of the trained standard model. Furthermore, this interpretation leads to developing a generalized version rank-k Falcon which performs k independent Falcon operations and sums up the result. Experiments show that Falcon (1) provides higher accuracy than existing methods based on depthwise separable convolution and tensor decomposition and (2) reduces the number of parameters and FLOPs of standard convolution by up to a factor of 8 while ensuring similar accuracy. We also demonstrate that rank-k Falcon further improves the accuracy while sacrificing a bit of compression and computation reduction rates.

✇ Knowledge and Information Systems

A jointly non-cooperative game-based offloading and dynamic service migration approach in mobile edge computing

1 de Maio de 2023, 00:00

Abstract

With the increase in the use of compute-intensive applications, the demand to continuously boost the efficiency of data processing increases. Offloading the compute-intensive application tasks to the edge servers can effectively solve problems for resource-constrained mobile devices. However, the computation offloading may increase network load and transmission delay, which will influence the user experience. On the other hand, the unceasing distance change between the local device and edge server could also affect the service quality due to user mobility. This paper proposes the offloading and service migration methods for compute-intensive applications to deal with these issues. First, the fine-grained computation offloading algorithm based on a non-cooperative game is proposed. The overhead on both the local side and edge side is analyzed. Moreover, the service migration path selection based on the Markov decision process is proposed by considering user mobility, energy cost, migration cost, available storage, and bandwidth. The optimal service migration path is selected according to the Markov decision process, which can improve service quality. Experiment results show that our proposed offloading strategy performs better in reducing energy consumption by more than 10% and latency by more than 6.2%, compared with other baseline algorithms, and saving mobile device energy and reducing task response time, saving over 10% of time and energy consumption compared to similar algorithms. The proposed service migration scheme can reduce migration times and maintain a success rate of more than 90% while guaranteeing service continuity in a multi-user scenario.

✇ Knowledge and Information Systems

Group decision-making with interval multiplicative preference relations

1 de Maio de 2023, 00:00

Abstract

This paper discusses group decision-making (GDM) with interval multiplicative preference relations (IMPRs) based on the geometric consistency. We propose a logarithmically geometric compatibility degree between two IMPRs and then define a geometrically logarithmic consistency index of IMPRs. The new consistency index of IMPRs is invariant under permutation of alternatives and transpose of IMPRs. By the statistics theory, the thresholds of the geometrically logarithmic consistency index are provided. For an unacceptably consistent IMPR, an interactive iterative algorithm is designed to improve its consistency level. Using the relationship between an interval weight vector (IWV) and an IMPR, a fuzzy programming model is established to derive an IWV. This model is converted into a linear programming model for resolution. Subsequently, a new individual decision-making (IDM) method with an IMPR is put forward. By minimizing the logarithmically geometric compatibility degree between each individual IMPR and the collective one, a convex programming model is built to determine experts’ weights. Consequently, a novel GDM method with IMPRs is presented. Numerical examples and simulation experiments are conducted to reveal the superiority of the proposed IDM method and GDM method.

✇ Knowledge and Information Systems

COOT optimization algorithm on training artificial neural networks

1 de Agosto de 2023, 00:00

Abstract

In recent years, significant advancements have been made in artificial neural network models and they have been applied to a variety of real-world problems. However, one of the limitations of artificial neural networks is that they can getting stuck in local minima during the training phase, which is a consequence of their use of gradient descent-based techniques. This negatively impacts the generalization performance of the network. In this study, it is proposed a new hybrid artificial neural network model called COOT-ANN, which uses the coot optimization algorithm firstly for optimizing artificial neural networks parameters, a metaheuristic-based approach. The COOT-ANN model does not get stuck in local minima during the training phase due to the use of metaheuristic-based optimization algorithm. The results of the study demonstrate that the proposed method is quite successful in terms of accuracy, cross-entropy, F1-score, and Cohen’s Kappa metrics when compared to gradient descent, scaled conjugate gradient, and Levenberg–Marquardt optimization techniques.

✇ Knowledge and Information Systems

Range-constrained probabilistic mutual furthest neighbor queries in uncertain databases

1 de Junho de 2023, 00:00

Abstract

For decades, query processing over uncertain databases has received much attention from the database community due to the pervasive data uncertainty in many real-world applications such as location-based services (LBS), sensor networks, business planning, biological databases, and so on. In this paper, we will study a novel query type, namely range-constrained probabilistic mutual furthest neighbor query (PMFN), over uncertain databases. PMFN retrieves a set of object pairs, \((o_i, o_j)\) , within a given query range Q, such that uncertain objects \(o_i\) and \(o_j\) are furthest neighbors of each other with high probabilities. In order to efficiently tackle the PMFN problem, we propose effective pruning methods, range, convex hull, and hypersphere pruning, for filtering out uncertain objects that can never appear in the PMFN answer set. Then, we also design spatial and probabilistic pruning methods to rule out false alarms of PMFN candidate pairs. Finally, we utilize a variant of the R \(^*\) -tree to integrate our proposed pruning methods and efficiently process ad hoc PMFN queries. Extensive experiments show the efficiency and effectiveness of our pruning techniques and PMFN query processing algorithms over real and synthetic data sets.

✇ Knowledge and Information Systems

Early portfolio pruning: a scalable approach to hybrid portfolio selection

1 de Junho de 2023, 00:00

Abstract

Driving the decisions of stock market investors is among the most challenging financial research problems. Markowitz’s approach to portfolio selection models stock profitability and risk level through a mean–variance model, which involves estimating a very large number of parameters. In addition to requiring considerable computational effort, this raises serious concerns about the reliability of the model in real-world scenarios. This paper presents a hybrid approach that combines itemset extraction with portfolio selection. We propose to adapt Markowitz’s model logic to deal with sets of candidate portfolios rather than with single stocks. We overcome some of the known issues of the Markovitz model as follows: (i) Complexity: we reduce the model complexity, in terms of parameter estimation, by studying the interactions among stocks within a shortlist of candidate stock portfolios previously selected by an itemset mining algorithm. (ii) Portfolio-level constraints: we not only perform stock-level selection, but also support the enforcement of arbitrary constraints at the portfolio level, including the properties of diversification and the fundamental indicators. (iii) Usability: we simplify the decision-maker’s work by proposing a decision support system that enables flexible use of domain knowledge and human-in-the-loop feedback. The experimental results, achieved on the US stock market, confirm the proposed approach’s flexibility, effectiveness, and scalability.

✇ Knowledge and Information Systems

Garden: a real-time processing framework for continuous top-k trajectory similarity search

1 de Setembro de 2023, 00:00

Abstract

Continuous top-k trajectory similarity Search (CkSearch) is now commonly required in real-time large-scale trajectory analysis, enabling the distributed stream processing engines to discover various dynamic patterns. As a fundamental operator, CkSearch empowers various applications, e.g., contact tracing during an outbreak and smart transportation. Although extensive efforts have been made to improve the efficiency of non-continuous top-k search, they do not consider dynamic capability of indexing (R1) and incremental capability of computing (R2). Therefore, in this paper, we propose a generic CkSearch-oriented framework for distributed real-time trajectory stream processing on Apache Flink, termed as Garden. To answer R1, we design a sophisticated distributed dynamic spatial index called Y-index, which consists of a real-time load scheduler and a two-layer indexing structure. To answer R2, we introduce a state reusing mechanism and index-based pruning methods that significantly reduce the computational cost. Empirical studies on real-world data validate the usefulness of our proposal and prove the huge advantage of our approach over state-of-the-art solutions in the literature.

✇ Knowledge and Information Systems

Migrating federated learning to centralized learning with the leverage of unlabeled data

1 de Setembro de 2023, 00:00

Abstract

Federated learning carries out cooperative training without local data sharing; the obtained global model performs generally better than independent local models. Benefiting from the free data sharing, federated learning preserves the privacy of local users. However, the performance of the global model might be degraded if diverse clients hold non-IID training data. This is because the different distributions of local data lead to weight divergence of local models. In this paper, we introduce a novel teacher–student framework to alleviate the negative impact of non-IID data. On the one hand, we maintain the advantage of the federated learning on the privacy-preserving, and on the other hand, we take the advantage of the centralized learning on the accuracy. We use unlabeled data and global models as teachers to generate a pseudo-labeled dataset, which can significantly improve the performance of the global model. At the same time, the global model as a teacher provides more accurate pseudo-labels. In addition, we perform a model rollback to mitigate the impact of latent noise labels and data imbalance in the pseudo-labeled dataset. Extensive experiments have verified that our teacher ensemble performs a more robust training. The empirical study verifies that the reliance on the centralized pseudo-labeled data enables the global model almost immune to non-IID data.

✇ Knowledge and Information Systems

A novel scheme to detect the best cloud service provider using logarithmic operational law in generalized spherical fuzzy environment

1 de Setembro de 2023, 00:00

Abstract

Generalized spherical fuzzy number (GSFN) is an extension of spherical fuzzy number (SFN) which deals the uncertainties involved in the real-life problems in much better way than other fuzzy numbers. So far, some fundamental operational laws of GSFNs are characterized, yet excluding the logarithmic operation. In this manuscript, we have defined and discussed various algebraic properties of logarithmic operational law (LOL) for GSFN where the logarithmic base \(\delta \) is a positive real number. Moreover, we have developed weighted averaging and weighted geometric aggregation operators and utilize these aggregation operators to initiate a multi-criteria group decision making (MCGDM) technique in the generalized spherical fuzzy (GSF) environment, which has been used to solve a problem of cloud service management. We have indicated the utility and reliability of the proposed MCGDM technique through sensitivity analysis. Finally, a comparative study has been presented with the help of a real data set to justify the rationality and efficiency of our proposed method with the existing methods.

✇ Knowledge and Information Systems

FoodRecNet: a comprehensively personalized food recommender system using deep neural networks

1 de Setembro de 2023, 00:00

Abstract

Today, the huge variety of foods and the existence of different food preferences among people have made it difficult to choose the right food according to people's food preferences for different meals. Also, achieving a pleasant balance between users’ food preferences and health requirements, considering the physical condition, diseases/allergies of users, and having a suitable dietary diversity, has become a requirement in the field of nutrition. Therefore, the need for an intelligent system to recommend and choose the proper food based on these criteria is felt more and more. In this paper, a deep learning-based food recommender system, termed “FoodRecNet”, is presented using a comprehensive set of characteristics and features of users and foods, including users’ long-term and short-term preferences, users’ health conditions, demographic information, culture, religion, food ingredients, type of cooking, food category, food tags, diet, allergies, text description, and even the images of the foods. The appropriate combination of features used in the proposed system has been identified based on detailed investigations conducted in this research. In order to achieve a desired architecture of the deep artificial neural network for our purpose, five different architectures are designed and evaluated, considering the specific characteristics of the intended application In addition, to evaluate the FoodRecNet, this work constructs a large-scale annotated dataset, consisting of 3,335,492 records of food information, users and their scores, and 54,554 food images. The experiments conducted on this dataset and the “FOOD.COM” benchmark dataset confirm the effectiveness of the combination of features used in FoodRecNet. The RMSE rates obtained by FoodRecNet on these two datasets are 0.7167 and 0.4930, respectively, which are much better than that of competitors. All the implementation source codes and datasets of this research are made publicly available at https://github.com/saeedhamdollahi/FoodRecNet.

✇ Knowledge and Information Systems

Knowledge-based system for three-way decision-making under uncertainty

1 de Setembro de 2023, 00:00

Abstract

Knowledge-based systems developed based on Dempster–Shafer theory and prospect theory enhances decision-making under uncertainty. But at times, the traditional two-way decision approach may not be able to suggest a suitable decision confidently. This work proposes a three-way decision support system which divides the alternatives into three disjoint sets. Nonparametric Gaussian kernel and mid-range values are used to compute basic probabilities and reference points, respectively. The difference between basic probabilities and reference points is considered for assigning gain–loss values based on the value function from prospect theory. Ten publicly available benchmark data sets are considered, and the effectiveness of the proposed system is affirmed by comparing its performance with traditional machine learning models and other relevant decision-making systems in the literature. A case study related to evaluation of candidates is included, and it is also compared with other reference point estimation methods. From the results, it can be inferred that considering mid-range values as reference generates a preference order that is intuitive and compliable.

✇ Knowledge and Information Systems

Tracking social provenance in chains of retweets

1 de Outubro de 2023, 00:00

Abstract

In the era of massive sharing of information, the term social provenance is used to denote the ownership, source or origin of a piece of information which has been propagated through social media. Tracking the provenance of information is becoming increasingly important as social platforms acquire more relevance as source of news. In this scenario, Twitter is considered one of the most important social networks for information sharing and dissemination which can be accelerated through the use of retweets and quotes. However, the Twitter API does not provide a complete tracking of the retweet chains, since only the connection between a retweet and the original post is stored, while all the intermediate connections are lost. This can limit the ability to track the diffusion of information as well as the estimation of the importance of specific users, who can rapidly become influencers, in the news dissemination. This paper proposes an innovative approach for rebuilding the possible chains of retweets and also providing an estimation of the contributions given by each user in the information spread. For this purpose, we define the concept of Provenance Constraint Network and a modified version of the Path Consistency Algorithm. An application of the proposed technique to a real-world dataset is presented at the end of the paper.

✇ Knowledge and Information Systems

Exploiting spatial relations for grammar-based specification of multidimensional languages

1 de Outubro de 2023, 00:00

Abstract

As opposed to textual programming languages, multidimensional languages compiler construction paradigms lack standardization. To this aim, in this paper we present the spatial grammar (SG) formalism, a grammar model for multidimensional languages which has string-like productions containing more general spatial relations other than string concatenation, and we provide mapping rules to translate an SG specification into a translation schema. In this way, the SG formalism inherits and extends to the multidimensional context concepts and techniques of standard compiler generation tools like YACC.

✇ Knowledge and Information Systems

Analysis and price prediction of cryptocurrencies for historical and live data using ensemble-based neural networks

1 de Outubro de 2023, 00:00

Abstract

The popularity of cryptocurrencies has been on the rise with the emergence of blockchain technologies. There have been enormous investments in the cryptocurrency market over the past few years. However, the volatile nature and significant price fluctuations in cryptocurrency have resulted in a high investment risk of these assets. In this paper, an improved neural network (NN) ensemble-based approach is proposed with the help of Convolutional Neural Network (CNN) and Bidirectional Long Short-Term Memory (LSTM), i.e., CNN-BiLSTM for long-term price prediction of cryptocurrencies using both live data API and historical data. The CNN learns internal representation of each cryptocurrency independently and extracts useful features. On the other hand, the LSTM layers are used to predict time-series data and recognize the long as well as short-term dependencies efficiently and accurately. The proposed ensemble of CNN-BiLSTM consists of three layers of CNN and two layers of Bi-LSTM to improve the accuracy of the predictions. Moreover, MLP, GRU, CNN and LSTM models are also implemented and compared with the proposed model on the test datasets followed by error evaluation. For evaluating the error of each model, Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) scores are analyzed for four cryptocurrencies Bitcoin, Ethereum, Dogecoin and Litecoin of historical and live data API. It is observed that the proposed CNN-BiLSTM ensemble model has the lowest RMSE score of 0.164 for live data API for Bitcoin and 0.166 for historical dataset for Dogecoin. The MSE score of 0.027 is observed for both Bitcoin and Dogecoin cryptocurrencies for live data API and 0.027 for Dogecoin for historical dataset. Thus, RMSE and MSE scores of the proposed approach are very less as compared to MLP, GRU, CNN and LSTM models for cryptocurrency price prediction for both the datasets.

✇ Knowledge and Information Systems

Continual text classification based on knowledge distillation and class-aware experience replay

1 de Outubro de 2023, 00:00

Abstract

Continual text classification aims at constantly classifying the texts from an infinite text stream while preserving stable classification performance on the seen texts. How to avoid catastrophic forgetting is a core issue in continual text classification. Most existing methods for handling catastrophic forgetting are based on regularization or replay. Usually, regularization-based strategies only consider one neural network layer and ignore the knowledge contained in other layers, and replay-based strategies neglect the class information. In the present paper, we introduce two strategies, knowledge distillation and class-aware experience replay, to consider two-level knowledge in a neural network and the class information to mitigate catastrophic forgetting. We use BERT as the encoder of our method. Extensive experimental results obtained on large-scale benchmarks show that our method is superior to the state-of-the-art methods under the continual learning setting.

✇ Knowledge and Information Systems

MIN: multi-dimensional interest network for click-through rate prediction

1 de Outubro de 2023, 00:00

Abstract

Click-through rate (CTR) prediction is a critical task in recommender systems and online advertising systems. The extensive collection of behavior data has become popular for building prediction models by capturing user interests from behavior sequences. There are two types of entities involved in behavior sequences, users and items, which form three kinds of relationships: user-to-user, user-to-item, and item-to-item. Most related work focuses on only one or two of these relationships, often ignoring the association between users, which also helps discover potential user interests. In this paper, we consider all three relationships useful and propose a Multi-dimensional Interest Network (MIN) to focus on their impact on CTR prediction simultaneously. It consists of three sub-networks that capture users’ preferences regarding group interests and individual interests. Specifically, the u-u sub-network models the relationship between the target user and those who have clicked on the target item. It takes user representations learned from behavior sequences via transformer as input. Two other sub-networks capture the individual interest of the target user. The u-i sub-network models the relationship between the target user and the target item. The i-i sub-network models the relationship between the target item and the items the target user has interacted with in the past time. Extensive evaluations on the real datasets show that our MIN model outperforms the state-of-the-art solutions in prediction accuracy ( \(+\)  5.0% in AUC and − 17.2% in Logloss, averagely). The ablation experiments also validate that each sub-network in MIN helps with improving the CTR prediction performance by using the u-u sub-network playing a more critical role. The source code is available at https://github.com/cocolixiao/MIN.

✇ Knowledge and Information Systems

Imbalance factor: a simple new scale for measuring inter-class imbalance extent in classification problems

1 de Outubro de 2023, 00:00

Abstract

Learning from datasets that suffer from differences in absolute frequency of classes is one of the most challenging tasks in the machine learning field. Efforts have been made to tackle the problem of class imbalance by providing solutions at data and algorithmic levels. In these cases, in order to categorize the solutions according to problem class imbalance level and to obtain meaningful and consistent interpretations from the experiments, it is essential to be able to quantify the extent of dataset imbalance. A competent scale to summarize the severity of data inter-class imbalance, requires to meet at least the following three conditions: (1) the ability to calculate the imbalance extent for both binary and multi-class datasets, (2) output within a definite and fixed range of values, (3) being correlated with the performance of different classifiers. Nevertheless, none of the scales introduced so far satisfy all the enumerated requirements. In this study, we propose an informative scale called imbalance factor (IF) based on information theory, which, independent of the number of data classes, quantifies dataset imbalance extent in a single value in the range of [0, 1]. Besides, IF offers various limiting cases with different growth rates according to its α order. This property is critical as it can settle the possibility of having the same extent for distinct distributions. Eventually, empirical experiments indicate that with an average correlation of 0.766 with the classification accuracies over 15 real datasets, IF is remarkably more sensitive to class imbalance changes than other previous scales.

✇ Knowledge and Information Systems

Heterogeneous graph neural network with semantic-aware differential privacy guarantees

1 de Outubro de 2023, 00:00

Abstract

Most social networks can be modeled as heterogeneous graphs. Recently, advanced graph learning methods exploit the rich node properties and topological relationships for downstream tasks. That means that more private information is embedded in the representation. However, the existing privacy-preserving methods only focus on protecting the single type of node attributes or relationships, which neglect the significance of high-order semantic information. To address this issue, we propose a novel Heterogeneous graph neural network with Semantic-aware Differential privacy Guarantees named HeteSDG, which provides a double privacy guarantee and performance trade-off in terms of both graph features and topology. In particular, we first reveal the privacy leakage in heterogeneous graphs and define a membership inference attack with a semantic enhancement (MIS) that will improve the means of member inference attacks by obtaining side background knowledge through semantics. Then we design a two-stage mechanism, which includes the feature attention personalized mechanism and the topology gradient perturbation mechanism, where the privacy-preserving technologies are based on differential privacy. These mechanisms will defend against MIS and provide stronger interpretation, but simultaneously bring in noise for representation learning. To better balance the noise perturbation and learning performance, we utilize a bi-level optimization pattern to allocate a suitable privacy budget for the above two modules. Our experiments on four public benchmarks conduct performance experiments, ablation studies, inference attack verification, etc. The results show the privacy protection capability and generalization of HeteSDG.

✇ Knowledge and Information Systems

Read to grow: exploring metadata of books to make intriguing book recommendations for teenage readers

1 de Novembro de 2023, 00:00

Abstract

It is clearly established that spending time reading is beneficial for an individual’s development in terms of their social, emotional, and intellectual capabilities. This is especially true for teenagers who are in the growing process and reading can improve their memory, vocabulary, concentration and attention span, creativity and imagination, and writing skills. With the overwhelming volume of (online) books available these days, it becomes a huge challenge to find suitable and appealing books to read. Current book recommender systems, however, do not adequately capitalize teenagers’ specific needs such as readability levels, emotional capabilities, and subject’s comprehension, that are more at the forefront for teenage readers than adults and children. To make appropriate recommendations on books for teenagers, we propose a book recommender system, called TBRec. TBRec recommends books to teenagers based on their personal preferences and needs that are determined by using various book features. These features, which include book genres, topic relevance, emotion traits, readers’ advisory, predicted user rating, and readability level, have significant impact on the teenagers’ preference and satisfaction on a book. These distinguished parts of a book, which are premeditated and essential criteria for book selection, identify the type, subject area, state of consciousness, appeal factors, (un)likeness, and complexity of the book content, respectively. Experimental results reveal that TBRec outperforms Amazon, Barnes and Noble, and LibraryThing, three of the widely used book recommenders, in making book recommendations for teenagers, and the results are statistically significant.

✇ Knowledge and Information Systems

Human activity recognition using fuzzy proximal support vector machine for multicategory classification

1 de Novembro de 2023, 00:00

Abstract

Applying machine learning tools to human activity analysis presents two major challenges: Firstly, the transformation of actions into multiple attributes increases training and testing time significantly. Secondly, the presence of noises and outliers in the dataset adds complexity and makes it difficult to implement the activity detection system efficiently. This paper addresses both of the challenges by proposing a kernel fuzzy proximal support vector machine as a robust classifier for multicategory classification problems. It transforms the input patterns into a higher-dimensional space and assigns each pattern an appropriate membership degree to reduce the effect of noises and outliers. The proposed method only requires the solution of a set of linear equations to obtain the classifiers; thus, it is computationally efficient. The computer simulation results on ten UCI benchmark problems show that the proposed method outperforms established methods in predictive accuracy. Finally, numerical results from three human activity recognition problems validate the applicability of the proposed method.

✇ Knowledge and Information Systems

Scholarly recommendation systems: a literature survey

1 de Novembro de 2023, 00:00

Abstract

A scholarly recommendation system is an important tool for identifying prior and related resources such as literature, datasets, grants, and collaborators. A well-designed scholarly recommender significantly saves the time of researchers and can provide information that would not otherwise be considered. The usefulness of scholarly recommendations, especially literature recommendations, has been established by the widespread acceptance of web search engines such as CiteSeerX, Google Scholar, and Semantic Scholar. This article discusses different aspects and developments of scholarly recommendation systems. We searched the ACM Digital Library, DBLP, IEEE Explorer, and Scopus for publications in the domain of scholarly recommendations for literature, collaborators, reviewers, conferences and journals, datasets, and grant funding. In total, 225 publications were identified in these areas. We discuss methodologies used to develop scholarly recommender systems. Content-based filtering is the most commonly applied technique, whereas collaborative filtering is more popular among conference recommenders. The implementation of deep learning algorithms in scholarly recommendation systems is rare among the screened publications. We found fewer publications in the areas of the dataset and grant funding recommenders than in other areas. Furthermore, studies analyzing users’ feedback to improve scholarly recommendation systems are rare for recommenders. This survey provides background knowledge regarding existing research on scholarly recommenders and aids in developing future recommendation systems in this domain.

✇ Knowledge and Information Systems

Recommender Systems in Cybersecurity

1 de Dezembro de 2023, 00:00

Abstract

With the growth of CyberTerrorism, enterprises worldwide have been struggling to stop intruders from obtaining private data. Despite the efforts made by Cybersecurity experts, the shortage of skillful security teams and the usage of intelligent attacks have slowed down the enhancement of defense mechanisms. Furthermore, the pandemic in 2020 forced organizations to work in remote environments with poor security, leading to increased cyberattacks. One possible solution for these problems is the implementation of Recommender Systems to assist Cybersecurity human operators. Our goal is to survey the application of Recommender Systems in Cybersecurity architectures. These decision-support tools deal with information overload through filtering and prioritization methods, allowing businesses to increase revenue, achieve better user satisfaction, and make faster and more efficient decisions in various domains (e-commerce, healthcare, finance, and other fields). Several reports demonstrate the potential of using these recommendation structures to enhance the detection and prevention of cyberattacks and aid Cybersecurity experts in treating client incidents. This survey discusses several studies where Recommender Systems are implemented in Cybersecurity with encouraging results. One promising direction explored by the community is using Recommender Systems as attack predictors and navigation assistance tools. As contributions, we show the recent efforts in this area and summarize them in a table. Furthermore, we provide an in-depth analysis of potential research lines. For example, the inclusion of Recommender Systems in security information event management systems and security orchestration, automation, and response applications could decrease their complexity and information overload.

✇ Knowledge and Information Systems

Meta-heuristic endured deep learning model for big data classification: image analytics

1 de Novembro de 2023, 00:00

Abstract

Image processing is currently developing as a unique and the inventive field in computer research and applications in the modern area. Most image processing algorithms produce a large quantity of data as an outcome, which is termed as Big-data. These algorithms process and store bulky information either as structured or unstructured data. The use of big data analytics to mine the data produced by image processing technology has huge potential in areas like education, governments, medical establishments, production units, finance and banking, and retail business centers. This paper well defined the innovations made in Big Data analytics and image processing. In this study, a novel data classification approach especially for image analytics is proposed. To improve image quality, pre-processing is applied to huge data that has been gathered. Then, most relevant features like spatial information, texture GLCM, and color and shape features are extracted from the pre-processed image. Since the dimensions of the features are huge in size, an adaptive map-reduce framework with Improved Shannon Entropy has been introduced to lessen the dimensionality of the extracted features. Then, in the big data classification phase, an optimized deep learning classifier deep convolutional neural network (DCNN) is introduced to classify the images accurately. The weight function of the DCNN is fine-tuned using the newly proposed dragonfly updated mothsearch (DAUMS) Algorithm to enhance the classification accuracy and to solve the optimization problems of the research work. The moth search algorithm and dragonfly algorithm are both concepts in this hybrid algorithm DAUMS.

✇ Knowledge and Information Systems

Enhanced U-Net segmentation with ensemble convolutional neural network for automated skin disease classification

1 de Outubro de 2023, 00:00

Abstract

In recent years, skin-related problems induce psychological problems and also injure physical health, particularly if the patient’s face was disfigured or damaged. Smart devices are used for gathering medical images for knowing their skin condition. Skin disease diagnosis is a complex task, which can be solved by adopting different lesion detection and classification approaches. However, the existing challenges cannot be solved by mixing the disease samples from diverse data sources while using simple data fusion approaches. The traditional deep learning-based computer-aided diagnosis approaches suffer from poor extraction of skin lesions due to complex features like limited training datasets, low contrast with the background, presence of artifacts, and fuzzy boundaries. It also includes problems like complex computation, poor generalization, and over-fitting while using the appropriate tuning of large-scale parameters. This paper intends to propose a new framework by using skin lesions classification and segmentation procedures for the automated diagnosis of various skin diseases. The significant stages of the given offered method are pre-processing lesion segmentation and classification. In the beginning, grey-level conversion, hair removal, and contrast enhancement are performed to make the image fit for effective classification. Once image pre-processing is over, the segmentation of skin lesions is done by the enhanced U-Net segmentation, in which the improvement is attained by proposing a hybrid optimization algorithm. Moreover, the offered hybridized optimization algorithm solves the local optimum issues, and also it has the ability for resolving a finite set of problems. Merging the optimization algorithms can balance the exploration and exploitation capability owing to its ability of convergence speed, searching global optimum, and simplicity. The classification is further performed by the optimized ensemble-convolutional neural network (E-CNN). Instead of the fully connected layer in CNN, five different expert systems like random forest, artificial neural network, support vector machine, Adaboost, and Extreme Gradient Boosting (XGBoost) are used for classifying the skin disease by CNN. The system also employs optimization of different parameters in the classification stage to improve computing efficiency and reduce network complexity. The hybrid meta-heuristic termed whale-electric fish optimization (W-EFO) based on EFO and whale optimization algorithm is used for improvising the segmentation and classification task. The comparative analysis over conventional models proves that the developed model encourages effective performance when analyzing diverse measures.

❌