# Noticias em eLiteracias

Knowledge and Information Systems

# A review on matrix completion for recommender systems

### Abstract

Recommender systems that predict the preference of users have attracted more and more attention in decades. One of the most popular methods in this field is collaborative filtering, which employs explicit or implicit feedback to model the user–item connections. Most methods of collaborative filtering are based on matrix completion techniques which recover the missing values of user–item interaction matrices. The low-rank assumption is a critical premise for matrix completion in recommender systems, which speculates that most information in interaction matrices is redundant. Based on this assumption, a large number of methods have been developed, including matrix factorization models, rank optimization models, and frameworks based on neural networks. In this paper, we first provide a brief description of recommender systems based on matrix completion. Next, several classical and state-of-the-art algorithms related to matrix completion for collaborative filtering are introduced, most of which are based on the assumption of low-rank property. Moreover, the performance of these algorithms is evaluated and discussed by conducting substantial experiments on different real-world datasets. Finally, we provide open research issues for future exploration of matrix completion on recommender systems.

Knowledge and Information Systems

# Scalable force-directed graph representation learning and visualization

### Abstract

A graph embedding algorithm embeds a graph into a low-dimensional space such that the embedding preserves the inherent properties of the graph. While graph embedding is fundamentally related to graph visualization, prior work did not exploit this connection explicitly. We develop Force2Vec that uses force-directed graph layout models in a graph embedding setting with an aim to excel in both machine learning (ML) and visualization tasks. We make Force2Vec highly parallel by mapping its core computations to linear algebra and utilizing multiple levels of parallelism available in modern processors. The resultant algorithm is an order of magnitude faster than existing methods (43 $$\times$$ faster than DeepWalk, on average) and can generate embeddings from graphs with billions of edges in a few hours. In comparison to existing methods, Force2Vec is better in graph visualization and performs comparably or better in ML tasks such as link prediction, node classification, and clustering. Source code is available at https://github.com/HipGraph/Force2Vec.This paper is an extension of a conference paper by Rahman et al. (in: 20th IEEE international conference on data mining, IEEE ICDM, 2020b) published in IEEE ICDM 2020.

Knowledge and Information Systems

# Efficient multi-attribute precedence-based task scheduling for edge computing in geo-distributed cloud environment

### Abstract

In order to realize globalization of cloud computing, joint use of different services of different cloud providers has become an inevitable trend. The geo-distributed cloud consists of several different clouds, providing a general environment for cloud computing. In data placement, many recently proposed data placement algorithms unilaterally use a single performance index to evaluate the performance of the algorithm. In task scheduling, when tasks are allocated with excess cloud resources, resources are wasted. When little cloud resources are allocated to the complex task, cause the overall progress of the system to stagnate, the overall progress of the system is stalled. For solving the above problems, the data placement method and the task scheduling method are proposed. In the proposed data placement scheme, multiple performance indicators are considered. The detection of the straggling nodes and the reasonable allocation of cloud resources are taken into account when the task is scheduled. For proving the superiority of the proposed methods, extensive experiments are conducted. In terms of the data placement, when the number of files is set as 800, the safety level of the proposed data placement algorithm is 7.0, which is 27.3% higher than that of the IDP algorithm, 45.8% higher than that of the GA-DPSO algorithm and 16.7% higher than that of the H2DP algorithm. As for the task scheduling, the percentage improvement in the time overhead of the proposed task scheduling method is the lowest, which implies that the time overhead of the proposed task scheduling algorithm is closest to the optimal time and is the shortest.

# Editorial Board

Publication date: January 2022

Source: The Journal of Academic Librarianship, Volume 48, Issue 1

Author(s):

Journal of the American Society for Information Science and Technology

# Improving the effectiveness of voice search systems through partial query modification

## Abstract

This paper addresses the importance of improving the effectiveness of voice search systems through partial query modification. A user-centered experiment was designed to compare the effectiveness of an experimental system using partial query modification feature to a baseline system in which users could issue complete queries only, with 32 participants each searching on eight different tasks. The results indicate that the participants spent significantly more time preparing the modification but significantly less time speaking the modification by using the experimental system than by using the baseline system. The participants found that the experimental system (a) was more effective, (b) gave them more control, (c) was easier for the search tasks, and (d) saved them time than the baseline system. The results contribute to improving future voice search system design and benefiting the research community in general. System implications and future work were discussed.

# Student Learning Outcomes Assessment in Higher Education and in Academic Libraries: A Review of the Literature

Publication date: March 2022

Source: The Journal of Academic Librarianship, Volume 48, Issue 2

Author(s): Harold Goss

Journal of Informetrics

# Patent collaborations: From segregation to globalization

Publication date: February 2022

Source: Journal of Informetrics, Volume 16, Issue 1

Author(s): Maria Tsouchnika, Alex Smolyak, Panos Argyrakis, Shlomo Havlin

Journal of Informetrics

# Which types of online resource support US patent claims?

Publication date: February 2022

Source: Journal of Informetrics, Volume 16, Issue 1

Author(s): Cristina I Font-Julián, José-Antonio Ontalba-Ruipérez, Enrique Orduña-Malea, Mike Thelwall

Journal of Informetrics

# Dynamics of senses of new physics discourse: Co-keywords analysis

Publication date: February 2022

Source: Journal of Informetrics, Volume 16, Issue 1

Author(s): Yurij L. Katchanov, Yulia V. Markova

Journal of Informetrics

# The continuity and citation impact of scientific collaboration with different gender composition

Publication date: February 2022

Source: Journal of Informetrics, Volume 16, Issue 1

Author(s): Hongquan Shen, Juan Xie, Weiyi Ao, Ying Cheng

Journal of Informetrics

# Analysis of duplicated publications in Russian journals

Publication date: February 2022

Source: Journal of Informetrics, Volume 16, Issue 1

Author(s): Yury V. Chekhovich, Andrey V. Khazov

Journal of Informetrics

# Effectiveness of research grants funded by European Research Council and Polish National Science Centre

Publication date: February 2022

Source: Journal of Informetrics, Volume 16, Issue 1

Author(s): Maciej Dzieżyc, Przemysław Kazienko

Journal of Informetrics

# New directions in science emerge from disconnection and discord

Publication date: February 2022

Source: Journal of Informetrics, Volume 16, Issue 1

Author(s): Yiling Lin, James A. Evans, Lingfei Wu

Journal of Informetrics

# Scores of a specific field-normalized indicator calculated with different approaches of field-categorization: Are the scores different or similar?

Publication date: February 2022

Source: Journal of Informetrics, Volume 16, Issue 1

Author(s): Robin Haunschild, Angela D. Daniels, Lutz Bornmann

Knowledge and Information Systems

# Random pairwise shapelets forest: an effective classifier for time series

### Abstract

Shapelet is a discriminative subsequence of time series. An advanced shapelet-based method is to embed shapelet into the accurate and fast random forest. However, there are several limitations. First, random shapelet forest requires a large training cost for split threshold searching. Second, a single shapelet provides limited information for only one branch of the decision tree, resulting in insufficient accuracy. Third, the randomized ensemble decreases comprehensibility. For that, this paper presents Random Pairwise Shapelets Forest (RPSF). RPSF combines a pair of shapelets from different classes to construct random forest. It omits threshold searching to be more efficient, includes more information about each node of the forest to be more effective. Moreover, a discriminability measure, Decomposed Mean Decrease Impurity, is proposed to identify the influential region for each class. Extensive experiments show that RPSF is competitive compared with other methods, while it improves the training speed of shapelet-based forest.

Knowledge and Information Systems

# Ensemble of classifier chains and decision templates for multi-label classification

### Abstract

Multi-label classification is the task of inferring the set of unseen instances using the knowledge obtained through the analysis of a set of training examples with known label sets. In this paper, a multi-label classifier fusion ensemble approach named decision templates for ensemble of classifier chains is presented, which is derived from the decision templates method. The proposed method estimates two decision templates per class, one representing the presence of the class and the other representing its absence, based on the same examples used for training the set of classifiers. For each unseen instance, a new decision profile is created and the similarity between the decision templates and the decision profile determines the resulting label set. The method is incorporated into a traditional multi-label classifier algorithm: the ensemble of classifier chains. Empirical evidence indicates that the use of the proposed decision templates adaptation can improve the performance over the traditionally used combining schemes, especially for domains with a large number of instances available, improving the performance of an already high-performing multi-label learning method.

Knowledge and Information Systems

# The phantom steering effect in Q&A websites

### Abstract

Virtual rewards, such as badges, are commonly used in online platforms as incentives for promoting contributions from a userbase. It is widely accepted that such rewards “steer” people’s behaviour towards increasing their rate of contributions before obtaining the reward. This paper provides a new probabilistic model of user behaviour in the presence of threshold rewards, such a badges. We find, surprisingly, that while steering does affect a minority of the population, the majority of users do not change their behaviour around the achievement of these virtual rewards. In particular, we find that only approximately 5–30% of Stack Overflow users who achieve the rewards appear to respond to the incentives. This result is based on the analysis of thousands of users’ activity patterns before and after they achieve the reward. Our conclusion is that the phenomenon of steering is less common than has previously been claimed. We identify a statistical phenomenon, termed “Phantom Steering”, that can account for the interaction data of the users who do not respond to the reward. The presence of phantom steering may have contributed to some previous conclusions about the ubiquity of steering. We conduct a qualitative survey of the users on Stack Overflow which supports our results, suggesting that the motivating factors behind user behaviour are complex, and that some of the online incentives used in Stack Overflow may not be solely responsible for changes in users’ contribution rates.

Knowledge and Information Systems

# Computing top-k temporal closeness in temporal networks

### Abstract

The closeness centrality of a vertex in a classical static graph is the reciprocal of the sum of the distances to all other vertices. However, networks are often dynamic and change over time. Temporal distances take these dynamics into account. In this work, we consider the harmonic temporal closeness with respect to the shortest duration distance. We introduce an efficient algorithm for computing the exact top-k temporal closeness values and the corresponding vertices. The algorithm can be generalized to the task of computing all closeness values. Furthermore, we derive heuristic modifications that perform well on real-world data sets and drastically reduce the running times. For the case that edge traversal takes an equal amount of time for all edges, we lift two approximation algorithms to the temporal domain. The algorithms approximate the transitive closure of a temporal graph (which is an essential ingredient for the top-k algorithm) and the temporal closeness for all vertices, respectively, with high probability. We experimentally evaluate all our new approaches on real-world data sets and show that they lead to drastically reduced running times while keeping high quality in many cases. Moreover, we demonstrate that the top-k temporal and static closeness vertex sets differ quite largely in the considered temporal networks.

Knowledge and Information Systems

# A hybrid quantum approach to leveraging data from HTML tables

### Abstract

The Web provides many data that are encoded using HTML tables. This facilitates rendering them, but obfuscates their structure and makes it difficult for automated business processes to leverage them. This has motivated many authors to work on proposals to extract them as automatically as possible. In this article, we present a new unsupervised proposal that uses a hybrid approach in which a standard computer is used to perform pre- and post-processing tasks and a quantum computer is used to perform the core task: guessing whether the cells have labels or values. The problem is addressed using a clustering approach that is known to be NP using standard computers, but our proposal can solve it in polynomial time, which implies a significant performance improvement. It is novel in that it relies on an entropy-preservation metaphor that has proven to work very well on two large collections of real-world tables from the Wikipedia and the Dresden Web Table Corpus. Our experiments prove that our proposal can beat the state-of-the-art proposal in terms of both effectiveness and efficiency; the key difference is that our proposal is totally unsupervised, whereas the state-of-the-art proposal is supervised.

Journal of the American Society for Information Science and Technology

# Information: Keywords. Kennerly, Michele, Frederick, Samuel, and Abel, Jonathan E New York: Columbia University Press, 2021. 232 pp. \$110.00 (hardcover). (ISBN: 9780231198769)

Journal of the Association for Information Science and Technology, EarlyView.
Journal of the American Society for Information Science and Technology

# Citing criteria and its effects on researcher's intention to cite: A mixed‐method study

## Abstract

This study explored users' criteria for citation decisions and investigated the effects on users' intention to cite using a mixed-method approach. A qualitative study was conducted first, where 16 citing criteria were identified based on interviews and inductive analysis. The findings were then used to develop hypotheses and extend the information adoption model. A questionnaire was designed to collect data from users in Chinese universities to test the research model. The findings indicated that pleasure, topicality, and functionality significantly increased users' perceived information usefulness, while familiarity and accessibility significantly enhanced users' perceived ease of use. Information usefulness and information ease of use further contributed to users' intention to cite with adjusted R 2 equaling 44.6%. It is also found that perceived academic quality based on 5 antecedents (i.e., reliability, comprehensiveness, novelty, author credibility, and source reputation) significantly increased users' pleasure. Implications and limitations were provided.

# Don’t Fear the Union: Successful Management in a Unionized Library

Volume 62, Issue 1, January 2022, Page 122-131
.

# Information Literacy Behavior and Practice: An Assessment of Undergraduate Students at Ada College of Education, Ghana

Volume 62, Issue 1, January 2022, Page 132-151
.