Noticias em eLiteracias

🔒
❌ Sobre o FreshRSS
Há novos artigos disponíveis, clique para atualizar a página.
Antes de ontemJournal of eScience Librarianship

Special Issue: 2021 Research Data Access and Preservation Summit

The Journal of eScience Librarianship has partnered with the Research Data Access and Preservation (RDAP) Association for a fourth year to publish selected conference proceedings.

The fully-virtual 2021 Research Data Access and Preservation (RDAP) Summit focused on the theme of Radical Change and Data. This editorial introduces the 2021 RDAP Special Issue of the Journal of eScience Librarianship.

  • 12 de Novembro de 2021, 19:07

Data Consultations, Racism, and Critiquing Colonialism in Demographic Datasheets

Objective: We consider how data librarians can take antiracist action in education and consultations. We attempt to apply QuantCrit thinking, particularly to demographic datasheets.

Methods: We synthesize historical context with modern critical thinking about race and data to examine the origins of current assumptions about data. We then present examples of how racial categories can hide, rather than reveal, racial disparities. Finally, we apply the Model of Domain Learning to explain why data science and data management experts can and should expose experts in subject research to the idea of critically examining demographic data collection.

Results: There are good reasons why patrons who are experts in topics other than racism can find it challenging to change habits from Interoperable approaches to race. Nevertheless, the Census categories explicitly say that they have no basis in research or science. Therefore, social justice requires that data librarians should expose researchers to this fact. If possible, data librarians should also consult on alternatives to habitual use of the Census racial categories.

Conclusions: We suggest that many studies are harmed by including race and should remove it entirely. Those studies that are truly examining race should reflect on their research question and seek more relevant racial questions for data collection.

  • 10 de Novembro de 2021, 20:22

Preparing a Data Archive or Repository for Changing Research Data and Materials Retention Policies

Archival expectations and requirements for researchers’ data and code are changing rapidly, both among publishers and institutions, in response to what has been referred to as a “reproducibility crisis.” In an effort to address this crisis, a number of publishers have added requirements or recommendations to increase the availability of supporting information behind the research, and academic institutions have followed. Librarians should focus on ways to make it easier for researchers to effectively share their data and code with reproducibility in mind. At the Cornell Center for Social Sciences, we have instituted a Results Reproduction Service (R-Squared) for Cornell researchers. Part of this service includes archiving the R-Squared package in our CoreTrustSeal certified Data and Reproduction Archive, which has been rebuilt to accommodate both the unique requirements of those packages and the traditional role of our data archive. Librarians need to consider roles that archives and institutional repositories can play in supporting researchers with reproducibility initiatives. Our commentary closes with some suggestions for more information and training.

  • 10 de Novembro de 2021, 20:22

Do I Have To Be An “Other” To Be Myself? Exploring Gender Diversity In Taxonomy, Data Collection, And Through The Research Data Lifecycle

Objective: Existing studies estimate that between 0.3% and 2% of adults in the U.S. (between 900,000 and 2.6 million in 2020) identify as a nonbinary gender or otherwise gender nonconforming. In response to the RDAP 2021 theme of radical change, this article examines the need to change how datasets represent nonbinary persons and how research involving gender data should approach the curation of this data at each stage of the research lifecycle.

Methods: In this article, we examine some of the known challenges of gender inclusion in datasets and summarize some solutions underway. Using a critical lens, we examine the difference between current practice and inclusive practice in gender representation, describing inclusive practices at each stage of the research lifecycle from writing a data management plan to sharing data.

Results: Data structures that limit gender to “male” and “female” or ontological structures that use mapping to collapse gender demographics to binary values exclude nonbinary and gender diverse populations. Some data collection instruments attempt inclusivity by adding the gender category of “other,” but using the “other” gender category labels nonbinary persons as intrinsically alien. Inclusive change must go farther, to move from alienation to inclusive categories. We describe several techniques for inclusively representing gender in data, from the data management planning stage, to collecting data, cleaning data, and sharing data. To facilitate better sharing of gender data, repositories must also allow mapping that includes nonbinary genders explicitly and allow for ontological mapping for long-term representation of diverse gender identities.

Conclusions: A good practice during research design is to consider two levels of critique in the data collection plan. First, consider the research question at hand and remove unnecessary gendering from the data. Secondly, if the research question needs gender, make sure to include nonbinary genders explicitly. Allies must take on this problem without leaving it to those who are most affected by it. Further, more voices calling for inclusionary practices surrounding data rises to a crescendo that cannot be ignored.

  • 10 de Novembro de 2021, 20:22

Data Curation Implications of Qualitative Data Reuse and Big Social Research

Objective: Big social data (such as social media and blogs) and archived qualitative data (such as interview transcripts, field notebooks, and diaries) are similar, but their respective communities of practice are under-connected. This paper explores shared challenges in qualitative data reuse and big social research and identifies implications for data curation.

Methods: This paper uses a broad literature search and inductive coding of 300 articles relating to qualitative data reuse and big social research. The literature review produces six key challenges relating to data use and reuse that are present in both qualitative data reuse and big social research—context, data quality, data comparability, informed consent, privacy & confidentiality, and intellectual property & data ownership.

Results: This paper explores six key challenges related to data use and reuse for qualitative data and big social research and discusses their implications for data curation practices.

Conclusions: Data curators can benefit from understanding these six key challenges and examining data curation implications. Data curation implications from these challenges include strategies for: providing clear documentation; linking and combining datasets; supporting trustworthy repositories; using and advocating for metadata standards; discussing alternative consent strategies with researchers and IRBs; understanding and supporting deidentification challenges; supporting restricted access for data; creating data use agreements; supporting rights management and data licensing; developing and supporting alternative archiving strategies. Considering these data curation implications will help data curators support sounder practices for both qualitative data reuse and big social research.

  • 10 de Novembro de 2021, 20:22

Reflections from Transitioning Carpentries Workshops Online

Objectives: As certified Carpentries instructors, the authors organized and co-taught the University of Montana’s first in-person Carpentries workshop focused on the R programming language during early 2020. Due to the COVID-19 pandemic, a repeated workshop was postponed to the fall of 2020 and was adapted for a fully online setting. The authors share their Carpentries journey from in-person to online instruction, hoping to inspire those interested in organizing Carpentries at their institution for the first time and those interested in improving their existing Carpentries presence.

Methods: The authors reflected on their experience facilitating the same Carpentries workshop in-person and online. They used this unique opportunity to compare the effectiveness of a face-to-face environment versus a virtual modality for delivering an interactive workshop.

Results: When teaching in the online setting, the authors learned to emphasize the basics, create many opportunities for feedback using formative assessments, reduce the amount of material presented, and include helpers who are familiar with technology and troubleshooting.

Conclusions: Although the online environment came with challenges (i.e., Zoom logistics and challenges, the need to further condense curricula, etc.), the instructors were surprised at the many advantages of hosting an online workshop. With some adaptations, Carpentries workshops work well in online delivery.

  • 9 de Novembro de 2021, 15:17

Data Management for Systematic Reviews: Guidance is Needed

Data management practices for systematic reviews and other types of knowledge syntheses are variable, with some reviews following open science practices and others with poor reporting practices leading to lack of transparency or reproducibility. Reporting standards have improved the level of detail being shared in published reviews, and also encourage more open sharing of data from various stages of the review process. Similar to project planning or completion of an ethics application, systematic review teams should create a data management plan alongside creation of their study protocol. This commentary provides a brief description of a Data Management Plan Template created specifically for systematic reviews. It also describes the companion LibGuide which was created to provide more detailed examples, and to serve as a living document for updates and new guidance. The creation of the template was funded by the Portage Network.

  • 9 de Novembro de 2021, 15:17

Using Customer Journey Mapping and Design Thinking to Understand the Library’s Role in Supporting the Research Data Lifecycle

Objective: Customer journey mapping and design thinking were identified as useful tools for identifying deeper insights into the research data service needs of researchers on our campus with their direct input. In this article we discuss ways to improve the process in order to identify data needs earlier in the project life and at a more granular level.

Methods: Customer journey mapping and design thinking were employed to get direct input from researchers about their research processes and data management needs. Responses from mapping templates and follow-up interviews were then used to identify themes to be explored using design thinking. Finally, a toolkit was created in Open Science Framework to guide other libraries who wish to employ these techniques

Results: Outcomes from the customer journey mapping and design thinking sessions identified needs in the areas of data storage, organization and sharing. We also identified project-management lessons learned. The first lesson was to ensure the researchers who participate adequately represent the range of data needs on campus. Another was that customer journey mapping would be more effective if the responses were collected in real time and researchers were allowed more flexibility in the mapping process.

Conclusions: Modifications to the customer journey mapping and design thinking techniques will provide real-time responses and deeper insights into the research data service needs of researchers on our campus. Our pilot identified some important gaps but we felt that more subtle and useful outcomes were possible by making changes to our process.

  • 9 de Novembro de 2021, 15:17

Introduction to the Special JeSLIB Issue on Data Curation in Practice

Research data curation is a set of scientific communication processes and activities that support the ethical reuse of research data and uphold research integrity. Data curators act as key collaborators with researchers to enrich the scholarly value and potential impact of their data through preparing it to be shared with others and preserved for the long term. This special issue focuses on practical data curation workflows and tools that have been developed and implemented within data repositories, scholarly societies, research projects, and academic institutions.

  • 11 de Agosto de 2021, 15:43

Active Curation of Large Longitudinal Surveys: A Case Study

In this paper we take an in-depth look at the curation of a large longitudinal survey and activities and procedures involved in moving the data from its generation to the state that is needed to conduct scientific analysis. Using a case study approach, we describe how large surveys generate a range of data assets that require many decisions well before the data is considered for analysis and publication. We use the notion of active curation to describe activities and decisions about the data objects that are “live,” i.e., when they are still being collected and processed for the later stages of the data lifecycle. Our efforts illustrate a gap in the existing discussions on curation. On one hand, there is an acknowledged need for active or upstream curation as an engagement of curators close to the point of data creation. On the other hand, the recommendations on how to do that are scattered across multiple domain-oriented data efforts.

In describing the complexities of active curation of survey data and providing general recommendations we aim to draw attention to the practices of active curation, stimulate the development of interoperable tools, standards, and techniques needed at the initial stages of research projects, and encourage collaborations between libraries and other academic units.

  • 11 de Agosto de 2021, 15:43

(Hyper)active Data Curation: A Video Case Study from Behavioral Science

Video data are uniquely suited for research reuse and for documenting research methods and findings. However, curation of video data is a serious hurdle for researchers in the social and behavioral sciences, where behavioral video data are obtained session by session and data sharing is not the norm. To eliminate the onerous burden of post hoc curation at the time of publication (or later), we describe best practices in active data curation—where data are curated and uploaded immediately after each data collection to allow instantaneous sharing with one button press at any time. Indeed, we recommend that researchers adopt “hyperactive” data curation where they openly share every step of their research process. The necessary infrastructure and tools are provided by Databrary—a secure, web-based data library designed for active curation and sharing of personally identifiable video data and associated metadata. We provide a case study of hyperactive curation of video data from the Play and Learning Across a Year (PLAY) project, where dozens of researchers developed a common protocol to collect, annotate, and actively curate video data of infants and mothers during natural activity in their homes at research sites across North America. PLAY relies on scalable standardized workflows to facilitate collaborative research, assure data quality, and prepare the corpus for sharing and reuse throughout the entire research process.

  • 11 de Agosto de 2021, 15:43

Plain Text & Character Encoding: A Primer for Data Curators

Plain text data consists of a sequence of encoded characters or “code points” from a given standard such as the Unicode Standard. Some of the most common file formats for digital data used in eScience (CSV, XML, and JSON, for example) are built atop plain text standards. Plain text representations of digital data are often preferred because plain text formats are relatively stable, and they facilitate reuse and interoperability. Despite its ubiquity, plain text is not as plain as it may seem. The set of standards used in modern text encoding (principally, the Unicode Character Set and the related encoding format, UTF-8) have complex architectures when compared to historical standards like ASCII. Further, while the Unicode standard has gained in prominence, text encoding problems are not uncommon in research data curation. This primer provides conceptual foundations for modern text encoding and guidance for common curation and preservation actions related to textual data.

  • 11 de Agosto de 2021, 15:43

Data Curation in Practice: Extract Tabular Data from PDF Files Using a Data Analytics Tool

Data curation is the process of managing data to make it available for reuse and preservation and to allow FAIR (findable, accessible, interoperable, reusable) uses. It is an important part of the research lifecycle as researchers are often either required by funders or generally encouraged to preserve the dataset and make it discoverable and reusable. This has been especially important as the Open Access (OA) policy is being implemented in many institutions across the nation. In facilitating research data discovery and enhancing its easier reuse, an efficient data repository and its data curation play key roles. In this article, we briefly discuss the local institutional repository at Penn State University and the general data curation practices we adopt for the deposited files and datasets, then we focus on a data analytics tool that has recently been applied to extract tabular data from PDF files. This is an enhancement to the existing data curation practices as it adds additional tabular data to deposits with PDF files where tables are often embedded and not easily reused.

  • 11 de Agosto de 2021, 15:43

Data Curation through Catalogs: A Repository-Independent Model for Data Discovery

Institutional data repositories are the acknowledged gold standard for data curation platforms in academic libraries. But not every institution can sustain a repository, and not every dataset can be archived due to legal, ethical, or authorial constraints. Data catalogs—metadata-only indices of research data that provide detailed access instructions and conditions for use—are one potential solution, and may be especially suitable for "challenging" datasets. This article presents the strengths of data catalogs for increasing the discoverability and accessibility of research data. The authors argue that data catalogs are a viable alternative or complement to data repositories, and provide examples from their institutions' experiences to show how their data catalogs address specific curatorial requirements. The article also reports on the development of a community of practice for data catalogs and data discovery initiatives.

  • 11 de Agosto de 2021, 15:42

Computational Reproducibility: A Practical Framework for Data Curators

Introduction: This paper presents concrete and actionable steps to guide researchers, data curators, and data managers in improving their understanding and practice of computational reproducibility.

Objectives: Focusing on incremental progress rather than prescriptive rules, researchers and curators can build their knowledge and skills as the need arises. This paper presents a framework of incremental curation for reproducibility to support open science objectives.

Methods: A computational reproducibility framework developed for the Canadian Data Curation Forum serves as the model for this approach. This framework combines learning about reproducibility with recommended steps to improving reproducibility.

Conclusion: Computational reproducibility leads to more transparent and accurate research. The authors warn that fear of a crisis and focus on perfection should not prevent curation that may be ‘good enough.’

  • 11 de Agosto de 2021, 15:42

Touring a Data Curation Network Primer: A Focus on Neuroimaging Data

This video article provides an introduction to a data primer which leads data curators through the process of preparing a neuroimaging dataset for submission into a repository. A team of health sciences librarians and informationists created the primer which is focused on data from functional magnetic resonance images that are saved in either DICOM or NIfTI formats. The video walks through a flowchart discussing the process of preparing data sets to be deposited into a repository, key curatorial questions to ask for data that is highly sensitive, and how to suggest edits to this and other primers. The primer grew out of a data curation workshop hosted by the Data Curation Network.

A transcript of this interview is available for download under Additional Files.

  • 11 de Agosto de 2021, 15:42

Introducing the Qualitative Data Repository's Curation Handbook

In this short practice paper, we introduce the public version of the Qualitative Data Repository’s (QDR) Curation Handbook. The Handbook documents and structures curation practices at QDR. We describe the background and genesis of the Handbook and highlight some of its key content.

  • 11 de Agosto de 2021, 15:42

Implementing and Managing a Data Curation Workflow in the Cloud

Objective: To increase data quality and ensure compliance with appropriate policies, many institutional data repositories curate data that is deposited into their systems. Here, we present our experience as an academic library implementing and managing a semi-automated, cloud-based data curation workflow for a recently launched institutional data repository. Based on our experiences we then present management observations intended for data repository managers and technical staff looking to move some or all of their curation services to the cloud.

Methods: We implemented tooling for our curation workflow in a service-oriented manner, making significant use of our data repository platform’s application programming interface (API). With an eye towards sustainability, a guiding development philosophy has been to automate processes following industry best practices while avoiding solutions with high resource needs (e.g., maintenance), and minimizing the risk of becoming locked-in to specific tooling.

Results: The initial barrier for implementing a data curation workflow in the cloud was high in comparison to on-premises curation, mainly due to the need to develop in-house cloud expertise. However, compared to the cost for on-premises servers and storage, infrastructure costs have been substantially lower. Furthermore, in our particular case, once the foundation had been established, a cloud approach resulted in increased agility allowing us to quickly automate our workflow as needed.

Conclusions: Workflow automation has put us on a path toward scaling the service and a cloud based-approach has helped with reduced initial costs. However, because cloud-based workflows and automation come with a maintenance overhead, it is important to build tooling that follows software development best practices and can be decoupled from curation workflows to avoid lock-in.

  • 11 de Agosto de 2021, 15:42

Responding to Reality: Evolving Curation Practices and Infrastructure at the University of Illinois at Urbana-Champaign

Objective: The Illinois Data Bank provides Illinois researchers with the infrastructure to publish research data publicly. During a five-year review of the Research Data Service at the University of Illinois at Urbana-Champaign, it was recognized as the most useful service offering in the unit. Internal metrics are captured and used to monitor the growth, document curation workflows, and surface technical challenges faced as we assist our researchers. Here we present examples of these curation challenges and the solutions chosen to address them.

Methods: Some Illinois Data Bank metrics are collected internally by within the system, but most of the curation metrics reported here are tracked separately in a Google spreadsheet. The curator logs required information after curation is complete for each dataset. While the data is sometimes ambiguous (e.g., depending on researcher uptake of suggested actions), our curation data provide a general understanding about our data repository and have been useful in assessing our workflows and services. These metrics also help prioritize development needs for the Illinois Data Bank.

Results and Conclusions: The curatorial services polish and improve the datasets, which contributes to the spirit of data reuse. Although we continue to see challenges in our processes, curation makes a positive impact on datasets. Continued development and adaptation of the technical infrastructure allows for an ever-better experience for the curators and users. These improvements have helped our repository more effectively support the data sharing process by successfully fostering depositor engagement with curators to improve datasets and facilitating easy transfer of very large files.

  • 11 de Agosto de 2021, 15:42

An Insider’s Take on Data Curation: Context, Quality, and Efficiency

This commentary describes how context, quality, and efficiency guide data curation at the University of Michigan's Inter-university Consortium for Political and Social Research (ICPSR). These three principals manifest from necessity. A primary purpose of this work is to facilitate secondary data analysis but in order to so, the context of data must be documented. Since a mistake in this work would render any results published from the data inaccurate, quality is paramount. However, optimizing data quality can be time consuming, so automative curation practices are necessary for efficiency. The implementation of these principles (context, quality, and efficiency) is demonstrated by a recent case study with a high-profile dataset. As the nature of data work changes, these principles will continue to guide the practice of curation and establish valuable skills for future curators to cultivate.

  • 11 de Agosto de 2021, 15:41

Creating Guidance for Canadian Dataverse Curators: Portage Network’s Dataverse Curation Guide

Purpose: This paper introduces the Portage Network’s Dataverse Curation Guide and the new bilingual curation framework developed to support it.

Brief Description: Canadian academic institutions and national organizations have been building infrastructure, staffing, and programming to support research data management. Amidst this work, a notable gap emerged between requirements for data curation in general repositories like Dataverse and the requisite workflows and guidance materials needed by curators to meet them. In response, Portage, a national network of data experts, organized a working group to develop a Dataverse curation guide built upon the Data Curation Network’s CURATED workflow. To create a bilingual resource, the original CURATE(D) acronym was modified to CURATION—which has the same meaning in both French and English—and steps were augmented with Dataverse-specific guidance and mapped to three conceptualized levels of curation to assist curators in prioritizing curation actions.

Methods: An environmental scan of relevant deposit and curation guidance materials from Canadian and international institutions identified the need for a comprehensive Dataverse Curation Guide, as most existing resources were either depositor-focused or contained only partial workflows. The resulting Guide synthesized these guidance materials into the CURATION steps and mapped actions to various theoretical levels of data repository services and levels of curation.

Resources: The following documents are supplemental to the Dataverse Curation Guide: the Portage Dataverse North Metadata Best Practices Guide, the Scholars Portal Dataverse Guide, and the Data Curation Network CURATED Workflow and Data Curation Primers.

  • 11 de Agosto de 2021, 15:41

Not Forgetting – 80s Style

Keeping in mind the work done by data librarians is key to understanding the importance of providing open and free access to data. Standards such as persistent identifiers (PIDs) were created to provide long-lasting access to all types of digital materials and resources. Providing new ways to inform and instruct researchers and other users on the importance of making data available for sharing, reproducibility, and re-use helps in driving good and effective social policy for researchers.

  • 30 de Julho de 2021, 17:27

Introducing Reproducibility to Citation Analysis: a Case Study in the Earth Sciences

Objectives:

  • Replicate methods from a 2019 study of Earth Science researcher citation practices.
  • Calculate programmatically whether researchers in Earth Science rely on a smaller subset of literature than estimated by the 80/20 rule.
  • Determine whether these reproducible citation analysis methods can be used to analyze open access uptake.

Methods: Replicated methods of a prior citation study provide an updated transparent, reproducible citation analysis protocol that can be replicated with Jupyter Notebooks.

Results: This study replicated the prior citation study’s conclusions, and also adapted the author’s methods to analyze the citation practices of Earth Scientists at four institutions. We found that 80% of the citations could be accounted for by only 7.88% of journals, a key metric to help identify a core collection of titles in this discipline. We then demonstrated programmatically that 36% of these cited references were available as open access.

Conclusions: Jupyter Notebooks are a viable platform for disseminating replicable processes for citation analysis. A completely open methodology is emerging and we consider this a step forward. Adherence to the 80/20 rule aligned with institutional research output, but citation preferences are evident. Reproducible citation analysis methods may be used to analyze open access uptake, however, results are inconclusive. It is difficult to determine whether an article was open access at the time of citation, or became open access after an embargo.

  • 13 de Maio de 2021, 15:22

Book Review: Data Feminism

Book review of: Data Feminism by Catherine D'Ignazio and Lauren F. Klein, The MIT Press (2020). Data Feminism combines intersectional feminism and critical data studies to invite the reader to consider: “How can we use data to remake the world?” As non-profit organizations with a mandate to provide equitable access to non-neutral information and services, libraries and library workers are uniquely positioned to advance the principles laid out in Data Feminism.

  • 6 de Maio de 2021, 17:21

Digital Object Identifier (DOI) Under the Context of Research Data Librarianship

A digital object identifier (DOI) is an increasingly prominent persistent identifier in finding and accessing scholarly information. This paper intends to present an overview of global development and approaches in the field of DOI and DOI services with a slight geographical focus on Germany. At first, the initiation and components of the DOI system and the structure of a DOI name are explored. Next, the fundamental and specific characteristics of DOIs are described and DOIs for three (3) kinds of typical intellectual entities in the scholar communication are dealt with; then, a general DOI service pyramid is sketched with brief descriptions of functions of institutions at different levels. After that, approaches of the research data librarianship community in the field of RDM, especially DOI services, are elaborated. As examples, the DOI services provided in German research libraries as well as best practices of DOI services in a German library are introduced; and finally, the current practices and some issues dealing with DOIs are summarized. It is foreseeable that DOI, which is crucial to FAIR research data, will gain extensive recognition in the scientific world.

  • 24 de Março de 2021, 17:07
❌