It takes a strong data science community and many stakeholders to make disparate types of data work so that you and I can continue to explore and learn. Being proactive by including diversity, equity, inclusion, and accessibility (DEIA) policies and practices, along with evaluating liaison/outreach roles and established programs and tools, will go a long way in strengthening the library, its staff and services, and the institution. Fighting for the “food” you need to grow the profession and data services is key to the future of the RDM library community.
Objective: The purpose of this study is to examine the usability of the Texas Data Repository (TDR) for the data depositors who are unfamiliar with its interface and use the results to improve user experience.
Methods: This mixed-method research study collected qualitative and quantitative data through a pre-survey, a task-oriented usability test with a think-aloud protocol, and an exit questionnaire. Analysis of the quantitative (i.e., descriptive statistics) and qualitative data (e.g., content analysis of the thinking-aloud protocols) were employed to examine the TDR’s usability for first-time data depositors at Texas A&M University.
Results: While the study revealed that the users were generally satisfied with their experience, the data suggest that a majority of the participants had difficulty understanding the difference between a dataverse collection and dataset, and often found adding or editing metadata overwhelming. The platform’s tiered model for metadata description is core to its function, but many participants did not have an accurate mental model of the platform, which left them scrolling up and down the page or jumping back and forth between different tabs and pages to perform a single task. Based on the results, the authors made some recommendations.
Conclusions: While this paper relies heavily on the context of the Harvard Dataverse repository platform, the authors posit that any self-deposit model, regardless of platform, could benefit from these recommendations. We noticed that completing various metadata fields in the TDR required participants to pivot their mindset from a data creator to that of a data curator. Moreover, the methods used to investigate the usability of the repository can be used to develop additional studies in a variety of repository and service model contexts.
Due to the COVID-19 pandemic, Delft University of Technology in the Netherlands (TU Delft) stopped its activities on campus until autumn 2021 and moved all teaching activities to an online setting. This article describes the challenges and lessons learned from successfully moving basic programming workshops, Software Carpentry workshops, online. The article details the local TU Delft context, the online workshop tools that were employed, and the roles that the organising team played to organise and run these online workshops. To successfully adapt to the online context, it was important to adjust the original planning and programme for the Carpentry workshops. General challenges of online workshops and solutions that worked for the TU Delft team are also shared. Through iteratively developing the online workshops over the past year, the team has enhanced both learners’ and organisers’ experience. The lessons learned will continue to be valuable when the workshops are transitioned back to a physical setting when COVID-19 protective measures are lifted.
Objective: This study explores the root causes that undermine successful collaborations between scientists and their library liaisons to improve outreach to this population.
Methods: This paper uses the Five Whys Technique to explore the reasons why many scientists are unaware of the breadth of services offered by liaison librarians. Existing outreach strategies that address these obstacles are interpreted through the lens of implementation science theories and process models, including Normalization Process Theory.
Results: A total of four recommendations—two for liaison librarians and two for libraries as institutions—are provided to enhance the perceived value of liaison services. The recommendations for individuals include aiming to understand scientists’ needs more comprehensively and actively increasing the visibility of services that respond to those needs. Those for libraries focus on cross-functional teams and new forms of assessment.
Conclusions: These recommendations emphasize the benefits of collaboration to liaisons, to library programs at large, and to the faculty that liaisons serve. Implementation science can help librarians to understand why certain outreach strategies bring success, and how new services can be implemented more effectively.
Objective: An underexplored area in Library and Information Science (LIS) is the development of educational offerings and partnerships in Health-Related Informatics (HRI) (e.g., bioinformatics, clinical informatics, health informatics). The purpose of this study is to identify which disciplines are collaborating in HRI education and how partnerships developed.
Methods: This study was conducted in two parts: a website review and survey. Seventy-seven North American ALA-accredited and iSchool member websites were searched between November 2019-March 2020 for HRI-related educational offerings and which academic units were involved. Two hundred sixteen individuals involved in LIS and/or HRI education were contacted for a 40-question survey that included: their roles and responsibilities regarding HRI education; the alignment of this education with strategic plans or competencies; and how HRI partnerships developed. The survey also asked those who were not currently partnering in HRI education which factors influenced their circumstances.
Results: 352 HRI educational offerings existed within ALA-accredited or iSchool programs. A total of 38 (17.5%) responded to the survey. For almost two-thirds of these, there was no indication of partnership in that education (213/352, 60.5%). LIS or iSchool involvement in HRI is just under one-third of all offerings (111/352, 31%). “Health or healthcare” informatics (35) or “biomedical or bioinformatics” were the most common types of HRI offered from the website review and survey.
Conclusions: Opportunities exist for LIS programs to form HRI educational partnerships that will provide richer educational offerings for LIS students and health sciences librarians.
This study of data services librarians is part of a series of studies examining the current roles and perspectives on Research Data Management (RDM) services in higher education. Reviewing current best practices provides insights into the role-based responsibilities for RDM services that data services librarians perform, as well as ways to improve and create new services to meet the needs of their respective university communities.
Objectives: The objectives of this article are to provide the context of research data services through a review of past studies, explain how they informed this qualitative study, and provide the methods and results of the current study. This study provides an in-depth overview of the overall job responsibilities of data services librarians and as well as their perspectives on RDM through job analyses.
Methods: Job analysis interviews provide insight and context to the tasks employees do as described in their own words. Interviews with 10 data services librarians recruited from the top 10 public and top 10 private universities according to the 2020 Best National University Rankings in the US News and World Reports were asked 30 questions concerning their overall job tasks and perspectives on RDM. Five public and five private data services librarians were interviewed. The interviews were recorded and transcribed. The transcriptions were analyzed in NVivo using a grounded theory application of open, axial, and selective coding to generate categories and broad themes based on the responses using synonymous meanings.
Results: The results presented here provide the typical job tasks of data services librarians that include locating secondary data, reviewing data management plans (DMPs), conducting outreach, collaborating, and offering RDM training. Fewer data services librarians assisted with data curation or manage an institutional repository.
Discussion: The results indicate that there may be different types of data services librarians depending on the mix of responsibilities. Academic librarianship will benefit from further delineation of job titles using tasks while planning, advertising, hiring, and evaluating workers in this emerging area. There remain many other explorations needed to understand the challenges and opportunities for data services librarians related to RDM.
Conclusions: This article concludes with a proposed matrix of job tasks that indicates different types of data services librarians to inform further study. Future job descriptions, training, and education will all benefit from differentiating between the many associated research data services roles and with increased focus on research data greater specializations will emerge.
Objective: While diversity, equity, inclusion, and accessibility (DEIA) principles and practices have been incorporated into much of academic librarianship, there has been less focus on the job postings.
Methods: In order to quantify ways in which DEIA is being integrated into job postings, we analyzed 48 job positions for engineering librarians posted in 2018 and 2019 via deductive thematic analysis, looking for trends in salary and qualifications related to education and academic or professional experience.
Results: Of postings that listed a quantitative salary value, salary ranged from $45,000 to $81,606; the median was $60,750. However, only 33% (n = 16) of positions listed a quantitative salary value. For educational qualifications, we found that 98% of job postings (n = 47) listed a Master’s in Library and Information Science (MLIS) as a required qualification; however, 34% of these postings (n = 16) would accept an equivalent degree in lieu of the MLIS. Additionally, 73% (n = 35) of positions sought candidates with an MLIS and another degree; 91% of these positions (n = 32) wanted the additional degree to be in a science, technology, engineering, and mathematics discipline. For academic or professional experience, 56% of positions (n = 27) sought candidates with previous academic library experience.
Conclusions: Using this data, we provide actionable recommendations on how to incorporate DEIA principles into any academic librarian job posting. Our study provides quantitative data and evidence-based recommendations that can be used to make DEIA an integral part of the job postings in academic librarianship.
Objective: This paper examines a unique data set disclosure process at a medium sized, land grant, research university and the campus collaboration that led to its creation.
Methods: The authors utilized a single case study methodology, reviewing relevant documents and workflows. As first-hand participants in the collaboration and disclosure process development, their own accounts and experiences also were utilized.
Results: A collaborative approach to enhancing research data sharing is essential, considering the wide array of stakeholders involved across the life cycle of research data. A transparent, inclusive data set disclosure process is a viable route to ensuring research data can be appropriately shared.
Conclusions: Successful sharing of research data impacts a range of university units and individuals. The establishment of productive working relationships and trust between these stakeholders is critical to expanding the sharing of research data and to establishing shared workflows.
Data Soup is a collaboration between the Journal of eScience Librarianship (JeSLIB) and the Data Curation Network to host a series of community focused webinars/discussions to exchange practices for curating research data of different formats or subject areas among data curators. The lineup of the inaugural webinar includes the following speakers and topics from the recent JeSLIB Special Issue: Data Curation in Practice:
The Journal of eScience Librarianship has partnered with the Research Data Access and Preservation (RDAP) Association for a fourth year to publish selected conference proceedings.
The fully-virtual 2021 Research Data Access and Preservation (RDAP) Summit focused on the theme of Radical Change and Data. This editorial introduces the 2021 RDAP Special Issue of the Journal of eScience Librarianship.
Objective: We consider how data librarians can take antiracist action in education and consultations. We attempt to apply QuantCrit thinking, particularly to demographic datasheets.
Methods: We synthesize historical context with modern critical thinking about race and data to examine the origins of current assumptions about data. We then present examples of how racial categories can hide, rather than reveal, racial disparities. Finally, we apply the Model of Domain Learning to explain why data science and data management experts can and should expose experts in subject research to the idea of critically examining demographic data collection.
Results: There are good reasons why patrons who are experts in topics other than racism can find it challenging to change habits from Interoperable approaches to race. Nevertheless, the Census categories explicitly say that they have no basis in research or science. Therefore, social justice requires that data librarians should expose researchers to this fact. If possible, data librarians should also consult on alternatives to habitual use of the Census racial categories.
Conclusions: We suggest that many studies are harmed by including race and should remove it entirely. Those studies that are truly examining race should reflect on their research question and seek more relevant racial questions for data collection.
Archival expectations and requirements for researchers’ data and code are changing rapidly, both among publishers and institutions, in response to what has been referred to as a “reproducibility crisis.” In an effort to address this crisis, a number of publishers have added requirements or recommendations to increase the availability of supporting information behind the research, and academic institutions have followed. Librarians should focus on ways to make it easier for researchers to effectively share their data and code with reproducibility in mind. At the Cornell Center for Social Sciences, we have instituted a Results Reproduction Service (R-Squared) for Cornell researchers. Part of this service includes archiving the R-Squared package in our CoreTrustSeal certified Data and Reproduction Archive, which has been rebuilt to accommodate both the unique requirements of those packages and the traditional role of our data archive. Librarians need to consider roles that archives and institutional repositories can play in supporting researchers with reproducibility initiatives. Our commentary closes with some suggestions for more information and training.
Objective: Existing studies estimate that between 0.3% and 2% of adults in the U.S. (between 900,000 and 2.6 million in 2020) identify as a nonbinary gender or otherwise gender nonconforming. In response to the RDAP 2021 theme of radical change, this article examines the need to change how datasets represent nonbinary persons and how research involving gender data should approach the curation of this data at each stage of the research lifecycle.
Methods: In this article, we examine some of the known challenges of gender inclusion in datasets and summarize some solutions underway. Using a critical lens, we examine the difference between current practice and inclusive practice in gender representation, describing inclusive practices at each stage of the research lifecycle from writing a data management plan to sharing data.
Results: Data structures that limit gender to “male” and “female” or ontological structures that use mapping to collapse gender demographics to binary values exclude nonbinary and gender diverse populations. Some data collection instruments attempt inclusivity by adding the gender category of “other,” but using the “other” gender category labels nonbinary persons as intrinsically alien. Inclusive change must go farther, to move from alienation to inclusive categories. We describe several techniques for inclusively representing gender in data, from the data management planning stage, to collecting data, cleaning data, and sharing data. To facilitate better sharing of gender data, repositories must also allow mapping that includes nonbinary genders explicitly and allow for ontological mapping for long-term representation of diverse gender identities.
Conclusions: A good practice during research design is to consider two levels of critique in the data collection plan. First, consider the research question at hand and remove unnecessary gendering from the data. Secondly, if the research question needs gender, make sure to include nonbinary genders explicitly. Allies must take on this problem without leaving it to those who are most affected by it. Further, more voices calling for inclusionary practices surrounding data rises to a crescendo that cannot be ignored.
Objective: Big social data (such as social media and blogs) and archived qualitative data (such as interview transcripts, field notebooks, and diaries) are similar, but their respective communities of practice are under-connected. This paper explores shared challenges in qualitative data reuse and big social research and identifies implications for data curation.
Methods: This paper uses a broad literature search and inductive coding of 300 articles relating to qualitative data reuse and big social research. The literature review produces six key challenges relating to data use and reuse that are present in both qualitative data reuse and big social research—context, data quality, data comparability, informed consent, privacy & confidentiality, and intellectual property & data ownership.
Results: This paper explores six key challenges related to data use and reuse for qualitative data and big social research and discusses their implications for data curation practices.
Conclusions: Data curators can benefit from understanding these six key challenges and examining data curation implications. Data curation implications from these challenges include strategies for: providing clear documentation; linking and combining datasets; supporting trustworthy repositories; using and advocating for metadata standards; discussing alternative consent strategies with researchers and IRBs; understanding and supporting deidentification challenges; supporting restricted access for data; creating data use agreements; supporting rights management and data licensing; developing and supporting alternative archiving strategies. Considering these data curation implications will help data curators support sounder practices for both qualitative data reuse and big social research.
Objectives: As certified Carpentries instructors, the authors organized and co-taught the University of Montana’s first in-person Carpentries workshop focused on the R programming language during early 2020. Due to the COVID-19 pandemic, a repeated workshop was postponed to the fall of 2020 and was adapted for a fully online setting. The authors share their Carpentries journey from in-person to online instruction, hoping to inspire those interested in organizing Carpentries at their institution for the first time and those interested in improving their existing Carpentries presence.
Methods: The authors reflected on their experience facilitating the same Carpentries workshop in-person and online. They used this unique opportunity to compare the effectiveness of a face-to-face environment versus a virtual modality for delivering an interactive workshop.
Results: When teaching in the online setting, the authors learned to emphasize the basics, create many opportunities for feedback using formative assessments, reduce the amount of material presented, and include helpers who are familiar with technology and troubleshooting.
Conclusions: Although the online environment came with challenges (i.e., Zoom logistics and challenges, the need to further condense curricula, etc.), the instructors were surprised at the many advantages of hosting an online workshop. With some adaptations, Carpentries workshops work well in online delivery.
Data management practices for systematic reviews and other types of knowledge syntheses are variable, with some reviews following open science practices and others with poor reporting practices leading to lack of transparency or reproducibility. Reporting standards have improved the level of detail being shared in published reviews, and also encourage more open sharing of data from various stages of the review process. Similar to project planning or completion of an ethics application, systematic review teams should create a data management plan alongside creation of their study protocol. This commentary provides a brief description of a Data Management Plan Template created specifically for systematic reviews. It also describes the companion LibGuide which was created to provide more detailed examples, and to serve as a living document for updates and new guidance. The creation of the template was funded by the Portage Network.
Objective: Customer journey mapping and design thinking were identified as useful tools for identifying deeper insights into the research data service needs of researchers on our campus with their direct input. In this article we discuss ways to improve the process in order to identify data needs earlier in the project life and at a more granular level.
Methods: Customer journey mapping and design thinking were employed to get direct input from researchers about their research processes and data management needs. Responses from mapping templates and follow-up interviews were then used to identify themes to be explored using design thinking. Finally, a toolkit was created in Open Science Framework to guide other libraries who wish to employ these techniques
Results: Outcomes from the customer journey mapping and design thinking sessions identified needs in the areas of data storage, organization and sharing. We also identified project-management lessons learned. The first lesson was to ensure the researchers who participate adequately represent the range of data needs on campus. Another was that customer journey mapping would be more effective if the responses were collected in real time and researchers were allowed more flexibility in the mapping process.
Conclusions: Modifications to the customer journey mapping and design thinking techniques will provide real-time responses and deeper insights into the research data service needs of researchers on our campus. Our pilot identified some important gaps but we felt that more subtle and useful outcomes were possible by making changes to our process.
Research data curation is a set of scientific communication processes and activities that support the ethical reuse of research data and uphold research integrity. Data curators act as key collaborators with researchers to enrich the scholarly value and potential impact of their data through preparing it to be shared with others and preserved for the long term. This special issue focuses on practical data curation workflows and tools that have been developed and implemented within data repositories, scholarly societies, research projects, and academic institutions.
In this paper we take an in-depth look at the curation of a large longitudinal survey and activities and procedures involved in moving the data from its generation to the state that is needed to conduct scientific analysis. Using a case study approach, we describe how large surveys generate a range of data assets that require many decisions well before the data is considered for analysis and publication. We use the notion of active curation to describe activities and decisions about the data objects that are “live,” i.e., when they are still being collected and processed for the later stages of the data lifecycle. Our efforts illustrate a gap in the existing discussions on curation. On one hand, there is an acknowledged need for active or upstream curation as an engagement of curators close to the point of data creation. On the other hand, the recommendations on how to do that are scattered across multiple domain-oriented data efforts.
In describing the complexities of active curation of survey data and providing general recommendations we aim to draw attention to the practices of active curation, stimulate the development of interoperable tools, standards, and techniques needed at the initial stages of research projects, and encourage collaborations between libraries and other academic units.
Video data are uniquely suited for research reuse and for documenting research methods and findings. However, curation of video data is a serious hurdle for researchers in the social and behavioral sciences, where behavioral video data are obtained session by session and data sharing is not the norm. To eliminate the onerous burden of post hoc curation at the time of publication (or later), we describe best practices in active data curation—where data are curated and uploaded immediately after each data collection to allow instantaneous sharing with one button press at any time. Indeed, we recommend that researchers adopt “hyperactive” data curation where they openly share every step of their research process. The necessary infrastructure and tools are provided by Databrary—a secure, web-based data library designed for active curation and sharing of personally identifiable video data and associated metadata. We provide a case study of hyperactive curation of video data from the Play and Learning Across a Year (PLAY) project, where dozens of researchers developed a common protocol to collect, annotate, and actively curate video data of infants and mothers during natural activity in their homes at research sites across North America. PLAY relies on scalable standardized workflows to facilitate collaborative research, assure data quality, and prepare the corpus for sharing and reuse throughout the entire research process.
Plain text data consists of a sequence of encoded characters or “code points” from a given standard such as the Unicode Standard. Some of the most common file formats for digital data used in eScience (CSV, XML, and JSON, for example) are built atop plain text standards. Plain text representations of digital data are often preferred because plain text formats are relatively stable, and they facilitate reuse and interoperability. Despite its ubiquity, plain text is not as plain as it may seem. The set of standards used in modern text encoding (principally, the Unicode Character Set and the related encoding format, UTF-8) have complex architectures when compared to historical standards like ASCII. Further, while the Unicode standard has gained in prominence, text encoding problems are not uncommon in research data curation. This primer provides conceptual foundations for modern text encoding and guidance for common curation and preservation actions related to textual data.
Data curation is the process of managing data to make it available for reuse and preservation and to allow FAIR (findable, accessible, interoperable, reusable) uses. It is an important part of the research lifecycle as researchers are often either required by funders or generally encouraged to preserve the dataset and make it discoverable and reusable. This has been especially important as the Open Access (OA) policy is being implemented in many institutions across the nation. In facilitating research data discovery and enhancing its easier reuse, an efficient data repository and its data curation play key roles. In this article, we briefly discuss the local institutional repository at Penn State University and the general data curation practices we adopt for the deposited files and datasets, then we focus on a data analytics tool that has recently been applied to extract tabular data from PDF files. This is an enhancement to the existing data curation practices as it adds additional tabular data to deposits with PDF files where tables are often embedded and not easily reused.
Institutional data repositories are the acknowledged gold standard for data curation platforms in academic libraries. But not every institution can sustain a repository, and not every dataset can be archived due to legal, ethical, or authorial constraints. Data catalogs—metadata-only indices of research data that provide detailed access instructions and conditions for use—are one potential solution, and may be especially suitable for "challenging" datasets. This article presents the strengths of data catalogs for increasing the discoverability and accessibility of research data. The authors argue that data catalogs are a viable alternative or complement to data repositories, and provide examples from their institutions' experiences to show how their data catalogs address specific curatorial requirements. The article also reports on the development of a community of practice for data catalogs and data discovery initiatives.
Introduction: This paper presents concrete and actionable steps to guide researchers, data curators, and data managers in improving their understanding and practice of computational reproducibility.
Objectives: Focusing on incremental progress rather than prescriptive rules, researchers and curators can build their knowledge and skills as the need arises. This paper presents a framework of incremental curation for reproducibility to support open science objectives.
Methods: A computational reproducibility framework developed for the Canadian Data Curation Forum serves as the model for this approach. This framework combines learning about reproducibility with recommended steps to improving reproducibility.
Conclusion: Computational reproducibility leads to more transparent and accurate research. The authors warn that fear of a crisis and focus on perfection should not prevent curation that may be ‘good enough.’
This video article provides an introduction to a data primer which leads data curators through the process of preparing a neuroimaging dataset for submission into a repository. A team of health sciences librarians and informationists created the primer which is focused on data from functional magnetic resonance images that are saved in either DICOM or NIfTI formats. The video walks through a flowchart discussing the process of preparing data sets to be deposited into a repository, key curatorial questions to ask for data that is highly sensitive, and how to suggest edits to this and other primers. The primer grew out of a data curation workshop hosted by the Data Curation Network.
A transcript of this interview is available for download under Additional Files.