Methods: Replicated methods of a prior citation study provide an updated transparent, reproducible citation analysis protocol that can be replicated with Jupyter Notebooks.
Results: This study replicated the prior citation study’s conclusions, and also adapted the author’s methods to analyze the citation practices of Earth Scientists at four institutions. We found that 80% of the citations could be accounted for by only 7.88% of journals, a key metric to help identify a core collection of titles in this discipline. We then demonstrated programmatically that 36% of these cited references were available as open access.
Conclusions: Jupyter Notebooks are a viable platform for disseminating replicable processes for citation analysis. A completely open methodology is emerging and we consider this a step forward. Adherence to the 80/20 rule aligned with institutional research output, but citation preferences are evident. Reproducible citation analysis methods may be used to analyze open access uptake, however, results are inconclusive. It is difficult to determine whether an article was open access at the time of citation, or became open access after an embargo.
Book review of: Data Feminism by Catherine D'Ignazio and Lauren F. Klein, The MIT Press (2020). Data Feminism combines intersectional feminism and critical data studies to invite the reader to consider: “How can we use data to remake the world?” As non-profit organizations with a mandate to provide equitable access to non-neutral information and services, libraries and library workers are uniquely positioned to advance the principles laid out in Data Feminism.
A digital object identifier (DOI) is an increasingly prominent persistent identifier in finding and accessing scholarly information. This paper intends to present an overview of global development and approaches in the field of DOI and DOI services with a slight geographical focus on Germany. At first, the initiation and components of the DOI system and the structure of a DOI name are explored. Next, the fundamental and specific characteristics of DOIs are described and DOIs for three (3) kinds of typical intellectual entities in the scholar communication are dealt with; then, a general DOI service pyramid is sketched with brief descriptions of functions of institutions at different levels. After that, approaches of the research data librarianship community in the field of RDM, especially DOI services, are elaborated. As examples, the DOI services provided in German research libraries as well as best practices of DOI services in a German library are introduced; and finally, the current practices and some issues dealing with DOIs are summarized. It is foreseeable that DOI, which is crucial to FAIR research data, will gain extensive recognition in the scientific world.
Objectives: Compare journal coverage of abstract and indexing tools commonly used within academic science and engineering research.
Methods: Title lists of Compendex, Inspec, Reaxys, SciFinder, and Web of Science were provided by their respective publishers. These lists were imported into Excel and the overlap of the ISSN/EISSNs and journal titles was determined using the VLOOKUP command, which determines if the value in one cell can be found in a column of other cells.
Results: There is substantial overlap between the Web of Science’s Science Citation Index Expanded and the Emerging Sources Citation Index, the largest database with 17,014 titles, and Compendex (63.6%), Inspec (71.0%), Reaxys (67.0%), and SciFinder (75.8%). SciFinder also overlaps heavily with Reaxys (75.9%). Web of Science and Compendex combined contain 77.6% of the titles within Inspec.
Conclusion: Flat or decreasing library budgets combined with increasing journal prices result in an unsustainable system that will require a calculated allocation of resources at many institutions. The overlap of commonly indexed journals among abstracting and indexing tools could serve as one way to determine how these resources should be allocated.
A range of regulatory pressures emanating from funding agencies and scholarly journals increasingly encourage researchers to engage in formal data sharing practices. As academic libraries continue to refine their role in supporting researchers in this data sharing space, one particular challenge has been finding new ways to meaningfully engage with campus researchers. Libraries help shape norms and encourage data sharing through education and training, and there has been significant growth in the services these institutions are able to provide and the ways in which library staff are able to collaborate and communicate with researchers. Evidence also suggests that within disciplines, normative pressures and expectations around professional conduct have a significant impact on data sharing behaviors (Kim and Adler 2015; Sigit Sayogo and Pardo 2013; Zenk-Moltgen et al. 2018). Duke University Libraries' Research Data Management program has recently centered part of its outreach strategy on leveraging peer networks and social modeling to encourage and normalize robust data sharing practices among campus researchers. The program has hosted two panel discussions on issues related to data management—specifically, data sharing and research reproducibility. This paper reflects on some lessons learned from these outreach efforts and outlines next steps.
Objective: Investigate how different groups of depositors vary in their use of optional data curation features that provide support for FAIR research data in the Harvard Dataverse repository.
Methods: A numerical score based upon the presence or absence of characteristics associated with the use of optional features was assigned to each of the 29,295 datasets deposited in Harvard Dataverse between 2007 and 2019. Statistical analyses were performed to investigate patterns of optional feature use amongst different groups of depositors and their relationship to other dataset characteristics.
Results: Members of groups make greater use of Harvard Dataverse's optional features than individual researchers. Datasets that undergo a data curation review before submission to Harvard Dataverse, are associated with a publication, or contain restricted files also make greater use of optional features.
Conclusions: Individual researchers might benefit from increased outreach and improved documentation about the benefits and use of optional features to improve their datasets' level of curation beyond the FAIR-informed support that the Harvard Dataverse repository provides by default. Platform designers, developers, and managers may also use the numerical scoring approach to explore how different user groups use optional application features.
Inspired by Reid Boehm’s presentation “Beyond Pronouns: Caring for Transgender Medical Research Data to Benefit All People,” at the Research Data Access and Preservation Summit (RDAP) in March 2018, four librarians from the University of Minnesota (UMN) set out to create a LibGuide to support research on transgender topics as a response to Boehm’s identification of insufficient traditional mechanisms for describing, securing, and accessing data on transgender people and topics. This commentary describes the process used to craft the LibGuide, "Library Resources for Transgender Topics," including assembling a team of interested library staff, defining the scope of the project, interacting with stakeholders and community partners, establishing a workflow, and designing an ongoing process to incorporate user feedback.
The Journal of eScience Librarianship has partnered with the Research Data Access & Preservation (RDAP) Association for a third year to publish selected conference proceedings. This issue highlights the research presented at the RDAP 2020 Summit and the community it has fostered.
Objective: Promoting discovery of research data helps archived data realize its potential to advance knowledge. Montana State University (MSU) Dataset Search aims to support discovery and reporting for research datasets created by researchers at institutions.
Methods and Results: The Dataset Search application consists of five core features: a streamlined browse and search interface, a data model based on dataset discovery, a harvesting process for finding and vetting datasets stored in external repositories, an administrative interface for managing the creation, ingest, and maintenance of dataset records, and a dataset visualization interface to demonstrate how data is produced and used by MSU researchers.
Conclusion: The Dataset Search application is designed to be easily customized and implemented by other institutions. Indexes like Dataset Search can improve search and discovery for content archived in data repositories, therefore amplifying the impact and benefits of archived data.
This commentary describes the experience of attending RDAP 2020 remotely after the author’s trip cancellation due to COVID-19 travel restrictions. The author describes the highs and lows of the remote viewing experience, and the potential future landscape of virtual conferences and remote attendance. Maintaining networking and casual conversation during a virtual conference is an area that needs improvement but has potential. Takeaways from several conference sessions, including the keynote speaker, are also included along with discussion of how the author learned valuable information or could apply the topics to her own work.
Objective: As electronic laboratory notebook (ELN) capability continues to expand, more researchers are turning to this digital format. The University of Massachusetts Medical School developed new guidelines to outline the retention and transferal of ELNs. How do other universities approach the retention and transferal of laboratory notebooks, including ELNs?
Methods: The websites of 25 universities were searched for policies or guidelines on laboratory notebook retention and transferal. A textual analysis of the policies was performed to find common themes.
Results: Information on the retention and transferal of laboratory notebooks was found in record retention and research data policies/guidelines. Out of the 25 institutional websites searched, 16 policies/guidelines on research notebook retention were found and 10 institutions had policies/guidelines on transferring research notebooks when a researcher leaves the university. Only one policy had a retention recommendation for storage location specific to electronic media, including laboratory notebooks, that did not apply to its paper counterparts, the remaining policies either explicitly include multiple forms and media or do not mention multiple formats for research records at all. The minimum number of years of retention for research notebooks ranged from immediately after report completion to 7 years after completing the research with the possibility of extension depending on a wide range of external requirements. Most research notebook transferal policies and guidelines required associated researchers and students to request permission from their principal investigator (PI) before taking a copy of the notebook. Most institutions with policies also seek to retain access to research notebooks when a PI leaves an institution to protect intellectual property and respond to any cases of scientific misconduct or conflict of interest.
Conclusions: Other universities have a range of approaches for the retention and transferal of laboratory notebooks, but most provide the same recommendations for both electronic and physical laboratory notebooks in their research data or record retention policies/guidelines.
Key themes in Dickens’ novel, transformation and resurrection, darkness and light, and social justice are firmly connected to the work being done in data. Data librarians can make a difference in times like these: resurrecting data, transforming how students, researchers, or the public think about and use data; unearthing and bringing to light historical data that will give context and meaning to an issue; and that accessible data can help address, and perhaps solve, social justice issues.
Objective: This eScience in Action article describes the collaborative development process and outputs for a qualitative data curation curriculum initiative led by a library faculty (research data specialist) at an R1 research university.
Methods: The collaborative curriculum development activities described in this article took place between 2015-2020 and included 1) a college-wide “call out” meeting with graduate methods instructors and additional one-on-one conversations, 2) a year-long training series for disciplinary faculty teaching graduate-level qualitative research methods courses, 3) guest lectures and co-curricular workshops, and 4) the development of a credit-bearing graduate-level course.
Results: This practice-based article includes a reflection on the collaborative curriculum development process and impacts, including the development of networks between the Library and qualitative researchers across campus. The article provides a proof-of-concept example for developing relevant and trustworthy library data services for humanities and qualitative social-science researchers.
Conclusions: Curriculum development activities focused predominately upon researcher-centered perspectives and identified needs. However, changes in institutional expectations for library faculty (i.e. requirement to teach credit-bearing courses) played a major role in how the curriculum was implemented, its impact and continued sustainability of outputs going forward.
Researchers are faced with unprecedented challenges due to the size and complexity of data, and libraries are stepping in to help by providing guidance on research data management primarily to graduate students and faculty. Currently, many universities are encouraging an undergraduate research experience where students engage in research projects in the classroom and in research labs, yet research data management is often not included as part of these opportunities. At UW-Madison, we piloted researchERS (Emerging Research Scholars), a program for undergraduates from all disciplines to learn data management skills. Focusing on core concepts as well as data ethics, reproducibility, and research workflows, the format of the program included seven evening workshops, two networking events, and one field trip. Each workshop invited campus and community speakers relevant to the workshop’s theme as a way to introduce the students to the network of available resources and data expertise and provided food for attendees. The workshops also built in customized activities to show students how to incorporate best practices into their work. Local businesses provided a tour of their facilities as well as a talk on how they leverage data. This paper will describe this program as well as the benefits and drawbacks of tailoring a research data management program toward undergraduates.
Objective: Data curation is becoming widely accepted as a necessary component of data sharing. Yet, as there are so many different types of data with various curation needs, the Data Curation Network (DCN) project anticipated that a collaborative approach to data curation across a network of repositories would expand what any single institution might offer alone. Now, halfway through a three-year implementation phase, we’re testing our assumptions using one year of data from the DCN.
Methods: Ten institutions participated in the implementation phase of a shared staffing model for curating research data. Starting on January 1, 2019, for 12 months we tracked the number, file types, and disciplines represented in data sets submitted to the DCN. Participating curators were matched to data sets based on their self-reported curation expertise. Aspects such as curation time, level of satisfaction with the assignment, and lack of appropriate expertise in the network were tracked and analyzed.
Results: Seventy-four data sets were submitted to the DCN in year one. Seventy-one of them were successfully curated by DCN curators. Each curation assignment takes 2.4 hours on average, and data sets take a median of three days to pass through the network. By analyzing the domain and file types of first- year submissions, we find that our coverage is well represented across domains and that our capacity is higher than the demand, but we also observed that the higher volume of data containing software code relied on certain curator expertise more often than others, creating potential unbalance.
Conclusions: The data from year one of the DCN pilot have verified key assumptions about our collaborative approach to data curation, and these results have raised additional questions about capacity, equitable use of network resources, and sustained growth that we hope to answer by the end of this implementation phase.
Objectives: This small-scale study explores the current state of connections between open data and open access (OA) articles in the life sciences.
Methods: This study involved 44 openly available life sciences datasets from the Illinois Data Bank that had 45 related research articles. For each article, I gathered the OA status of the journal and the article on the publisher website and checked whether the article was openly available via Unpaywall and Research Gate. I also examined how and where the open data was included in the HTML and PDF versions of the related articles.
Results: Of the 45 articles studied, less than half were published in Gold/Full OA journals, and while the remaining articles were published in Gold/Hybrid journals, none of them were OA. This study found that OA articles pointed to the Illinois Data Bank datasets similarly to all of the related articles, most commonly with a data availability statement containing a DOI.
Conclusions: The findings indicate that Gold OA in hybrid journals does not appear to be a popular option, even for articles connected to open data, and this study emphasizes the importance of data repositories providing DOIs, since the related articles frequently used DOIs to point to the Illinois Data Bank datasets. This study also revealed concerns about free (not licensed OA) access to articles on publisher websites, which will be a significant topic for future research.
There are many courses available to teach research data management to librarians and researchers. While these courses can help with technical skills, like programming or statistics, and practical knowledge of data life cycles or data sharing policies, there are “soft skills” and non-technical skills that are needed to successfully start and run data services. While there are many important characteristics of a good data librarian, reference skills, relationship building, collaboration, listening, and facilitation are some of the most important. Giving consideration to these skills will help any data librarian with their multifaceted job.
Objective: Evaluate and examine Data Literacy (DL) in the supported disciplines of four liaison librarians at a large research university.
Methods: Using a framework developed by Prado and Marzal (2013), the study analyzed 378 syllabi from a two-year period across six departments—Criminal Justice, Geography, Geology, Journalism, Political Science, and Sociology—to see which classes included DLs.
Results: The study was able to determine which classes hit on specific DLs and where those classes might need more support in other DLs. The most common DLs being taught in courses are Reading, Interpreting, and Evaluating Data, and Using Data. The least commonly taught are Understanding Data and Managing Data skills.
Conclusions: While all disciplines touched on data in some way, there is clear room for librarians to support DLs in the areas of Understanding Data and Managing Data.
The Journal of eScience Librarianship has partnered with the Research Data Access & Preservation (RDAP) Association for a second year to publish selected conference proceedings. This issue highlights the research presented at the RDAP 2019 Summit and the community it has fostered.
Objective: This paper compares the pedagogical theory driving current norms towards instruction of novices in both fields, specifically focusing on The Carpentries and ACRL Framework instruction. I identify key areas of difference in theoretical and practical approaches towards education of learners entirely new to a topic, focusing on a choice to pursue constructivist or experiential learning versus providing direct instructional guidance.
Methods: Two case studies are explored through the lens of the Dreyfus Model of learning for their theoretical underpinings for engaging novice learners: the ACRL Framework and Carpentries’ Instructor Training.
Results: Applying the Dreyfus Model of learning and cognitive load theory shows theoretical benefits to direct instructional guidance over constructivist or minimally guided instruction.
Conclusions: The ACRL Framework and Carpentries workshops share teaching goals of creating new mental models and core skills to support future learning, but differ in their pedagogical approaches. For novice learners of information literacy, there may be value in considering a more guided approach. Concrete lesson-planning strategies are proposed.
The substance of this article is based upon a poster presented at RDAP Summit 2019.
Objective: Best practices such as the FAIR Principles (Findability, Accessibility, Interoperability, Reusability) were developed to ensure that published datasets are reusable. While we employ best practices in the curation of datasets, we want to learn how domain experts view the reusability of datasets in our institutional repository, ScholarsArchive@OSU. Curation workflows are designed by data curators based on their own recommendations, but research data is extremely specialized, and such workflows are rarely evaluated by researchers. In this project we used peer-review by domain experts to evaluate the reusability of the datasets in our institutional repository, with the goal of informing our curation methods and ensure that the limited resources of our library are maximizing the reusability of research data.
Methods: We asked all researchers who have datasets submitted in Oregon State University’s repository to refer us to domain experts who could review the reusability of their data sets. Two data curators who are non-experts also reviewed the same datasets. We gave both groups review guidelines based on the guidelines of several journals. Eleven domain experts and two data curators reviewed eight datasets. The review included the quality of the repository record, the quality of the documentation, and the quality of the data. We then compared the comments given by the two groups.
Results: Domain experts and non-expert data curators largely converged on similar scores for reviewed datasets, but the focus of critique by domain experts was somewhat divergent. A few broad issues common across reviews were: insufficient documentation, the use of links to journal articles in the place of documentation, and concerns about duplication of effort in creating documentation and metadata. Reviews also reflected the background and skills of the reviewer. Domain experts expressed a lack of expertise in data curation practices and data curators expressed their lack of expertise in the research domain.
Conclusions: The results of this investigation could help guide future research data curation activities and align domain expert and data curator expectations for reusability of datasets. We recommend further exploration of these common issues and additional domain expert peer-review project to further refine and align expectations for research data reusability.
The substance of this article is based upon a panel presentation at RDAP Summit 2019.
Biodiversity research that informs conservation action is increasingly data intensive. Cutting-edge projects at large institutions use massive aggregated datasets to build dynamic models and conduct novel analyses of natural systems. Most of these research institutions are geographically distant from the highest-priority conservation areas, which are found in South America, Africa, and Southeast Asia. There, data is typically collected by or with the help of local residents hired as field assistants. These field assistants have few meaningful opportunities to participate in biodiversity research and conservation beyond data logging. The literature indicates the data revolution has increased demand for impersonal and integrated large-scale systems that aggregate biodiversity data across sources with minimal friction. In this study, interviews were conducted with six active conservation workers to identify elements of these data systems that create barriers to field assistants’ engagement with the projects they make possible. As both creators and consumers of data, all six relayed frustration with various aspects of their data workflows. Regarding field assistant interaction with digital data systems, they observed that their field assistants engaged only at the initial point of data entry or not at all. Some suggested mobile apps as a good solution for field data collection. However, some also expressed doubt that their local assistants had the necessary knowledge background to navigate digital systems or understand scientific methodologies. These results suggest that trying to mold field assistants to fit existing data infrastructure and adapting purpose-built data systems to nontechnical users are both sub-optimal solutions. A human-mediated capacity building paradigm, which requires embedding people who are both culturally literate and data literate alongside field assistants, is explored as an alternative path to making data meaningful. Improving the accessibility of data this way can empower local communities to share ownership in biodiversity conservation.
The substance of this article is based upon a panel presentation at RDAP Summit 2019.
The curation and preservation of scientific data has long been recognized as an essential activity for the reproducibility of science and the advancement of knowledge. While investment into data curation for specific disciplines and at individual research institutions has advanced the ability to preserve research data products, data curation for big interdisciplinary science remains relatively unexplored terrain. To fill this lacunae, this article presents a case study of the data curation for the National Centers for Coastal Ocean Science (NCCOS) funded project “Understanding Coral Ecosystem Connectivity in the Gulf of Mexico-Pulley Ridge to the Florida Keys” undertaken from 2011 to 2018 by more than 30 researchers at several research institutions. The data curation process is described and a discussion of strengths, weaknesses and lessons learned is presented. Major conclusions from this case study include: the reimplementation of data repository infrastructure builds valuable institutional data curation knowledge but may not meet data curation standards and best practices; data from big interdisciplinary science can be considered as a special collection with the implication that metadata takes the form of a finding aid or catalog of datasets within the larger project context; and there are opportunities for data curators and librarians to synthesize and integrate results across disciplines and to create exhibits as stories that emerge from interdisciplinary big science.
The substance of this article is based upon a poster presented at RDAP Summit 2019.
The principles of equity, diversity, and inclusion have long been incorporated into many aspects of the data practitioner profession. The hiring process is an exception; it is opaque, stress-inducing, and ultimately reinforces a homogeneous workforce. Job postings are important both as a window into the profession and as the first way that candidates interact with your institution. This Commentary article provides concrete and actionable recommendations on how you can start writing more equitable, diverse, and inclusive job postings at your institution.
The substance of this article is based upon a panel presentation at RDAP Summit 2019.
This commentary describes the impressions of a first-time attendee from a small liberal arts college (SLAC) to the Research Data Access and Preservation (RDAP) Summit, in May 2019, and observations about the makeup of the conference in terms of types of jobs and types of institutions represented among the attendees. The author also outlines a more general difficulty librarians from any institution face in adapting lessons learned and examples given by research data management librarians at other institutions, due to differences in institutional structure. The commentary suggests ways data management professionals might make reuse of ideas and solutions easier for one another, by analyzing why solutions work at different types of institutions, and by developing our understanding of how to replicate successful projects and practices in different organizational structures. The author discusses the value of attending RDAP Summit for librarians from smaller institutions such as SLACs, and compares the RDAP experience with professional development opportunities regarding data librarianship that are available on a region-by-region basis.
The substance of this article is based upon the author’s experience at RDAP Summit 2019.