Noticias em eLiteracias

✇ The Code4Lib Journal

Works, Expressions, Manifestations, Items: An Ontology

Por Karen Coyle — 10 de Maio de 2022, 00:26
The concepts first introduced in the FRBR document and known as "WEMI" have been employed in situations quite different from the library bibliographic catalog. This is evidence that a definition of similar classes that are more general than those developed for library usage would benefit metadata developers broadly. This article proposes a minimally constrained set of classes and relationships that could form the basis for a useful model of created works.
✇ The Code4Lib Journal

Lantern: A Pandoc Template for OER Publishing

Por Chris Diaz — 10 de Maio de 2022, 00:26
Lantern is a template and workflow for using Pandoc and GitHub to create and host multi-format open educational resources (OER) online. It applies minimal computing methods to OER publishing practices. The purpose is to minimize the technical footprint for digital publishing while maximizing control over the form, content, and distribution of OER texts. Lantern uses Markdown and YAML to capture an OER’s source content and metadata and Pandoc to transform it into HTML, PDF, EPUB, and DOCX formats. Pandoc’s options and arguments are pre-configured in a Bash script to simplify the process for users. Lantern is available as a template repository on GitHub. The template repository is set up to run Pandoc with GitHub Actions and serve output files on GitHub Pages for convenience; however, GitHub is not a required dependency. Lantern can be used on any modern computer to produce OER files that can be uploaded to any modern web server.
✇ The Code4Lib Journal

Fractal in detail: What information is in a file format identification report?

Por Ross Spencer — 10 de Maio de 2022, 00:26
A file format identification report, such as those generated by digital preservation tools, DROID, Siegfried, or FIDO, contain an incredible wealth of information. Used to scan discrete sets of files comprising a part of, or the entirety of a digital collection, these datasets can serve as entry points for further activities including appraisal, identification of future work efforts, and the facilitation of transfer of digital objects into preservation storage. The information contained in them is fractal in detail and there are numerous outputs that can be generated from that detail. This paper describes the purpose of a file format identification report and the extensive information that can be extracted from one. It summarizes a number of ways of transforming them into the inputs for other systems and describes a handful of the tools already doing so. The paper concludes that describing a format identification report is a pivotal artefact in the digital transfer process, and asks the reader to consider how they might leverage them and the benefits doing so might provide.
✇ The Code4Lib Journal

Editorial — New name change policy

Por Ron Peterson — 10 de Maio de 2022, 00:26
The Code4Lib Journal Editorial Committee is implementing a new name change policy aimed to facilitate the process and ensure timely and comprehensive name changes for anyone who needs to change their name within the Journal.
✇ The Code4Lib Journal

Automating reference consultation requests with JavaScript and a Google Form

Por Stephen Zweibel — 10 de Maio de 2022, 00:26
At the CUNY Graduate Center Library, reference consultation requests were previously sent to a central email address, then manually directed by our head of reference to the appropriate subject expert. This process was cumbersome and because the inbox was not checked every day, responses were delayed and messages were occasionally missed. In order to streamline this process, I created a form and wrote a script that uses the answers in the form to automatically forward any consultation requests to the correct subject specialist. This was done using JavaScript, Google Sheets, and the Google Apps Script backend. When a patron requesting a consultation fills out the form, they include their field of research. This field is associated in my script with a particular subject specialist librarian, who then receives an email with the pertinent information. Rather than requiring either that patrons themselves search for the right subject specialist, or that library faculty spend time distributing messages to the right liaison, this enables a smoother, more direct interaction. In this article, I will describe the steps I took to write this script, using only freely available online software.
✇ The Code4Lib Journal

The DSA Toolkit Shines Light Into Dark and Stormy Archives

Por Shawn M. Jones, Himarsha R. Jayanetti, Alex Osborne, Paul Koerbin, Martin Klein, Michele C. Weigle, Michael L. Nelson — 10 de Maio de 2022, 00:26
Themed web archive collections exist to make sense of archived web pages (mementos). Some collections contain hundreds of thousands of mementos. There are many collections about the same topic. Few collections on platforms like Archive-It include standardized metadata. Reviewing the documents in a single collection thus becomes an expensive proposition. Search engines help find individual documents but do not provide an overall understanding of each collection as a whole. Visitors need to be able to understand what individual collections contain so they can make decisions about individual collections and compare them to each other. The Dark and Stormy Archives (DSA) Project applies social media storytelling to a subset of a collection to facilitate collection understanding at a glance. As part of this work, we developed the DSA Toolkit, which helps archivists and visitors leverage this capability. As part of our recent International Internet Preservation Consortium (IIPC) grant, Los Alamos National Laboratory (LANL) and Old Dominion University (ODU) piloted the DSA toolkit with the National Library of Australia (NLA). Collectively we have made numerous improvements, from better handling of NLA mementos to native Linux installers to more approachable Web User Interfaces. Our goal is to make the DSA approachable for everyone so that end-users and archivists alike can apply social media storytelling to web archives.
✇ The Code4Lib Journal

Citation Needed: Adding Citations to CONTENTdm Records

Por Jenn Randles & Andrew Bullen — 10 de Maio de 2022, 00:26
The Tennessee State Library and Archives and the Illinois State Library identified a need to add citation information to individual image records in OCLC’s CONTENTdm ( Experience with digital archives at both institutions showed that citation information was one of the most requested features. Unfortunately, CONTENTdm does not natively display citation information about image records; to add this functionality, custom JavaScript had to be written that would interact with the underlying React environment and parse out or retrieve the appropriate metadata to dynamically build record citations. Detailed code and a description of methods for building two different models of citation generators are presented.
✇ The Code4Lib Journal

Supporting open access, integrating distributed research platforms, and building a research information management platform

Por Daniel M. Coughlin, Cynthia Hudson Vitale — 10 de Maio de 2022, 00:26

Academic libraries are often called upon by their university communities to collect, manage, and curate information about the research activity produced at their campuses. Proper research information management (RIM) can be leveraged for multiple institutional contexts, including networking, reporting activities, building faculty profiles, and supporting the reputation management of the institution.

In the last ten to fifteen years the adoption and implementation of RIM infrastructure has become widespread throughout the academic world. Approaches to developing and implementing this infrastructure have varied, from commercial and open-source options to locally developed instances. Each piece of infrastructure has its own functionality, features, and metadata sources. There is no single application or data source to meet all the needs of these varying pieces of research information, many of these systems together create an ecosystem to provide for the diverse set of needs and contexts.

This paper examines the systems at Pennsylvania State University that contribute to our RIM ecosystem; how and why we developed another piece of supporting infrastructure for our Open Access policy and the successes and challenges of this work.

✇ The Code4Lib Journal

Strategies for Preserving Digital Scholarship / Humanities Projects

Por Kirsta Stapelfeldt, Sukhvir Khera, Natkeeran Ledchumykanthan, Lara Gomez, Erin Liu, and Sonia Dhaliwal — 10 de Maio de 2022, 00:26
The Digital Scholarship Unit (DSU) at the University of Toronto Scarborough library frequently partners with faculty for the creation of digital scholarship (DS) projects. However, managing completed projects can be challenging when it is no longer under active development by the original project team, and resources allocated to its ongoing maintenance are scarce. Maintaining inactive projects on the live web bloats staff workloads or is not possible due to limited staff capacity. As technical obsolescence meets a lack of staff capacity, the gradual disappearance of digital scholarship projects forms a gap in the scholarly record. This article discusses the Library DSU’s experimentations with using web archiving technologies to capture and describe digital scholarship projects, with the goal of accessioning the resulting web archives into the Library’s digital collections. In addition to comparing some common technologies used for crawling and replay of archives, this article describes aspects of the technical infrastructure the DSU is building with the goal of making web archives discoverable and playable through the library’s digital collections interface.
✇ The Code4Lib Journal

Automated 3D Printing in Libraries

Por Brandon Patterson, Ben Engel, and Willis Holle — 10 de Maio de 2022, 00:26
This article highlights the creation of an automated 3D printed system created at a health sciences library at a large research university. As COVID-19 limited in-person interaction with 3D printers, a group of library staff came together to code a form that took users’ 3D printed files and connected them to machines automatically. A ticketing system and payment form was also automated via this system. The only in-person interactions are dedicated staff members that unload the prints. This article will describe the journey in getting to an automated system and share code and strategies so others can try it for themselves.
✇ The Code4Lib Journal

Core Concepts and Techniques for Library Metadata Analysis

Por Stacie Traill and Martin Patrick — 22 de Setembro de 2021, 15:52
Metadata analysis is a growing need in libraries of all types and sizes, as demonstrated in many recent job postings. Data migration, transformation, enhancement, and remediation all require strong metadata analysis skills. But there is no well-defined body of knowledge or competencies list for library metadata analysis, leaving library staff with analysis-related responsibilities largely on their own to learn how to do the work effectively. In this paper, two experienced metadata analysts will share what they see as core knowledge areas and problem solving techniques for successful library metadata analysis. The paper will also discuss suggested tools, though the emphasis is intentionally not to prescribe specific tools, software, or programming languages, but rather to help readers recognize tools that will meet their analysis needs. The goal of the paper is to help library staff and their managers develop a shared understanding of the skill sets required to meet their library’s metadata analysis needs. It will also be useful to individuals interested in pursuing a career in library metadata analysis and wondering how to enhance their existing knowledge and skills for success in analysis work.
✇ The Code4Lib Journal

Leveraging a Custom Python Script to Scrape Subject Headings for Journals

Por Shelly R. McDavid, Eric McDavid, and Neil E. Das — 22 de Setembro de 2021, 15:52
In our current library fiscal climate with yearly inflationary cost increases of 2-6+% for many journals and journal package subscriptions, it is imperative that libraries strive to make our budgets go further to expand our suite of resources. As a result, most academic libraries annually undertake some form of electronic journal review, employing factors such as cost per use to inform budgetary decisions. In this paper we detail some tech savvy processes we created to leverage a Python script to automate journal subject heading generation within the OCLC’s WorldCat catalog, the MOBIUS (A Missouri Library Consortium) Catalog, and the VuFind Library Catalog, a now retired catalog for the CARLI (Consortium for Academic and Research Libraries in Illinois). We also describe the rationale for the inception of this project, the methodology we utilized, the current limitations, and details of our future work in automating our annual analysis of journal subject headings by use of an OCLC API.
✇ The Code4Lib Journal

Editorial : The Cost of Knowing Our Users

Por Mark Swenson — 22 de Setembro de 2021, 15:52
Some musings on the difficulty of wanting to know our users' secrets and simultaneously wanting to not know them.
✇ The Code4Lib Journal

Closing the Gap between FAIR Data Repositories and Hierarchical Data Formats

Por Connor B. Bailey, Fedor F. Balakirev, and Lyudmila L. Balakireva — 22 de Setembro de 2021, 15:52
Many in the scientific community, particularly in publicly funded research, are pushing to adhere to more accessible data standards to maximize the findability, accessibility, interoperability, and reusability (FAIR) of scientific data, especially with the growing prevalence of machine learning augmented research. Online FAIR data repositories, such as the Open Science Framework (OSF), help facilitate the adoption of these standards by providing frameworks for storage, access, search, APIs, and other features that create organized hubs of scientific data. However, the wider acceptance of such repositories is hindered by the lack of support of hierarchical data formats, such as Technical Data Management Streaming (TDMS) and Hierarchical Data Format 5 (HDF5), that many researchers rely on to organize their datasets. Various tools and strategies should be used to allow hierarchical data formats, FAIR data repositories, and scientific organizations to work more seamlessly together. A pilot project at Los Alamos National Laboratory (LANL) addresses the disconnect between them by integrating the OSF FAIR data repository with hierarchical data renderers, extending support for additional file types in their framework. The multifaceted interactive renderer displays a tree of metadata alongside a table and plot of the data channels in the file. This allows users to quickly and efficiently load large and complex data files directly in the OSF webapp. Users who are browsing files can quickly and intuitively see the files in the way they or their colleagues structured the hierarchical form and immediately grasp their contents. This solution helps bridge the gap between hierarchical data storage techniques and FAIR data repositories, making both of them more viable options for scientific institutions like LANL which have been put off by the lack of integration between them.
✇ The Code4Lib Journal

Using Low Code to Automate Public Service Workflows: Three Cases

Por Dianna Morganti and Jess Williams — 22 de Setembro de 2021, 15:52
Public service librarians without coding experience or technical education may not always be aware of or consider automation to be an option to streamline their regular work tasks, but the new prevalence of enterprise-level low code solutions allows novices to take advantage of technology to make their work more efficient and effective. Low code applications apply a graphic user interface on top of a coding platform to make it easy for novices to leverage automation at work. This paper presents three cases of using low code solutions for automating public service problems using the prevalent Microsoft Power Automate application, available in many library workplaces that use the Microsoft Office ecosystem. From simplifying the communication and scheduling process for instruction classes to connecting our student workers’ hourly floor counts to our administrators’ dashboard of building occupancy, we’ve leveraged simple low code automation in a scalable and replicable manner. Pseudo-code examples provided.
✇ The Code4Lib Journal

Introducing SAGE: An Open-Source Solution for Customizable Discovery Across Collections

Por David B. Lowe, James Creel, Elizabeth German, Douglas Hahn, and Jeremy Huff — 22 de Setembro de 2021, 15:52
Digital libraries at research universities make use of a wide range of unique tools to enable the sharing of eclectic sets of texts, images, audio, video, and other digital objects. Presenting these assorted local treasures to the world can be a challenge, since text is often siloed with text, images with images, and so on, such that per type, there may be separate user experiences in a variety of unique discovery interfaces. One common tool that has been developed in recent years to potentially unite them all is the Apache Solr index. Texas A&M University (TAMU) Libraries has harnessed Solr for internal indexing for repositories like DSpace, Fedora, and Avalon. Impressed by frameworks like Blacklight at peer institutions, TAMU Libraries wrote an analogous set of tools in Java, and thus was born SAGE, the Solr AGgregation Engine, with two primary functions: 1) aggregating Solr indices or “cores,” from various local sources, and 2) presenting search facility to the user in a discovery interface.
✇ The Code4Lib Journal

Building and Maintaining Metadata Aggregation Workflows Using Apache Airflow

Por Leanne Finnigan and Emily Toner — 22 de Setembro de 2021, 15:52
PA Digital is a Pennsylvania network that serves as the state’s service hub for the Digital Public Library of America (DPLA). The group developed a homegrown aggregation system in 2014, used to harvest digital collection records from contributing institutions, validate and transform their metadata, and deliver aggregated records to the DPLA. Since our initial launch, PA Digital has expanded significantly, harvesting from an increasing number of contributors with a variety of repository systems. With each new system, our highly customized aggregator software became more complex and difficult to maintain. By 2018, PA Digital staff had determined that a new solution was needed. From 2019 to 2021, a cross-functional team implemented a more flexible and scalable approach to metadata aggregation for PA Digital, using Apache Airflow for workflow management and Solr/Blacklight for internal metadata review. In this article, we will outline how we use this group of applications and the new workflows adopted, which afford our metadata specialists more autonomy to contribute directly to the ongoing development of the aggregator. We will discuss how this work fits into our broader sustainability planning as a network and how the team leveraged shared expertise to build a more stable approach to maintenance.
✇ The Code4Lib Journal

An XML-Based Migration from Digital Commons to Open Journal Systems

Por Cara M. Key — 22 de Setembro de 2021, 15:52
The Oregon Library Association has produced its peer-reviewed journal, the OLA Quarterly (OLAQ), since 1995, and OLAQ was published in Digital Commons beginning in 2014. When the host institution undertook to move away from Bepress, their new repository solution was no longer a good match for OLAQ. Oregon State University and University of Oregon agreed to move the journal into their joint instance of Open Journal Systems (OJS), and a small team from OSU Libraries carried out the migration project. The OSU project team declined to use PKP’s existing migration plugin for a number of reasons, instead pursuing a metadata-centered migration pipeline from Digital Commons to OJS. We used custom XSLT to convert tabular data exported from Bepress into PKP’s Native XML schema, which we imported using the OJS Native XML Plugin. This approach provided a high degree of control over the journal’s metadata and a robust ability to test and make adjustments along the way. The article discusses the development of the transformation stylesheet, the metadata mapping and cleanup work involved, as well as advantages and limitations of using this migration strategy.
✇ The Code4Lib Journal

Digitization Decisions: Comparing OCR Software for Librarian and Archivist Use

Por Leanne Olson and Veronica Berry — 22 de Setembro de 2021, 15:52
This paper is intended to help librarians and archivists who are involved in digitization work choose optical character recognition (OCR) software. The paper provides an introduction to OCR software for digitization projects, and shares the method we developed for easily evaluating the effectiveness of OCR software on resources we are digitizing. We tested three major OCR programs (Adobe Acrobat, ABBYY FineReader, Tesseract) for accuracy on three different digitized texts from our archives and special collections at the University of Western Ontario. Our test was divided into two parts: a word accuracy test (to determine how searchable the final documents were), and a test with a screen reader (to determine how accessible the final documents were). We share our findings from the tests and make recommendations for OCR work on digitized documents from archives and special collections.
✇ The Code4Lib Journal

On Two Proposed Metrics of Electronic Resource Use

Por William Denton — 22 de Setembro de 2021, 15:52
There are many ways to look at electronic resource use, individually or aggregated. I propose two new metrics to help give a better understanding of comparative use across an online collection. Users per mille is a relative annual measure of how many users a platform had for every thousand potential users: this tells us how many people used a given platform. Interest factor is the average number of uses of a platform by people who used it more than once: this tells us how much people used a given platform. These two metrics are enough to give us good insight into collection use. Dividing each into quartiles allows a quadrant comparison of lows and highs on each metric, giving a quick view of platforms many people use a lot (the big expensive ones), many people use very little (a curious subset), a few people use a lot (very specific to a narrow subject) and a few people use very little (deserves attention). This helps understand collection use and informs collection management.
✇ The Code4Lib Journal

Conspectus: A Syllabi Analysis Platform for Leganto Data Sources

Por David Massey, Thomas Sødring — 22 de Setembro de 2021, 15:52
In recent years, higher education institutions have implemented electronic solutions for the management of syllabi, resulting in new and exciting opportunities within the area of large-scale syllabi analysis. This article details an information pipeline that can be used to harvest, enrich and use such information.
✇ The Code4Lib Journal

Pythagoras: Discovering and Visualizing Musical Relationships Using Computer Analysis

Por Brandon Bellanti — 14 de Junho de 2021, 21:40
This paper presents an introduction to Pythagoras, an in-progress digital humanities project using Python to parse and analyze XML-encoded music scores. The goal of the project is to use recurring patterns of notes to explore existing relationships among musical works and composers. An intended outcome of this project is to give music performers, scholars, librarians, and anyone else interested in digital humanities new insights into musical relationships as well as new methods of data analysis in the arts.
✇ The Code4Lib Journal

On the Nature of Extreme Close-Range Photogrammetry: Visualization and Measurement of North African Stone Points

Por Michael J. Bennett — 14 de Junho de 2021, 21:40
Image acquisition, visualization, and measurement are examined in the context of extreme close-range photogrammetric data analysis. Manual measurements commonly used in traditional stone artifact investigation are used as a starting point to better gauge the usefulness of high-resolution 3D surrogates and the flexible digital tool sets that can work with them. The potential of various visualization techniques are also explored in the context of future teaching, learning, and research in virtual environments.
✇ The Code4Lib Journal

Choose Your Own Educational Resource: Developing an Interactive OER Using the Ink Scripting Language

Por Stewart Baker — 14 de Junho de 2021, 21:40
Learning games are games created with the purpose of educating, as well as entertaining, players. This article describes the potential of interactive fiction (IF), a type of text-based game, to serve as learning games. After summarizing the basic concepts of interactive fiction and learning games, the article describes common interactive fiction programming languages and tools, including Ink, a simple markup language that can be used to create choice based text games that play in a web browser. The final section of the article includes code putting the concepts of Ink, interactive fiction, and learning games into action using part of an interactive OER created by the author in December of 2020.
✇ The Code4Lib Journal

Institutional Data Repository Development, a Moving Target

Por Colleen Fallaw, Genevieve Schmitt, Hoa Luong, Jason Colwell, and Jason Strutz — 14 de Junho de 2021, 21:40
At the end of 2019, the Research Data Service (RDS) at the University of Illinois at Urbana-Champaign (UIUC) completed its fifth year as a campus-wide service. In order to gauge the effectiveness of the RDS in meeting the needs of Illinois researchers, RDS staff developed a five-year review consisting of a survey and a series of in-depth focus group interviews. As a result, our institutional data repository developed in-house by University Library IT staff, Illinois Data Bank, was recognized as the most useful service offering by our unit. When launched in 2016, storage resources and web servers for Illinois Data Bank and supporting systems were hosted on-premises at UIUC. As anticipated, researchers increasingly need to share large, and complex datasets. In a responsive effort to leverage the potentially more reliable, highly available, cost-effective, and scalable storage accessible to computation resources, we migrated our item bitstreams and web services to the cloud. Our efforts have met with success, but also with painful bumps along the way. This article describes how we supported data curation workflows through transitioning from on-premises to cloud resource hosting. It details our approaches to ingesting, curating, and offering access to dataset files up to 2TB in size--which may be archive type files (e.g., .zip or .tar) containing complex directory structures.