Noticias em eLiteracias

🔒
❌ Sobre o FreshRSS
Há novos artigos disponíveis, clique para atualizar a página.
Antes de ontemThe Code4Lib Journal

The Forgotten Disc: Synthesis and Recommendations for Viable VCD Preservation

Por Andrew Weaver and Ashley Blewer
As optical media held by cultural heritage institutions has fully transitioned from use as a digital preservation ‘solution’ to a digital preservation risk, an increasing amount of effort has been focused on exploring tools and workflows to migrate the data off of these materials before it is permanently lost to physical degradation. One optical format, however, has been broadly ignored by the existing body of work: the humble Video CD. While never a dominant format in the Anglosphere, the Video CD, or VCD, held wide popularity from the 1990s through the 2000s in Asia and other regions. As such, a dedicated exploration of preservation solutions for VCD has utility both as a resource for institutions that collect heavily in Pacific Rim materials, as well as a means to, in a minor way, aid in the ongoing efforts to expand the Digital Preservation corpus beyond its traditional focus of issues prevalent in North America and Europe. This paper introduces an overview of VCD as a format and summarizes its unique characteristics that impact preservation decisions and presents the results of a survey of existing tools and methods for the migration of VCD contents. This paper conveys practical methods for migrating VCD material from the original carrier and into both digital preservation and access workflows.

Breathing Life into Archon: A Case Study in Working with an Unsupported System

Por Krista L. Gray
Archival repositories at the University of Illinois Urbana-Champaign Library have relied on Archon to represent archival description and finding aids to researchers worldwide since its launch in 2006. Archon has been officially unsupported software, however, for more than half of this time span. This article will discuss strategies and approaches used to enhance and extend Archon’s functionality during this period of little to no support for maintaining the software. Whether in enhancing accessibility and visual aesthetics through custom theming, considering how to present data points in new ways to support additional functions, or making modifications so that the database would support UTF-8 encoding, a wide variety of opportunities proved possible for enhancing user experience despite the inherent limitations of working with an unsupported system. Working primarily from the skill set of an archivist with programming experience, rather than that of a software developer, the author also discusses some of the strengths emerging from this “on the ground” approach to developing enhancements to an archival access and collection management system.

An introduction to using metrics to assess the health and sustainability of library open source software projects

Por Jenn Colt
In LYRASIS 2021 Open Source Software Report: Understanding the Landscape of Open Source Software Support in American Libraries (Rosen & Grogg, 2021), responding libraries indicated the sustainability of OSS projects to be an important concern when making decisions about adoption. However, methods libraries might use to gather information about sustainability is not discussed. Metrics defined by the Linux Foundation’s CHAOSS project (https://chaoss.community/) are designed to measure the health and sustainability of open source software (OSS) communities and may be useful for libraries who are making decisions about adopting particular OSS applications. I demonstrate the use of cauldron.io as one method to gather and visualize the data for these metrics, and discuss the benefits and limitations of using them for decision-making.

Searching for Meaning Rather Than Keywords and Returning Answers Rather Than Links

Por Kent Fitch
Large language models (LLMs) have transformed the largest web search engines: for over ten years, public expectations of being able to search on meaning rather than just keywords have become increasingly realised. Expectations are now moving further: from a search query generating a list of "ten blue links" to producing an answer to a question, complete with citations. This article describes a proof-of-concept that applies the latest search technology to library collections by implementing a semantic search across a collection of 45,000 newspaper articles from the National Library of Australia's Trove repository, and using OpenAI's ChatGPT4 API to generate answers to questions on that collection that include source article citations. It also describes some techniques used to scale semantic search to a collection of 220 million articles.

From DSpace to Islandora: Why and How

Por Vlastimil Krejčíř, Alžbeta Strakošová, and Jan Adler
The article summarizes the experience of switching from DSpace to Islandora. It briefly gives the historical background and reasons for switching to Islandora. It then compares the basic features of the two systems: installation, updates, operations, and customization options. Finally, it concludes practical lessons learned from the migration and provides examples of implemented digital libraries at Masaryk University.

The Brooklyn Health Map: Reflections on a Health Data Dashboard for Brooklyn, NY

Por Sheena Philogene
Recent years have put a spotlight on the importance of searchers of all kinds being able to quickly and easily find relevant, timely, and useful health information. This article provides a general overview of the process used when creating the Brooklyn Health Map, an interactive Brooklyn-based health data dashboard that visualizes community health information at the census tract, zip code, and neighborhood levels. Built using HTML, CSS, Bootstrap, and JavaScript, the Brooklyn Health Map presents information in the form of interactive web maps, customizable graphs, and local level data summaries. This article also highlights the tools used to simplify the creation of various dynamic features of the dashboard. by Sheena Philogene

The viability of using an open source locally hosted AI for creating metadata in digital image collections

Por Ingrid Reiche
Artificial intelligence (AI) can support metadata creation for images by generating descriptions, titles, and keywords for digital collections in libraries. Many AI options are available, ranging from cloud-based corporate software solutions, including Microsoft Azure Custom Vision and Google Cloud Vision, to open-source locally hosted software packages. This case study examines the feasibility of deploying the open-source, locally hosted AI software, Sheeko, and the accuracy of the descriptions generated for images using two of the pre-trained models. The study aims to ascertain if Sheeko’s AI would be a viable solution for producing metadata in the form of descriptions, or titles for digital collections in Libraries and Cultural Resources at the University of Calgary. by Ingrid Reiche

To Everything There Is a Session: A Time to Listen, a Time to Read Multi-session CDs

Por Dianne Dietrich and Alex Nelson
When the cost of CD burners dropped precipitously in the late 1990s, consumers had access to the CD-R, a format with far greater storage capacity than floppy disks. Multiple session standards allowed users the flexibility to add subsequent content to an already-burned CD-R, which made them an attractive option for personal backups. In a digital preservation context, CDs with multiple sessions can pose significant challenges to workflows and can lead to data errantly not being acquired or reviewed if users are using a workflow designed for single-session, single-track CDs. In workflows that include CDs as software installation or transmission media, extra-session behavior can have an impact on software supply chain review. This article provides an overview of the structure of a multi-session CD and outlines tool behavior of disk images generated from multi-session CDs. To support testing in specific contexts, we provide a guide to creating a multi-session CD that can be used when developing workflows. Finally, we provide techniques for extracting content from physical media as well as existing disk images generated from multi-session CDs.

Editorial: Forget the AI, We Have Live Editors

Por Sara Amato
Welcoming new editors to the Code4Lib Journal

Building a Large-Scale Digital Library Search Interface Using The Libraries Online Catalog

Por Jason Griffith and Eric Weig
The Kentucky Digital Newspaper Program (KDNP) was born out of the University of Kentucky Libraries' (UKL) work in the National Digital Newspaper Program (NDNP) that began in 2005. In early 2021, a team of specialists at UKL from library systems, digital archives, and metadata management was formed to explore a new approach to searching this content by leveraging the power of the library services platform (Alma) and discovery system (Primo VE) licensed from Ex Libris. The result was the creation of a dedicated Primo VE search interface that would include KDNP content as well as all Kentucky newspapers held on microfilm in the UKL system. This article will describe the journey from the question of whether we could harness the power of Alma and Primo VE to display KDNP content, to the methodology used in creating a new dedicated search interface that can be replicated to create custom search interfaces of your own.

PREMIS Events Through an Event-sourced Lens

Por Ross Spencer
The PREMIS metadata standard is widely adopted in the digital preservation community. Repository software often include fully compliant implementations or assert some level of conformance. Within PREMIS we have four semantic units, but “Events”, the topic of this paper, are particularly interesting as they describe “actions performed within or outside the repository that affects its capability to preserve Objects over the long term.” Events can help us to observe interactions with digital objects and understand where and when something may have gone wrong with them. Events in PREMIS, however, are slightly different to events in software development paradigms, specifically event driven software development – though similar, the design of PREMIS event logs does not promote their “being complete” nor their consumption and reuse; and so, learning from logs in event driven software development, may help us to simplify the PREMIS data model; plug identified gaps in implementations; and improve the ability to migrate digital content in future repositories.

Utilizing R and Python for Institutional Repository Daily Jobs

Por Yongli Zhou
In recent years, the programming languages R and Python have become very popular and are being used by many professions. However, they are not just limited to data scientists or programmers; they can also help librarians to perform many tasks more efficiently and possibly achieve goals that were almost impossible before. R and Python are scripting languages, which means they are not very complicated. With minimal programming experience, a librarian can learn how to program in these languages and start to apply them to work. This article provides examples of how to use R and Python to clean up metadata, resize images, and match transcripts with scanned images for the Colorado State University Institutional Repository.

Strategies for Digital Library Migration

Por Justin Littman, Mike Giarlo, Peter Mangiafico, Laura Wrubel, Naomi Dushay, Aaron Collier, Arcadia Falcone
A migration of the datastore and data model for Stanford Digital Repository’s digital object metadata was recently completed. This paper describes the motivations for this work and some of the strategies used to accomplish the migration. Strategies include: adopting a validatable data model, abstracting the datastore behind an API, separating concerns, testing metadata mappings against real digital objects, using reports to understand the data, templating unit tests, performing a rolling migration, and incorporating the migration into ongoing project work. These strategies may be useful to other repository or digital library application migrations.

Apples to Oranges: Using Python and the pymarc library to match bookstore ISBNs to locally held eBook ISBNs

Por Mitchell Scott
To alleviate financial burdens faced by students and to provide additional avenues for the benefits shown to be present when no-cost materials are available to students (equity and access and an increase in student success metrics), more and more libraries are leveraging their collections and acquisition processes to provide no-cost eBook alternatives to students. It is common practice now for academic libraries to have a partnership with their campus bookstore and to receive a list of print and eBook materials required for an upcoming semester. Libraries take these lists and use various processes and workflows, some extremely labor intensive and others semi-labor intensive, to identify which of these titles they already own as unlimited access eBooks, and which titles could be purchased as unlimited access eBooks. The most common way to match bookstore titles to already licensed eBooks is by searching the bookstore provided ISBN or title in either the Library Management System (LMS), the Analytics and Reporting layer of the LMS, the Library Discovery Layer, or via another homegrown process. While some searching could potentially be automated, depending on the available functionality of the LMS or the Analytics component of the LMS, the difficulty lies in matching the bookstore ISBN, often the print ISBN, to the library eBook ISBN. This article will discuss the use of Python, the Pymarc library in Python, and Library eBook MARC records to create an automated identification process to accurately match bookstore lists to library eBook holdings.

A Fast and Full-Text Search Engine for Educational Lecture Archives

Por Arun F. Adrakatti and K.R. Mulla
E-lecturing and online learning are more common and convenient than offline teaching and classroom learning in the academic community after the covid-19 pandemic. Universities and research institutions are recording the lecture videos delivered by the faculty members and archiving them internally. Most of the lecture videos are hosted on popular video-sharing platforms creating private channels. The students access published lecture videos independent of time and location. Searching becomes difficult from large video repositories for students as search is restricted on metadata. We presented a design and developed an open-source application to build an education lecture archive with fast and full-text search within the video content.

Creating a Custom Queueing System for a Makerspace Using Web Technologies

Por Jonathan Bradley
This article details the changes made to the queueing system used by Virginia Tech University Libraries' 3D Design Studio as the space was decommissioned and reabsorbed into the new Prototyping Studio makerspace. This new service, with its greatly expanded machine and tool offerings, required a revamp of the underlying data structure and was an opportunity to rethink the React and Electron app used previously in order to make the queue more maintainable and easier to deploy moving forward. The new Prototyping Queue application utilizes modular design and auto building forms and queues in order to improve the upgradeability of the app. We also moved away from using React and Electron and made a web app that loads from the local filesystem of the computer in the studio and runs on the Svelte framework with IBM's Carbon Design components to build out functionality with the frontend. The deployment process was also streamlined, now relying on git and Windows Batch scripts to automate updating the app as changes are committed to the repository.

Data Preparation for Fairseq and Machine-Learning using a Neural Network

Por John Schriner
This article aims to demystify data preparation and machine-learning software for sequence-to-sequence models in the field of computational linguistics. The tools, however, may be used in many different applications. In this article we detail what sequence-to-sequence learning looks like using code and results from different projects: predicting pronunciation in Esperanto, predicting the placement of stress in Russian, and how open data like WikiPron (mined pronunciation data from Wiktionary) makes projects like these possible. With scraped data, projects can be started in automatic speech recognition, text-to-speech tasks, and computer-assisted language-learning for under-resourced and under-researched languages. We will explain why and how datasets are split into training, development, and test sets. The article will discuss how to add features (i.e. properties of the target word that may or may not help in prediction). By scaffolding the tasks and using code and results from these projects, it’s our hope that the article will demystify some of the technical jargon and methods.

Click Tracking with Google Tag Manager for the Primo Discovery Service

Por By Hui Zhang
This article introduces practices at the library of Oregon State University aiming to track the usage of Unpaywall links with Google Tag Manager for the Primo discovery interface. Unpaywall is an open database of links to full-text scholarly articles from open access sources[1]. The university library adds Unpaywall links to Primo that will provide free and legal full-text access to journal articles to the patrons to promote more usage of open-access content. However, the usage of the Unpaywall links is unavailable because Primo does not track the customized Unpaywall links. This article will detail how to set up Google Tag Manager for tracking the usage of Unpaywall links and creating reports in Google Analytics. It provides step-by-step instructions, screenshots, and code snippets so the readers can customize the solution for their integrated library systems.

Using Python Scripts to Compare Records from Vendors with Those from ILS

Por Dan Lou
An increasing challenge libraries face is how to maintain and synchronize the electronic resource records from vendors with those in the integrated library system (ILS). Ideally vendors send record updates frequently to the library. However, this is not a perfect solution, and over time a problem with record discrepancies can become severe with thousands of records out of sync. This is what happened when, at a certain point, our acquisitions librarian and our cataloging librarian noticed a big record discrepancy issue. In order to effectively identify the problematic records among tens of thousands of records from both sides, the author of this article developed some solutions to analyze the data using Python functions and scripts. This data analysis helps to quickly scale down the issue and reduce the cataloging effort.

Revamping Metadata Maker for ‘Linked Data Editor’: Thinking Out Loud

Por Greta Heng, Myung-Ja Han
With the development of linked data technologies and launch of the Bibliographic Framework Initiative (BIBFRAME), the library community has conducted several experiments to design and build linked data editors. While efforts have been made to create original linked data 'records' from scratch, less attention has been given to copy cataloging workflows in a linked data environment. Developed and released as an open-source application in 2015, Metadata Maker is a cataloging creation tool that allows users to create bibliographic metadata without previous knowledge in cataloging. Metadata Maker might have the potential to be adopted by paraprofessional catalogers in practice with new linked data sources added, including auto suggestion of Virtual International Authority File (VIAF) personal name and Library of Congress Subject Heading (LCSH) recommendations based on the users’ text input. This article introduces those new features, shares the user testing results, and discusses the possible future steps.

Designing Digital Discovery and Access Systems for Archival Description

Por Gregory Wiedeman
Archival description is often misunderstood by librarians, administrators, and technologists in ways that have seriously hindered the development of access and discovery systems. It is not widely understood that there is currently no off-the-shelf system that provides discovery and access to digital materials using archival methods. This article is an overview of the core differences between archival and bibliographic description, and discusses how to design access systems for born-digital and digitized materials using the affordances of archival metadata. It offers a custom indexer as a working example that adds the full text of digital content to an Arclight instance and argues that the extensibility of archival description makes it a perfect match for automated description. Finally, it argues that building archives-first discovery systems allows us to use our descriptive labor more thoughtfully, better enable digitization on demand, and overall make a larger volume of cultural heritage materials available online.

DRYing our library’s LibGuides-based webpage by introducing Vue.js

Por Mark E. Eaton
At the Kingsborough Community College library, we recently decided to bring the library’s website more in line with DRY principles (Don’t Repeat Yourself). We felt we this could improve the site by creating more concise and maintainable code. DRYer code would be easier to read, understand and edit. We adopted the Vue.js framework in order to replace repetitive, hand-coded dropdown menus with programmatically generated markup. Using Vue allowed us to greatly simplify the HTML documents, while also improving maintainability.

Editorial: Journal Updates and a Call for Editors

Por Junior Tidal
Journal updates, recent policies, and a call for editors.

“You could use the API!”: A Crash Course in Working with the Alma APIs using Postman

Por Rebecca Hyams and Tamara Pilko
While there are those within libraries that are able to take vendor APIs and use them to power applications and innovative workflows in their respective systems, there are those of us that may have heard of APIs but have only the slightest idea of how to actually make use of them. Often colleagues in various forums will mention that a task could be “just done with the API” but provide little information to take us from “this is what an API is” or “here’s the API documentation” to actually putting them to use. Looking for a way to automate tasks in Alma, the authors of this article both found themselves in such a position and then discovered Postman, an API platform with a user-friendly interface that simplifies sending API calls as well as using bulk and chained requests. This article gives a basic primer in how to set up Postman, how to use it to work with ExLibris’ Alma APIs, as well as the authors’ use cases in working with electronic inventory and course reserves.

Archiving an Early Web-Based Journal: Addressing Issues of Workflow, Authenticity, and Bibliodiversity

Por Nick Szydlowski, Rhonda Holberton, Erika Johnson
SWITCH is a journal of new media art that has been published in an online-only format since 1995 by the CADRE Laboratory for New Media at San José State University (SJSU). The journal is distinctive in its commitment to presenting scholarship and criticism on new media art in a visual format that reflects and enhances its engagement with the subject. This approach, which includes the practice of redesigning the journal’s platform and visual presentation for each issue, raises significant challenges for the long-term preservation of the journal, as well as immediate issues related to indexing and discovery. This article describes the initial stages of a collaboration between the Martin Luther King, Jr. Library and the CADRE Laboratory at SJSU to archive and index SWITCH and to host a copy of the journal on SJSU’s institutional repository, SJSU ScholarWorks. It will describe the process of harvesting the journal, share scripts used to extract metadata and modify files to address accessibility and encoding issues, and discuss an ongoing curricular project that engages CADRE students in the process of augmenting metadata for SWITCH articles. The process reflects the challenges of creating an authentic version of this journal that is also discoverable and citable within the broader scholarly communication environment. This effort is part of a growing multi-institutional project to archive the new media art community in the Bay Area in a 3D web exhibition format.
❌