Noticias em eLiteracias

🔒
❌ Sobre o FreshRSS
Há novos artigos disponíveis, clique para atualizar a página.
Antes de ontemThe Code4Lib Journal

Extra Editorial: On the Release of Patron Data in Issue 58 of Code4Lib Journal

Por Code4Lib Editorial Board
We, the editors of the Code4Lib Journal, sincerely apologize for the recent incident in which Personally Identifiable Information (PII) was released through the publication of an article in issue 58.

Bringing it All Together: Data from Everywhere to Build Dashboards

Por David Schuster
This article will talk about how Binghamton University approached building a data dashboard bringing together various datasets, from MySQL, vendor emails, Alma Analytics and other sources. Using Power BI, Power Automate and a Microsoft gateway, we can see the power of easy access to data without knowing all of the disparate systems. We will discuss why we did it, some of the how we did it, and privacy concerns.

Leveraging Aviary for Past and Future Audiovisual Collections

Por Tyler Mobley and Heather Gilbert
Now that audio and video recording hardware is easy to use, highly portable, affordable, and capable of producing high quality content, many universities are seeing a rise in demand for oral history projects and programs on their campuses. The burden of preserving and providing access to this complex format typically falls on the library, oftentimes with no prior involvement or consultation with library staff. This can be challenging when many library staff have no formal training in oral history and only a passing familiarity with the format. To address this issue, librarians at the College of Charleston have implemented AVPreserve’s audiovisual content platform, Aviary, to build out a successful oral history program. The authors will share their experience building new oral history programs that coexist alongside migrated audiovisual materials from legacy systems. They will detail how they approached migrating legacy oral histories in batch form, and how they leveraged Aviary’s API and embed functionalities to present Aviary audiovisual materials seamlessly alongside other cultural heritage materials in a single, searchable catalog. This article will also discuss techniques for managing an influx of oral histories from campus stakeholders and details on how to make efficient use of time-coded transcripts and indices for the best user experience possible.

Beyond the Hype Cycle: Experiments with ChatGPT’s Advanced Data Analysis at the Palo Alto City Library

Por M Ryan Hess and Chris Markman
In June and July of 2023 the Palo Alto City Library’s Digital Services team embarked on an exploratory journey applying Large Language Models (LLMs) to library projects. This article, complete with chat transcripts and code samples, highlights the challenges, successes, and unexpected outcomes encountered while integrating ChatGPT Pro into our day-to-day work. Our experiments utilized ChatGPTs Advanced Data Analysis feature (formerly Code Interpreter). The first goal tested the Search Engine Optimization (SEO) potential of ChatGPT plugins. The second goal of this experiment aimed to enhance our web user experience by revising our BiblioCommons taxonomy to better match customer interests and make the upcoming Personalized Promotions feature more relevant. ChatGPT helped us perform what would otherwise be a time-consuming analysis of customer catalog usage to determine a list of taxonomy terms better aligned with that usage. In the end, both experiments proved the utility of LLMs in the workplace and the potential for enhancing our librarian’s skills and efficiency. The thrill of this experiment was in ChatGPT's unprecedented efficiency, adaptability, and capacity. We found it can solve a wide range of library problems and speed up project deliverables. The shortcomings of LLMs, however, were equally palpable. Each day of the experiment we grappled with the nuances of prompt engineering, contextual understanding, and occasional miscommunications with our new AI assistant. In short, a new class of skills for information professionals came into focus.

A practical method for searching scholarly papers in the General Index without a high-performance computer

Por Emily Cukier
The General Index is a free database that offers unprecedented access to keywords and ngrams derived from the full text of over 107 million scholarly articles. Its simplest use is looking up articles that contain a term of interest, but the data set is large enough for text mining and corpus linguistics. Despite being positioned as a public utility, there is no user interface; one must download, query, and extract results from raw data tables. Not only is computing skill a barrier to use, but the file sizes are too large for most desktop computers to handle. This article will show a practical way to use the GI for researchers with moderate skills and resources. It will walk though building a bibliography of articles and a visualizing yearly prevalence of a topic in the General Index, using simple R programming commands and a modestly equipped desktop computer (code is available at https://osf.io/s39n7/). It will briefly discuss what else can be done (and how) with more powerful computational resources.

Pipeline or Pipe Dream: Building a Scaled Automated Metadata Creation and Ingest Workflow Using Web Scraping Tools

Por Matthew Krc and Anna Oates Schlaack
Since 2004, the FRASER Digital Library has provided free access to publications and archival collections related to the history of economics, finance, banking, and the Federal Reserve System. The agile web development team that supports FRASER’s digital asset management system embarked on an initiative to automate collecting documents and metadata from US governmental sources across the web. These sources present their content on web pages but do not serve the metadata and document links via an API or other semantic web technologies, making automation a unique challenge. Using a combination of third-party software, lightweight cloud services, and custom Python code, the FRASER Recurring Downloads project transformed what was previously a labor-intensive daily process into a metadata creation and ingest pipeline that requires minimal human intervention or quality control. This article will provide an overview of the software and services used for the Recurring Downloads pipeline, as well as some of the struggles that the team encountered during the design and build process, and current use of the final product. The project required a more detailed plan than was designed and documented. The fully manual process was not intended to be automated when established, which introduced inherent complexity in creating the pipeline. A more comprehensive plan could have made the iterative development process easier by having a defined data model, and documentation of—and strategy for—edge cases. Further initial analysis of the cloud services used would have defined the limitations of those services, and workarounds could have been accounted for in the project plan. While the labor-intensive manual workflow has been reduced significantly, the required skill sets to efficiently maintain the automated workflow present a sustainability challenge of task distribution between librarians and developers. This article will detail the challenges and limitations of transitioning and standardizing recurring web scraping across more than 50 sources to a semi-automated workflow and potential future improvements to the pipeline.

Standing Up Vendor-Provided Web Hosting Services at Florida State University Libraries: A Case Study

Por Matthew E. Hunter, Devin Soper, and Sarah Stanley
CreateFSU is Florida State University Libraries’ branded Reclaim Hosting, Domain of One’s Own web-hosting service. CreateFSU provides current FSU faculty, staff, and some students web domains and over 150 popular open-source content management systems including Wordpress, Drupal, Scalar, and Omeka. Since the launch of the service in September 2021, the Libraries have negotiated the demands of providing such a service with various administrative stakeholders across campus, expanded the target audience, provided support and refined our workflows and documentation to make the service fit campus needs. Using this service, members of the FSU community showcase the fruits of their research to a broad audience in ways that are highly accessible and engaging. More work needs to be done to promote CreateFSU to the FSU community and identify opportunities to integrate the service into existing research and learning workflows. To expand the service to meet new use cases and ensure its scalability, the Libraries hope to convince campus partners to consider its utility to their missions and contribute funding. This article lays out our experiences in launching and hosting this service over its first two years and proposes steps for future development and growth.

Islandora for archival access and discovery

Por Sarah Jones, Cory Lampert, Emily Lapworth, and Seth Shaw
This article is a case study describing the implementation of Islandora 2 to create a public online portal for the discovery, access, and use of archives and special collections materials at the University of Nevada, Las Vegas. The authors will explain how the goal of providing users with a unified point of access across diverse data (including finding aids, digital objects, and agents) led to the selection of Islandora 2 and they will discuss the benefits and challenges of using this open source software. They will describe the various steps of implementation, including custom development, migration from CONTENTdm, integration with ArchivesSpace, and developing new skills and workflows to use Islandora most effectively. As hindsight always provides additional perspective, the case study will also offer reflection on lessons learned since the launch, insights on open-source repository sustainability, and priorities for future development.

Comparative analysis of automated speech recognition technologies for enhanced audiovisual accessibility

Por Dave Rodriguez and Bryan J. Brown
The accessibility of digital audiovisual (AV) collections is a difficult legal and ethical area that nearly all academic libraries will need to navigate at some point. The inclusion of AV accessibility features like captions and transcripts enormously benefit users with disabilities in addition to providing extra value to the repository more universally. However, implementing these features has proven challenging for many reasons. Recent technological advancements in automatic speech recognition (ASR) and its underlying artificial intelligence (AI) technology offer an avenue for librarians in stewarding more accessible collections. This article will discuss these opportunities and present research from Florida State University Libraries evaluating the performance of different ASR tools. The authors will also present an overview of basic AV accessibility-related concepts, ethical issues in using AI technology, and a brief technical discussion of captioning formats.

Real-Time Reporting Using the Alma API and Google Apps Script

Por David Fulmer
When the University of Michigan Library migrated from the Aleph Integrated Library System (ILS) to the Alma Library Services Platform (LSP), many challenges arose in migrating our workflows from a multi-tier client/server structured ILS with an in-house, locally hosted server which was accessed by staff through a dedicated client to a cloud-based LSP accessed by staff through a browser. Among those challenges were deficiencies in timely reporting functionality in the new LSP, and incompatibility with the locally popular macro software that was currently in use. While the Alma LSP includes a comprehensive business intelligence tool, Alma Analytics, which includes a wide variety of out-of-the-box reports and on-demand reporting, it suffers from one big limitation: the data on which the reports are based are a copy of the data from Alma extracted overnight. If you need a report of data from Alma that is timely, Analytics isn’t suitable. These issues necessitated the development of an application that brought together the utility of the Alma APIs and the convenience of the Google Apps Script platform. This article will discuss the resulting tool which provides a real-time report on invoice data stored in Alma using the Google Apps Script platform.

Using Event Notifications, Solid and Orchestration for Decentralizing and Decoupling Scholarly Communication

Por Patrick Hochstenbach, Ruben Verborgh and Herbert Van de Sompel
The paper presents the case for a decentralized and decoupled architecture for scholarly communication. An introduction to the Event Notifications protocol will be provided as being applied in projects such as the international COAR Notify Initiative and the NDE-Usable program by memory institutions in The Netherlands. This paper provides an implementation of Event Notifications using a Solid server. The processing of notifications can be automated using an orchestration service called Koreografeye. Koreografeye will be applied to a citation extraction and relay experiment to show all these tools fit together.

Using Airtable to download and parse Digital Humanities Data

Por William K. Dewey
Airtable is an increasingly popular cloud-based format for entering and storing research data, especially in the digital humanities. It combines the simplicity of spreadsheets like CSV or Excel with a relational database’s ability to model relationships and link records. The Center for Digital Research in the Humanities (CDRH) at Nebraska uses Airtable data for two projects, African Poetics (africanpoetics.unl.edu) and Petitioning for Freedom (petitioningforfreedom.unl.edu). In the first project, the data focuses on African poets and news coverage of them, and in the second, the data focuses on habeas corpus petitions and individuals involved in the cases. CDRH’s existing software stack (designed to facilitate display and discovery) can take in data in many formats, including CSV, and parse it with Ruby scripts and ingest it into an API based on the Elasticsearch search index. The first step in using Airtable data is to download and convert it into a usable data format. This article covers the command line tools that can download tables from Airtable, the formats that can be downloaded (JSON being the most convenient for automation) and access management for tables and authentication. Python scripts can process this JSON data into a CSV format suitable for ingesting into other systems The article goes on to discuss how this data processing might work. It also discusses the process of exporting information from the join tables, Airtable’s relational database-like functionality. Join data is not human-readable when exported, but it can be pre-processed in Airtable into parsable formats. After processing the data into CSV format, this article touches on how CDRH API fields are populated from plain values and more complicated structures including Markdown-style links. Finally, this article discusses the advantages and disadvantages of Airtable for managing data, from a developer’s perspective.

Using Scalable Vector Graphics (SVG) and Google Sheets to Build a Visual Tool Location Web App

Por Jonathan Bradley
At the University Libraries at Virginia Tech, we recently built a visual kiosk web app for helping patrons in our makerspace locate the tools they need and assist our staff in returning and inventorying our large selection of tools, machines, and consumables. The app is built in Svelte, and uses the Google Sheets "publish to web as csv" feature to pull data from a staff-maintained list of equipment in the space. All of this is tied to a Scalable Vector Graphics (SVG) file that is controlled by JavaScript and CSS to provide an interactive map of our shelving and storage locations, highlighting bins as patrons select specific equipment from a searchable list on the kiosk, complete with photos of each piece of equipment. In this article, you will learn why the app was made, the problems it has solved, why certain technologies were used and others weren't, the challenges that arose during development, and where the project stands to go from here.

Developing a Multi-Portal Digital Library System: A Case Study of the new University of Florida Digital Collections

Por Todd Digby, Cliff Richmond, Dustin Durden, and Julio Munoz
The University of Florida (UF) launched the UF Digital Collections in 2006. Since this time, the system has grown to over 18 million pages of content. The locally developed digital library system consisted of an integrated public frontend interface and a production backend. As with other monoliths, being able to adapt and make changes to the system became increasingly difficult as time went on and the size of the collections grew. As production processes changed, the system was modified to make improvements on the backend, but the public interface became dated and increasingly not mobile responsive. A decision was made to develop a new system, starting with decoupling the public interface from the production system. This article will examine our experience in rearchitecting our digital library system and deploying our new multi-portal, public-facing system. After an environmental scan of digital library technologies, it was decided to not use a current open-source digital library system. A relatively new programming team, who were new to the library ecosystem, allowed us to rethink many of our existing assumptions and provided new insights and development opportunities. Using technologies that include Python, APIs, ElasticSearch, ReactJS, PostgreSQL, and more, has allowed us to build a flexible and adaptable system that allows us to hire developers in the future who may not have experience building digital library systems.

The Use of Python to Support Technical Services Work in Academic Libraries

Por Maria Collins, Xiaoyan Song, and Sherri Schon
Technical services professionals in academic libraries are firmly committed to digital transformation and have embraced technologies and data practices that reshape their work to be more efficient, reliable, and scalable. Evolving systems, constantly changing workflows, and management of large-scale data are constants in the technical services landscape. Maintaining one’s ability to effectively work in this kind of environment involves embracing continuous learning cycles and incorporating new skills - which in effect means training people in a different way and re-conceptualizing how libraries provide support for technical services work. This article presents a micro lens into this space by examining the use of Python within a technical services environment. The authors conducted two surveys and eleven follow up interviews to investigate how Python is used in academic libraries to support technical services work and to learn more about training and organizational support across the academic library community. The surveys and interviews conducted for this research indicate that understanding the larger context of culture and organizational support are of high importance for illustrating the complications of this learning space for technical services. Consequently, this article will address themes that affect skills building in technical services at both a micro and macro level.

Enhancing Serials Holdings Data: A Pymarc-Powered Clean-Up Project

Por Minyoung Chung and Phani Chaitanya Pendyala
Following the recent transition from Inmagic to Ex Libris Alma, the Technical Services department at the University of Southern California (USC) in Los Angeles undertook a post-migration cleanup initiative. This article introduces methodologies aimed at improving irregular summary holdings data within serials records using Pymarc, regular expressions, and the Alma API in MarcEdit. The challenge identified was the confinement of serials' holdings information exclusively to the 866 MARC tag for textual holdings. To address this challenge, Pymarc and regular expressions were leveraged to parse and identify various patterns within the holdings data, offering a nuanced understanding of the intricacies embedded in the 866 field. Subsequently, the script generated a new 853 field for captions and patterns, along with multiple instances of the 863 field for coded enumeration and chronology data, derived from the existing data in the 866 field. The final step involved utilizing the Alma API via MarcEdit, streamlining the restructuring of holdings data and updating nearly 5,000 records for serials. This article illustrates the application of Pymarc for both data analysis and creation, emphasizing its utility in generating data in the MARC format. Furthermore, it posits the potential application of Pymarc to enhance data within library and archive contexts.

Editorial

Por Brighid M. Gonzales
Issue 58 of the Code4Lib Journal is bursting at the seams with examples of how libraries are creating new technologies, leveraging existing technologies, and exploring the use of AI to benefit library work. We had an unprecedented number of submissions this quarter and the resulting issue features 16 articles detailing some of the more unique and innovative technology projects libraries are working on today.

Jupyter Notebooks and Institutional Repositories: A Landscape Analysis of Realities, Opportunities and Paths Forward

Por Adrienne VandenBosch, Keith E. Maull, and Matthew Mayernik
Jupyter Notebooks are important outputs of modern scholarship, though the longevity of these resources within the broader scholarly record is still unclear. Communities and their creators have yet to holistically understand creation, access, sharing and preservation of computational notebooks, and such notebooks have yet to be designated a proper place among institutional repositories or other preservation environments as first class scholarly digital assets. Before this can happen, repository managers and curators need to have the appropriate tools, schemas and best practices to maximize the benefit of notebooks within their repository landscape and environments. This paper explores the landscape of Jupyter notebooks today, and focuses on the opportunities and challenges related to bringing Jupyter Notebooks into institutional repositories. We explore the extent to which Jupyter Notebooks are currently accessioned into institutional repositories, and how metadata schemas like CodeMeta might facilitate their adoption. We also discuss characteristics of Jupyter Notebooks created by researchers at the National Center for Atmospheric Research, to provide additional insight into how to assess and accession Jupyter Notebooks and related resources into an institutional repository.

Editorial: Big code, little code, open code, old code

Por Péter Király
Paraphrasing the title of Christine L. Borgman’s inaugural lecture in Göttingen some years ago “Big data, little data, open data” I could say that the current issue of Code4Lib is about big code, little code, open code, old code. The good side of coding is that effective contribution could be done with different levels and types of background knowledge. The issue proves to us that even small modifications or sharing knowledge about command line usage of a tool might be very useful for the user community. Let’s see what we have!

Evaluating HTJ2K as a Drop-In Replacement for JPEG2000 with IIIF

Por Glen Robson, Stefano Cossu, Ruven Pillay, Michael D. Smith
JPEG2000 is a widely adopted open standard for images in cultural heritage, both for delivering access and for creating preservation files that are losslessly compressed. Recently, a new extension to JPEG2000 has been developed by the JPEG Committee: “High Throughput JPEG2000,” better known as HTJ2K. HTJ2K promises faster encoding and decoding speeds compared to traditional JPEG2000 Part-1, while requiring little or no changes to existing code and infrastructure. The IIIF community has completed a project to evaluate HTJ2K as a drop-in replacement for encoding JPEG2000 and to validate the expected improvements regarding speed and efficiency. The group looked at a number of tools including Kakadu, OpenJPEG, and Grok that support HTJ2K and ran encoding tests comparing the encoding speeds and required disk space for these images. The group also set up decoding speed tests comparing HTJ2K with tiled pyramid TIFF and traditional JPEG2000 using one of the major open source IIIF Image servers, IIPImage. We found that HTJ2K is significantly faster than traditional JPEG2000, though the results are more nuanced when compared with TIFF.

Standardization of Journal Title Information from Interlibrary Loan Data: A Customized Python Code Approach

Por Jennifer Ye Moon-Chung
Interlibrary loan (ILL) data plays a crucial role in making informed journal subscription decisions. However, inconsistent or incomplete data associated with journal titles and International Standard Serial Numbers (ISSNs) as data points often entered inaccurately by requestors, presents challenges when attempting to make use of the ILL data. This article introduces a solution utilizing customized Python code to standardize journal titles obtained from user-entered data. The solution incorporates a preprocessing workflow that filters out irrelevant information and employs Application Programming Interfaces (APIs) to replace inaccurate titles with precise ones based on retrieved ISSNs, ensuring data accuracy. The solution then presents the processed data in a dashboard format, highlighting the most requested journals and enabling librarians to interactively explore the data. By adopting this approach, librarians can make well-informed decisions and conduct thorough analysis, resulting in more efficient and effective management of library resources.

ChronoNLP: Exploration and Analysis of Chronological Textual Corpora

Por Erin Wolfe
This article introduces ChronoNLP, a free and open-source web application designed to enable the application of Natural Language Processing (NLP) techniques to textual datasets with a time-based component. This interactive Python platform allows users to filter, search, explore, and visualize this data, allowing the temporal aspect to play a central role in data analysis. ChronoNLP makes use of several powerful NLP libraries to facilitate various text analysis techniques including topic modeling, term/TF-IDF frequency evaluation, automated keyword extraction, named entity recognition and other tasks through a graphical interface without the need for coding or technical knowledge. By highlighting the temporal aspect of specific types of corpora, ChronoNLP provides access to methods of parsing and visualizing the data in a user-friendly format to help uncover patterns and trends in text-based materials.

A Very Small Pond: Discovery Systems That Can Be Used with FOLIO in Academic Libraries

Por Aaron Neslin, Jaime Taylor
FOLIO, an open source library services platform, does not have a front end patron interface for searching and using library materials. Any library installing FOLIO will need at least one other software to perform those functions. This article evaluates which systems, in a limited marketplace, are available for academic libraries to use with FOLIO.

Supporting Library Consortia Website Needs: Two Case Studies

Por Elizabeth Joan Kelly
LOUIS: The Louisiana Library Network provides library technology infrastructure, electronic resources, affordable learning, and digital literacy support for its 47 academic library members. With this support comes a need to develop web solutions for members, a challenging task as the members have their own websites on a multitude of platforms, and a multitude of library faculty and staff with differing needs. This article details two case studies in developing consortia-specific web design projects. The first summarizes the LOUIS Tabbed Search Box Order Form, an opportunity for members to "order" a custom-made search box for the various services LOUIS supports that can then be embedded on their library's website. The second involves the LOUIS Community Jobs Board, a member-driven job listing tool that exists on the LOUIS site, but that members can publish jobs to using a Google Form. Both the Search Box Order Form and the Jobs Board have resulted in increased engagement with and satisfaction from member libraries. This article will include best practices, practical solutions, and sample code for both projects.

Creating a Full Multitenant Back End User Experience in Omeka S with the Teams Module

Por Alexander Dryden and Daniel G. Tracy
When Omeka S appeared as a beta release in 2016, it offered the opportunity for researchers or larger organizations to publish multiple Omeka sites from the same installation. Multisite functionality was and continues to be a major advance for what had become the premiere platform for scholarly digital exhibits produced by libraries, museums, researchers, and students. However, while geared to larger institutional contexts, Omeka S poses some user experience challenges on the back end for larger organizations with numerous users creating different sites. These challenges include a “cluttered” effect for many users seeing resources they do not need to access and data integrity challenges due to the possibility of users editing resources that other users need in their current state. The University of Illinois Library, drawing on two local use cases as well as two additional external use cases, developed the Teams module to address these challenges. This article describes the needs leading to the decision to create the module, the project requirement gathering process, and the implementation and ongoing development of Teams. The module and findings are likely to be of interest to other institutions adopting Omeka S but also, more generally, to libraries seeking to contribute successfully to larger open-source initiatives.
❌