Research

Gauging Library Needs for Advanced AI-Assisted Cataloging

Project Introduction

The current era of unprecedented information proliferation and increasing multilingual diversity challenges libraries’ traditional cataloging and resource management processes. Cutting-edge artificial intelligence (AI) tools known as large language models (LLMs), which excel at processing natural language, have the potential to assist librarians in their quest to organize and provide access to their ever-growing collections. By combining the capability of AI with the expertise of catalogers, we aim to create a synergy that will empower catalogers to be as efficient and accurate as possible as they enhance the accessibility and inclusivity of library resources.

The 2-year Applied Research grant will investigate the applicability of LLMs running locally to assist the subject cataloging of digital and print resources. We plan to address two main questions. RQ1: How can LLM-based models be developed to generate accurate cataloging results, particularly classification and subject analysis, for both English and foreign language resources? RQ2: How can AI models be integrated into cataloging procedures to assist librarians? This project aims to build knowledge for the future development and deployment of LLM-based applications for cataloging.

Project Outcomes

Publications

Liu, J., Song, X., Zhang, D., Thomale, J., He, D., & Hong, L. (2025). A Hybrid Framework for Subject Analysis: Integrating Embedding-Based Regression Models with Large Language Models. Proceedings of the Association for Information Science and Technology. (Accepted)
Luo, P., Hong, L., & Nie, L. (2025). Automatic classification of research data sets into the Chinese Library Classification with generative large language model. The Electronic Library.

Presentations

Liu, J. & Hong, L. (2025). Applications of LLMs in Library Information Organization. ASIS&T 2025 IDEA Institute on AI. (Tutorial)

Workshop Organization

Large Language Models for Library Information Organization in iConference 2025 (March 18, 2025 in Indiana Bloomington) [Proposal] [Call for Participation][Presentations]

Data and Code

Project GitHub repository: https://github.com/llm4cat
Code for processing MARC records: https://github.com/llm4cat/filtermarc

PIs

Project Director and Lead Principal Investigator

Dr. Lingzi Hong, Assistant Professor in the Department of Data Science at UNT.

Co-Principal Investigator

Jason Thomale, Resource Discovery Systems Librarian in University of North Texas Libraries at UNT.

Advisory Board

Kevin Yanowski, Department Head of the Cataloging and Metadata Services at the University of North Texas Libraries.
Casey Mullin: Head of Cataloging and Metadata Services at Western Washington University Libraries.
Charlene Chou: Head of Knowledge Access at the New York University Libraries.
Sarah Hovde: Monographs & Media Cataloging Librarian (Librarian II) at the University of Maryland Libraries.
Dr. Jian Wu: Assistant Professor of Computer Science at Old Dominion University.
Dr. C. Lee Giles: David Reese Professor, College of Information Sciences and Technology at the Pennsylvania State University.

Relevant Resources

Library of Congress' recent experiments with AI for cataloging tasks: https://labs.loc.gov/work/experiments/ECD/
An interview study with catalogers on the applicability of AI to cataloging tasks: https://static.sched.com/hosted_files/2024coreforum/96/KateSlauson-ALACore.pdf
Blog about the AI for cataloging: https://ruthtillman.com/talk/mcls-waiting-for-production/

Acknowledgement

This work was supported by the Institute of Museum and Library Services under Grant (IMLS) LG-256666-OLS-24. The opinions, findings, and conclusions expressed in this publication are those of the author(s) and do not necessarily reflect the views of IMLS.

IMLS logo

Last updated Dec 2, 2024