Performance of the Annif Model in Enhancing Metadata Quality: A Bibliometric Perspective Across Multidisciplinary Research Outputs

Authors

DOI:

https://doi.org/10.17821/srels/2026/v63i1/171977

Keywords:

AI in Libraries, Annif Model, Automated Subject Indexing, Bibliometrics, Information Retrieval, Library and Information Science, Metadata Quality

Abstract

The rapid growth of scholarly publications across multiple disciplines has made high-quality metadata essential for effective information retrieval and resource discovery. Traditional human indexing, while accurate, is time-consuming, labour-intensive, and often inconsistent, highlighting the need for scalable automated solutions. This study evaluates the performance of the Annif model in enhancing metadata quality across four major academic domains: Library and Information Science (LIS), Computer Science, Health Sciences, and Social Sciences. Using a sample of 5,000 research outputs, Annif-generated keywords were compared with human-assigned metadata to assess improvements in keyword completeness, thematic specificity, vocabulary consistency, and retrieval performance. Evaluation metrics included precision, recall, F1-score, and user-oriented measures such as search satisfaction and ease of discovery. The findings indicate that Annif significantly enhances metadata quality, increasing keyword coverage by 38–57%, improving retrieval precision by 12–20%, and reducing indexing time from 18 hours per 1,000 records to 32 minutes. Error analysis revealed common challenges, including concept drift and over-generalisation, emphasising the continued need for domain oversight. Overall, the study demonstrates that Annif offers a scalable, efficient, and effective approach to metadata enhancement, supporting improved discoverability and knowledge organisation in multidisciplinary research repositories.

Downloads

Download data is not yet available.

Published

2026-03-23

How to Cite

Mondal, D. (2026). Performance of the Annif Model in Enhancing Metadata Quality: A Bibliometric Perspective Across Multidisciplinary Research Outputs. Journal of Information and Knowledge, 63(1), 19–27. https://doi.org/10.17821/srels/2026/v63i1/171977

References

Ahmed, M. (2023). Automatic indexing for agriculture: Designing a framework by deploying Agrovoc, AGRIS and Annif. Journal of Information and Knowledge, 60(2), 85-95. https://doi.org/10.17821/srels/2023/v60i2/170966

Ahmed, M., Mukhopadhyay, M., & Mukhopadhyay, P. (2023). Automated knowledge organization AI/ML based subject indexing system for libraries. DESIDOC Journal of Library and Information Technology, 43(1), 45-54. https://doi.org/10.14429/djlit.43.01.18619

Gobbo, L. (2023). Automated subject indexing: Testing of Annif software for Italian language. In Proceedings/Article in AIB (Associazione Italiana Biblioteche).

Golub, K., Suominen, O., Mohammed, A. T., Aagaard, H., & Osterman, O. (2024). Automated Dewey Decimal classification of Swedish library metadata using Annif software. Journal of Documentation, 80(5), 1057-1079. https://doi.org/10.1108/JD-01-2022-0026

Kasprzik, A. (2024). The automation of subject indexing and the role of Annif. In Proceedings of CRIS2024. ZBW/EuroCRIS. Kerketta, S., & Mukhopadhyay, P. (2025). Automated class numbers prediction for books: An AI/ML based approach using Annif. In International Conference on Marching Beyond the Libraries (ICMBL 2024) (pp. 140-147). Atlantis Press. https://doi.org/10.2991/978-94-6463-712-0_12

Majal, G. M. (2024). The automation of subject indexing at ZBW. In Proceedings of SWIB 2024 - Semantic Web in Libraries. Leibniz Information Centre for Economics.

Mukhopadhyay, P. (2023). Machine learning and bibliographic data universe: Assessing efficacy of backend algorithms in Annif through retrieval metrics. Journal of Information and Knowledge, 60(1), 39-48. https://doi.org/10.17821/srels/2023/v60i1/170891

Suominen, O. (2019). Annif: DIY automated subject indexing using multiple algorithms. LIBER Quarterly: The Journal of the Association of European Research Libraries, 29(1), 1-25. https://doi.org/10.18352/lq.10285

Suominen, O. (2025). Annif at SemEval 2025 Task 5: Traditional XMTC augmented by LLMs. https://doi.org/10.48550/ arXiv.2504.19675

Suominen, O., Inkinen, J., & Lehtinen, M. (2022). Annif and Finto AI: Developing and implementing automated subject indexing. JLIS.it, 13(1), 265-282. https://doi.org/10.4403/ jlis.it-12740

Yang, H., Wang, N., Yang, L., Liu, W., & Wang, S. (2023). Research on the automatic subject indexing method of academic papers based on climate change domain ontology. Sustainability, 15(5), 3919. https://doi.org/10.3390/ su15053919