Performance of the Annif Model in Enhancing Metadata Quality: A Bibliometric Perspective Across Multidisciplinary Research Outputs
DOI:
https://doi.org/10.17821/srels/2026/v63i1/171977Keywords:
AI in Libraries, Annif Model, Automated Subject Indexing, Bibliometrics, Information Retrieval, Library and Information Science, Metadata QualityAbstract
The rapid growth of scholarly publications across multiple disciplines has made high-quality metadata essential for effective information retrieval and resource discovery. Traditional human indexing, while accurate, is time-consuming, labour-intensive, and often inconsistent, highlighting the need for scalable automated solutions. This study evaluates the performance of the Annif model in enhancing metadata quality across four major academic domains: Library and Information Science (LIS), Computer Science, Health Sciences, and Social Sciences. Using a sample of 5,000 research outputs, Annif-generated keywords were compared with human-assigned metadata to assess improvements in keyword completeness, thematic specificity, vocabulary consistency, and retrieval performance. Evaluation metrics included precision, recall, F1-score, and user-oriented measures such as search satisfaction and ease of discovery. The findings indicate that Annif significantly enhances metadata quality, increasing keyword coverage by 38–57%, improving retrieval precision by 12–20%, and reducing indexing time from 18 hours per 1,000 records to 32 minutes. Error analysis revealed common challenges, including concept drift and over-generalisation, emphasising the continued need for domain oversight. Overall, the study demonstrates that Annif offers a scalable, efficient, and effective approach to metadata enhancement, supporting improved discoverability and knowledge organisation in multidisciplinary research repositories.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Journal of Information and Knowledge

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
All the articles published in Journal of Information and Knowledge are held by the Publisher. Sarada Ranganathan Endowment for Library Science (SRELS), as a publisher requires its authors to transfer the copyright prior to publication. This will permit SRELS to reproduce, publish, distribute and archive the article in print and electronic form and also to defend against any improper use of the article.
References
Ahmed, M. (2023). Automatic indexing for agriculture: Designing a framework by deploying Agrovoc, AGRIS and Annif. Journal of Information and Knowledge, 60(2), 85-95. https://doi.org/10.17821/srels/2023/v60i2/170966
Ahmed, M., Mukhopadhyay, M., & Mukhopadhyay, P. (2023). Automated knowledge organization AI/ML based subject indexing system for libraries. DESIDOC Journal of Library and Information Technology, 43(1), 45-54. https://doi.org/10.14429/djlit.43.01.18619
Gobbo, L. (2023). Automated subject indexing: Testing of Annif software for Italian language. In Proceedings/Article in AIB (Associazione Italiana Biblioteche).
Golub, K., Suominen, O., Mohammed, A. T., Aagaard, H., & Osterman, O. (2024). Automated Dewey Decimal classification of Swedish library metadata using Annif software. Journal of Documentation, 80(5), 1057-1079. https://doi.org/10.1108/JD-01-2022-0026
Kasprzik, A. (2024). The automation of subject indexing and the role of Annif. In Proceedings of CRIS2024. ZBW/EuroCRIS. Kerketta, S., & Mukhopadhyay, P. (2025). Automated class numbers prediction for books: An AI/ML based approach using Annif. In International Conference on Marching Beyond the Libraries (ICMBL 2024) (pp. 140-147). Atlantis Press. https://doi.org/10.2991/978-94-6463-712-0_12
Majal, G. M. (2024). The automation of subject indexing at ZBW. In Proceedings of SWIB 2024 - Semantic Web in Libraries. Leibniz Information Centre for Economics.
Mukhopadhyay, P. (2023). Machine learning and bibliographic data universe: Assessing efficacy of backend algorithms in Annif through retrieval metrics. Journal of Information and Knowledge, 60(1), 39-48. https://doi.org/10.17821/srels/2023/v60i1/170891
Suominen, O. (2019). Annif: DIY automated subject indexing using multiple algorithms. LIBER Quarterly: The Journal of the Association of European Research Libraries, 29(1), 1-25. https://doi.org/10.18352/lq.10285
Suominen, O. (2025). Annif at SemEval 2025 Task 5: Traditional XMTC augmented by LLMs. https://doi.org/10.48550/ arXiv.2504.19675
Suominen, O., Inkinen, J., & Lehtinen, M. (2022). Annif and Finto AI: Developing and implementing automated subject indexing. JLIS.it, 13(1), 265-282. https://doi.org/10.4403/ jlis.it-12740
Yang, H., Wang, N., Yang, L., Liu, W., & Wang, S. (2023). Research on the automatic subject indexing method of academic papers based on climate change domain ontology. Sustainability, 15(5), 3919. https://doi.org/10.3390/ su15053919
Debdas Mondal




