Performance of the Annif Model in Enhancing Metadata Quality: A Bibliometric Perspective Across Multidisciplinary Research Outputs

Debdas Mondal

doi:10.17821/srels/2026/v63i1/171977

Performance of the Annif Model in Enhancing Metadata Quality: A Bibliometric Perspective Across Multidisciplinary Research Outputs

Authors

Debdas Mondal
S. R. Fatepuria College, Murshidabad – 742133, West Bengal https://orcid.org/0000-0003-3321-979X

DOI:

https://doi.org/10.17821/srels/2026/v63i1/171977

Keywords:

AI in Libraries, Annif Model, Automated Subject Indexing, Bibliometrics, Information Retrieval, Library and Information Science, Metadata Quality

Abstract

The rapid growth of scholarly publications across multiple disciplines has made high-quality metadata essential for effective information retrieval and resource discovery. Traditional human indexing, while accurate, is time-consuming, labour-intensive, and often inconsistent, highlighting the need for scalable automated solutions. This study evaluates the performance of the Annif model in enhancing metadata quality across four major academic domains: Library and Information Science (LIS), Computer Science, Health Sciences, and Social Sciences. Using a sample of 5,000 research outputs, Annif-generated keywords were compared with human-assigned metadata to assess improvements in keyword completeness, thematic specificity, vocabulary consistency, and retrieval performance. Evaluation metrics included precision, recall, F1-score, and user-oriented measures such as search satisfaction and ease of discovery. The findings indicate that Annif significantly enhances metadata quality, increasing keyword coverage by 38–57%, improving retrieval precision by 12–20%, and reducing indexing time from 18 hours per 1,000 records to 32 minutes. Error analysis revealed common challenges, including concept drift and over-generalisation, emphasising the continued need for domain oversight. Overall, the study demonstrates that Annif offers a scalable, efficient, and effective approach to metadata enhancement, supporting improved discoverability and knowledge organisation in multidisciplinary research repositories.

Downloads

Download data is not yet available.

Downloads

Requires Subscription PDF ⁰

Published

2026-02-28

How to Cite

Mondal, D. (2026). Performance of the Annif Model in Enhancing Metadata Quality: A Bibliometric Perspective Across Multidisciplinary Research Outputs. Journal of Information and Knowledge, 63(1), 19–27. https://doi.org/10.17821/srels/2026/v63i1/171977

Download Citation

Issue

Volume 63, Issue 1, February 2026

Section

Articles

License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

All the articles published in Journal of Information and Knowledge are held by the Publisher. Sarada Ranganathan Endowment for Library Science (SRELS), as a publisher requires its authors to transfer the copyright prior to publication. This will permit SRELS to reproduce, publish, distribute and archive the article in print and electronic form and also to defend against any improper use of the article.

References

Ahmed, M. (2023). Automatic indexing for agriculture: Designing a framework by deploying Agrovoc, AGRIS and Annif. Journal of Information and Knowledge, 60(2), 85-95. https://doi.org/10.17821/srels/2023/v60i2/170966

Ahmed, M., Mukhopadhyay, M., & Mukhopadhyay, P. (2023). Automated knowledge organization AI/ML based subject indexing system for libraries. DESIDOC Journal of Library and Information Technology, 43(1), 45-54. https://doi.org/10.14429/djlit.43.01.18619

Gobbo, L. (2023). Automated subject indexing: Testing of Annif software for Italian language. In Proceedings/Article in AIB (Associazione Italiana Biblioteche).

Golub, K., Suominen, O., Mohammed, A. T., Aagaard, H., & Osterman, O. (2024). Automated Dewey Decimal classification of Swedish library metadata using Annif software. Journal of Documentation, 80(5), 1057-1079. https://doi.org/10.1108/JD-01-2022-0026

Kasprzik, A. (2024). The automation of subject indexing and the role of Annif. In Proceedings of CRIS2024. ZBW/EuroCRIS. Kerketta, S., & Mukhopadhyay, P. (2025). Automated class numbers prediction for books: An AI/ML based approach using Annif. In International Conference on Marching Beyond the Libraries (ICMBL 2024) (pp. 140-147). Atlantis Press. https://doi.org/10.2991/978-94-6463-712-0_12

Majal, G. M. (2024). The automation of subject indexing at ZBW. In Proceedings of SWIB 2024 - Semantic Web in Libraries. Leibniz Information Centre for Economics.

Mukhopadhyay, P. (2023). Machine learning and bibliographic data universe: Assessing efficacy of backend algorithms in Annif through retrieval metrics. Journal of Information and Knowledge, 60(1), 39-48. https://doi.org/10.17821/srels/2023/v60i1/170891

Suominen, O. (2019). Annif: DIY automated subject indexing using multiple algorithms. LIBER Quarterly: The Journal of the Association of European Research Libraries, 29(1), 1-25. https://doi.org/10.18352/lq.10285

Suominen, O. (2025). Annif at SemEval 2025 Task 5: Traditional XMTC augmented by LLMs. https://doi.org/10.48550/ arXiv.2504.19675

Suominen, O., Inkinen, J., & Lehtinen, M. (2022). Annif and Finto AI: Developing and implementing automated subject indexing. JLIS.it, 13(1), 265-282. https://doi.org/10.4403/ jlis.it-12740

Yang, H., Wang, N., Yang, L., Liu, W., & Wang, S. (2023). Research on the automatic subject indexing method of academic papers based on climate change domain ontology. Sustainability, 15(5), 3919. https://doi.org/10.3390/ su15053919

Performance of the Annif Model in Enhancing Metadata Quality: A Bibliometric Perspective Across Multidisciplinary Research Outputs

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

License

References

Most read articles by the same author(s)

Make Submission

Authors Corner

Template

Our Journals

Editorial Team

Chief Editor

Announcements

Thanks to Authors for Publishing their articles as Open Access

backpage

Subscription

Keywords