Mapping Indian Social Science Research to SDGs: An Automated Machine Learning Framework
DOI:
https://doi.org/10.17821/srels/2026/v63i1/172002Keywords:
Annif, Categorisation of Publications, Machine Learning, Neural Network, Retrieval metrics, SDG, Social Science Research, IndiaAbstract
This research study reports progress on a minor research project funded by the ICSSR, aimed at categorising Indian social science research publications according to the United Nations’ 17 Sustainable Development Goals (SDGs) using Machine Learning (ML)- based categorisation. To build the system, a dataset of 200,000 bibliographic records (during the period from 2016 to 2024) was assembled initially from major databases, including OpenAlex, Lens, Dimensions (open access), and Scopus, Web of Science (commercial), by applying an array of filters to check that at least one author is from India. These bibliographic records underwent manual classification based on titles and abstracts, with certainty/confidence scores assigned to ensure data reliability by a team of research scholars in LIS. Subsequently, 100,000 quality records (with confidence score ≥0.50) were utilised to train selected ML backends (mainly associative models) available in the Annif open source framework. Performance evaluation revealed that a Neural Network backend, tuned through the hyperparameter optimisation utility of Annif, outperformed other ML backends in both metrics, like F1@5 and NDCG. The study further applied this system to map recent research trends (social science publications from India during January to June, 2025), revealing that Clean Water and Sanitation (Goal 6) emerged as the highest focused theme. This was closely connected to ‘People’ centric goals such as Good Health, Quality Education, and Gender Equality. It reveals that Indian social science scholarship prioritises immediate human welfare and essential infrastructure over environmental goals such as Climate Action. These findings demonstrate the feasibility of this automated prototype for assessing the nation’s contribution to these global development targets.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Journal of Information and Knowledge

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
All the articles published in Journal of Information and Knowledge are held by the Publisher. Sarada Ranganathan Endowment for Library Science (SRELS), as a publisher requires its authors to transfer the copyright prior to publication. This will permit SRELS to reproduce, publish, distribute and archive the article in print and electronic form and also to defend against any improper use of the article.
References
Agarwal, S., & Manu, K. S. (2025). Prediction of sustainability status using machine learning models: The case of India. Indian Journal of Computer Science, 10(3), 29-41. https:// doi.org/10.17010/ijcs/2025/v10/i3/175399
Ahmed, M. (2023). Automatic indexing for agriculture: Designing a framework by deploying agrovoc, agris and annif. Journal of Information and Knowledge, 85-95. https:// doi.org/10.17821/srels/2023/v60i2/170966
Asadikia, A., Rajabifard, A., & Kalantari, M. (2021). Systematic prioritisation of SDGs: Machine learning approach. World Development, 140, Article 105269. https://doi.org/10.1016/j. worlddev.2020.105269
Breuer, A., Janetschek, H., & Malerba, D. (2019). Translating Sustainable Development Goal (SDG) interdependencies into policy advice. Sustainability, 11(7), Article 2092. https://doi.org/10.3390/su11072092
Chen, M., Mussalli, G., Amel-Zadeh, A., & Weinberg, M. (2021). NLP for SDGs: Measuring corporate alignment with the sustainable development goals. Social Science Research Network. https://doi.org/10.2139/ssrn.3874442
Confraria, H., Ciarli, T., & Noyons, E. (2024). Countries’ research priorities in relation to the sustainable development goals. Research Policy, 53(3), Article 104950. https://doi. org/10.1016/j.respol.2023.104950
Dobbin, K. K., & Simon, R. M. (2011). Optimally splitting cases for training and testing high dimensional classifiers. BMC Medical Genomics, 4(1), Article 31. https://doi. org/10.1186/1755-8794-4-31
Dwivedi, Y. K. et al. (2021). Artificial intelligence (AI): Multidisciplinary perspectives on emerging challenges, opportunities, and agenda for research, practice and policy. International Journal of Information Management, 57, Article 101994. https://doi.org/10.1016/j. ijinfomgt.2019.08.002
Golub, K., Suominen, O., Mohammed, A. T., Aagaard, H., & Osterman, O. (2024). Automated Dewey decimal classification of Swedish library metadata using Annif software. Journal of Documentation, 80(5), 1057-1079. https://doi.org/10.1108/JD-01-2022-0026
Hahn, J. (2024). Sociotechnical automation science: A case study in developing and augmenting an ensemble neural network with multiple LLMs for subject cataloging at the penn libraries [Colloquium]. University of Illinois School of Information Sciences, Summer Colloquium, Online. https://repository. upenn.edu/handle/20.500.14332/60308
Hajikhani, A., & Suominen, A. (2022). Mapping the Sustainable Development Goals (SDGs) in science, technology and innovation: Application of machine learning in SDGoriented artefact detection. Scientometrics, 127(11), 6661-6693. https://doi.org/10.1007/s11192-022-04358-x
Huan, Y., Liang, T., Li, H., & Zhang, C. (2021). A systematic method for assessing progress of achieving sustainable development goals: A case study of 15 countries. Science of the Total Environment, 752, Article 141875. https://doi. org/10.1016/j.scitotenv.2020.141875
Lane, V. R., & Scott, S. G. (2007). The neural network model of organizational identification. Organizational Behavior and Human Decision Processes, 104(2), 175-192. https://doi. org/10.1016/j.obhdp.2007.04.004
Le Blanc, D. (2015). Towards integration at last? The sustainable development goals as a network of targets. Sustainable Development, 23(3), 176-187. https://doi.org/10.1002/ sd.1582
Mukhopadhyay, P. (2023). Machine learning and bibliographic data universe: Assessing efficacy of backend algorithms in annif through retrieval metrics. SRELS Journal of Information Management, 60(1), 39-48. https://doi. org/10.17821/srels/2023/v60i1/170891 Pal, A., & Mukhopadhyay, P. (2024). Categorisation of Indian research publications by Sustainable Development
Goals (SDGs): A machine learning approach. Journal of Information and Knowledge, 61(6), 303-311. https://doi. org/10.17821/srels/2024/v61i6/171637
Pukelis, L., Bautista-Puig, N., Statuleviciute, G., Stanciauskas, V., Dikmener, G., & Akylbekova, D. (2022). OSDG 2.0: A multilingual tool for classifying text data by UN sustainable development goals (SDGs). Arxiv, abs/2211.11252. https:// api.semanticscholar.org/CorpusId:253734743
Suominen, O. (2019). Annif: DIY automated subject indexing using multiple algorithms. LIBER Quarterly: The Journal of the Association of European Research Libraries, 29(1), 1-25. https://doi.org/10.18352/lq.10285
Suominen, O., Lehtinen, M., & Inkinen, J. (2022). Annif and finto AI: Developing and implementing automated subject indexing. Jlis, 1. https://doi.org/10.4403/jlis.it-12740
Tang, S., Lei, C.-U., & Wang, H. (2024). Revealing vocational training on achieving UN’s sustainable development goals: Analysis through machine learning. 2024 IEEE International Conference on Teaching, Assessment and Learning for Engineering (TALE), 1-5. https://doi. org/10.1109/TALE62452.2024.10834357
Vanderfeesten, M., Jaworek, R., & Keßler, L. (2022). AI for mapping multi-lingual academic papers to the United Nations’ Sustainable Development Goals (SDGs). Zenodo. https://doi.org/10.5281/ZENODO.5603019
Wulff, D. U., Meier, D. S., & Mata, R. (2024). Using novel data and ensemble models to improve automated labeling of sustainable development goals. Sustainability Science, 19(5), 1773-1787. https://doi.org/10.1007/s11625-024-01516-3
Mondrita Mukhopadhyay




