Mapping Indian Social Science Research to SDGs: An Automated Machine Learning Framework

Authors

DOI:

https://doi.org/10.17821/srels/2026/v63i1/172002

Keywords:

Annif, Categorisation of Publications, Machine Learning, Neural Network, Retrieval metrics, SDG, Social Science Research, India

Abstract

This research study reports progress on a minor research project funded by the ICSSR, aimed at categorising Indian social science research publications according to the United Nations’ 17 Sustainable Development Goals (SDGs) using Machine Learning (ML)- based categorisation. To build the system, a dataset of 200,000 bibliographic records (during the period from 2016 to 2024) was assembled initially from major databases, including OpenAlex, Lens, Dimensions (open access), and Scopus, Web of Science (commercial), by applying an array of filters to check that at least one author is from India. These bibliographic records underwent manual classification based on titles and abstracts, with certainty/confidence scores assigned to ensure data reliability by a team of research scholars in LIS. Subsequently, 100,000 quality records (with confidence score ≥0.50) were utilised to train selected ML backends (mainly associative models) available in the Annif open source framework. Performance evaluation revealed that a Neural Network backend, tuned through the hyperparameter optimisation utility of Annif, outperformed other ML backends in both metrics, like F1@5 and NDCG. The study further applied this system to map recent research trends (social science publications from India during January to June, 2025), revealing that Clean Water and Sanitation (Goal 6) emerged as the highest focused theme. This was closely connected to ‘People’ centric goals such as Good Health, Quality Education, and Gender Equality. It reveals that Indian social science scholarship prioritises immediate human welfare and essential infrastructure over environmental goals such as Climate Action. These findings demonstrate the feasibility of this automated prototype for assessing the nation’s contribution to these global development targets.

Downloads

Download data is not yet available.

Published

2026-03-23

How to Cite

Mukhopadhyay, M., & Mukhopadhyay, P. (2026). Mapping Indian Social Science Research to SDGs: An Automated Machine Learning Framework. Journal of Information and Knowledge, 63(1), 09–18. https://doi.org/10.17821/srels/2026/v63i1/172002

References

Agarwal, S., & Manu, K. S. (2025). Prediction of sustainability status using machine learning models: The case of India. Indian Journal of Computer Science, 10(3), 29-41. https:// doi.org/10.17010/ijcs/2025/v10/i3/175399

Ahmed, M. (2023). Automatic indexing for agriculture: Designing a framework by deploying agrovoc, agris and annif. Journal of Information and Knowledge, 85-95. https:// doi.org/10.17821/srels/2023/v60i2/170966

Asadikia, A., Rajabifard, A., & Kalantari, M. (2021). Systematic prioritisation of SDGs: Machine learning approach. World Development, 140, Article 105269. https://doi.org/10.1016/j. worlddev.2020.105269

Breuer, A., Janetschek, H., & Malerba, D. (2019). Translating Sustainable Development Goal (SDG) interdependencies into policy advice. Sustainability, 11(7), Article 2092. https://doi.org/10.3390/su11072092

Chen, M., Mussalli, G., Amel-Zadeh, A., & Weinberg, M. (2021). NLP for SDGs: Measuring corporate alignment with the sustainable development goals. Social Science Research Network. https://doi.org/10.2139/ssrn.3874442

Confraria, H., Ciarli, T., & Noyons, E. (2024). Countries’ research priorities in relation to the sustainable development goals. Research Policy, 53(3), Article 104950. https://doi. org/10.1016/j.respol.2023.104950

Dobbin, K. K., & Simon, R. M. (2011). Optimally splitting cases for training and testing high dimensional classifiers. BMC Medical Genomics, 4(1), Article 31. https://doi. org/10.1186/1755-8794-4-31

Dwivedi, Y. K. et al. (2021). Artificial intelligence (AI): Multidisciplinary perspectives on emerging challenges, opportunities, and agenda for research, practice and policy. International Journal of Information Management, 57, Article 101994. https://doi.org/10.1016/j. ijinfomgt.2019.08.002

Golub, K., Suominen, O., Mohammed, A. T., Aagaard, H., & Osterman, O. (2024). Automated Dewey decimal classification of Swedish library metadata using Annif software. Journal of Documentation, 80(5), 1057-1079. https://doi.org/10.1108/JD-01-2022-0026

Hahn, J. (2024). Sociotechnical automation science: A case study in developing and augmenting an ensemble neural network with multiple LLMs for subject cataloging at the penn libraries [Colloquium]. University of Illinois School of Information Sciences, Summer Colloquium, Online. https://repository. upenn.edu/handle/20.500.14332/60308

Hajikhani, A., & Suominen, A. (2022). Mapping the Sustainable Development Goals (SDGs) in science, technology and innovation: Application of machine learning in SDGoriented artefact detection. Scientometrics, 127(11), 6661-6693. https://doi.org/10.1007/s11192-022-04358-x

Huan, Y., Liang, T., Li, H., & Zhang, C. (2021). A systematic method for assessing progress of achieving sustainable development goals: A case study of 15 countries. Science of the Total Environment, 752, Article 141875. https://doi. org/10.1016/j.scitotenv.2020.141875

Lane, V. R., & Scott, S. G. (2007). The neural network model of organizational identification. Organizational Behavior and Human Decision Processes, 104(2), 175-192. https://doi. org/10.1016/j.obhdp.2007.04.004

Le Blanc, D. (2015). Towards integration at last? The sustainable development goals as a network of targets. Sustainable Development, 23(3), 176-187. https://doi.org/10.1002/ sd.1582

Mukhopadhyay, P. (2023). Machine learning and bibliographic data universe: Assessing efficacy of backend algorithms in annif through retrieval metrics. SRELS Journal of Information Management, 60(1), 39-48. https://doi. org/10.17821/srels/2023/v60i1/170891 Pal, A., & Mukhopadhyay, P. (2024). Categorisation of Indian research publications by Sustainable Development

Goals (SDGs): A machine learning approach. Journal of Information and Knowledge, 61(6), 303-311. https://doi. org/10.17821/srels/2024/v61i6/171637

Pukelis, L., Bautista-Puig, N., Statuleviciute, G., Stanciauskas, V., Dikmener, G., & Akylbekova, D. (2022). OSDG 2.0: A multilingual tool for classifying text data by UN sustainable development goals (SDGs). Arxiv, abs/2211.11252. https:// api.semanticscholar.org/CorpusId:253734743

Suominen, O. (2019). Annif: DIY automated subject indexing using multiple algorithms. LIBER Quarterly: The Journal of the Association of European Research Libraries, 29(1), 1-25. https://doi.org/10.18352/lq.10285

Suominen, O., Lehtinen, M., & Inkinen, J. (2022). Annif and finto AI: Developing and implementing automated subject indexing. Jlis, 1. https://doi.org/10.4403/jlis.it-12740

Tang, S., Lei, C.-U., & Wang, H. (2024). Revealing vocational training on achieving UN’s sustainable development goals: Analysis through machine learning. 2024 IEEE International Conference on Teaching, Assessment and Learning for Engineering (TALE), 1-5. https://doi. org/10.1109/TALE62452.2024.10834357

Vanderfeesten, M., Jaworek, R., & Keßler, L. (2022). AI for mapping multi-lingual academic papers to the United Nations’ Sustainable Development Goals (SDGs). Zenodo. https://doi.org/10.5281/ZENODO.5603019

Wulff, D. U., Meier, D. S., & Mata, R. (2024). Using novel data and ensemble models to improve automated labeling of sustainable development goals. Sustainability Science, 19(5), 1773-1787. https://doi.org/10.1007/s11625-024-01516-3

Most read articles by the same author(s)

1 2 > >>