Developing a Hybrid Tool for Value Extraction from Social Media Sourced Textual Corpora

Social media contain a wealth of user-generated data concerning values and norms. Conventional value identification and extraction processes require significant technical expertise, confining their use in education and interdisciplinary research. This project will develop a hybrid Natural Language Processing tool to extract values from social media-sourced textual corpora, combining domain-specific embedding-based model training with user-friendly LLM (Large Language Model)-based prompt engineering. We aim for both validated extraction of heritage values in urban contexts and scalable application to new domains such as health and nutrition. The datasets and the tool will be used in research and education activities and published open source.