Detecting and Understanding of Information Pollution on Social Media

Abstract

Social media and the web have become primary sources for obtaining information and news. Given the speed and spread of information on social media, effects of poor-quality information, especially with respect to health-related information, can be consequential. In recent years, researchers have been working on detecting different types of poor-quality information, e.g. fake news and misinformation, and identifying levels of support. Knowledge of both of these types of information can be used to mitigate the negative effects of this information pollution. Research on information pollution within computer science is growing rapidly; however, the state of the art still has a number of limitations, including the inability to accurately identify information pollution in noisy domains like social media where high-quality labels are limited and information spreads rapidly.

In this dissertation, we aim to address some of the aforementioned challenges, specifically on two types of information pollution, spam and misinformation. We show on different Twitter data sets that context-specific spam exists and is identifiable using only content-based features, and present a comparative study of different detection algorithms in a low resource setting. Next, we review literature on misinformation detection and discuss the need for building bridges between misinformation detection research in computer science and research in other disciplines. To detect misinformation on Twitter in a resource-constrained environment, we develop a novel reinforcement learning framework for weak supervision and show that our model outperforms baseline models. To improve detection of multiple myths simultaneously and exploit information learned for different myth themes, we propose a novel cooperative learning method for multi-agent reinforcement learning, thereby improving the training process in our reinforcement learning framework. To understand whether there is support for the misinformation, we study stance detection. We propose a stance detection algorithm that uses the log-odds-ratio algorithm to identify distinguishable stance words, then model a novel attention mechanism that focuses on these words. We show that our approach outperforms the state-of-the-art models on Twitter data sets about the 2020 US Presidential election. Next, we develop and release a pre-trained language model trained on a large amount of social media data about the US election in order to support those studying political (mis)information.

Finally, we publish the models, data sets, and code, enabling future research on spam, misinformation, and stance detection. All of these contributions reduce the existing knowledge gaps, bringing us closer to a world free of information pollution.

Publication
In ProQuest Dissertations and Theses
Kornraphop Kawintiranon
Kornraphop Kawintiranon
LLM / ML / NLP

My research interests include AI/ML, NLP and Data Science.

Related