Investigating Scientific Misinformation Using Different Modes of Learning


Misinformation spread through social media poses a grave threat to public health, interfering with the best scientific evidence available. This spread was most visible during the COVID-19 pandemic. To track and curb misinformation, an essential first step is to detect it. One component of misinformation detection is finding examples of misinformation posts that can serve as training data for misinformation detection algorithms. In this paper, we focus on the challenge of collecting high quality training data in misinformation detection applications. To that end, we demonstrate the effectiveness of a simple methodology and show its viability on five myths related to COVID-19. Our methodology incorporates both dictionary-based sampling and predictions from weak learners in order to identify a reasonable number of myth examples for data labeling. To aid researchers in adjusting this methodology for specific use cases, we use word usage entropy to describe when fewer iterations of sampling and training will be needed to obtain high quality samples. Finally, we present a case study that shows the prevalence of three of our myths on Twitter at the beginning of the pandemic.

In Proceedings of the International Conference on Data Science, Technology and Applications (DATA)
Kornraphop Kawintiranon
Kornraphop Kawintiranon
Ph.D. in CS

My research interests include AI/ML, NLP and Data Science.