Lionel Zhepeng Li
Prof. Lionel Zhepeng LI
Innovation and Information Management
External Data Scientist, Bank of Ning Bo (SZSE: 002142)
Associate Professor

3910 2404

KK 822

Publications
Mitigating Bias in Hate Speech Detection With a Small Number of Expert Annotations: A Prompt-Based Learning Approach

Hate speech is a major problem on social media platforms. Automatic hate speech detection methods relying on machine learning models, which learn from manually labeled datasets, have been proposed in both academia and industry. However, there is increasing evidence that hate speech detection datasets labeled by general annotators (e.g., amateurs or MTurk workers) contain systematic bias, as they cannot effectively consider language use differences among different speakers. When such biased datasets are used to train machine learning models, the resulting models will also be biased. Unlike general annotators, experts can produce much less biased annotations. However, expert annotations cannot be efficiently obtained in large quantity. This paper bridges the gap by adopting a weakly supervised learning method for hate speech detection using a small number of expert annotations. We propose a novel design that uses contrastive learning and prompt-based learning based on large language models, incorporating a group estimator, a pair generator, and knowledge injection. Using real-world Twitter posts written by African American English speakers and other racial groups as an example, extensive experiments were conducted to demonstrate the superior performance of the proposed method. The proposed approach was also evaluated on data in the LGBTQ+ community and achieved consistent results. The study has important academic and practical implications for hate speech detection and large language models.

How to Predict Popularity

If the many functions of digital social media networks could be summed up in one word, it would likely be “sharing”. Through a myriad of apps and platforms, we share our thoughts, feelings, opinions, ideas, and more – with our friends and family, with our online social circles, with strangers, and even with companies.

Thriving at the Forefront of Information Technologies – Dr. Zhepeng LI

Aspired to make a difference than making a fortune, Dr. Zhepeng Li is dedicated to propel the development of information systems and machine learning technologies.

What Will Be Popular Next? Predicting Hotspots in Two-Mode Social Networks

In social networks, social foci are physical or virtual entities around which social individuals organize joint activities, for example, places and products (physical form) or opinions and services (virtual form). Forecasting which social foci will diffuse to more social individuals is important for managerial functions such as marketing and public management operations. In terms of diffusive social adoptions, prior studies on user adoptive behavior in social networks have focused on single-item adoption in homogeneous networks. We advance this body of research by modeling scenarios with multi-item adoption and learning the relative propagation of social foci in concurrent social diffusions for online social networking platforms. In particular, we distinguish two types of social nodes in our two-mode social network model: social foci and social actors. Based on social network theories, we identify and operationalize factors that drive social adoption within the two-mode social network. We also capture the interdependencies between social actors and social foci using a bilateral recursive process—specifically, a mutual reinforcement process that converges to an analytical form. Thus, we develop a gradient learning method based on a mutual reinforcement process that targets the optimal parameter configuration for pairwise ranking of social diffusions. Further, we demonstrate analytical properties of the proposed method such as guaranteed convergence and the convergence rate. In the evaluation, we benchmark the proposed method against prevalent methods, and we demonstrate its superior performance using three real-world data sets that cover the adoption of both physical and virtual entities in online social networking platforms.