Background

Online extremist narratives can polarize communities and amplify misinformation. Traditional detection methods often overlook the relational dynamics of online discourse. This study introduces a hybrid BERT-GNN model that combines linguistic features with network structures to identify extremist content and map influence within BLM Twitter conversations.

Objectives

01

Develop and validate a tweet classification pipeline.

Manually labeled tweets (Extremist / Non-extremist) were used to fine-tune a BERT model to produce robust content labels across the dataset.
02

Construct and train a GNN for link prediction.

Construct a social graph of users and labeled tweets and train a GNN to predict which user–tweet pairs are likely to form engagement links.
03

Evaluate model performance and robustness.

Measure performance with metrics and run ablation studies to compare GNN configurations.

Methodology

Data Collection

Collect data using RapidAPI APIs or scraping tools like Tweety
Data Preprocessing

Clean and preprocess the collected data to ensure quality and consistency.
BERT-GNN

Use the BERT model to output labels then feed them into the GNN for training and link prediction.
Evaluation & Deployment

Evaluate the model performance and deploy it for real-time analysis.

Results

	Accuracy	Precision	Recall	F1 Score	PR AUC
GCN	0.71	0.471	0.567	0.508	0.758
GraphSAGE	0.887	0.815	0.999	0.898	0.925

The GraphSAGE model achieved the highest overall performance, with an accuracy of 0.887, F1-score of 0.898, and PR-AUC of 0.925, indicating strong capability in correctly identifying both positive and negative links within the graph. Its recall score of 0.999 suggests that nearly all true positive links were successfully detected, though the slightly lower precision (0.815) indicates a modest number of false positives. This behavior implies that GraphSAGE effectively generalizes the contextual and relational features learned from the BERT HateXplain embeddings, capturing fine-grained semantic cues and user–tweet relationships within the network.