Stanford CS230 Deep Learning Final Project Course Website: https://cs230.stanford.edu
The rapid spread of misinformation and fake news on online platforms poses a significant threat to public trust, social cohesion, and informed decision-making. As multimodal posts combining text and images become increasingly common, they often leverage striking visuals and emotionally charged headlines to boost engagement, making them more persuasive and challenging to detect with text-only models.
Traditional fake news detection systems focus primarily on textual analysis, overlooking the visual cues and cross-modal inconsistencies often present in misinformation. Recent studies demonstrate that combining text and image signals substantially improves detection accuracy.
This project aims to design and evaluate a neural network model that can determine whether a given multimodal post (image + text) represents fake news. Each input consists of an image paired with an associated text caption or title. We leverage the BLIP-2 Vision-Language Model (VLM) to explicitly model semantic consistency between text and image, enabling deep cross-modal understanding rather than treating the two modalities separately.
Original Data Repository: https://github.com/entitize/Fakeddit
-
Train Images: Google Drive Link Mapped via the 'id' column in the corresponding CSV file
-
Dev Images: Google Drive Link
-
Baseline Model Checkpoints: (to large to upload to Github) Google Drive Link
- [2-way] 0: True | 1: False
- [3-way] 0: True | 1: Fake with true text | 2: Fake with false text
- [6-way] 0: True | 1: Satire/Parody | 2: Misleading Content | 3: Imposter Content | 4: False Connection | 5: Manipulated Content
Contains the preprocessed dataset used for training and evaluation:
- Training and validation CSV files with image IDs and labels
- Split datasets for 2-way, 3-way, and 6-way classification tasks
train_images/anddev_images/folders containing the actual image files- Grid search subset for hyperparameter tuning
Scripts and notebooks for data preprocessing and preparation:
data_processing.ipynb: Jupyter notebook for data exploration and preprocessing pipelinedataProcess_util.py: Utility functions for data loading, cleaning, and transformation
Baseline model implementation using traditional multimodal approaches:
baseline_model.py: Core baseline model architecturefinetune_model.py: Fine-tuning scripts for the baseline modelevaluate_model.py: Evaluation scripts with comprehensive metricscheckpoints/: Saved model checkpoints from trainingevaluation_results/: Model performance results and metricsgrid_search_results/: Hyperparameter tuning resultslogs/: Training logs and TensorBoard files- Detailed documentation in
README_BASELINE.md
BLIP-2 Vision-Language Model implementation:
model.py: BLIP-2 based multimodal architectureblip2_extractor.py: Feature extraction using BLIP-2 embeddingstrain.py: Training pipeline for the BLIP-2 modelevaluate.py: Comprehensive evaluation with confusion matrices and metricsgrid_search.py: Automated hyperparameter searchconfig.py: Centralized configuration managementcheckpoints/: Saved BLIP-2 model checkpointslogs/: Training logs and experiment tracking- Batch scripts for running experiments (
train_all_full_data.bat,run_grid_search_all.bat)
analysis.ipynb: Data analysis and result visualization notebookconfusion_matrix_*_comparison.png: Confusion matrices comparing model performancemodel_comparison_3metrics.png: Comparative visualization of model metricstraining_validation_accuracy.png: Training curves and validation performance