Skip to content

UmdTask461_DATA605_Spring2026_Real_Time_Tweet_Sentiment_Analysis_Kafka#493

Open
aashishvinod wants to merge 5 commits into
gpsaggese:masterfrom
aashishvinod:UmdTask461_DATA605_Spring2026_Real_Time_Tweet_Sentiment_Analysis_Kafka_2
Open

UmdTask461_DATA605_Spring2026_Real_Time_Tweet_Sentiment_Analysis_Kafka#493
aashishvinod wants to merge 5 commits into
gpsaggese:masterfrom
aashishvinod:UmdTask461_DATA605_Spring2026_Real_Time_Tweet_Sentiment_Analysis_Kafka_2

Conversation

@aashishvinod
Copy link
Copy Markdown

Closes #461

This PR adds the Real-Time Tweet Sentiment Analysis project using Apache Kafka and HuggingFace Transformers.

Project: Real-Time Tweet Sentiment Analysis using Apache Kafka
Dataset: Sentiment140 (1.6M labeled tweets)
Model: cardiffnlp/twitter-roberta-base-sentiment (RoBERTa trained on 58M tweets)

What's included:

  • Kafka producer/consumer pipeline for streaming tweets
  • Pre-trained HuggingFace RoBERTa model for real-time sentiment classification
  • Live Streamlit dashboard with sentiment metrics, pie chart, and trend visualization
  • Comparative model analysis (RoBERTa vs DistilBERT)
  • Anomaly detection for sentiment spikes
  • Spark SQL analysis and aggregations
  • Working Docker setup with 3 containers (Zookeeper + Kafka + Jupyter)

Author: @aashishvinod
Reviewers: @gpsaggese @protocorn

@aashishvinod
Copy link
Copy Markdown
Author

Hi @protocorn ,

I have updated my project to align with the official Apache Kafka project description as requested. This new PR replaces the stock market pipeline with a real-time tweet sentiment analysis application using Apache Kafka and HuggingFace Transformers.

The project includes:

  • Kafka producer/consumer pipeline streaming tweets from Sentiment140 dataset
  • Pre-trained RoBERTa model (cardiffnlp/twitter-roberta-base-sentiment) for real-time classification
  • Live Streamlit dashboard with sentiment metrics, pie chart, and trend visualization
  • Comparative model analysis (RoBERTa vs DistilBERT)
  • Anomaly detection for sudden sentiment spikes
  • Spark SQL analysis and aggregations
  • Working Docker setup with 3 containers

Could you please review this PR and let me know if everything looks correct or if there is anything I should fix or improve? I want to make sure the submission meets all the requirements before the final deadline.

Thank you so much for your patience and guidance throughout this process. I really appreciate it!

@aashishvinod

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DATA605_Spring2026_Real_Time_Stock_Market_Pipeline_Kafka_Spark

1 participant