Skip to content

Latest commit

 

History

History
119 lines (75 loc) · 3.45 KB

File metadata and controls

119 lines (75 loc) · 3.45 KB

ThreadFlow

Team Omni
Prajith Ravisankar, Srijan Ravisankar


What is ThreadFlow?

ThreadFlow is a visual pipeline builder for analyzing Reddit discussions. Users drag and drop nodes onto a canvas to build data processing workflows—similar to how n8n or Node-RED work, but focused on social media analysis.

The tool connects to a local database of Reddit comments and uses Google's Gemini AI to perform analysis tasks like sentiment detection, bot identification, and content summarization.


The Problem

Analyzing large amounts of social media comments manually is time-consuming. A single Reddit thread can have thousands of comments, and popular subreddits accumulate millions over time.

Researchers, journalists, or anyone trying to understand public sentiment on a topic faces two options:

  1. Read comments one by one (not practical at scale)
  2. Write custom scripts for each analysis task (requires programming knowledge)

ThreadFlow provides a middle ground: a visual interface where non-programmers can build analysis workflows by connecting nodes.


What ThreadFlow Does

Data Loading

  • Loads Reddit comments from a local DuckDB database
  • Supports search queries to find relevant discussions
  • Filters by comment score or keywords

AI Analysis (via Gemini API)

  • Sentiment Analysis: Classifies comments as positive, negative, or neutral
  • Bot Detection: Flags potentially automated or suspicious accounts
  • Evidence Extraction: Identifies factual claims in discussions
  • Summarization: Generates summaries of comment threads

Visualization

  • Data tables showing filtered results
  • 3D visualizations: Canada map (geographic distribution), political party breakdown, bar charts, pie charts

Pipeline Building

  • Drag-and-drop node interface
  • Connect nodes to create processing pipelines
  • Run pipelines with a single click
  • View results directly in the nodes

Technical Implementation

Frontend

  • Next.js 15 with React
  • React Flow for the node-based canvas
  • Three.js for 3D visualizations
  • TailwindCSS for styling

Backend

  • FastAPI (Python)
  • DuckDB for querying ~1.3GB of Reddit data locally
  • Google Gemini API for AI analysis

Data

  • r/Canada subreddit comments and threads (CSV format, ingested into DuckDB)

What ThreadFlow Does NOT Do

  • Does not scrape Reddit in real-time (uses a static dataset)
  • Does not deploy to the cloud (runs entirely on localhost)
  • Does not store user data or require authentication
  • Does not perform vector search or semantic similarity (uses keyword-based full-text search)
  • Does not guarantee AI accuracy (Gemini results depend on prompt quality and model limitations)

How to Run

  1. Set up the backend: install Python dependencies, add Gemini API key to .env, run python ingest.py to build the DuckDB database
  2. Start the backend: uvicorn main:app --port 8000
  3. Start the frontend: npm install && npm run dev
  4. Open http://localhost:3000

Full setup instructions are in readme.md.


Limitations

  • Dataset is limited to r/Canada subreddit
  • Gemini API has rate limits (15 requests/minute on free tier)
  • Large datasets may slow down the browser
  • 3D visualizations require WebGL support

Repository Structure

backend/     → FastAPI server, DuckDB queries, Gemini integration
frontend/    → Next.js app, React Flow canvas, 3D components
archive/     → Source CSV files for Reddit data

AI Collective Hackathon 2026