ThreadFlow

Team Omni
Prajith Ravisankar, Srijan Ravisankar

What is ThreadFlow?

ThreadFlow is a visual pipeline builder for analyzing Reddit discussions. Users drag and drop nodes onto a canvas to build data processing workflows—similar to how n8n or Node-RED work, but focused on social media analysis.

The tool connects to a local database of Reddit comments and uses Google's Gemini AI to perform analysis tasks like sentiment detection, bot identification, and content summarization.

The Problem

Analyzing large amounts of social media comments manually is time-consuming. A single Reddit thread can have thousands of comments, and popular subreddits accumulate millions over time.

Researchers, journalists, or anyone trying to understand public sentiment on a topic faces two options:

Read comments one by one (not practical at scale)
Write custom scripts for each analysis task (requires programming knowledge)

ThreadFlow provides a middle ground: a visual interface where non-programmers can build analysis workflows by connecting nodes.

What ThreadFlow Does

Data Loading

Loads Reddit comments from a local DuckDB database
Supports search queries to find relevant discussions
Filters by comment score or keywords

AI Analysis (via Gemini API)

Sentiment Analysis: Classifies comments as positive, negative, or neutral
Bot Detection: Flags potentially automated or suspicious accounts
Evidence Extraction: Identifies factual claims in discussions
Summarization: Generates summaries of comment threads

Visualization

Data tables showing filtered results
3D visualizations: Canada map (geographic distribution), political party breakdown, bar charts, pie charts

Pipeline Building

Drag-and-drop node interface
Connect nodes to create processing pipelines
Run pipelines with a single click
View results directly in the nodes

Technical Implementation

Frontend

Next.js 15 with React
React Flow for the node-based canvas
Three.js for 3D visualizations
TailwindCSS for styling

Backend

FastAPI (Python)
DuckDB for querying ~1.3GB of Reddit data locally
Google Gemini API for AI analysis

Data

r/Canada subreddit comments and threads (CSV format, ingested into DuckDB)

What ThreadFlow Does NOT Do

Does not scrape Reddit in real-time (uses a static dataset)
Does not deploy to the cloud (runs entirely on localhost)
Does not store user data or require authentication
Does not perform vector search or semantic similarity (uses keyword-based full-text search)
Does not guarantee AI accuracy (Gemini results depend on prompt quality and model limitations)

How to Run

Set up the backend: install Python dependencies, add Gemini API key to .env, run python ingest.py to build the DuckDB database
Start the backend: uvicorn main:app --port 8000
Start the frontend: npm install && npm run dev
Open http://localhost:3000

Full setup instructions are in readme.md.

Limitations

Dataset is limited to r/Canada subreddit
Gemini API has rate limits (15 requests/minute on free tier)
Large datasets may slow down the browser
3D visualizations require WebGL support

Repository Structure

backend/     → FastAPI server, DuckDB queries, Gemini integration
frontend/    → Next.js app, React Flow canvas, 3D components
archive/     → Source CSV files for Reddit data

AI Collective Hackathon 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ThreadFlow

What is ThreadFlow?

The Problem

What ThreadFlow Does

Data Loading

AI Analysis (via Gemini API)

Visualization

Pipeline Building

Technical Implementation

Frontend

Backend

Data

What ThreadFlow Does NOT Do

How to Run

Limitations

Repository Structure

FilesExpand file tree

writeup.md

Latest commit

History

writeup.md

File metadata and controls

ThreadFlow

What is ThreadFlow?

The Problem

What ThreadFlow Does

Data Loading

AI Analysis (via Gemini API)

Visualization

Pipeline Building

Technical Implementation

Frontend

Backend

Data

What ThreadFlow Does NOT Do

How to Run

Limitations

Repository Structure