A Retrieval-Augmented Generation (RAG) based AI teaching assistant that helps students find specific content within video lectures using semantic search and natural language queries.
- Automatic video subtitle extraction and processing
- Semantic search using BGE-M3 embeddings
- Context-aware answers with precise video timestamps
- Integration with both local LLMs (Ollama) and OpenAI GPT models
- Efficient vector similarity search using cosine similarity
- Python: Core programming language
- Embeddings: BGE-M3 (via Ollama)
- Vector Search: Cosine Similarity, NumPy
- LLM: Llama 3.2 (Ollama) / GPT-4 (OpenAI)(highly recommended)
- Libraries: Pandas, Scikit-learn, Joblib, Requests
- Video Processing: Convert video files to MP3 audio format
- Transcription: Extract subtitles/transcripts to JSON format
- Embedding Generation: Create vector embeddings for each subtitle chunk
- Query Processing: Convert user questions to embeddings
- Retrieval: Find top-5 most relevant video segments using cosine similarity
- Response Generation: LLM generates contextual answers with timestamps
Move all your video files to the videos folder
Run video_to_mp3.py to convert video files to audio format
Run mp3_to_json.py to generate subtitle JSON files
Run preprocess_json.py to convert JSON files to vector embeddings and save as embedding.joblib
Execute the main script to start querying your video content
Ask a Question: How to create a responsive navbar?
Thinking...
This topic is covered in Video 15: "Building Navigation Bar" at timestamp 3:45 to 8:20...