Skip to content

Susheyyy/Ask-PDF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

PDF Chatbot

SDP_LLM (June 2025)

A lightweight chatbot app that lets users upload a PDF and ask questions about its content using Google's Gemini LLM. It integrates PDF uploading, text extraction, embedding generation, vector storage, and Retrieval Q&A, all in one seamless pipeline.

⚝ How it works:

  1. PDF Upload and Text Extraction: User uploads a PDF file via a Gradio interface. The text is extracted using pdfplumber.
  2. Text Preprocessing: The text is split into manageable chunks using LangChain's CharacterTextSplitter.
  3. Embeddings & Vector Store: Each chunk is embedded using HuggingFaceEmbeddings. Chunks are stored in Chroma, a vector database. This allows for efficient semantic similarity searches when a user asks a question.
  4. Question Answering Chain: When a user inputs a question, the chatbot retrieves the most relevant text chunks from ChromaDB and sends them to the Gemini model(gemini-2.5-pro). LangChain’s RetrievalQA chain is used to handle this process.

⚝ Tech Stack:

  1. Programming Language: Python is used for building, training and deploying the model
  2. Libraries Used:
    ↣ Gradio: The trained model is integrated into a Gradio Interface
    ↣ pdfplumer: For extracting text from PDF files
    ↣ Langchain: For creating the retrieval-based QA chain
    ↣ chromaDB: Vector database used to store and retrieve text embeddings
  3. Large Language Model(LLM) : Google Generative AI (gemini-2.5-pro)
  4. Embeddings: HuggingFace Embedding - To convert text chunks into numerical vectors

⚝ Installation:

  • Install Dependencies:
 pip install langchain chromadb gradio google-generativeai pdfplumber transformers langchain-google-genai langchain-community 
  • Setting API Key:
 import os os.environ['GOOGLE_API_KEY'] = 'your_google_api_key_here' 

About

A lightweight chatbot app that lets users upload a PDF and ask questions about its content using Google's Gemini LLM.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors