UmdTask430_Data605_Spring2026_txtai_for_market_research_1#452
Open
Gauravp2104 wants to merge 7 commits into
Open
UmdTask430_Data605_Spring2026_txtai_for_market_research_1#452Gauravp2104 wants to merge 7 commits into
Gauravp2104 wants to merge 7 commits into
Conversation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Architecture: - Hot tier: KeyDB for caching and live data - Warm tier: PostgreSQL + pgvector for filings and embeddings - Cold tier: MinIO for raw document archive Components: - Storage clients (KeyDB, PostgreSQL, MinIO) with connection pooling - FilingsManager for high-level warm tier operations - SEC EDGAR collector with full pipeline support - Data collectors for news, web, and social sources - Ingestion pipeline orchestrator - Docker Compose for local infrastructure Scripts: - run_sec_collector.py: CLI for SEC filings collection Infrastructure: - docker-compose.yml: KeyDB, pgvector, MinIO services - sql/init.sql: Database schema with vector indexes - .gitignore: Comprehensive exclusions for Python, IDE, secrets
Collaborator
|
This PR currently includes a very large number of unrelated file changes Your PR is expected to include only your own project folder under: Please remove unrelated repository-wide changes and ensure that your PR only contains the files required for your project submission. |
Adds an end-to-end agentic search system on top of the existing collector storage. New components: - app/agents/research_agent.py: streaming agent with router, SEC/news sub-agents, and an extractive synthesizer (LLM-pluggable). - app/api/server.py: FastAPI service with /research (sync) and /research/stream (SSE) endpoints. - app/ui/research.py: Streamlit UI showing live agent trace then collapsing to a clean answer + sources view. - scripts/eval_research.py: benchmark harness reporting p50/p95/p99 latency, routing accuracy, and retrieval health. - scripts/run_all_collectors.py, run_sec_bulk.py, backfill_txtai_from_chunks.py, check_storage_status.py: bulk collection + index utilities. Removes the social and web collectors; the project now sources data only from SEC EDGAR and news APIs (NewsAPI + Alpha Vantage). Updates the embeddings search() helper to expose per-chunk metadata (ticker, filing_type, filing_date) by joining txtai's data column. RUN_INSTRUCTIONS.md gains a quickstart section for new users covering docker-compose, collectors, API/UI bring-up, and the eval harness. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Updated README to reflect project focus and removed outdated architecture details.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Related to #430
Progress update 1
Next steps
Reviewers: @gpsaggese @protocorn
Assignee: @Gauravp2104 @SanjanaK1801