UmdTask430_Data605_Spring2026_txtai_for_market_research_1 by Gauravp2104 · Pull Request #452 · gpsaggese/gpsaggese.github.io

Gauravp2104 · 2026-04-01T17:46:30Z

Related to #430

Progress update 1

Project template files (Dockerfile, requirements.txt, shell scripts)
README with architecture overview and setup instructions

Next steps

Implement txtai embeddings pipeline
Add data ingestion tools (NewsAPI, SEC EDGAR, web scraper)
Build out individual agents (sentiment, diligence, web research, earnings, regulatory)

Reviewers: @gpsaggese @protocorn
Assignee: @Gauravp2104 @SanjanaK1801

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Architecture: - Hot tier: KeyDB for caching and live data - Warm tier: PostgreSQL + pgvector for filings and embeddings - Cold tier: MinIO for raw document archive Components: - Storage clients (KeyDB, PostgreSQL, MinIO) with connection pooling - FilingsManager for high-level warm tier operations - SEC EDGAR collector with full pipeline support - Data collectors for news, web, and social sources - Ingestion pipeline orchestrator - Docker Compose for local infrastructure Scripts: - run_sec_collector.py: CLI for SEC filings collection Infrastructure: - docker-compose.yml: KeyDB, pgvector, MinIO services - sql/init.sql: Database schema with vector indexes - .gitignore: Comprehensive exclusions for Python, IDE, secrets

protocorn · 2026-04-16T19:03:13Z

This PR currently includes a very large number of unrelated file changes

Your PR is expected to include only your own project folder under:
class_project/data605/Spring2026/projects/

Please remove unrelated repository-wide changes and ensure that your PR only contains the files required for your project submission.

Adds an end-to-end agentic search system on top of the existing collector storage. New components: - app/agents/research_agent.py: streaming agent with router, SEC/news sub-agents, and an extractive synthesizer (LLM-pluggable). - app/api/server.py: FastAPI service with /research (sync) and /research/stream (SSE) endpoints. - app/ui/research.py: Streamlit UI showing live agent trace then collapsing to a clean answer + sources view. - scripts/eval_research.py: benchmark harness reporting p50/p95/p99 latency, routing accuracy, and retrieval health. - scripts/run_all_collectors.py, run_sec_bulk.py, backfill_txtai_from_chunks.py, check_storage_status.py: bulk collection + index utilities. Removes the social and web collectors; the project now sources data only from SEC EDGAR and news APIs (NewsAPI + Alpha Vantage). Updates the embeddings search() helper to expose per-chunk metadata (ticker, filing_type, filing_date) by joining txtai's data column. RUN_INSTRUCTIONS.md gains a quickstart section for new users covering docker-compose, collectors, API/UI bring-up, and the eval harness. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Updated README to reflect project focus and removed outdated architecture details.

Gaurav Prakash and others added 3 commits April 1, 2026 10:22

UmdTask430_DATA605_Spring2026_txtai_for_market_research_1

ec97843

Update helpers_root submodule

a09f282

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Gaurav Prakash and others added 4 commits May 7, 2026 01:47

Modified readme

933879e

Added earning transcript collector

a3c405e

Updated README to reflect project focus and removed outdated architecture details.

Changes: 1.Final clean up

4e7826f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UmdTask430_Data605_Spring2026_txtai_for_market_research_1#452

UmdTask430_Data605_Spring2026_txtai_for_market_research_1#452
Gauravp2104 wants to merge 7 commits into
gpsaggese:masterfrom
Gauravp2104:UmdTask430_DATA605_Spring2026_txtai_for_market_research_1

Gauravp2104 commented Apr 1, 2026

Uh oh!

protocorn commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Gauravp2104 commented Apr 1, 2026

Progress update 1

Next steps

Uh oh!

protocorn commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants