MathMex is a web application for mathematical search, powered by OpenSearch and machine learning. It consists of a Flask backend, React frontend, and a data pipeline that transforms TSVs into searchable indices.
- Python 3.8+ — Backend and data pipeline
- Node.js 18+ — Frontend
- Docker — OpenSearch (or use a remote instance)
- formula-search (optional) — For TangentCFT/formula search; add as git submodule
mathmex/
├── apps/
│ ├── backend/ # Flask API server
│ ├── data-processing/ # TSV → vectors → JSONL pipeline
│ ├── opensearch/ # OpenSearch scripts, schemas, docker-compose
│ └── frontend/ # React UI
├── bin/ # Shell scripts (run, stop, install, process)
└── data/ # TSVs, vectors, JSONL (gitignored)
See each app's README for details: backend, data-processing, opensearch, frontend.
Run all commands from the project root.
cp config.ini.example config.ini
cp .env.example .envEdit config.ini:
- [opensearch] — Host, username, password. For local Docker, use
host = localhostand credentials matching your.env. - [general] —
model= path to sentence-transformers model (local path or HuggingFace ID, e.g.sentence-transformers/all-mpnet-base-v2).
Edit .env:
OPENSEARCH_INITIAL_ADMIN_PASSWORD— Bootstrap password for the OpenSearch container. Thebin/scripts pass this via--env-filewhen running docker compose. Setconfig.ini[opensearch] password to match.VITE_API_BASE— (Optional) For local frontend dev, set tohttp://localhost:5001to point at the backend. Omit for production.
bin/install.shThis builds the frontend and OpenSearch Docker image.
Start OpenSearch and the backend:
bin/run.shOr manually:
cd apps/opensearch && docker compose --env-file ../../.env up -d
python apps/backend/app.py(The --env-file loads OPENSEARCH_INITIAL_ADMIN_PASSWORD from .env.)
Start the frontend (dev mode):
cd apps/frontend && npm run devOpen the app (typically http://localhost:5173 for Vite).
To add searchable content:
- Add a TSV to
data/tsvs/(format:title<TAB>description<TAB>url, no header). - Run:
bin/process.sh SOURCE TSV_FILE - Index:
python apps/opensearch/scripts/bulk_index.py SOURCE
Or combine steps 2 and 3: bin/process.sh SOURCE TSV_FILE --index
Example: bin/process.sh wikipedia final_wikipedia.tsv --index
For full search (TangentCFT, LateFusion), add the formula-search repo as a submodule.
First-time setup (run from project root):
For a fresh setup:
git submodule add <formula-search-repo-url> formula-search
git submodule update --init --recursiveClone mathmex with submodule:
git clone --recurse-submodules <mathmex-repo-url>Existing clone (submodule not initialized):
git submodule update --init --recursive| Command | Purpose |
|---|---|
bin/install.sh |
Build frontend and OpenSearch image |
bin/run.sh |
Start OpenSearch + backend |
bin/stop.sh |
Stop services, remove build artifacts |
bin/restart.sh |
Stop → install → run |
bin/process.sh SOURCE TSV [--index] |
Process data, optionally index |
python apps/backend/app.py |
Run backend only |
python apps/opensearch/scripts/bulk_index.py SOURCE |
Bulk index JSONL |
cd apps/frontend && npm run dev |
Frontend dev server |
- Fork the repository
- Create a branch:
git checkout -b my-feature - Make changes; tests are encouraged
- Commit:
git commit -m "Description" - Push:
git push origin my-feature - Open a Pull Request