TalkBank
diff --git a/‎.cargo/audit.toml‎
Lines changed: 9 additions & 0 deletions b/‎.cargo/audit.toml‎
Lines changed: 9 additions & 0 deletions
diff --git a/‎.claude/skills/add-command/SKILL.md‎
Lines changed: 176 additions & 0 deletions b/‎.claude/skills/add-command/SKILL.md‎
Lines changed: 176 additions & 0 deletions
diff --git a/‎.claude/skills/add-inference/SKILL.md‎
Lines changed: 182 additions & 0 deletions b/‎.claude/skills/add-inference/SKILL.md‎
Lines changed: 182 additions & 0 deletions
@@ -0,0 +1,9 @@
+# RustSec audit configuration
+# Used by rustsec/audit-check@v2 in CI
+
+[advisories]
+ignore = [
+    # rsa crate timing sidechannel (Marvin Attack). No patch available.
+    # Transitive dep — not exploitable in our context.
+    "RUSTSEC-2023-0071",
+]
@@ -0,0 +1,176 @@
+---
+name: add-command
+description: Scaffold a new batchalign3 CLI command end-to-end (Rust CLI + server orchestration + Python worker integration). Use when adding a new analysis/processing command.
+disable-model-invocation: true
+allowed-tools: Bash, Read, Write, Edit, Glob, Grep, Agent
+---
+
+# Add a New Batchalign3 Command
+
+Scaffold a new CLI command through all layers. `$ARGUMENTS` should specify the command name and description (e.g., `/add-command sentiment "Sentiment analysis on utterances"`).
+
+## Architecture
+
+```
+CLI (batchalign-cli) → Server (batchalign-app) → Worker IPC → Python inference module
+```
+
+## Step 1: Determine Command Type
+
+| Type | Example | Python Worker? | Orchestration |
+|------|---------|---------------|---------------|
+| **ML inference** | morphotag, utseg, translate | Yes — needs inference module | Server extracts words → worker infers → server injects results |
+| **Audio processing** | transcribe, align | Yes — needs inference module | Server sends audio path → worker returns segments |
+| **File processing** | opensmile, avqi | Yes — uses `process` op | Worker processes entire file |
+| **Rust-only** | validate, normalize | No | Pure Rust, no worker needed |
+
+## Step 2: Add CLI Subcommand
+
+**File:** `crates/batchalign-cli/src/cli/args.rs`
+
+```bash
+grep -n "enum Commands" crates/batchalign-cli/src/cli/args.rs
+```
+
+Add a new variant to the `Commands` enum with clap attributes.
+
+**File:** `crates/batchalign-cli/src/commands/`
+
+Create a new module for the command dispatch logic. Read an existing command for the pattern:
+
+```bash
+ls crates/batchalign-cli/src/commands/
+```
+
+## Step 3: Add Server Orchestration (if ML inference)
+
+**File:** `crates/batchalign-app/src/`
+
+Create an orchestration module that:
+1. Parses CHAT using `batchalign-chat-ops`
+2. Extracts relevant data (words, audio paths, etc.)
+3. Calls the Python worker via `batch_infer` IPC
+4. Injects results back into the CHAT AST
+5. Serializes the modified CHAT
+
+Read existing orchestrators for the pattern:
+
+```bash
+ls crates/batchalign-app/src/*.rs
+```
+
+## Step 4: Add Worker Types
+
+**File:** `batchalign/worker/_types.py`
+
+Add a new `InferTask` variant matching the Rust enum:
+
+```bash
+grep -n "class InferTask" batchalign/worker/_types.py
+```
+
+Add Pydantic request/response models for the new task's input/output.
+
+**Rust side:** Add matching types to `batchalign-types::worker` (if it exists in the workspace).
+
+## Step 5: Add Inference Module (if ML inference)
+
+**File:** `batchalign/inference/<name>.py`
+
+Create a new inference module following the pattern:
+
+```python
+def load_<name>_model(lang: str) -> ModelType:
+    """Load the ML model. Called once at worker startup."""
+    ...
+
+def batch_infer_<name>(model: ModelType, items: list[InputType]) -> list[OutputType]:
+    """Pure inference function. No CHAT, no domain logic."""
+    ...
+```
+
+Read existing inference modules for the exact pattern:
+
+```bash
+ls batchalign/inference/
+head -40 batchalign/inference/morphosyntax.py
+```
+
+Key rules:
+- Heavy imports (torch, stanza) must be lazy
+- Return raw model output — no CHAT text processing
+- Use Pydantic models for structured I/O
+- Type annotations on all functions
+
+## Step 6: Wire Worker Dispatch
+
+**File:** `batchalign/worker/_infer.py`
+
+Add a case to the `batch_infer` dispatch router:
+
+```bash
+grep -n "def batch_infer" batchalign/worker/_infer.py
+```
+
+**File:** `batchalign/worker/_main.py`
+
+Add model loading for the new command:
+
+```bash
+grep -n "def load_models" batchalign/worker/_main.py
+```
+
+## Step 7: Add CHAT Operations (if needed)
+
+**File:** `crates/batchalign-chat-ops/src/`
+
+If the command needs to extract data from or inject results into CHAT:
+
+```bash
+ls crates/batchalign-chat-ops/src/
+```
+
+Follow existing extraction/injection patterns. Use the content walker for AST traversal.
+
+## Step 8: Add Tests
+
+```bash
+# Python inference test
+cat > batchalign/tests/test_<name>.py
+
+# Rust integration test
+# Add to crates/batchalign-app/tests/
+
+# Worker protocol test (manually)
+uv run python -m batchalign.worker --command <name> --lang eng
+# Then paste: {"op": "capabilities", "id": "test-1"}
+```
+
+## Step 9: Verify
+
+```bash
+# Python tests
+cd $REPO_ROOT && uv run pytest batchalign/tests/test_<name>.py -v
+
+# Rust compile
+cd $REPO_ROOT && cargo check --workspace
+
+# Rust tests
+cd $REPO_ROOT && cargo test --workspace
+
+# Type check
+cd $REPO_ROOT && uv run mypy batchalign/inference/<name>.py batchalign/worker/
+```
+
+## Key Files
+
+| Purpose | Path |
+|---------|------|
+| CLI args definition | `crates/batchalign-cli/src/cli/args.rs` |
+| CLI command dispatch | `crates/batchalign-cli/src/commands/` |
+| Server orchestration | `crates/batchalign-app/src/` |
+| Worker types (Pydantic) | `batchalign/worker/_types.py` |
+| Worker dispatch | `batchalign/worker/_infer.py` |
+| Worker model loading | `batchalign/worker/_main.py` |
+| Inference modules | `batchalign/inference/` |
+| CHAT operations | `crates/batchalign-chat-ops/src/` |
@@ -0,0 +1,182 @@
+---
+name: add-inference
+description: Add a new Python ML inference module for a new model or task. Use when adding a new ML backend (e.g., a new ASR engine, new FA model, new NLP task).
+disable-model-invocation: true
+allowed-tools: Bash, Read, Write, Edit, Glob, Grep, Agent
+---
+
+# Add a New Inference Module
+
+Create a new Python ML inference module. `$ARGUMENTS` should describe the module (e.g., `/add-inference "wav2vec2 ASR for Mandarin"`).
+
+## Architecture
+
+Each inference module is a **pure inference function**: receives structured input, runs ML model, returns structured output. No CHAT parsing, no pipeline orchestration, no domain logic.
+
+```
+Worker (_infer.py) → inference/<module>.py → ML library (torch, stanza, etc.) → structured output
+```
+
+## Step 1: Understand the Pattern
+
+Read existing inference modules to understand the conventions:
+
+```bash
+ls batchalign/inference/
+head -60 batchalign/inference/morphosyntax.py
+head -60 batchalign/inference/fa.py
+```
+
+Every inference module has:
+1. **Model loading function** — called once at worker startup
+2. **Inference function** — called per-batch, pure computation
+3. **Pydantic types** — for structured I/O at IPC boundary
+
+## Step 2: Define Types
+
+**File:** `batchalign/worker/_types.py`
+
+Add Pydantic models for the request and response. These mirror Rust types across the IPC boundary.
+
+```bash
+grep -n "class.*BaseModel" batchalign/worker/_types.py | head -20
+```
+
+Rules:
+- Use domain types from `batchalign/inference/_domain_types.py` (AudioPath, TimestampMs, etc.)
+- All fields must have type annotations
+- No `Any` or `object` types
+
+If this is a new InferTask variant, add it to the `InferTask` enum:
+
+```bash
+grep -n "class InferTask" batchalign/worker/_types.py
+```
+
+## Step 3: Create the Inference Module
+
+**File:** `batchalign/inference/<name>.py`
+
+Template:
+
+```python
+"""<Name> inference module.
+
+Receives structured input, runs ML model, returns raw output.
+No CHAT parsing, no text processing, no domain logic.
+"""
+
+from __future__ import annotations
+
+from typing import TYPE_CHECKING
+
+if TYPE_CHECKING:
+    pass  # Import heavy ML types here
+
+from batchalign.inference._domain_types import ...
+
+
+def load_<name>_model(lang: str) -> <ModelType>:
+    """Load the ML model for the given language.
+
+    Called once at worker startup. Heavy imports go here.
+    """
+    import torch  # Lazy import
+    ...
+
+
+def batch_infer_<name>(
+    model: <ModelType>,
+    items: list[<InputType>],
+) -> list[<OutputType>]:
+    """Run inference on a batch of items.
+
+    Pure computation — no CHAT text, no side effects.
+    """
+    ...
+```
+
+Key rules:
+- **Lazy imports** for heavy libraries (torch, stanza, transformers) — put them inside the function
+- **No CHAT text** — receive extracted words/audio, return structured results
+- **Type annotations** on all functions and variables
+- **No `Any`** — use specific types, `TYPE_CHECKING` guards for expensive imports
+- **Pydantic models** at IPC boundaries
+
+## Step 4: Wire into Worker
+
+### Model loading
+
+**File:** `batchalign/worker/_main.py`
+
+Add model loading for the new module:
+
+```bash
+grep -n "def load_models" batchalign/worker/_main.py
+```
+
+### Dispatch
+
+**File:** `batchalign/worker/_infer.py`
+
+Add a case to route the new InferTask to your inference function:
+
+```bash
+grep -n "def batch_infer" batchalign/worker/_infer.py
+```
+
+## Step 5: Add to Worker Capabilities
+
+**File:** `batchalign/worker/_handlers.py`
+
+Ensure the new task appears in the capabilities response:
+
+```bash
+grep -n "capabilities" batchalign/worker/_handlers.py
+```
+
+## Step 6: Add Tests
+
+```python
+# batchalign/tests/test_<name>.py
+"""Tests for <name> inference module."""
+
+def test_<name>_basic():
+    """Test basic inference with minimal input."""
+    ...
+```
+
+Rules:
+- No mocks (`unittest.mock` is banned)
+- Use real models or skip if not available (`pytest.mark.skipif`)
+- Test with minimal valid input
+- Verify output types match Pydantic models
+
+## Step 7: Verify
+
+```bash
+# Run the new tests
+uv run pytest batchalign/tests/test_<name>.py -v
+
+# Type check
+uv run mypy batchalign/inference/<name>.py
+
+# Test worker starts with new command
+uv run python -m batchalign.worker --command <name> --lang eng
+# Should print {"ready": true, ...}
+
+# Full test suite
+uv run pytest batchalign --disable-pytest-warnings -x -q
+```
+
+## Key Files
+
+| Purpose | Path |
+|---------|------|
+| Existing inference modules | `batchalign/inference/` |
+| Domain type aliases | `batchalign/inference/_domain_types.py` |
+| Worker types (Pydantic) | `batchalign/worker/_types.py` |
+| Worker dispatch router | `batchalign/worker/_infer.py` |
+| Worker model loading | `batchalign/worker/_main.py` |
+| Worker capabilities | `batchalign/worker/_handlers.py` |
+| HK/Cantonese engines | `batchalign/inference/hk/` |