Skip to content
View bettyguo's full-sized avatar
🧠
🎓 Final-year PhD: 90% caffeine ☕, 10% gradient descent 🤖. Say hi! 👋
🧠
🎓 Final-year PhD: 90% caffeine ☕, 10% gradient descent 🤖. Say hi! 👋

Sponsoring

@nlohmann
@yamadashy
@legesher
@kyegomez

Block or report bettyguo

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
bettyguo/README.md

Dongxin (Betty) Guo

Final-year PhD @ HKU CS  ·  Hong Kong
I prove the architectural limits of LLM reasoning, and build the systems that route around them.

Homepage Google Scholar ORCID OpenReview LinkedIn X / Twitter Email

Status


About

I'm a final-year PhD candidate in the Department of Computer Science at The University of Hong Kong, advised by Prof. Siu-Ming Yiu. My research sits at the intersection of three threads that keep refusing to be separate:

  • What transformers can actually reason about. Tight architectural bounds, plus the tool-delegation systems those bounds force you to build.
  • Trustworthy LLMs in regulated settings. Compliance-grade explainability, distribution-free coverage, atomic claim verification.
  • Serving infrastructure that respects both. Workflow-atomic GPU scheduling with per-tenant fairness guarantees.

Theorems tell you what cannot be done. Systems make precise what can.

The cycle runs both ways: deployment surfaces the limits worth proving, and the proofs become the constraints that keep deployment honest.


🎉 News

  • [05.2026] 🎉🎉🎉 Two papers accepted to ICML 2026: The Deterministic Horizon: When Extended Reasoning Fails and Tool Delegation Becomes Necessary (Main) and Current XAI Methods Cannot Satisfy Financial AI Explainability Requirements (Position Track).
  • [05.2026] ✨ On the postdoc market for Fall 2026. Trustworthy / compliance-grade AI, multi-agent systems & mechanism design, LLM theory, and serving systems. Reach out at bettyguo@connect.hku.hk.
  • [05.2026] 📝 Serving as a reviewer for NeurIPS, ACM Multimedia (Main & Dataset Tracks), and UAI.
  • [04.2026] 🏆🏆🏆 Four papers accepted to ACL 2026 Industry Track: FinGround (atomic claim verification), RouteNLP (conformal LLM routing), AgentEval (DAG-structured agent evaluation), and ComplianceNLP (KG-augmented regulatory gap detection).
  • [04.2026] 🚀🚀🚀 SAGA accepted to HPDC 2026. It's a workflow-atomic scheduler for AI agent inference on GPU clusters, with per-tenant fairness guarantees that hold under real multi-tenant load.
  • [03.2026] 📣 Adaptive Retrieval for Large Reasoning Models accepted to SIGIR 2026. When to retrieve during reasoning, with bounds, not heuristics.
  • [02.2026] 💼 Conformal-bound risk management at Brain Investing is now running against live P&L. That's our HKU FinTech spin-out, and the lab's coverage work has finally made it onto a real trading book.
  • [01.2026] 🛠️ Shipped multi-tenant scheduling and conformal-coverage pipelines at Stellaris AI for native-safe foundation-model deployment in regulated industries.
  • [09.2025] 🎓 Began the final year of PhD at HKU CS, advised by Prof. Siu-Ming Yiu. Thesis focuses on the theory-meets-deployment cycle: bounds on transformer reasoning, and the systems those bounds force.
  • [08.2025] 🏅 Continuing Cyberport Incubation (2023–2025 intake). That keeps an unbroken 2018–2025 funding run going across TSSSU, HKSTP Incu-Tech, HKU iAXON Deep Tech, and Cyberport.

Theory. Production. Curation.

Eight ICML / SIGIR / ACL / HPDC papers this cycle. Conformal-bound risk on a live trading book. HMAC-signed agent memory in Rust. A 60-paper survey atlas of LLM reasoning theory. Built across HKU CS, Stellaris AI, and Brain Investing.


⚡   At a Glance

8

papers, 2026 cycle
ICML × 2  ·  SIGIR  ·  HPDC
ACL Industry × 4

54

original OSS repos
15 research  ·  8 MCP  ·  6 agents
5 benchmarks  ·  8 tools  ·  12 curated

2

in production
Stellaris AI  ·  Brain Investing
conformal-bound risk on live P&L

10

years of continuous funding
TSSSU  ·  HKSTP Incu-Tech
Cyberport (×2)  ·  iAXON

HKU CS Stellaris AI Brain Investing

Python PyTorch C++ Go Rust TypeScript Jupyter DuckDB Next.js OpenTelemetry MCP


🌟   Showcase

Four projects worth a second look:

ReaLM-Retrieve · SIGIR 2026. When to retrieve during reasoning, with bounds rather than heuristics. Highest-cloned repo in this account.

Python  ·  ⭐ 18  ·  🍴 7  ·  most-cloned

🚀   SAGA

HPDC 2026. Workflow-atomic GPU-cluster scheduler for AI agents. Within 1.31× of Bélády-optimal KV-cache eviction, with OpenMP-accelerated C++ kernels and LangChain / AutoGen / CrewAI bridges.

Python C++  ·  concrete-metric flagship

🧬   Vannevar

Open-source agentic harness with citation-grade memory: source URI, temporal validity window, append-only provenance ledger. MCP-native, multi-frontend, fully self-hostable.

Rust  ·  flagship infrastructure

🔐   agent-memory

Verifiable memory for LLM agents. Every recalled claim is HMAC-signed back to its originating trajectory span.

Python  ·  cryptographically grounded


📚   Selected Publications

Paper Venue Code
The Deterministic Horizon: When Extended Reasoning Fails and Tool Delegation Becomes Necessary ICML 2026 deterministic-horizon
When to Retrieve During Reasoning: Adaptive Retrieval for Large Reasoning Models SIGIR 2026 realm-retrieve
Current XAI Methods Cannot Satisfy Financial AI Explainability Requirements ICML 2026 Position position paper
SAGA: Workflow-Atomic Scheduling for AI Agent Inference on GPU Clusters HPDC 2026 SAGA
FinGround: Atomic Claim Verification for Financial LLM Outputs ACL 2026 Industry FinGround
ComplianceNLP: KG-Augmented Regulatory Gap Detection ACL 2026 Industry ComplianceNLP
RouteNLP: Conformal LLM Routing ACL 2026 Industry RouteNLP
AgentEval: DAG-Structured Agent Evaluation ACL 2026 Industry AgentEval

Full publication list, PDFs, and BibTeX at bettyguo.github.io.


🧭   Research Threads

Three lines that keep crossing in our papers. Each thread proves a bound and ships the system that meets it.

🧠   Reasoning & tool use

What softmax attention can realize at inference time, and what it provably cannot. The matching upper and lower bounds become the spec for the tool-delegation layer above them.

📄   The Deterministic Horizon  ·  ICML 2026  ·  Adaptive Retrieval for Large Reasoning Models  ·  SIGIR 2026  ·  code: deterministic-horizon, realm-retrieve

🛡️   Trustworthy LLMs for regulated settings

Explainability and verification that survive financial-services audit, not benchmark conditions. Distribution-free coverage, atomic claim verification, knowledge-graph-augmented regulatory gap detection.

📄   Current XAI Methods Cannot Satisfy Financial AI Explainability Requirements  ·  ICML 2026 Position  ·  FinGround, ComplianceNLP  ·  ACL 2026 Industry × 2  ·  code: FinGround, ComplianceNLP, TrustKGRAG

⚡   Serving & agent infrastructure

Workflow-atomic GPU scheduling with per-tenant fairness guarantees that hold under real multi-tenant load. DAG-structured evaluation harnesses and conformal routing for agent cascades.

📄   SAGA  ·  HPDC 2026  ·  RouteNLP, AgentEval  ·  ACL 2026 Industry × 2  ·  code: SAGA, RouteNLP, AgentEval


📐   Method, in four habits

How we approach problems, across every thread:

  1. Tight bounds with explicit constants. Upper and lower bounds in the same paper. No asymptotic hand-waving.
  2. Impossibility paired with construction. When a thing can't be done, that result becomes a design constraint, not a stopping point.
  3. Guarantees that survive reality. Distribution-free coverage, conformal prediction, fair scheduling. No idealized assumptions.
  4. Theory and the system that meets it, shipped together. The proof tells the algorithm what to achieve; the algorithm tells the proof what's worth bounding.

"Theorems tell you what cannot be done. Systems make precise what can."


🗂️   What lives in this account

54 original public repos. Research code behind every paper, plus the developer infrastructure our team relies on every day across HKU CS, Stellaris AI, and Brain Investing.
Browse the full index → github.com/bettyguo?tab=repositories

🔬
15
research
🔌
8
MCP servers
🤖
6
agent systems
🧪
5
benchmarks
🛠️
8
dev tools
📚
12
atlases & lists

🔬   Research code

One repo per paper. Theory and the system that meets it, in the same artifact.

Reasoning & retrieval

Repo What it is
deterministic-horizon ICML '26 companion. Bounds on extended reasoning, and the regime where tool delegation becomes necessary. Explicit constants.
realm-retrieve ReaLM-Retrieve · SIGIR '26 companion. When to retrieve during reasoning, with bounds rather than heuristics.

Serving & agent infrastructure

Repo What it is
SAGA HPDC '26 companion. Workflow-atomic GPU-cluster scheduler. Within 1.31× of Bélády-optimal KV-cache eviction, with OpenMP-accelerated C++ kernels and LangChain / AutoGen / CrewAI bridges.
RouteNLP ACL '26 Industry companion. Conformal-coverage router for LLM cascade serving.
AgentEval ACL '26 Industry companion. DAG-structured evaluation harness for multi-step agents.

Trustworthy & regulated AI

Repo What it is
FinGround ACL '26 Industry companion. Atomic claim verification for financial LLM outputs.
ComplianceNLP ACL '26 Industry companion. KG-augmented regulatory gap detection.
TrustKGRAG Probabilistic certified robustness and anomaly detection against knowledge-graph poisoning in RAG.
conformalized-neural-operators Distribution-free, spatially adaptive UQ for neural-operator PDE surrogates via physics-informed conformal prediction.

Theory & foundations

Repo What it is
SafeAnchor Safety-preserving continual domain adaptation of LLMs via Fisher-based subspace identification and orthogonal gradient projection.
SigGate-GT Sigmoid-gated attention for graph transformers. Eliminates over-smoothing and stabilizes training via element-wise output gating.
pac-learned-index PAC learning with tight VC-dimension bounds and provable sample-complexity guarantees for learned database indexes.
JoinPAC PAC learnability for join cardinality estimation. Decomposition bounds, drift detection, hybrid-estimation guarantees.
neural-precond-spectral Spectral-equivalence theory with mesh-independent convergence bounds for neural-operator preconditioning of PDE systems.
sae-brain-topography Sparse-autoencoder decomposition of brain–LLM alignment with a priori cortical semantic topography mapping.

🔌   MCP servers

Eight live integrations across our research workflow: code, data, papers, knowledge bases.

Repo What it is Lang
mcp-gateway Any OpenAPI 3.x spec into a Model Context Protocol server. Auth, rate-limiting, OpenTelemetry baked in. Go
mcp-postgres Postgres MCP server for agents. Four-tier safety: role grants, pglast AST guard, per-tx envelope, audit log. Schema introspection, EXPLAIN analysis, pgvector. PG 13 to 17. Python
mcp-jupyter MCP server for Jupyter. Live kernel state (variables, dataframes, plots, tracebacks) instead of just the .ipynb JSON. Python
mcp-wandb-2 Analytical MCP server for Weights & Biases: hparam importance, sweep summaries, run-delta analysis, inline charts, gated Launch actions. Python
paperbase-mcp Research-grade MCP composing arXiv, Semantic Scholar, and OpenAlex. Related work, citation graphs, BibTeX in your chat. Python
mcp-overleaf MCP server and Skills bundle for finishing a LaTeX paper: bib cleanup, venue rule packs, latexdiff, related-work drafting. Python
obsidian_mcp MCP plus 7 Claude skills for Obsidian vaults. Read, search, write, and link notes from Claude / Cursor / ChatGPT. Filesystem-direct, local-first, round-trip safe. Python
semantic-grep Local semantic code search. CLI and MCP server, all on your machine. pre-alpha Python

🤖   Agent systems & runtimes

Local-first when possible; verifiable when not.

Repo What it is Lang
Vannevar Open-source agentic harness with citation-grade memory. Every fact carries a source URI, a temporal validity window, and an append-only provenance ledger. MCP-native, multi-frontend, fully self-hostable. Rust
agent-memory Verifiable memory for LLM agents. Every recalled claim is HMAC-signed back to its originating trajectory span. Python
computer_use_agent Open-source local-VLM browser agent. AT-tree-first routing with VLM fallback, refusals enforced in code, honest benchmarks including the failure atlas. Python
whisper_agent Hands-free local voice agent: faster-whisper STT, local LLM with tool use, TTS. Runs entirely on your machine. Python
agent-tracer-2 OpenTelemetry-native, local-first observability for AI agents. DuckDB on disk, Next.js viewer on localhost, no SaaS. Adapters for Anthropic, OpenAI, LangGraph, AutoGen, CrewAI. Python
local-deep-research Self-hosted deep-research agent: multi-step query planning, source synthesis, report generation. Ollama / llama.cpp / vLLM friendly, with SearXNG, FAISS, and BM25. Python

🧪   Benchmarks & evaluation

Reproducible by default. Probe for contamination and reward hacks before declaring a number.

Repo What it is Lang
agent_eval Open-source benchmark for Claude Code skill bundles. Pass@k plus cost plus reliability, content-addressed leaderboard across Anthropic / OpenAI / Google. Python
bench_audit Library of probes for agent benchmarks: contamination, gold-answer leaks, harness-injection vulnerabilities, reward hacking. CIs on every result. Python
rag-bench Small, reproducible benchmark for RAG pipelines. Python
agent-arena Arena-style framework for head-to-head agent comparison. Python
paper-replay Replay and reproduce paper experiments with locked seeds, environments, and artifacts. Python

🛠️   Developer tools & skills

Quality layers, lockfiles, and ergonomics for the agent stack.

Repo What it is Lang
promptlock Production prompt workflow: semantic diff, eval-on-PR, lockfile, drift detection, and rollback for plain-markdown prompts in your repo. Go
skill-forge-2 Quality layer for Claude Code Skills: lint, test, and bench before you ship. Rust
browser-skills 15 reusable, agent-agnostic browser recipes plus an MCP server. Cookie banners, infinite scroll, calendar widgets, all solved once. Python
diagram-skills Generate validated diagrams across Mermaid, PlantUML, Graphviz, D2, and Excalidraw. MCP server, CLI, and Claude Code skills. Python
see-the-ai-think Watch an LLM think. Interactive interpretability tool that visualizes sparse-autoencoder features firing live across every token. Runs on your laptop, no GPU required. Python
paper_pod Local-first audio overviews for academic papers. Take an arXiv URL, PDF, or BibTeX in, get an 8 to 15 minute two-host podcast out. Python
paper2repro Paper to reproducible experiment scaffold. Python
test_forge Test-generation toolkit for Python research code. Python

📚   Curated knowledge

What we had to learn the hard way, written down for the next person.

📓   Atlases & annotated notebooks

Repo What it is
awesome-llm-circuits-atlas Interactive atlas of discovered circuits and SAE features in large language models, with Colab reproductions on open-weights models.
awesome-reasoning-models-theory Theory-first map of why reasoning models (o1/o3, DeepSeek-R1, Claude-thinking, Qwen-QwQ) actually work. 8 chapters, 60+ annotated papers, 13 models compared, 5 reproduction notebooks, live benchmarks.
retrieval-from-scratch Modern Information Retrieval from scratch in PyTorch. BM25, dense bi-encoders, ColBERT late interaction, cross-encoder reranking, and RAG, in annotated notebooks that run on a single GPU.

🗺️   Maps, lists & roadmaps

Repo What it is
awesome-why-llms-work Falsifiable-hypothesis atlas of why LLMs work. Five competing research programmes, 41 tracked claims with epistemic status (🟢🟡🔴⚪) and named falsifiers.
awesome-llm-reasoning-foundations Curated, rigorously-verified map of the theoretical foundations of LLM reasoning: transformer expressivity, chain-of-thought error bounds, circuit complexity, logical characterizations, learnability.
llm-impossibility-results Verified, assumption-explicit catalog of published impossibility and lower-bound results for LLMs and AI agents: circuit-complexity ceilings, hallucination bounds, watermarking impossibility, alignment.
awesome-llm-theory Companion list: theory papers for LLM behavior, expressiveness, and learnability.
build-your-own-ai Master modern AI by building it from scratch: curated index of the best build-it-yourself guides for tokenizers, attention, training, RAG, agents, and evals.
awesome-research-agents Opinionated, curated list of agents, skills, MCP servers, and tools ML researchers actually use.
ai-engineer-roadmap Interactive end-to-end roadmap for AI engineers. 12 stages, 122 nodes, 276 link-verified resources from math prerequisites to the research frontier.
harness-engineer-roadmap Interactive roadmap for harness engineering: the agent loop, tool layers, context engineering, memory, retrieval, eval.
llm-interview-prep Interview-prep notebook for LLM and ML-systems roles.

🏭   Deployment

Translational work. Coverage proofs and scheduling guarantees in production, against real workloads.

  • Stellaris AI. Conformal-coverage pipelines and multi-tenant scheduling for native-safe foundation models, in regulated deployments.
  • Brain Investing. HKU FinTech spin-out. Conformal-bound risk management running against live P&L. The lab's coverage work, in a real trading book.

🏅   Service & Recognition


📬   Availability

Postdoc, Fall 2026

Open to positions where theory and deployment share a research agenda.

Areas. Trustworthy & compliance-grade AI  ·  Multi-agent systems & mechanism design  ·  LLM theory (descriptive complexity, in-context reasoning)  ·  Serving systems for inference.

Reach me at   bettyguo@connect.hku.hk


Dongxin (Betty) Guo  ·  The University of Hong Kong  ·  Department of Computer Science
homepage  ·  scholar  ·  orcid  ·  openreview  ·  linkedin
Last updated May 2026

Popular repositories Loading

  1. realm-retrieve realm-retrieve Public

    When to Retrieve During Reasoning — Adaptive RAG for Large Reasoning Models

    Python 99 11

  2. browser-skills browser-skills Public

    The missing content layer for agentskills.io: 15 reusable, agent-agnostic recipes + MCP server so browser-using AI agents stop re-discovering cookie banners, infinite scroll, and calendar widgets e…

    Python 9

  3. deterministic-horizon deterministic-horizon Public

    Python 9 5

  4. SafeAnchor SafeAnchor Public

    Safety-Preserving Continual Domain Adaptation of LLMs through Fisher-Based Subspace Identification and Orthogonal Gradient Projection

    Python 8

  5. awesome-research-agents awesome-research-agents Public

    An opinionated, curated list of agents, skills, MCP servers, and tools ML researchers actually use.

    8

  6. local-deep-research local-deep-research Public

    local-llm ollama llama-cpp vllm deep-research research-agent rag retrieval-augmented-generation agent llm-agent local-first privacy offline qwen searxng faiss bm25 python cli ai-research llm-tooling

    Python 8