Releases: thushan/olla
Olla v0.0.27
What's in this release
We've added native support for LMDeploy after so long as well as bugfixes for sticky sessions thanks to @lbatalha.
Quick Start
# Docker
docker pull ghcr.io/thushan/olla:v0.0.27
# Binary (see assets below)
./olla --config config.yamlChangelog
Other
- 4ed9cf0: Bump github.com/puzpuzpuz/xsync/v4 from 4.4.0 to 4.5.0 (@dependabot[bot])
- b900a20: add aimock test harness for sticky sessions (@thushan)
- f350dcd: add language tags to skill code fences (@thushan)
- 249e915: add lmdeploy backend docs (@thushan)
- e52b561: add lmdeploy model converter and wire into factory and routes (@thushan)
- 364154e: add lmdeploy profile (@thushan)
- a06a3a9: add lmdeploy provider constants (@thushan)
- 252d0ea: add lmdeploy response parser (@thushan)
- 3fa1875: assert backend marker is non-empty in sticky tests (@thushan)
- 45f9865: clarify provider prefix helpers return no trailing slash (@thushan)
- 19b712b: doc updates and lmdeploy TPS is still in the works. (@thushan)
- ab4a786: fix lmdeploy struct alignment (@thushan)
- a57501b: fix prefix_hash fallback for empty messages array (@thushan)
- 8510341: fix sticky diversity check to ignore failed requests (@thushan)
- 3055480: fix sticky sessions for provider-scoped routes (#139) (@thushan)
- 95022d1: link lmdeploy from backend overview docs (@thushan)
- 4ac53f2: pin aimock image digest for reproducible tests (@thushan)
- bbdd5f0: readme update for native support for LMDeploy (@thushan)
- 85495d5: revert anthropic_support default + fix stale comment (@thushan)
- 7985c7c: stabilise sticky harness turn-3 diversity (@thushan)
- ac2386f: use docker compose --wait instead of custom poll (@thushan)
Documentation: thushan.github.io/olla | Issues: github.com/thushan/olla/issues
Olla v0.0.26
What's in this release
This is a bugfix release that addresses SSL connection issues due to mis-handling of the Host header.
Quick Start
# Docker
docker pull ghcr.io/thushan/olla:v0.0.26
# Binary (see assets below)
./olla --config config.yamlChangelog
Other
- 6896157: Bump actions/upload-pages-artifact from 4 to 5 (@dependabot[bot])
- 0d3ff86: doc updates (@thushan)
- 50287bf: lets you build docker image locally without goreleaser (@thushan)
- 9505fba: make docker builds portable across arm and amd (@thushan)
Documentation: thushan.github.io/olla | Issues: github.com/thushan/olla/issues
Olla v0.0.25
What's in this release
Olla is a high-performance proxy and load balancer for LLM infrastructure.
Quick Start
# Docker
docker pull ghcr.io/thushan/olla:v0.0.25
# Binary (see assets below)
./olla --config config.yamlRelease Highlights
Model Aliasing
Thanks to @dnnspaul for contributing the Model Aliasing feature to Olla to alias models easily via the configuration.
Sticky Sessions
We've now got a way of having sticky sessions in Olla to help keep requests aligned to KV Caches across multiple endpoints, taken from the working implementation in TensorFoundry's FoundryOS.
Bugfixes and Chores
Lots of bugfixes and chores from March & April.
Changelog
Features
- 8378bff: feat: add model alias validation, test coverage, and byte-preserving JSON rewrite (@dnnspaul)
- pr: sticky-sessions sticky sessions implementation (@thushan)
Bug Fixes
- 33307eb: fix(inspector): copy buffer bytes before pool return to avoid aliasing (@dnnspaul)
- c54eea9: fix(inspector): incremental scan with token-level skipping for field-order independence (@dnnspaul)
- f7222c9: fix(inspector): replace io.NopCloser with readCloser to preserve body Close delegation (@dnnspaul)
- 2bbb9f7: fix(inspector): restore body on error and fix decoder state in extractTopLevelModelField (@dnnspaul)
- ed1609a: fix: address CodegRabbit review issues (@thushan)
- 8a28dbf: fix: extract model name from large requests via streaming JSON prefix scan (@dnnspaul)
- d923536: fix: propagate model alias rewrite map to translation hanlder (@thushan)
- 96b67eb: fix: replace regex model rewrite with json.Decoder token scanner (@dnnspaul)
Other
- f4104ef: + documentation (@dnnspaul)
- df441b3: - OLLA-284: move SetPurgeDeadEndpoints registration from app.go wiring time into applyStickySessions() (@thushan)
- 8b9e939: CI fix for actions etc. (@thushan)
- 7e426e9: Fix latent rarce on purgedead & replace TTL sleep with poll loop (@thushan)
- 5268939: betteralign adjustments (@dnnspaul)
- 3cbef31: bump x/sync to 0.20.0 (@thushan)
- 414e5a9: bump x/time to 0.15.0 (@thushan)
- 83712ca: concise (@thushan)
- d15a63d: doc updates (@thushan)
- afb0f25: fix leaky test (@thushan)
- 047cc92: fix test failures from typed model key and retry string trim (@thushan)
- 44999cf: fix ttlcache == 0 (@thushan)
- ca899c5: guard Cleanup against double-invoke (@thushan)
- 307b9c4: guard StopChecking against concurrent double-invoke (@thushan)
- a99a8bb: implementation(PR-98): model aliases (@dnnspaul)
- efb02fa: initial sticky sessions work (@thushan)
- 17d167f: lint test (@thushan)
- f717468: refactor(inspector): clarify pool-safety asymmetry and add capability regression guard (@dnnspaul)
- 1c209c0: refactor: address PR review feedback (round 2) for model aliases (@dnnspaul)
- 41b544e: refactor: address PR review feedback for model aliases (@dnnspaul)
- 4bbe95b: reference updates (@thushan)
- 6b9e2d0: remove dead proxyToSingleEndpointLegacy (@thushan)
- 29b6bca: remove dead responsePool (@thushan)
- 9c0debc: reorder Service fields for betteralign (@thushan)
- 6de613d: revert x/sync bump, needs go 1.25 (@thushan)
- ec5d03f: revert x/time bump, needs go 1.25 (@thushan)
- da5d8bb: tighten connection error string fallback in retry (@thushan)
- 01500b7: update coderabbit commnts (@thushan)
- 94c1784: update readme (@thushan)
- ce4aaaa: update readme (@thushan)
- 8442085: update version signature to be a bit more robust (@thushan)
- c2d9af4: use typed context key for model (@thushan)
- c9f2223: wrong template . (@thushan)
Documentation: thushan.github.io/olla | Issues: github.com/thushan/olla/issues
olla-v0.0.24
This is a bugfix release to fix some agentic workloads for translator mode, agent tooling and improved logging.
What's Changed
Changelog
- 3d00f3b coderabbit recommendation
- 37c416b default to Olla proxy engine
- a0d1941 fix duplicate increment
- 857c75d fix anthropic tooling bug
- 5827fb1 flush for sherpa interface
- 7a94b70 mssing outputconfig in anthropic requests
- 7649f80 readme tweak
- f98eeae show translation mode in logs
Full Changelog: v0.0.23...v0.0.24
olla-v0.0.23
This is a major release bringing in some exciting features:
- New Backends: Docker Model Runner and vLLM-MLX
- Support for Anthropic Passthrough on supported backends (vllm etc) so we don't translate in Olla
- Documentation Refinements based on feedback
- Sensible defaults so you can have a lean config file to overide for most users
- BUGFIX: Proxy Path problems (
/olla/proxy) resolution issues when certain mixes of backends were present - Additional tests for integration and pass through (python) for internal verification before shipping
- Security & dependency updates
What's Changed
- Bump actions/cache from 4 to 5 by @dependabot[bot] in #91
- chore: february 2026 dependency updates + CI fixes by @thushan in #101
- docs: February 2026 updates by @thushan in #102
- feat: endpoint optional by @thushan in #103
- Bump github.com/expr-lang/expr from 1.17.6 to 1.17.7 by @dependabot[bot] in #93
- feat: Anthropic Pass-through by @thushan in #105
- feat: backend/docker-model-runner by @thushan in #106
- fix: Proxy Path issue & sensible defaults by @thushan in #107
- fix: pass through failure by @thushan in #108
- feature: backend/vllm-mlx by @thushan in #109
- docs: vllm-mlx by @thushan in #110
- feat: python integration tests by @thushan in #112
- Bump github.com/expr-lang/expr from 1.17.7 to 1.17.8 by @dependabot[bot] in #111
Full Changelog: v0.0.22...v0.0.23
olla-v0.0.22
This release was largely for fixing model_url resolution but also contains some maintenance fixes.
What's Changed
- Bump golang.org/x/sync from 0.17.0 to 0.18.0 by @dependabot[bot] in #83
- fix: ensure model_url is used from endpoint config by @thushan in #88
- chore: december 2025 by @thushan in #89
- Bump actions/checkout from 5 to 6 by @dependabot[bot] in #85
- fix: Alternative method of resolving profile paths by @thushan in #90
Full Changelog: v0.0.21...v0.0.22
Changelog
- 6bf7c45 Bump actions/checkout from 5 to 6
- 83fbc91 Bump golang.org/x/sync from 0.17.0 to 0.18.0
- e2b4351 alternative way to join paths for OpenAI compatible profiles
- eebe4f1 copy paste issue
- b47df2f ensure that model_url is used from endpoint config and fallback is the profile.
- a6f8af2 feedback
- 1a3325a format
- 499a154 handle absolute URLs a bit better and expand test cases
- 72ddfcd lib update & validation
- 7428298 small refactor
- fd2733d update doc 4
olla-v0.0.21
This release is to help address path translation issues (see #80) with tools like Docker Model Runner and Olla's default way of handling paths and not preserving paths.
We added a new setting for endpoints to instruct Olla to preserve paths when proxying requests.
- url: "http://localhost:12434/engines/llama.cpp/"
name: "local-docker"
type: "openai-compatible"
priority: 100
preserve_path: true # this way, /v1/completions will forward properly to Docker Model Runner
model_url: "/models"
health_check_url: "/"
check_interval: 2s
check_timeout: 1sThere's also a bugfix for the missing OpenAI routing (for type: openai-compatible) with refreshed profiles for OpenAI.
What's Changed
Full Changelog: v0.0.20...v0.0.21
Changelog
olla-v0.0.20
This release brings back llamacpp integration and adds experimental Anthropic message support (disabled by default) at /olla/anthropic so you can point Claude Code and other tools easily.
What's Changed
- feat: Backend llamacpp by @thushan in #73
- feat: anthropic / message logger (development only) by @thushan in #77
- feat: Anthropic Message format Support by @thushan in #76
- Bump github.com/pterm/pterm from 0.12.81 to 0.12.82 by @dependabot[bot] in #75
- Bump golang.org/x/time from 0.13.0 to 0.14.0 by @dependabot[bot] in #72
- prepare: v0.0.20 by @thushan in #78
Full Changelog: v0.0.19...v0.0.20
olla-v0.0.19
This release has several performance fixes (noticeably uplift for ARM), critical fixes for all archs and adds support for sglang and LemonadeSDK.
Encourage all to upgrade to this release.
What's Changed
- feat: backend/sglang by @thushan in #69
- feat: backend/lemonade by @thushan in #70
- fixes: October 2025 performance improvements by @thushan in #71
Full Changelog: v0.0.18...v0.0.19
Changelog
- 554b2fa GetHealthyEndpointsForModel could leak targets that no longer exist.
- 4d3e12d adds parser
- dcf3c52 adds the parser and converter
- 267dcd2 atomic catalog store
- 716e57f avoid alloc on response times
- 203ce4a cleanup
- 9cb11c9 constants for linting, will add more later
- c7a7fc9 doc refresh
- 7aeb09f documentation
- 6748a50 documentation updates
- c688fce factory too
- 6ab4a15 fixed warnings and missed sglang reference
- 16fa9d5 handler bits
- ccc8f58 hotpath: reduce allocations
- e2be222 initial SGLang work
- 4d3d3e4 initial configuration based on what's available
- 12a7d14 initial lemonade bits
- 1d65097 note about format
- 1b9ffd6 openai
- c091490 perf: avoid resolvereference call if endpoint URL has no path
- 985d8eb perf: avoid GC pressure and preallocate
- fbaece8 perf: reduce string allocations
- dcb9050 race fix: method instead of module level
- e012a30 reduce hashing and allocations
- 21de3da refactor and slightly different way to infer capabilities
- 3b19336 refactor to use benchmark
- 77a4b8c refeactor test
- 53e83a6 rune fix
- 3fcf132 slightly more complex fix to improve allocations in unified memory registry
- 319f442 update docs and make supported backends a table.
- 35b6cab update readme
- 837dc42 use map rather than MapOf (deprecated)
- f9e8a69 wire up handler too and initial profile
olla-v0.0.18
This is mostly a maintenance release and includes consolidation of configuration of the Sherpa and Olla Proxies internally.
What's Changed
- chore: Consolidate Converters by @thushan in #58
- September 2025 updates by @thushan in #68
- Bump actions/upload-pages-artifact from 3 to 4 by @dependabot[bot] in #60
- refactor: Proxy Configurations by @thushan in #59
- Bump actions/setup-python from 5 to 6 by @dependabot[bot] in #63
- Bump actions/setup-go from 5 to 6 by @dependabot[bot] in #62
- Bump actions/configure-pages from 4 to 5 by @dependabot[bot] in #55
- Bump actions/checkout from 4 to 5 by @dependabot[bot] in #54
Changelog
- b5be024 Bump actions/checkout from 4 to 5
- 8141cb2 Bump actions/configure-pages from 4 to 5
- 720cc7c Bump actions/setup-go from 5 to 6
- 29afbd9 Bump actions/setup-python from 5 to 6
- 11c06d3 Bump actions/upload-pages-artifact from 3 to 4
- 10efb5a September 2025 updates
- 9c03ec9 cache time
- c4bbb98 fix remaining convertors
- dbf6dee initial consolidation of Proxy Configuration
- 0d27e9b introduce a base converter for conversion to avoid duplication
- d7bec85 update the olla service config and fallback too
- 6951956 update workflows.
- a83c209 use the specific settings and fallback if unavailable
Full Changelog: v0.0.17...v0.0.18