Releases · thushan/olla

27 Apr 11:11

github-actions

v0.0.27

6d6ac4d

Olla v0.0.27 Latest

Latest

What's in this release

We've added native support for LMDeploy after so long as well as bugfixes for sticky sessions thanks to @lbatalha.

Quick Start

# Docker
docker pull ghcr.io/thushan/olla:v0.0.27

# Binary (see assets below)
./olla --config config.yaml

Changelog

Other

4ed9cf0: Bump github.com/puzpuzpuz/xsync/v4 from 4.4.0 to 4.5.0 (@dependabot[bot])
b900a20: add aimock test harness for sticky sessions (@thushan)
f350dcd: add language tags to skill code fences (@thushan)
249e915: add lmdeploy backend docs (@thushan)
e52b561: add lmdeploy model converter and wire into factory and routes (@thushan)
364154e: add lmdeploy profile (@thushan)
a06a3a9: add lmdeploy provider constants (@thushan)
252d0ea: add lmdeploy response parser (@thushan)
3fa1875: assert backend marker is non-empty in sticky tests (@thushan)
45f9865: clarify provider prefix helpers return no trailing slash (@thushan)
19b712b: doc updates and lmdeploy TPS is still in the works. (@thushan)
ab4a786: fix lmdeploy struct alignment (@thushan)
a57501b: fix prefix_hash fallback for empty messages array (@thushan)
8510341: fix sticky diversity check to ignore failed requests (@thushan)
3055480: fix sticky sessions for provider-scoped routes (#139) (@thushan)
95022d1: link lmdeploy from backend overview docs (@thushan)
4ac53f2: pin aimock image digest for reproducible tests (@thushan)
bbdd5f0: readme update for native support for LMDeploy (@thushan)
85495d5: revert anthropic_support default + fix stale comment (@thushan)
7985c7c: stabilise sticky harness turn-3 diversity (@thushan)
ac2386f: use docker compose --wait instead of custom poll (@thushan)

Documentation: thushan.github.io/olla | Issues: github.com/thushan/olla/issues

Contributors

thushan, lbatalha, and dependabot

Assets 11

20 Apr 22:55

github-actions

v0.0.26

ebed952

Olla v0.0.26

What's in this release

This is a bugfix release that addresses SSL connection issues due to mis-handling of the Host header.

Quick Start

# Docker
docker pull ghcr.io/thushan/olla:v0.0.26

# Binary (see assets below)
./olla --config config.yaml

Changelog

Other

6896157: Bump actions/upload-pages-artifact from 4 to 5 (@dependabot[bot])
0d3ff86: doc updates (@thushan)
50287bf: lets you build docker image locally without goreleaser (@thushan)
9505fba: make docker builds portable across arm and amd (@thushan)

Documentation: thushan.github.io/olla | Issues: github.com/thushan/olla/issues

Contributors

thushan and dependabot

Assets 11

17 Apr 02:13

github-actions

v0.0.25

0c4bfb4

Olla v0.0.25

What's in this release

Olla is a high-performance proxy and load balancer for LLM infrastructure.

Quick Start

# Docker
docker pull ghcr.io/thushan/olla:v0.0.25

# Binary (see assets below)
./olla --config config.yaml

Release Highlights

Model Aliasing

Thanks to @dnnspaul for contributing the Model Aliasing feature to Olla to alias models easily via the configuration.

Sticky Sessions

We've now got a way of having sticky sessions in Olla to help keep requests aligned to KV Caches across multiple endpoints, taken from the working implementation in TensorFoundry's FoundryOS.

Bugfixes and Chores

Lots of bugfixes and chores from March & April.

Changelog

Features

8378bff: feat: add model alias validation, test coverage, and byte-preserving JSON rewrite (@dnnspaul)
pr: sticky-sessions sticky sessions implementation (@thushan)

Bug Fixes

33307eb: fix(inspector): copy buffer bytes before pool return to avoid aliasing (@dnnspaul)
c54eea9: fix(inspector): incremental scan with token-level skipping for field-order independence (@dnnspaul)
f7222c9: fix(inspector): replace io.NopCloser with readCloser to preserve body Close delegation (@dnnspaul)
2bbb9f7: fix(inspector): restore body on error and fix decoder state in extractTopLevelModelField (@dnnspaul)
ed1609a: fix: address CodegRabbit review issues (@thushan)
8a28dbf: fix: extract model name from large requests via streaming JSON prefix scan (@dnnspaul)
d923536: fix: propagate model alias rewrite map to translation hanlder (@thushan)
96b67eb: fix: replace regex model rewrite with json.Decoder token scanner (@dnnspaul)

Other

f4104ef: + documentation (@dnnspaul)
df441b3: - OLLA-284: move SetPurgeDeadEndpoints registration from app.go wiring time into applyStickySessions() (@thushan)
8b9e939: CI fix for actions etc. (@thushan)
7e426e9: Fix latent rarce on purgedead & replace TTL sleep with poll loop (@thushan)
5268939: betteralign adjustments (@dnnspaul)
3cbef31: bump x/sync to 0.20.0 (@thushan)
414e5a9: bump x/time to 0.15.0 (@thushan)
83712ca: concise (@thushan)
d15a63d: doc updates (@thushan)
afb0f25: fix leaky test (@thushan)
047cc92: fix test failures from typed model key and retry string trim (@thushan)
44999cf: fix ttlcache == 0 (@thushan)
ca899c5: guard Cleanup against double-invoke (@thushan)
307b9c4: guard StopChecking against concurrent double-invoke (@thushan)
a99a8bb: implementation(PR-98): model aliases (@dnnspaul)
efb02fa: initial sticky sessions work (@thushan)
17d167f: lint test (@thushan)
f717468: refactor(inspector): clarify pool-safety asymmetry and add capability regression guard (@dnnspaul)
1c209c0: refactor: address PR review feedback (round 2) for model aliases (@dnnspaul)
41b544e: refactor: address PR review feedback for model aliases (@dnnspaul)
4bbe95b: reference updates (@thushan)
6b9e2d0: remove dead proxyToSingleEndpointLegacy (@thushan)
29b6bca: remove dead responsePool (@thushan)
9c0debc: reorder Service fields for betteralign (@thushan)
6de613d: revert x/sync bump, needs go 1.25 (@thushan)
ec5d03f: revert x/time bump, needs go 1.25 (@thushan)
da5d8bb: tighten connection error string fallback in retry (@thushan)
01500b7: update coderabbit commnts (@thushan)
94c1784: update readme (@thushan)
ce4aaaa: update readme (@thushan)
8442085: update version signature to be a bit more robust (@thushan)
c2d9af4: use typed context key for model (@thushan)
c9f2223: wrong template . (@thushan)

Documentation: thushan.github.io/olla | Issues: github.com/thushan/olla/issues

Contributors

thushan and dnnspaul

Assets 11

22 Feb 09:54

github-actions

v0.0.24

eef52ee

olla-v0.0.24

This is a bugfix release to fix some agentic workloads for translator mode, agent tooling and improved logging.

What's Changed

feature: Anthropic agent fixes and improvements by @thushan in #113

Changelog

3d00f3b coderabbit recommendation
37c416b default to Olla proxy engine
a0d1941 fix duplicate increment
857c75d fix anthropic tooling bug
5827fb1 flush for sherpa interface
7a94b70 mssing outputconfig in anthropic requests
7649f80 readme tweak
f98eeae show translation mode in logs

Full Changelog: v0.0.23...v0.0.24

Contributors

thushan

Assets 11

20 Feb 21:36

github-actions

v0.0.23

46853b9

olla-v0.0.23

This is a major release bringing in some exciting features:

New Backends: Docker Model Runner and vLLM-MLX
Support for Anthropic Passthrough on supported backends (vllm etc) so we don't translate in Olla
Documentation Refinements based on feedback
Sensible defaults so you can have a lean config file to overide for most users
BUGFIX: Proxy Path problems (/olla/proxy) resolution issues when certain mixes of backends were present
Additional tests for integration and pass through (python) for internal verification before shipping
Security & dependency updates

What's Changed

Bump actions/cache from 4 to 5 by @dependabot[bot] in #91
chore: february 2026 dependency updates + CI fixes by @thushan in #101
docs: February 2026 updates by @thushan in #102
feat: endpoint optional by @thushan in #103
Bump github.com/expr-lang/expr from 1.17.6 to 1.17.7 by @dependabot[bot] in #93
feat: Anthropic Pass-through by @thushan in #105
feat: backend/docker-model-runner by @thushan in #106
fix: Proxy Path issue & sensible defaults by @thushan in #107
fix: pass through failure by @thushan in #108
feature: backend/vllm-mlx by @thushan in #109
docs: vllm-mlx by @thushan in #110
feat: python integration tests by @thushan in #112
Bump github.com/expr-lang/expr from 1.17.7 to 1.17.8 by @dependabot[bot] in #111

Full Changelog: v0.0.22...v0.0.23

Contributors

thushan and dependabot

Assets 11

15 Dec 10:48

github-actions

v0.0.22

083ff79

olla-v0.0.22

This release was largely for fixing model_url resolution but also contains some maintenance fixes.

What's Changed

Bump golang.org/x/sync from 0.17.0 to 0.18.0 by @dependabot[bot] in #83
fix: ensure model_url is used from endpoint config by @thushan in #88
chore: december 2025 by @thushan in #89
Bump actions/checkout from 5 to 6 by @dependabot[bot] in #85
fix: Alternative method of resolving profile paths by @thushan in #90

Full Changelog: v0.0.21...v0.0.22

Changelog

6bf7c45 Bump actions/checkout from 5 to 6
83fbc91 Bump golang.org/x/sync from 0.17.0 to 0.18.0
e2b4351 alternative way to join paths for OpenAI compatible profiles
eebe4f1 copy paste issue
b47df2f ensure that model_url is used from endpoint config and fallback is the profile.
a6f8af2 feedback
1a3325a format
499a154 handle absolute URLs a bit better and expand test cases
72ddfcd lib update & validation
7428298 small refactor
fd2733d update doc 4

Contributors

thushan and dependabot

Assets 11

06 Nov 10:30

github-actions

v0.0.21

fd8418d

olla-v0.0.21

This release is to help address path translation issues (see #80) with tools like Docker Model Runner and Olla's default way of handling paths and not preserving paths.

We added a new setting for endpoints to instruct Olla to preserve paths when proxying requests.

      - url: "http://localhost:12434/engines/llama.cpp/"
        name: "local-docker"
        type: "openai-compatible"
        priority: 100
        preserve_path: true # this way, /v1/completions will forward properly to Docker Model Runner
        model_url: "/models"
        health_check_url: "/"
        check_interval: 2s
        check_timeout: 1s

There's also a bugfix for the missing OpenAI routing (for type: openai-compatible) with refreshed profiles for OpenAI.

What's Changed

feat: path preservation for routing in Olla by @thushan in #81

Full Changelog: v0.0.20...v0.0.21

Changelog

f523393 add preserve_path to ep configuration
0cca5e4 initial profile consolidation
3fc883c introduce url_builder to abstract out the target path building
de17e8a lint issues
971efc6 update doc
2538316 update doc 2
fd8418d update doc 3
43756ee update docs

Contributors

thushan

Assets 11

22 Oct 08:14

github-actions

v0.0.20

df959f0

olla-v0.0.20

This release brings back llamacpp integration and adds experimental Anthropic message support (disabled by default) at /olla/anthropic so you can point Claude Code and other tools easily.

What's Changed

feat: Backend llamacpp by @thushan in #73
feat: anthropic / message logger (development only) by @thushan in #77
feat: Anthropic Message format Support by @thushan in #76
Bump github.com/pterm/pterm from 0.12.81 to 0.12.82 by @dependabot[bot] in #75
Bump golang.org/x/time from 0.13.0 to 0.14.0 by @dependabot[bot] in #72
prepare: v0.0.20 by @thushan in #78

Full Changelog: v0.0.19...v0.0.20

Contributors

thushan and dependabot

Assets 11

09 Oct 23:40

github-actions

v0.0.19

1b9ffd6

olla-v0.0.19

This release has several performance fixes (noticeably uplift for ARM), critical fixes for all archs and adds support for sglang and LemonadeSDK.

Encourage all to upgrade to this release.

What's Changed

feat: backend/sglang by @thushan in #69
feat: backend/lemonade by @thushan in #70
fixes: October 2025 performance improvements by @thushan in #71

Full Changelog: v0.0.18...v0.0.19

Changelog

554b2fa GetHealthyEndpointsForModel could leak targets that no longer exist.
4d3e12d adds parser
dcf3c52 adds the parser and converter
267dcd2 atomic catalog store
716e57f avoid alloc on response times
203ce4a cleanup
9cb11c9 constants for linting, will add more later
c7a7fc9 doc refresh
7aeb09f documentation
6748a50 documentation updates
c688fce factory too
6ab4a15 fixed warnings and missed sglang reference
16fa9d5 handler bits
ccc8f58 hotpath: reduce allocations
e2be222 initial SGLang work
4d3d3e4 initial configuration based on what's available
12a7d14 initial lemonade bits
1d65097 note about format
1b9ffd6 openai
c091490 perf: avoid resolvereference call if endpoint URL has no path
985d8eb perf: avoid GC pressure and preallocate
fbaece8 perf: reduce string allocations
dcb9050 race fix: method instead of module level
e012a30 reduce hashing and allocations
21de3da refactor and slightly different way to infer capabilities
3b19336 refactor to use benchmark
77a4b8c refeactor test
53e83a6 rune fix
3fcf132 slightly more complex fix to improve allocations in unified memory registry
319f442 update docs and make supported backends a table.
35b6cab update readme
837dc42 use map rather than MapOf (deprecated)
f9e8a69 wire up handler too and initial profile

Contributors

thushan

Assets 11

23 Sep 12:04

github-actions

v0.0.18

d2bc4af

olla-v0.0.18

This is mostly a maintenance release and includes consolidation of configuration of the Sherpa and Olla Proxies internally.

What's Changed

chore: Consolidate Converters by @thushan in #58
September 2025 updates by @thushan in #68
Bump actions/upload-pages-artifact from 3 to 4 by @dependabot[bot] in #60
refactor: Proxy Configurations by @thushan in #59
Bump actions/setup-python from 5 to 6 by @dependabot[bot] in #63
Bump actions/setup-go from 5 to 6 by @dependabot[bot] in #62
Bump actions/configure-pages from 4 to 5 by @dependabot[bot] in #55
Bump actions/checkout from 4 to 5 by @dependabot[bot] in #54

Changelog

b5be024 Bump actions/checkout from 4 to 5
8141cb2 Bump actions/configure-pages from 4 to 5
720cc7c Bump actions/setup-go from 5 to 6
29afbd9 Bump actions/setup-python from 5 to 6
11c06d3 Bump actions/upload-pages-artifact from 3 to 4
10efb5a September 2025 updates
9c03ec9 cache time
c4bbb98 fix remaining convertors
dbf6dee initial consolidation of Proxy Configuration
0d27e9b introduce a base converter for conversion to avoid duplication
d7bec85 update the olla service config and fallback too
6951956 update workflows.
a83c209 use the specific settings and fallback if unavailable

Full Changelog: v0.0.17...v0.0.18

Contributors

thushan and dependabot

Assets 11

Uh oh!

Releases: thushan/olla

Olla v0.0.27

What's in this release

Quick Start

Changelog

Other

Contributors

Uh oh!

Olla v0.0.26

What's in this release

Quick Start

Changelog

Other

Contributors

Uh oh!

Olla v0.0.25

What's in this release

Quick Start

Release Highlights

Model Aliasing

Sticky Sessions

Bugfixes and Chores

Changelog

Features

Bug Fixes

Other

Contributors

Uh oh!

olla-v0.0.24

What's Changed

Changelog

Contributors

Uh oh!

olla-v0.0.23

What's Changed

Contributors

Uh oh!

olla-v0.0.22

What's Changed

Changelog

Contributors

Uh oh!

olla-v0.0.21

What's Changed

Changelog

Contributors

Uh oh!

olla-v0.0.20

What's Changed

Contributors

Uh oh!

olla-v0.0.19

What's Changed

Changelog

Contributors

Uh oh!

olla-v0.0.18

What's Changed

Changelog

Contributors

Uh oh!