Skip to content

Releases: thushan/olla

Olla v0.0.27

27 Apr 11:11
6d6ac4d

Choose a tag to compare

What's in this release

We've added native support for LMDeploy after so long as well as bugfixes for sticky sessions thanks to @lbatalha.

Quick Start

# Docker
docker pull ghcr.io/thushan/olla:v0.0.27

# Binary (see assets below)
./olla --config config.yaml

Changelog

Other


Documentation: thushan.github.io/olla | Issues: github.com/thushan/olla/issues

Olla v0.0.26

20 Apr 22:55
ebed952

Choose a tag to compare

What's in this release

This is a bugfix release that addresses SSL connection issues due to mis-handling of the Host header.

Quick Start

# Docker
docker pull ghcr.io/thushan/olla:v0.0.26

# Binary (see assets below)
./olla --config config.yaml

Changelog

Other


Documentation: thushan.github.io/olla | Issues: github.com/thushan/olla/issues

Olla v0.0.25

17 Apr 02:13
0c4bfb4

Choose a tag to compare

What's in this release

Olla is a high-performance proxy and load balancer for LLM infrastructure.

Quick Start

# Docker
docker pull ghcr.io/thushan/olla:v0.0.25

# Binary (see assets below)
./olla --config config.yaml

Release Highlights

Model Aliasing

Thanks to @dnnspaul for contributing the Model Aliasing feature to Olla to alias models easily via the configuration.

Sticky Sessions

We've now got a way of having sticky sessions in Olla to help keep requests aligned to KV Caches across multiple endpoints, taken from the working implementation in TensorFoundry's FoundryOS.

Bugfixes and Chores

Lots of bugfixes and chores from March & April.

Changelog

Features

Bug Fixes

  • 33307eb: fix(inspector): copy buffer bytes before pool return to avoid aliasing (@dnnspaul)
  • c54eea9: fix(inspector): incremental scan with token-level skipping for field-order independence (@dnnspaul)
  • f7222c9: fix(inspector): replace io.NopCloser with readCloser to preserve body Close delegation (@dnnspaul)
  • 2bbb9f7: fix(inspector): restore body on error and fix decoder state in extractTopLevelModelField (@dnnspaul)
  • ed1609a: fix: address CodegRabbit review issues (@thushan)
  • 8a28dbf: fix: extract model name from large requests via streaming JSON prefix scan (@dnnspaul)
  • d923536: fix: propagate model alias rewrite map to translation hanlder (@thushan)
  • 96b67eb: fix: replace regex model rewrite with json.Decoder token scanner (@dnnspaul)

Other


Documentation: thushan.github.io/olla | Issues: github.com/thushan/olla/issues

olla-v0.0.24

22 Feb 09:54
eef52ee

Choose a tag to compare

This is a bugfix release to fix some agentic workloads for translator mode, agent tooling and improved logging.

What's Changed

  • feature: Anthropic agent fixes and improvements by @thushan in #113

Changelog

  • 3d00f3b coderabbit recommendation
  • 37c416b default to Olla proxy engine
  • a0d1941 fix duplicate increment
  • 857c75d fix anthropic tooling bug
  • 5827fb1 flush for sherpa interface
  • 7a94b70 mssing outputconfig in anthropic requests
  • 7649f80 readme tweak
  • f98eeae show translation mode in logs

Full Changelog: v0.0.23...v0.0.24

olla-v0.0.23

20 Feb 21:36
46853b9

Choose a tag to compare

This is a major release bringing in some exciting features:

  • New Backends: Docker Model Runner and vLLM-MLX
  • Support for Anthropic Passthrough on supported backends (vllm etc) so we don't translate in Olla
  • Documentation Refinements based on feedback
  • Sensible defaults so you can have a lean config file to overide for most users
  • BUGFIX: Proxy Path problems (/olla/proxy) resolution issues when certain mixes of backends were present
  • Additional tests for integration and pass through (python) for internal verification before shipping
  • Security & dependency updates

What's Changed

Full Changelog: v0.0.22...v0.0.23

olla-v0.0.22

15 Dec 10:48
083ff79

Choose a tag to compare

This release was largely for fixing model_url resolution but also contains some maintenance fixes.

What's Changed

  • Bump golang.org/x/sync from 0.17.0 to 0.18.0 by @dependabot[bot] in #83
  • fix: ensure model_url is used from endpoint config by @thushan in #88
  • chore: december 2025 by @thushan in #89
  • Bump actions/checkout from 5 to 6 by @dependabot[bot] in #85
  • fix: Alternative method of resolving profile paths by @thushan in #90

Full Changelog: v0.0.21...v0.0.22

Changelog

  • 6bf7c45 Bump actions/checkout from 5 to 6
  • 83fbc91 Bump golang.org/x/sync from 0.17.0 to 0.18.0
  • e2b4351 alternative way to join paths for OpenAI compatible profiles
  • eebe4f1 copy paste issue
  • b47df2f ensure that model_url is used from endpoint config and fallback is the profile.
  • a6f8af2 feedback
  • 1a3325a format
  • 499a154 handle absolute URLs a bit better and expand test cases
  • 72ddfcd lib update & validation
  • 7428298 small refactor
  • fd2733d update doc 4

olla-v0.0.21

06 Nov 10:30
fd8418d

Choose a tag to compare

This release is to help address path translation issues (see #80) with tools like Docker Model Runner and Olla's default way of handling paths and not preserving paths.

We added a new setting for endpoints to instruct Olla to preserve paths when proxying requests.

      - url: "http://localhost:12434/engines/llama.cpp/"
        name: "local-docker"
        type: "openai-compatible"
        priority: 100
        preserve_path: true # this way, /v1/completions will forward properly to Docker Model Runner
        model_url: "/models"
        health_check_url: "/"
        check_interval: 2s
        check_timeout: 1s

There's also a bugfix for the missing OpenAI routing (for type: openai-compatible) with refreshed profiles for OpenAI.

What's Changed

  • feat: path preservation for routing in Olla by @thushan in #81

Full Changelog: v0.0.20...v0.0.21

Changelog

olla-v0.0.20

22 Oct 08:14
df959f0

Choose a tag to compare

This release brings back llamacpp integration and adds experimental Anthropic message support (disabled by default) at /olla/anthropic so you can point Claude Code and other tools easily.

What's Changed

Full Changelog: v0.0.19...v0.0.20

olla-v0.0.19

09 Oct 23:40
1b9ffd6

Choose a tag to compare

This release has several performance fixes (noticeably uplift for ARM), critical fixes for all archs and adds support for sglang and LemonadeSDK.

Encourage all to upgrade to this release.

What's Changed

Full Changelog: v0.0.18...v0.0.19

Changelog

  • 554b2fa GetHealthyEndpointsForModel could leak targets that no longer exist.
  • 4d3e12d adds parser
  • dcf3c52 adds the parser and converter
  • 267dcd2 atomic catalog store
  • 716e57f avoid alloc on response times
  • 203ce4a cleanup
  • 9cb11c9 constants for linting, will add more later
  • c7a7fc9 doc refresh
  • 7aeb09f documentation
  • 6748a50 documentation updates
  • c688fce factory too
  • 6ab4a15 fixed warnings and missed sglang reference
  • 16fa9d5 handler bits
  • ccc8f58 hotpath: reduce allocations
  • e2be222 initial SGLang work
  • 4d3d3e4 initial configuration based on what's available
  • 12a7d14 initial lemonade bits
  • 1d65097 note about format
  • 1b9ffd6 openai
  • c091490 perf: avoid resolvereference call if endpoint URL has no path
  • 985d8eb perf: avoid GC pressure and preallocate
  • fbaece8 perf: reduce string allocations
  • dcb9050 race fix: method instead of module level
  • e012a30 reduce hashing and allocations
  • 21de3da refactor and slightly different way to infer capabilities
  • 3b19336 refactor to use benchmark
  • 77a4b8c refeactor test
  • 53e83a6 rune fix
  • 3fcf132 slightly more complex fix to improve allocations in unified memory registry
  • 319f442 update docs and make supported backends a table.
  • 35b6cab update readme
  • 837dc42 use map rather than MapOf (deprecated)
  • f9e8a69 wire up handler too and initial profile

olla-v0.0.18

23 Sep 12:04
d2bc4af

Choose a tag to compare

This is mostly a maintenance release and includes consolidation of configuration of the Sherpa and Olla Proxies internally.

What's Changed

Changelog

  • b5be024 Bump actions/checkout from 4 to 5
  • 8141cb2 Bump actions/configure-pages from 4 to 5
  • 720cc7c Bump actions/setup-go from 5 to 6
  • 29afbd9 Bump actions/setup-python from 5 to 6
  • 11c06d3 Bump actions/upload-pages-artifact from 3 to 4
  • 10efb5a September 2025 updates
  • 9c03ec9 cache time
  • c4bbb98 fix remaining convertors
  • dbf6dee initial consolidation of Proxy Configuration
  • 0d27e9b introduce a base converter for conversion to avoid duplication
  • d7bec85 update the olla service config and fallback too
  • 6951956 update workflows.
  • a83c209 use the specific settings and fallback if unavailable

Full Changelog: v0.0.17...v0.0.18