docs: document Ollama + record demo against gemma3#24
Merged
Conversation
ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one. |
- README: add a dedicated Ollama section under "LLM provider examples"
with config, model-tag note, apiKey quirk (Ollama ignores the header
but the OpenAI client must still send a non-empty Bearer string),
Docker host networking, and a small-model edit caveat.
- demo.gif / demo.png: regenerate against a real local Ollama server
(gemma3:latest) instead of the previous mock. The chat exchange now
shows gemma3's own response ("✅ Smoother phrasing") and the edit
("we'd like … during playtime") attributed to the AI author colour.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
86d19cd to
d65622a
Compare
The Ollama+gemma3 take was 49 s end-to-end, but ~28 s of that was
just "Thinking..." while the local model produced a reply — dead
air for a viewer. Re-encode the same recording through ffmpeg's
mpdecimate filter to drop near-duplicate frames (the static
Thinking... segment), repack timestamps with setpts, then append a
2.5 s freeze on the final frame so the loop seam is obvious.
Result: 9.1 s loop, 924 KB GIF (down from 49 s / 2.6 MB). Alice's
typing is still visible at full pace — mpdecimate's default
thresholds drop frames where almost no pixels changed, so the
typing animation survives but the spinner-dwells collapse. Final
still is the post-edit state: surgical attribution clearly visible
("we'd like" and "during" are AI-purple, surrounding words stay in
Alice's colour, Bob's line untouched in pink).
This was a one-shot post-process on the cached webm rather than a
re-record, so it doesn't burn another LLM round-trip. The
build-ep_ai_chat-demo.sh wrapper can fold the same mpdecimate
filter into its ffmpeg invocation as a follow-up.
Replaces the gemma3:4b take with one driven by qwen2.5:1.5b — a
~5× smaller model that responds in ~5 s on CPU vs ~30 s for
gemma3, so subsequent rebuilds will be faster end-to-end. After
the mpdecimate trim the final GIF length is essentially identical
to the gemma3 take (8.7 s vs 9.1 s; mpdecimate eats the dead air
either way), so the speed win is in the developer loop, not the
recorded artefact.
What qwen2.5 chooses to do is different from gemma3, though:
where gemma3 surgically tweaked Alice's body sentence
("we would love" -> "we'd like", "at" -> "during"), qwen2.5
rewrites the title line into a longer formal opener
("Letter to the Football Coach" -> "Dear Coach. Please find below
a proposal for increased football practice at playtime."). The
surgical-attribution story is still visible (Alice's body line and
Bob's sentence stay in their original colours; only the rewritten
title is AI-purple), it's just less of a "look how few words
changed" moment because the title has fewer shared tokens between
FIND and REPLACE.
GIF 874 KB / 8.7 s; PNG 146 KB. Same mpdecimate filter chain as
the previous trim:
mpdecimate=hi=2304:lo=512:frac=0.05,setpts=N/FRAME_RATE/TB,
scale=900:-1:flags=lanczos,fps=12,
tpad=stop_duration=2.5:stop_mode=clone
JohnMcLear
added a commit
that referenced
this pull request
May 3, 2026
The take that landed in PR #24 still showed Bob's "Many of us play..." sentence visibly indented by one column because the steps file typed it with a literal leading space. The cleanup that used to strip it was a property of the ep_ai_chat-specific *mock* LLM, not of the plugin — once the demo switched to a real LLM (gemma3:4b), nothing was removing the space, and the visual implied an asymmetry the plugin isn't actually creating. Fix is upstream of the AI: drop the leading space from Bob's typed string in steps-ep_ai_chat.mjs (Alice's Enter already gives a clean line break, so the defensive space was never necessary). Re-record on top of that. Result: Bob's line now sits flush against the line-number gutter as intended, and the rest of the demo's story is unchanged — gemma3 surgically tightens Alice's sentence ("we would love to play more football at playtime" -> "we'd appreciate more football time at playtime"), the changed words are AI-purple, the surrounding words stay in Alice's colour, Bob's contribution is untouched. GIF 1.1 MB / 9.5 s; PNG 135 KB. Same mpdecimate trim filter chain as PR #24's final commit.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
gemma3:latest,llama3.1:8b, …), theapiKeyquirk (Ollama ignores the bearer but the OpenAI client must still send a non-empty string), Docker host networking, and a caveat that 3–4B local models occasionally drift off the JSON edit format.demo.gif/demo.pngregenerated against a real local Ollama server runninggemma3:latestinstead of the previous mock LLM. The chat exchange in the gif now shows gemma3's own reply ("✅ Smoother phrasing") and an edit ("we'd like … during playtime") that's actually generated by the model, not a scripted response.The internal
_demo-gifbuild pipeline (kept outside this repo) was rewired in the same pass: pre-flight checks Ollama via/api/tags, warms the model, waits for Etherpad's dev-mode/p/:padroute handler to register before recording, and the steps file now waits on a non-placeholder "AI Assistant" chat reply (instead of the mock's literal "Tightened…" string).Test plan
_demo-gif/build-ep_ai_chat-demo.shruns clean against a local Ollama (gemma3:latest) and writes both assets (verified locally — 51 s recording, gemma3 ~13 s for the edit call).🤖 Generated with Claude Code