[OPIK-3358] [SDK] [Docs] Harbor integration #4370

jverre · 2025-12-05T17:58:56Z

Details

This PR adds a comprehensive Opik integration for Harbor, a benchmark evaluation framework for autonomous LLM agents. The integration enables real-time observability for agent benchmark evaluations (SWE-bench, LiveCodeBench, Terminal-Bench, etc.).

Key features:

Python SDK Integration (opik.integrations.harbor):

track_harbor(job) - Wraps Harbor Job instances with Opik tracking
enable_tracking() - Global tracking enablement for Harbor Trial and Verifier classes
Automatic patching of Harbor's Step class for real-time trajectory step tracking

CLI Integration (opik harbor):

New CLI command that wraps Harbor CLI with automatic Opik tracking
Supports all Harbor subcommands (run, jobs start, trials start, etc.)
Usage: opik harbor run -d terminal-bench@head -a terminus_2 -m gpt-4.1

Data Mapping:

Trial results → Opik traces with timing, metadata, and agent/model info
ATIF trajectory steps → Nested spans with tool calls, observations, token usage, and costs
Verifier rewards → Feedback scores (e.g., pass/fail, tests_passed)
Automatic experiment creation linking trials to datasets per benchmark source

Change checklist

User facing
Documentation update

Issues

Resolves #
OPIK-3358

Testing

Tested manually and created library integration tests

Documentation

Documentation was updated

Copilot

Pull request overview

This PR adds a comprehensive integration between Opik and Harbor, a benchmark evaluation framework for autonomous LLM agents. The integration enables real-time observability for agent benchmark evaluations (SWE-bench, LiveCodeBench, Terminal-Bench, etc.) by tracking trials as traces, trajectory steps as spans, and verifier rewards as feedback scores.

Key changes:

Added Python SDK integration with track_harbor() and enable_tracking() functions
Added opik harbor CLI command that wraps Harbor CLI with automatic tracking
Added comprehensive documentation and examples

Reviewed changes

Copilot reviewed 17 out of 19 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
sdks/python/src/opik/integrations/harbor/opik_tracker.py	Core integration logic - patches Harbor's Trial/Verifier classes and Step class for real-time tracking
sdks/python/src/opik/integrations/harbor/experiment_service.py	Manages datasets and experiments for Harbor jobs, links trial traces to experiments
sdks/python/src/opik/cli/harbor.py	CLI command that wraps Harbor CLI with automatic Opik tracking
sdks/python/tests/e2e_library_integration/harbor/test_harbor_e2e.py	E2E tests for both SDK and CLI integration
sdks/python/examples/harbor_integration_example.py	Usage example showing how to track Harbor jobs
apps/opik-documentation/documentation/fern/docs/tracing/integrations/harbor.mdx	Complete integration documentation with API reference and examples
README.md and localized READMEs	Added Harbor to integrations table

sdks/python/examples/harbor_integration_example.py

sdks/python/src/opik/integrations/harbor/opik_tracker.py

sdks/python/tests/e2e_library_integration/harbor/test_harbor_e2e.py

sdks/python/src/opik/integrations/harbor/experiment_service.py

sdks/python/src/opik/integrations/harbor/opik_tracker.py

sdks/python/tests/e2e_library_integration/harbor/test_harbor_e2e.py

github-actions · 2025-12-05T18:00:48Z

🌿 Preview your docs: https://opik-preview-46bb02b1-87bc-4e70-824c-9ff9228c1bf7.docs.buildwithfern.com/docs/opik

No broken links found

github-actions · 2025-12-05T18:01:30Z

Images automagically compressed by Calibre's image-actions ✨

Compression reduced images by 40.2%, saving 127.85 KB.

Filename	Before	After	Improvement	Visual comparison
`apps/opik-documentation/documentation/fern/img/tracing/harbor_integration.png`	317.85 KB	190.00 KB	-40.2%	View diff

332 images did not require optimisation.

Update required: Update image-actions configuration to the latest version before 1/1/21. See README for instructions.

…l/opik into jacques/harbor-integration

github-actions · 2025-12-05T21:01:14Z

🌿 Preview your docs: https://opik-preview-dfc28fc9-d106-4976-9100-81bffa000e22.docs.buildwithfern.com/docs/opik

No broken links found

github-actions · 2025-12-05T21:09:00Z

🌿 Preview your docs: https://opik-preview-a17f443f-0127-4a04-8df5-5f903b60ef95.docs.buildwithfern.com/docs/opik

The following broken links where found:

Page:
❌ Broken link: {} ()

Page:
❌ Broken link: ] ()

github-actions · 2025-12-08T11:06:36Z

🌿 Preview your docs: https://opik-preview-25135266-e318-46e2-af7d-2256e68043d6.docs.buildwithfern.com/docs/opik

No broken links found

sdks/python/src/opik/integrations/harbor/experiment_service.py

Lothiraldan · 2025-12-08T15:21:03Z

It would be useful to also save the following information from the dataset:

The version of the dataset used. Not sure where to save it though
Additional task data like the git_url, git_commit, path could be useful to catch if a task definition has changed between two runs?

github-actions · 2025-12-09T09:33:25Z

SDK Unit Tests Results

0 tests 0 ✅ 0s ⏱️
0 suites 0 💤
0 files 0 ❌

Results for commit e3419e0.

♻️ This comment has been updated with latest results.

github-actions · 2025-12-09T09:34:06Z

🌿 Preview your docs: https://opik-preview-f794d068-78b5-49b2-a52b-a1d643c50ff8.docs.buildwithfern.com/docs/opik

No broken links found

📌 Results for commit 85fc76e

github-actions · 2025-12-09T09:39:20Z

SDK E2E Tests Results

0 tests 0 ✅ 0s ⏱️
0 suites 0 💤
0 files 0 ❌

Results for commit e3419e0.

♻️ This comment has been updated with latest results.

sdks/python/src/opik/integrations/harbor/experiment_service.py

sdks/python/tests/e2e_library_integration/harbor/test_harbor_e2e.py

sdks/python/src/opik/integrations/harbor/opik_tracker.py

sdks/python/src/opik/integrations/harbor/experiment_service.py

sdks/python/tests/e2e_library_integration/harbor/test_harbor_e2e.py

Harbor integration

f06e077

Copilot AI review requested due to automatic review settings December 5, 2025 17:58

jverre requested review from a team as code owners December 5, 2025 17:58

github-actions bot assigned jverre Dec 5, 2025

github-actions bot added documentation Improvements or additions to documentation dependencies Pull requests that update a dependency file python Pull requests that update Python code Python SDK tests Including test files, or tests related like configuration. labels Dec 5, 2025

Copilot AI reviewed Dec 5, 2025

View reviewed changes

Optimised images with calibre/image-actions

8edf019

jverre added 2 commits December 5, 2025 20:50

Integration update

508dd8c

t pushMerge branch 'jacques/harbor-integration' of github.com:comet-m…

151b7e4

…l/opik into jacques/harbor-integration

Address tests

faf4bdb

Fix linter

0106fc7

Lothiraldan reviewed Dec 8, 2025

View reviewed changes

sdks/python/src/opik/integrations/harbor/experiment_service.py Outdated Show resolved Hide resolved

sdks/python/src/opik/integrations/harbor/experiment_service.py Outdated Show resolved Hide resolved

Lothiraldan previously approved these changes Dec 8, 2025

View reviewed changes

Update Harbor docs

e3419e0

jverre dismissed Lothiraldan’s stale review via e3419e0 December 9, 2025 09:31

alexkuzmik requested changes Dec 9, 2025

View reviewed changes

jverre added 3 commits December 9, 2025 21:33

Merge branch 'main' into jacques/harbor-integration

b7eaad2

Address comments from review

6fcf5ff

Address comments from review

d8c162d

[OPIK-3358] [SDK] [Docs] Harbor integration #4370

Are you sure you want to change the base?

[OPIK-3358] [SDK] [Docs] Harbor integration #4370

Uh oh!

Conversation

jverre commented Dec 5, 2025

Details

Change checklist

Issues

Testing

Documentation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Dec 5, 2025

Uh oh!

github-actions bot commented Dec 5, 2025

Uh oh!

github-actions bot commented Dec 5, 2025

Uh oh!

github-actions bot commented Dec 5, 2025

Uh oh!

github-actions bot commented Dec 8, 2025

Uh oh!

Uh oh!

Uh oh!

Lothiraldan commented Dec 8, 2025

Uh oh!

github-actions bot commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

SDK Unit Tests Results

Uh oh!

github-actions bot commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

SDK E2E Tests Results

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions bot commented Dec 9, 2025 •

edited

Loading

github-actions bot commented Dec 9, 2025 •

edited

Loading

github-actions bot commented Dec 9, 2025 •

edited

Loading