-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[OPIK-3358] [SDK] [Docs] Harbor integration #4370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a comprehensive integration between Opik and Harbor, a benchmark evaluation framework for autonomous LLM agents. The integration enables real-time observability for agent benchmark evaluations (SWE-bench, LiveCodeBench, Terminal-Bench, etc.) by tracking trials as traces, trajectory steps as spans, and verifier rewards as feedback scores.
Key changes:
- Added Python SDK integration with
track_harbor()andenable_tracking()functions - Added
opik harborCLI command that wraps Harbor CLI with automatic tracking - Added comprehensive documentation and examples
Reviewed changes
Copilot reviewed 17 out of 19 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| sdks/python/src/opik/integrations/harbor/opik_tracker.py | Core integration logic - patches Harbor's Trial/Verifier classes and Step class for real-time tracking |
| sdks/python/src/opik/integrations/harbor/experiment_service.py | Manages datasets and experiments for Harbor jobs, links trial traces to experiments |
| sdks/python/src/opik/cli/harbor.py | CLI command that wraps Harbor CLI with automatic Opik tracking |
| sdks/python/tests/e2e_library_integration/harbor/test_harbor_e2e.py | E2E tests for both SDK and CLI integration |
| sdks/python/examples/harbor_integration_example.py | Usage example showing how to track Harbor jobs |
| apps/opik-documentation/documentation/fern/docs/tracing/integrations/harbor.mdx | Complete integration documentation with API reference and examples |
| README.md and localized READMEs | Added Harbor to integrations table |
sdks/python/tests/e2e_library_integration/harbor/test_harbor_e2e.py
Outdated
Show resolved
Hide resolved
sdks/python/tests/e2e_library_integration/harbor/test_harbor_e2e.py
Outdated
Show resolved
Hide resolved
|
🌿 Preview your docs: https://opik-preview-46bb02b1-87bc-4e70-824c-9ff9228c1bf7.docs.buildwithfern.com/docs/opik No broken links found |
|
Images automagically compressed by Calibre's image-actions ✨ Compression reduced images by 40.2%, saving 127.85 KB.
332 images did not require optimisation. Update required: Update image-actions configuration to the latest version before 1/1/21. See README for instructions. |
…l/opik into jacques/harbor-integration
|
🌿 Preview your docs: https://opik-preview-dfc28fc9-d106-4976-9100-81bffa000e22.docs.buildwithfern.com/docs/opik No broken links found |
|
🌿 Preview your docs: https://opik-preview-a17f443f-0127-4a04-8df5-5f903b60ef95.docs.buildwithfern.com/docs/opik The following broken links where found: Page: Page: |
|
🌿 Preview your docs: https://opik-preview-25135266-e318-46e2-af7d-2256e68043d6.docs.buildwithfern.com/docs/opik No broken links found |
|
It would be useful to also save the following information from the dataset:
|
SDK Unit Tests Results0 tests 0 ✅ 0s ⏱️ Results for commit e3419e0. ♻️ This comment has been updated with latest results. |
|
🌿 Preview your docs: https://opik-preview-f794d068-78b5-49b2-a52b-a1d643c50ff8.docs.buildwithfern.com/docs/opik No broken links found 📌 Results for commit 85fc76e |
SDK E2E Tests Results0 tests 0 ✅ 0s ⏱️ Results for commit e3419e0. ♻️ This comment has been updated with latest results. |
Details
This PR adds a comprehensive Opik integration for Harbor, a benchmark evaluation framework for autonomous LLM agents. The integration enables real-time observability for agent benchmark evaluations (SWE-bench, LiveCodeBench, Terminal-Bench, etc.).
Key features:
Python SDK Integration (
opik.integrations.harbor):track_harbor(job)- Wraps Harbor Job instances with Opik trackingenable_tracking()- Global tracking enablement for Harbor Trial and Verifier classesCLI Integration (
opik harbor):run,jobs start,trials start, etc.)opik harbor run -d terminal-bench@head -a terminus_2 -m gpt-4.1Data Mapping:
Change checklist
Issues
Testing
Tested manually and created library integration tests
Documentation
Documentation was updated