Skip to content

Lightweight eval for wizard integrations #32

@edwinyjlim

Description

@edwinyjlim

We need to build a lightweight, automated loop for sanity checking and evaluating Wizard integrations anytime there are changes to the underlying model, prompts, or resources.

model + prompt + contextwizard outputcheck (by human or model)saved run

  • Create a stable of boilerplate example apps for the Wizard to integrate with
  • Run nightly or on CI/CD
  • Save snapshots of the Wizard's output and diffs
  • Save a summary of the diffs
  • Save a qualitative evaluation of the diffs based on benchmarks or humans / LLMs as a judge
  • Reporting and alerting

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions