This repository contains a lightweight environment for training and evaluating creative-writing models using reinforcement learning. It includes components for generating stories, evaluating them using rubric-based scoring, and configuring training runs.
The environment is designed for experimenting with RLHF pipelines for story generation.
A model produces a story based on a prompt, and a judge evaluates the output according to structured rubrics.
The judge returns a numerical reward along with rubric-specific scores.
Runs the main writing-generation loop.
It is responsible for:
- Loading tasks from JSON files
- Sampling prompts and running generation episodes
- Handling batch size, sequence length, rollouts, and sampling settings
- Returning outputs suitable for RL training or evaluation workflows
Implements the rubric-based evaluation system.
The judge performs the following:
- Reads task metadata and rubric definitions
- Checks objective requirements (such as word count, POV, tense, genre, etc.)
- Applies subjective scoring using a judge model
- Produces a final numeric reward and a detailed rubric breakdown
This repository includes two task sets:
generated_tasks8_train.jsongenerated_tasks8_eval.json
Each task contains:
- A story prompt
- Required stylistic or structural attributes
- Rubric definitions used by the judge for scoring
These files are used during training and evaluation, respectively.
Defines configuration settings for:
- Model selection
- Checkpoint intervals
- Batch sizes and rollout parameters
- Sequence lengths and sampling settings
- Environment parameters (such as which task file to load)
- Optimizer and LoRA settings
This file serves as the central configuration point for training.
Run the writing generation process and judge process:
python writing_bench.py --config writing_configs.toml
python writing_judge.py --tasks generated_tasks8_eval.json