- Core contract:
ProblemGenerator.generate() -> dict(inbase_generator.py) returnsproblem_id,operation, human-readableproblem,steps(list of delimited op-codes), andfinal_answer. The last step must start withZ|. - Generators: One class per skill in
generators/(e.g.,long_division_generator.py). Each is independent, seeded viarandomindolphin_math_datagen.py, and responsible for validating its own outputs before returning. - Data flow:
dolphin_math_datagen.pyseeds RNG, selects a generator, callsgenerate(), asserts required keys and finalZ|step, then writes JSONL viawrite_jsonl.--sampleprints one example per generator;-n/-o/-sbuilds datasets. - Step encoding: Steps are pipe-delimited strings built with
helpers.step()andDELIM="|". Opcodes capture atomic reasoning moves (divide, multiply, bring-down, etc.) and end withZholding the formatted answer string. - Extensibility: To add a skill, create a new generator implementing
ProblemGenerator, emit well-formed steps (includingZ|), add it toALL_GENERATORSindolphin_math_datagen.py, and mirror tests intests/.
- Long Division: Integers 2–99 divisors; includes bring-down (
B), divide (D), multiply (M), subtract (S), remainder (R), and finalZ. - Multi-digit Addition (integers): Column alignment (
INT_ALIGN), per-column sums with carry (ADD_COL), final carry (CARRY_FINAL), and finalZ. - Multi-digit Subtraction (integers): Column alignment (
INT_ALIGN), per-column differences with borrow (SUB_COL), explicit borrow steps (BORROW), finalZ. - Multi-digit Multiplication (integers): Multiplication setup (
MUL_SETUP), digit-by-digit partials (MUL_PARTIAL), summing partials (ADD_PARTIALS), finalZ. - *Mixed Number Operations (+, -, , /): Convert to improper (
MIX_IMPROPER), align denominators (L,C), invert for division (I), operate on numerators (A/S/M), simplify (F), convert back to mixed (IMPROPER_TO_MIX), finalZ. - Fraction Comparison: LCD (
L,C), compare converted fractions (CMP), finalZ. - Fraction/Decimal/Percent Conversions: Conversions across forms (
FRAC_TO_DEC,DEC_TO_FRAC,PERCENT_TO_DEC,DEC_TO_PERCENT), simplify where needed, finalZ. - Factors & Multiples: Factor listing via trial division (
FACT_CHECK,FACT_PAIR), prime factorization via repeated division (PF_STEP,PF_PRIME), GCF via Euclid (GCD_START,GCD_STEP,GCD_RESULT), LCM via product/gcd (LCM_FROM_GCD), finalZ. - Order of Operations: Precedence steps with arithmetic ops and rewrites (
REWRITE), finalZ. - Geometry (Perimeter/Area/Volume): Compute perimeters (
PERIM), areas (AREA), and volumes (VOLUMEfor rectangular prisms) using explicit arithmetic steps and rewrites where needed, finalZ. - Place Value & Rounding / Number Comparison: Digit inspection (
ROUND_CHECK), rounded result (ROUND_RESULT), alignment/comparison (ALIGN_NUM,CMP_NUM), finalZ. - Divisibility & Classification: Divisibility checks (
DIV_CHECK), prime/composite markers (PRIME,COMPOSITE_FACTOR), finalZ. - Unit Conversions: Factor-label conversion (
CONV_FACTOR,CONV_RESULT), explicit multiply, finalZ. - Basic Data/Statistics/Probability: Sort (
SORT), arithmetic sums/divides for mean (MEAN_DIV), median selection (MEDIAN_PICK/MEDIAN_PAIR), mode counting (MODE_COUNT/MODE), simple probability setup (PROB_SETUP) with division/simplify, finalZ. - Decimal Multiplication: Partial products (
MUL_PARTIAL), decimal placement (COUNT_DP,PLACE_DP), partial sums (ADD_PARTIALS). - Decimal Addition/Subtraction: Column alignment (
DEC_ALIGN), column operations (DEC_ADD_COL,DEC_SUB_COL), carries (DEC_CARRY_FINAL), finalZ. - Decimal Division: Decimal shifting (
DEC_SHIFT), setup (DIV_SETUP), quotient decimal placement (PLACE_DP_Q), reuse ofB/D/M/S. - *Fraction Operations (+, -, , /): LCD and conversions (
L,C), simplification (F), inversion (I), arithmetic (A,M,D,Sreused contextually), finalZ. - Linear Equations (Simple/Complex): Move terms (
MOVE_TERM), combine like terms (COMB_X,COMB_CONST), divide coefficients (DIV_COEFF), rewrite (REWRITE), finalZ. - Quadratic Equations: Discriminant (
DISC), root extraction (ROOT), quadratic formula branches (Q1,Q2), finalZ. - Simplify Algebraic Expressions: Distribution (
DIST), combining terms, rewrites, finalZ. - Evaluate Expressions: Substitution (
SUBST), arithmetic steps as needed, finalZ. - Proportional Relationships: Proportion setup (
PROP_SETUP), solving via algebraic steps, finalZ. - Pythagorean Hypotenuse: Exponents (
E), square root (ROOT), finalZ. - Percent Problems (find part/percent/whole): Percent-to-decimal (
PERCENT_TO_DEC), equation setup/rearrange (SETUP_PERCENT_EQ,REARRANGE_EQ), calculation (PERCENT_CALC_PART), convert back (DEC_TO_PERCENT), finalZ. - Abacus-Style Addition: Initial set (
AB_SET), informational notes (AB_INFO), column adds (AB_ADD_DGT), carry propagation (AB_CARRY,AB_CARRY_FINAL), finalZ. - Graph Interpretation (Bar/Line/Pictograph): Graph data recording (
GRAPH_DATA), value reading (GRAPH_READ), comparisons (CMP), min/max identification (GRAPH_MIN,GRAPH_MAX), change tracking (GRAPH_CHANGE,GRAPH_MAX_CHANGE), pictograph key (PICTO_KEY) and symbol counting (PICTO_COUNT), arithmetic steps reused (A,S,M), finalZ.