This model classifies K-4 math learning material with skill concepts from the EduGraph ontology.
The classification is performed by a fine-tuned Qwen3-VL model, which is capable of
processing images to understand and categorize content. It is trained to label content along
the three competence dimensions of EduGraph: Area, Scope and Ability.
For instructions on how to use the model, please visit the model's Huggingface Repository.
The training itself is performed in a Docker container for full reproducibility and easy setup. However, it is still recommended to also set up the Python environment locally for proper IDE support and for running some optional Python scripts locally.
We use uv for fast dependency and virtual environment management. Please see the official
uv documentation for installation instructions.
Once uv is installed, sync the project dependencies:
uv syncThe fine-tuning process is designed to be run within a Docker container, ensuring a consistent and reproducible environment locally and across different cloud providers. The training data is automatically loaded from Huggingface by the training scripts.
Please install Docker for your operating system if you haven't already:
The training process on Google Cloud Platform is managed by three main scripts that you can also use as a guideline for training in other environments.
Setup .env file:
Before running the scripts, create a .env file from .env.example. Fill in the required values for
your GCP project, such as PROJECT_ID, VM_ZONE, GCS_BUCKET_NAME, etc.
Step 1: Build and Push Docker Image
This script builds the Docker image containing the training environment and all necessary code, and then pushes it to the Google Artifact Registry.
bash gcp/build_and_push.shStep 2: Create VM and Start Training
This script creates a new spot VM instance on GCP with a single GPU, and starts the training process using the Docker image from Step 1. The VM's startup script will automatically pull the image and run the training. The training data will be loaded automatically from Huggingface.
bash gcp/run_on_vm.shStep 3: Download Results
After the training is complete, the resulting fine-tuned adapter is saved to a GCS bucket.
This script downloads the adapter from the bucket into the out/ directory.
bash gcp/download_results.shFor local development, you will need an NVIDIA GPU and the NVIDIA Container Toolkit to allow Docker to access the GPU.
Prerequisites:
Create a .env file from .env.example. For a local run, you only need to set MODEL_SIZE
(e.g., 4b) and RUN_MODE (e.g., train or test).
Step 1: Build the Docker Image
Build the Docker image using the Dockerfile in the project root. This command uses the
MODEL_SIZE argument from your .env file.
export $(grep -v '^#' .env | xargs)
docker build --build-arg MODEL_SIZE=$MODEL_SIZE -t qwen-trainer .Step 2: Run the Training Container
This command overrides the default container command to run the setup_and_run.local.sh
script, which is simplified for local training. The training data will be loaded
automatically from Huggingface.
- The
--env-file .envflag passes your local configuration into the container. - The
-v $(pwd)/out:/app/outcommand mounts your localout/directory to save the trained model adapters to your machine.
docker run --gpus all --rm \
--env-file .env \
-v $(pwd)/out:/app/out \
qwen-trainer \
bash setup_and_run.local.shTo test the trained model for inference on a new image, run the scripts/classify_image.py script
and provide the path to the image file as an argument.
Note: Make sure that the training artifacts have been downloaded/synced successfully and
that your environment variables (MODEL_SIZE, RUN_MODE) are set to same values as during fine-tuning.
For MODEL_SIZE=4B and RUN_MODE=train the expected model location is: out/models/qwen-3vl-4b/train/model
Also Note: Inference at this stage will be slow because we haven't quantized the model yet and inference is executed on the CPU to simplify local setup. Even on high-end machines, it can take 1-2 minutes to get results. Only use it for quick validation. Inference will be much faster in a production environment.
Example Usage:
uv run scripts/classify_image.py path/to/your/image.pngExample Output:
{
"Area": [
"Counting",
"SetTheory"
],
"Scope": [
"NumbersSmaller10",
"CountingObjects"
],
"Ability": [
"ConceptualUnderstanding",
"ProcedureExecution"
]
}The script will load the fine-tuned and merged model, process the image along with the predefined system prompt, and print the predicted classification using terms from the EduGraph ontology.
After a model has been trained and the output artifacts have been downloaded & tested, you can prepare the final model artifacts for publication. This involves creating a quantized and standardized version of the model in the GGUF format that makes is easy to host the model anywhere.
Configure Environment: The script uses the MODEL_SIZE and RUN_MODE variables from your .env file
to automatically identify the correct model files to publish. Ensure these are set correctly.
Publish the model on Huggingface with defaults:
uv run scripts/publish_model.py --publishAfter the script runs successfully, you will find:
- A
publishdirectory located atout/models/qwen-3vl-{MODEL_SIZE}/publish/. - This directory contains the merged model files ready for quantization or further tuning.
- The content of that directory can be pushed to Huggingface when using the
--publishoption
Use the GGUF-my-repo Huggingface space to convert to GGUF format and quantize model weights to smaller sizes.
This project utilizes two main types of data for training the Qwen3-VL classifier. These datasets are generated from their raw sources and then uploaded to Huggingface, from where they are loaded during the fine-tuning process.
This dataset is generated from the EduGraph ontology, which defines the domain-specific concepts and their relationships.
- Generation Script:
scripts/generate_dataset_ki.py - Hugging Face Dataset:
christian-bick/edugraph-knowledge - Raw Source: The content for the training worksheets is sourced from the releases of the edugraph-ontology GitHub repository.
- Content: The script parses the RDF ontology to create a comprehensive Q&A dataset. This includes pairs asking for definitions of concepts, parent-child relationships, and children of specific concepts within the ontology. This helps infuse the model with structured domain knowledge.
This dataset consists of images (worksheets) and associated metadata that provide labels based on the EduGraph ontology.
- Generation Script:
scripts/generate_dataset_multimodal.py - Hugging Face Dataset:
christian-bick/edugraph-worksheets - Raw Source: The content for the training worksheets is sourced from the imagine-content GitHub repository.
- Content: The script scans the downloaded content for
meta.jsonfiles. Eachmeta.jsonfile describes a worksheet image (PNG) and provides its corresponding labels across various EduGraph ontology dimensions (Area, Scope, Ability). This dataset is used for multimodal fine-tuning, enabling the model to directly classify images.
Prerequisites:
Hugging Face Login: Authenticate with Hugging Face using huggingface-cli login.
Steps to Generate and Upload Datasets:
- Run the above scripts from the project root.
- Use the
--versionoption to determine the source version - Use the
--no-cacheoption to force downloading from the source - Use the
--publishoption for publishing on Huggingface
Examples
Generate the knowledge infusion dataset for ontology release 0.4.0 forcing redownload:
uv run scripts/generate_dataset_ki.py --version 0.4.0 --no-cacheGenerate & publish the multimodal dataset for content release 1.0.0:
uv run scripts/generate_dataset_multimodal.py --version 1.0.0 --publishContributions are welcome!
Ideally, always open a Github issue first to make sure your contribution aligns with the project's scope.
Please also make sure to add tests with your contribution and to only submit PRs with green tests.
Please be aware that you need to sign a contributor agreement that allows us to relicense your contribution under other terms in addition to the AGPL license. We aim for a balanced approach between open source availability and project viability. Being able to redistribute contributions under other licenses helps us accomplish that goal.
This project is licensed under the GNU Affero General Public License. See the LICENSE file for details.
If these license terms are not working for you, then get in touch, and we can discuss your options.