VLM4TS is a two-stage, zero-shot anomaly detection pipeline for time series:
- ViT4TS (screening): converts a time series into short sliding-window plots and scores anomalies via pre-trained vision encoders.
- VLM4TS (verification): prompts a vision-language model with the long time horizons and ViT4TS proposals to refine anomalies.
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtexport OPENAI_API_KEY="your_api_key_here"VLM4TS uses the OpenAI client and reads OPENAI_API_KEY from the environment (see src/models/vlm4ts.py).
Input format (Orion-compatible):
- A CSV or
pandas.DataFramewith two columns:timestampandvalue. timestampcan be numeric or datetime; rows are sorted bytimestampinternally.
Output format:
- A
pandas.DataFramewith columnsstart,end,severity, wherestartandendare anomaly interval timestamps.
import pandas as pd
from src.models.vit4ts import ViT4TS
from src.models.vlm4ts import VLM4TS
df = pd.read_csv("your_series.csv") # columns: timestamp, value
df = df[["timestamp", "value"]]
# Stage 1: ViT4TS screening
vit4ts = ViT4TS()
vit_intervals = vit4ts.detect(df)
print(vit_intervals.head()) # columns: start, end, severity
# Stage 2: VLM4TS verification (requires OPENAI_API_KEY)
vlm4ts = VLM4TS()
vlm_intervals = vlm4ts.detect(df)
print(vlm_intervals.head())Notes:
ViT4TS.predict_scores(df)returns(aligned_scores, timestamps)if you want raw anomaly scores.- By default,
ViT4TSusesalpha=0.01andwindow_size=224, withalphabeing the uppper quantile for thresholding andwindow_sizebeing the rolling window size for vision screening. One should tune these hyperparameters for the best performance.
bash run_experiments.shThis script downloads benchmark datasets, runs VLM4TS across multiple alphas, and aggregates results.
├── data
│ └── raw # Raw benchmark data
├── results # Generated detection results and metrics
├── src
│ ├── evaluation # Scoring, visualization, and utilities
│ ├── models # ViT4TS and VLM4TS code
│ └── preprocessing # Data conversion and image rendering
├── run_experiments.sh # Full benchmark sweep
└── README.md # You are here
If this work is beneficial, please kindly cite:
@article{he2025harnessing,
title={Harnessing Vision-Language Models for Time Series Anomaly Detection},
author={He, Zelin and Alnegheimish, Sarah and Reimherr, Matthew},
journal={arXiv preprint arXiv:2506.06836},
year={2025}
}If you find this work useful, please also consider starring the repo.
Follow-up updates are coming soon.
This project is released under the MIT License.

