Dataset-level heterogeneity introduces significant domain biases that fundamentally degrade generalization on general Time Series Foundation Models (TSFMs), yet this challenge remains underexplored. This paper rethinks the from-scratch training of TSFMs using the paradigm of federated learning. We propose a novel Federated Dataset Learning (FeDaL) approach to tackle heterogeneous time series by learning dataset-agnostic temporal representations. Specifically, the distributed architecture of federated learning is a natural solution to decompose heterogeneous TS datasets into shared generalized knowledge and preserved personalized knowledge. Moreover, based on the TSFM architecture, FeDaL explicitly mitigates both local and global biases by adding two complementary mechanisms: Domain Bias Elimination (DBE) and Global Bias Elimination (GBE). FeDaL's cross-dataset generalization has been extensively evaluated in real-world datasets spanning eight tasks (including various regression and classification), against 54 baselines. We further analyze federated scaling behavior, showing how data volume, client count, and join rate affect model performance under decentralization.
- 🛠️ 2026-05 · We have refactored and polished the entire codebase, and released a set of tutorials.
- 📦 2026-02 · We have released the full FeDaL training pipeline.
- 🎉 2026-01 · Our paper FeDaL: Federated Dataset Learning for General Time Series Foundation Models has been accepted to ICLR 2026 — we sincerely thank the reviewers and the broader time-series community for their thoughtful feedback throughout the process.
git clone https://github.com/shengchaochen82/FeDaL.git
cd FeDaL
conda create -n fedal python=3.10 -y && conda activate fedal
pip install -r requirements.txt
pip install -e .Requirements · PyTorch 2.0+ · CUDA 11.8+ · Python 3.10+
FeDaL pretraining is evaluated on three public corpora. All loaders consume the same on-disk format, so adding a new corpus is just a matter of writing a converter.
| Layout |
|---|
|
Each .npz is a flat dictionary whose keys are series identifiers and whose values are arrays of shape [T, n_features] (float32). The framework filters series shorter than seq_len, channel-splits them, builds sliding windows, and StandardScaler-normalises each channel.
UTSD · ~1 B time points · 7 domains · huggingface.co/datasets/thuml/UTSD
# scripts/data/convert_utsd.py
import numpy as np
from datasets import load_dataset
ds = load_dataset("thuml/UTSD", split="train") # ~1 GB
series = {}
for i, row in enumerate(ds):
arr = np.asarray(row["target"], dtype=np.float32).reshape(-1, 1)
series[f"utsd_{i:07d}"] = arr
np.savez_compressed("datasets/crossdomain/UTSD.npz", **series)UTSD is used with domain-mixed (DM) partitioning. Two heterogeneity levels (UTSD_unbalance_H1, UTSD_unbalance_H2) are pre-registered in FedRepresentationDatasetConfig; control the Dirichlet imbalance via the imbalance_factor field.
CTSD · ~500 M time points · 6 domains · forecastingdata.org (Monash repository)
# scripts/data/convert_ctsd.py
import numpy as np
from sktime.datasets import load_from_tsfile_to_dataframe
CTSD_TSF_FILES = [ # cherry-picked from Monash to mirror the paper split
"covid_deaths.tsf", "tourism_monthly.tsf", "traffic_hourly.tsf",
"fred_md.tsf", "kdd_cup_2018.tsf", "weather_monash.tsf", ...
]
series = {}
for fname in CTSD_TSF_FILES:
df, _ = load_from_tsfile_to_dataframe(f"datasets/raw/monash/{fname}")
for i, row in df.iterrows():
arr = np.asarray(row["series_value"], dtype=np.float32).reshape(-1, 1)
series[f"{fname[:-4]}_{i:06d}"] = arr
np.savez_compressed("datasets/crossdomain/CTSD.npz", **series)CTSD is used with domain-independent (DI) partitioning — each client is tied to a single Monash domain, sequences within a client are non-overlapping but the same domain may appear across clients.
LOTSA · ~231 B time points · 174 sub-datasets · huggingface.co/datasets/Salesforce/lotsa_data
# scripts/data/convert_lotsa.py
import os, numpy as np
from datasets import load_dataset
OUT = "datasets/crossdomain/LOTSA"
os.makedirs(OUT, exist_ok=True)
ds = load_dataset("Salesforce/lotsa_data", split="train", streaming=True)
buckets = {}
for row in ds:
domain = row["source_dataset"]
arr = np.asarray(row["target"], dtype=np.float32)
if arr.ndim == 1:
arr = arr.reshape(-1, 1)
buckets.setdefault(domain, []).append(arr)
for domain, series_list in buckets.items():
payload = {f"{domain}_{i:06d}": s for i, s in enumerate(series_list)}
np.savez_compressed(os.path.join(OUT, f"{domain}.npz"), **payload)LOTSA naturally aligns with the federated setting: 174 sub-datasets → 174 clients. Datasets that overlap with downstream benchmarks (e.g. Traffic, ECL) are excluded at conversion time.
Adding your own corpus · emit an .npz of {series_key: float32[T, C]} and register it in data_provider/fed_representation_data.py::FedRepresentationDatasetConfig.SINGLE_DATASET_CONFIGS. No other code changes are needed.
# Single-GPU pretraining
python main.py --algorithm FeDaL --config configs/EXP_BASIC.yaml
# Two-GPU DataParallel
python main.py --algorithm FeDaL --config configs/EXP_BASIC.yaml \
--multi_gpus --device_id 0,1
torchrun --standalone --nproc_per_node=16 main.py \
--algorithm FeDaL --config configs/EXP_BASIC.yamlImportant
FeDaL is highly sensitive to its hyperparameters and dataset configuration. Please make sure your data preparation and YAML config match the paper exactly before launching a run. Minor numerical drift across runs is also expected — it comes from random patch masking, client sampling, and the inherent stochasticity of federated optimisation, not from a bug.
Note
Development Status: We are continuously improving the codebase. Some interfaces may change as we enhance the framework.
If you find FeDaL useful for your research, please cite:
@inproceedings{
chen2026fedal,
title={FeDaL: Federated Dataset Learning for General Time Series Foundation Models},
author={Shengchao Chen and Guodong Long and Michael Blumenstein and Jing Jiang},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=HK6t5x5gJq}
}This project builds on the shoulders of several excellent open-source efforts:
— a comprehensive PyTorch toolbox covering forecasting, imputation, classification, and anomaly detection on standard time-series benchmarks.
— a channel-independent Transformer for long-term forecasting built on patch tokenisation of univariate series.
— an open family of time-series foundation models pretrained for general analytical tasks across domains.
— a universal forecasting Transformer accompanied by the large-scale LOTSA pretraining corpus.
— a language-modelling-style framework that tokenises numerical time series for probabilistic zero-shot forecasting.
— a billion-scale, sparse Mixture-of-Experts foundation model designed for time-series forecasting at scale.
— an open research and production platform for federated learning across edge, cloud, and cross-silo settings.
This repository is released under the MIT License.

