Skip to content

Relocating TSFM-Specifc CSV Files to CouchDB#406

Open
jellyfishing2346 wants to merge 2 commits into
IBM:mainfrom
jellyfishing2346:main
Open

Relocating TSFM-Specifc CSV Files to CouchDB#406
jellyfishing2346 wants to merge 2 commits into
IBM:mainfrom
jellyfishing2346:main

Conversation

@jellyfishing2346

@jellyfishing2346 jellyfishing2346 commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Description

This PR addresses issue #297 by relocating TSFM-specific CSV files from local storage to CouchDB.Previously, TSFM-related utterances used local CSV files, which has been updated to load data
from the CouchDB tsfm collection instead. This change standardizes data access patterns across
the benchmark suite and ensures all data is managed through the CouchDB infrastructure.

Fix Details

CouchDB Infrastructure:

• Added tsfm collection configuration to collections.json with CSV format and Timestamp as
primary key
• Configured proper ID prefix (tsfm) for document identification
• Set up collection with no additional indexes (optimized for time-series data access)

Data Files:

• Added 3 TSFM CSV files to src/couchdb/scenarios_data/shared/tsfm/:
• chiller9_annotated_small_test.csv (192 records for forecasting inference)
• chiller9_finetuning_small.csv (192 records for model fine-tuning)
• chiller9_tsad.csv (192 records for time-series anomaly detection)
• All files contain timestamped sensor data for Chiller 9 Condenser Water Flow
• Data spans from 2020-04-27 to 2020-04-28 with 15-minute intervals

Configuration Updates:

• Updated all scenario manifests (default, scenario_1, scenario_2) to include tsfm data sources
• Updated .allowed_datafiles to include the new tsfm CSV files for security checks
• Ensured proper path resolution through the shared data directory structure

Documentation Updates:

• case_study_industrial_asset_management.md: Updated forecasting and anomaly detection examples
to reference CouchDB collection 'tsfm'
• ground_truth_design_guideline.md: Updated ground truth patterns to use CouchDB paths and
corrected agent actions (changed from jsonreader to csvreader, updated file paths)
• data.md: Added tsfm directory to the shared data directory structure documentation

Impact on Benchmarking

  • No change to baselines: This fix only improves stability/performance.
  • Baseline change: This fix corrects a scoring error. (Please provide "Before vs. After"
    results).

This change is a data infrastructure improvement that does not affect benchmark scoring or
evaluation metrics. It standardizes how TSFM-related utterances access data, making the system
more maintainable and consistent with other data sources.

Related Issues

Fixes: #297

Verification Steps

  1. Run the following command: uv run pytest tests/integration
  2. Describe any manual verification performed:
    • CSV Parsing Validation: Tested parsing of all 3 CSV files using the loader's CSV parser
    with tsfm collection config. Successfully parsed 192 documents from each file with proper
    field mapping.
    • Manifest Loading: Tested collection loading from the default manifest. Successfully loaded
    576 total documents (192 from each of the 3 CSV files) through the manifest-based loader.
    • Document Normalization: Verified document normalization properly adds dataset: 'tsfm' field
    and generates correct _id fields using the Timestamp primary key (e.g., tsfm:2020-04-27T00:0
    0:00).
    • Path Resolution: Confirmed that relative paths in manifests (shared/tsfm/*.csv) properly
    resolve against the scenario directory and shared data structure.
    • Schema Validation: Verified that CSV files have correct structure with Timestamp and Chille
    r_9_Condenser_Water_Flow columns matching the expected schema.

Checklist

  • I have added tests that prove my fix is effective.
  • My code follows the project's Ruff formatting and linting rules.
  • I have signed off my commits (DCO).

…to CouchDB

Issue IBM#297: Remove tsfm specific csv from scenarios, relocate the files to couchdb

Changes:
- Add tsfm collection configuration to collections.json with CSV format and Timestamp primary key
- Add tsfm CSV files to src/couchdb/scenarios_data/shared/tsfm/:
  - chiller9_annotated_small_test.csv
  - chiller9_finetuning_small.csv
  - chiller9_tsad.csv
- Update all scenario manifests (default, scenario_1, scenario_2) to include tsfm data sources
- Update .allowed_datafiles to include the new tsfm CSV files
- Update documentation to reference CouchDB collection 'tsfm' instead of local CSV paths:
  - case_study_industrial_asset_management.md: Update forecasting and anomaly detection examples
  - ground_truth_design_guideline.md: Update ground truth patterns to use CouchDB paths
  - data.md: add tsfm directory to documentation

This enables TSFM utterances to load data from CouchDB instead of local files.

Signed-off-by: Faizan Khan <faizanakhan2003@gmail.com>
@jellyfishing2346 jellyfishing2346 changed the title Main Relocating TSFM-Specifc CSV Files to CouchDB Jun 23, 2026
@DhavalRepo18 DhavalRepo18 self-requested a review June 23, 2026 22:33
Resolved merge conflicts in:
- src/couchdb/.allowed_datafiles: Combined local TSFM CSV files with upstream asset_profile_sample.json
- src/couchdb/scenarios_data/default/manifest.json: Merged local tsfm data sources with upstream asset field
- src/couchdb/collections.json: Added upstream asset collection while keeping local tsfm collection

Generated with [Devin](https://devin.ai)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

remove tsfm specific csv from scenarios, reloate the files to couchdb

2 participants