Relocating TSFM-Specifc CSV Files to CouchDB#406
Open
jellyfishing2346 wants to merge 2 commits into
Open
Conversation
…to CouchDB Issue IBM#297: Remove tsfm specific csv from scenarios, relocate the files to couchdb Changes: - Add tsfm collection configuration to collections.json with CSV format and Timestamp primary key - Add tsfm CSV files to src/couchdb/scenarios_data/shared/tsfm/: - chiller9_annotated_small_test.csv - chiller9_finetuning_small.csv - chiller9_tsad.csv - Update all scenario manifests (default, scenario_1, scenario_2) to include tsfm data sources - Update .allowed_datafiles to include the new tsfm CSV files - Update documentation to reference CouchDB collection 'tsfm' instead of local CSV paths: - case_study_industrial_asset_management.md: Update forecasting and anomaly detection examples - ground_truth_design_guideline.md: Update ground truth patterns to use CouchDB paths - data.md: add tsfm directory to documentation This enables TSFM utterances to load data from CouchDB instead of local files. Signed-off-by: Faizan Khan <faizanakhan2003@gmail.com>
Resolved merge conflicts in: - src/couchdb/.allowed_datafiles: Combined local TSFM CSV files with upstream asset_profile_sample.json - src/couchdb/scenarios_data/default/manifest.json: Merged local tsfm data sources with upstream asset field - src/couchdb/collections.json: Added upstream asset collection while keeping local tsfm collection Generated with [Devin](https://devin.ai) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR addresses issue #297 by relocating TSFM-specific CSV files from local storage to CouchDB.Previously, TSFM-related utterances used local CSV files, which has been updated to load data
from the CouchDB tsfm collection instead. This change standardizes data access patterns across
the benchmark suite and ensures all data is managed through the CouchDB infrastructure.
Fix Details
CouchDB Infrastructure:
• Added tsfm collection configuration to collections.json with CSV format and Timestamp as
primary key
• Configured proper ID prefix (tsfm) for document identification
• Set up collection with no additional indexes (optimized for time-series data access)
Data Files:
• Added 3 TSFM CSV files to src/couchdb/scenarios_data/shared/tsfm/:
• chiller9_annotated_small_test.csv (192 records for forecasting inference)
• chiller9_finetuning_small.csv (192 records for model fine-tuning)
• chiller9_tsad.csv (192 records for time-series anomaly detection)
• All files contain timestamped sensor data for Chiller 9 Condenser Water Flow
• Data spans from 2020-04-27 to 2020-04-28 with 15-minute intervals
Configuration Updates:
• Updated all scenario manifests (default, scenario_1, scenario_2) to include tsfm data sources
• Updated .allowed_datafiles to include the new tsfm CSV files for security checks
• Ensured proper path resolution through the shared data directory structure
Documentation Updates:
• case_study_industrial_asset_management.md: Updated forecasting and anomaly detection examples
to reference CouchDB collection 'tsfm'
• ground_truth_design_guideline.md: Updated ground truth patterns to use CouchDB paths and
corrected agent actions (changed from jsonreader to csvreader, updated file paths)
• data.md: Added tsfm directory to the shared data directory structure documentation
Impact on Benchmarking
results).
This change is a data infrastructure improvement that does not affect benchmark scoring or
evaluation metrics. It standardizes how TSFM-related utterances access data, making the system
more maintainable and consistent with other data sources.
Related Issues
• Fixes: #297
Verification Steps
• CSV Parsing Validation: Tested parsing of all 3 CSV files using the loader's CSV parser
with tsfm collection config. Successfully parsed 192 documents from each file with proper
field mapping.
• Manifest Loading: Tested collection loading from the default manifest. Successfully loaded
576 total documents (192 from each of the 3 CSV files) through the manifest-based loader.
• Document Normalization: Verified document normalization properly adds dataset: 'tsfm' field
and generates correct _id fields using the Timestamp primary key (e.g., tsfm:2020-04-27T00:0
0:00).
• Path Resolution: Confirmed that relative paths in manifests (shared/tsfm/*.csv) properly
resolve against the scenario directory and shared data structure.
• Schema Validation: Verified that CSV files have correct structure with Timestamp and Chille
r_9_Condenser_Water_Flow columns matching the expected schema.
Checklist