Skip to content

Comments

feat: Add multi-cloud FOCUS test data generator for FinOps Hub#2006

Open
FallenHoot wants to merge 1 commit intomicrosoft:devfrom
FallenHoot:feature/multi-cloud-test-data-generator
Open

feat: Add multi-cloud FOCUS test data generator for FinOps Hub#2006
FallenHoot wants to merge 1 commit intomicrosoft:devfrom
FallenHoot:feature/multi-cloud-test-data-generator

Conversation

@FallenHoot
Copy link

Add multi-cloud FOCUS test data generator for FinOps Hub

Description

Adds Generate-MultiCloudTestData.ps1 — a PowerShell script that generates synthetic, multi-cloud, FOCUS-compliant cost data for testing and validating FinOps Hub deployments end-to-end.

Closes #2005

What's Included

  • Generate-MultiCloudTestData.ps1 (~1,430 lines) — Self-contained script that generates FOCUS 1.0–1.3 synthetic cost data for Azure, AWS, GCP, and DataCenter providers

Why This Script Is Needed

Testing a FinOps Hub deployment today requires real Cost Management export data. This script fills that gap by generating realistic synthetic data that:

  1. Covers all 4 supported cloud providers with provider-specific conventions (Azure resource IDs, AWS ARNs, GCP resource paths)
  2. Populates every column referenced by FinOps Hub dashboard KQL queries
  3. Simulates real-world patterns: commitment discounts (Reservations + Savings Plans), Azure Hybrid Benefit, spot/dynamic pricing, marketplace purchases, negotiated discounts, and tag coverage variation
  4. Generates data with proper Cost Management manifest.json files for ingestion pipeline compatibility
  5. Optionally uploads to Azure Storage and manages ADF triggers

Key Features

Feature Details
FOCUS compliance All mandatory + conditional FOCUS columns (v1.0–1.3)
Persistent identities Resources, billing accounts, subscriptions consistent across days
Budget scaling Costs scaled to target budget via Python/pandas
Memory-safe Streams rows daily to CSV, avoids OOM on 500K+ row datasets
Output formats Parquet (pyarrow), CSV, or both
Upload support Uploads to msexports + ingestion containers with proper blob paths

Testing

Tested with:

  • Default settings (500K rows, 6 months, all providers, $500K budget)
  • Single provider mode (Azure-only, 200K rows)
  • Full pipeline (generate → upload → ADF trigger → ADX ingestion → dashboard validation)
  • FOCUS versions 1.0, 1.2, and 1.3

Prerequisites

  • PowerShell 7+
  • Python 3 with pandas and pyarrow (for Parquet conversion)
  • Azure CLI (for upload functionality)

Checklist

  • Script follows FOCUS specification conventions
  • Microsoft copyright header included
  • Comment-based help with SYNOPSIS, DESCRIPTION, PARAMETERS, EXAMPLES
  • No hardcoded paths or environment-specific references
  • Tested with 498K+ rows successfully ingested into FinOps Hub

Copy link
Collaborator

@RolandKrummenacher RolandKrummenacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Generate-MultiCloudTestData.ps1

Great concept — this fills a real gap for FinOps Hub end-to-end testing. The FOCUS column coverage and multi-cloud provider modeling are thorough. However, there are several issues to address before merging:

Critical

  • Get-Random overflow with 12-digit AWS account IDs (lines 335, 361) — will throw at runtime
  • Python dependency should be eliminated — budget scaling and Parquet output can be done in pure PowerShell, removing ~80 lines of fragile cross-language code with path-injection risk and dead code

Required by repo conventions

  • Missing changelog entry (v14 section in docs-mslearn/toolkit/changelog.md)
  • Missing README.md in the test directory
  • Missing #Requires statement and .LINK in help
  • No -WhatIf/-Confirm support for destructive operations (file creation, uploads, trigger starts)

Recommended

  • Add Pester tests for helper functions
  • Prefer Azure AD auth over storage account keys
  • Add -Seed parameter for reproducible test data
  • FOCUS version parameter is metadata-only — either vary the schema or simplify

Minor

  • Inconsistent cost rounding (10 vs 2 decimal places)
  • ADF trigger names hardcoded — should be parameterized or documented

@microsoft-github-policy-service microsoft-github-policy-service bot added Needs: Attention 👋 Issue or PR needs to be reviewed by the author or it will be closed due to no activity and removed Needs: Review 👀 PR that is ready to be reviewed labels Feb 16, 2026
@RolandKrummenacher
Copy link
Collaborator

FOCUS Specification Compliance Analysis

I did a detailed comparison of the script's output against the official FOCUS specification at focus.finops.org for all four claimed versions (1.0, 1.1, 1.2, 1.3). Here's what I found:


Critical Issue: $FocusVersion parameter is cosmetic only

The script accepts $FocusVersion (ValidateSet "1.0", "1.1", "1.2", "1.3") but never uses it to vary the output schema. The same columns are emitted regardless of version. The value is only written to x_FocusVersion. This means the output cannot be properly compliant with any single FOCUS version — it's a superset/subset mix.


Per-Version Column Compliance (Cost and Usage Dataset)

FOCUS v1.0 (43 columns)

  • ✅ All 43 v1.0 columns are present
  • 8 extra columns that don't exist in v1.0: BillingAccountType (v1.2), SubAccountType (v1.2), InvoiceId (v1.2), CommitmentDiscountQuantity (v1.1), CommitmentDiscountUnit (v1.1), ServiceSubcategory (v1.1), HostProviderName (v1.3), ServiceProviderName (v1.3)

FOCUS v1.1 (50 columns)

  • 4 columns missing: CapacityReservationId, CapacityReservationStatus, SkuMeter, SkuPriceDetails
  • 5 extra columns not in v1.1: BillingAccountType, SubAccountType, InvoiceId, HostProviderName, ServiceProviderName

FOCUS v1.2 (57 columns)

  • 8 columns missing: CapacityReservationId, CapacityReservationStatus, SkuMeter, SkuPriceDetails, PricingCurrency, PricingCurrencyContractedUnitPrice, PricingCurrencyEffectiveCost, PricingCurrencyListUnitPrice
  • 2 extra columns not in v1.2: HostProviderName, ServiceProviderName

FOCUS v1.3 (64+ columns)

  • 13 columns missing: AllocatedMethodDetails, AllocatedResourceId, AllocatedResourceName, AllocatedResourceType, CapacityReservationId, CapacityReservationStatus, ContractApplied, PricingCurrency, PricingCurrencyContractedUnitPrice, PricingCurrencyEffectiveCost, PricingCurrencyListUnitPrice, SkuMeter, SkuPriceDetails

Column Naming Issue

The script uses ServiceProviderName as the mandatory provider column for all versions, but the correct Column ID is ProviderName for v1.0–v1.2. ServiceProviderName only replaces ProviderName (deprecated) in v1.3. The script does output ProviderName too, but categorizes it under "FinOps Hub / Dashboard required columns" — it should be the primary mandatory column for v1.0–v1.2.


Missing v1.3 Structural Features

1. Contract Commitment Dataset (entirely absent)

FOCUS v1.3 introduced a second dataset with 13 mandatory columns (ContractId, ContractCommitmentId, ContractCommitmentCategory, ContractCommitmentCost, ContractCommitmentDescription, ContractCommitmentPeriodEnd/Start, ContractCommitmentQuantity, ContractCommitmentType, ContractCommitmentUnit, ContractPeriodEnd/Start, BillingCurrency). The script only generates Cost and Usage data — no Contract Commitment dataset is produced.

2. Data Generator-Calculated Split Cost Allocation (absent)

The 4 Allocated* columns (AllocatedMethodDetails, AllocatedResourceId, AllocatedResourceName, AllocatedResourceType) support shared cost splitting (e.g., K8s clusters, shared storage). Not implemented.

3. ContractApplied column (absent)

The JSON column that bridges Cost and Usage rows to the Contract Commitment dataset is not generated.


Summary Table

Version Columns Present Columns Missing Extra Columns Compliant?
v1.0 43/43 0 8
v1.1 46/50 4 5
v1.2 49/57 8 2
v1.3 51/64+ 13+ 0

Recommendation

Either:

  1. Target a single version (e.g., v1.0 or v1.3) and get that version fully correct, or
  2. Use $FocusVersion to dynamically select columns — only output columns valid for the chosen version and include all required columns for that version.

The closest match today is v1.3, but it's still missing conditional/recommended columns and the entire Contract Commitment dataset. For test data generation purposes, it may be acceptable to scope this to Cost and Usage only with a documented caveat, but the column set should still match the selected version.

@RolandKrummenacher
Copy link
Collaborator

Additional FOCUS Spec Compliance Findings

A few more items found during deeper analysis:


1. ServiceSubcategory — Invalid Values Against Spec's Closed Enumeration

The FOCUS spec (v1.1+) defines a closed list of allowed ServiceSubcategory values, each with a mandatory parent ServiceCategory. ~12 out of ~30 service entries use values that are not in the spec's allowed list:

Line Service Category Subcategory in Script Issue
169 Storage Accounts Storage General Purpose v2 Should be Object Storage or Block Storage
170 Azure Cosmos DB Databases NoSQL Databases Should be NoSQL
171 Azure Data Explorer Analytics Data Analytics Not in spec — closest: Log Analytics or Other (Analytics)
172 Azure App Service Compute App Services Not in spec — closest: Containers or Other (Compute)
173 Azure Functions Compute Serverless Compute Should be Functions
174 Azure Key Vault Security Key Management Not in spec — closest: Other (Security)
175 Bandwidth Networking Data Transfer Not in spec — closest: Content Delivery or Other (Networking)
176 Marketplace - 3rd Party Compute Marketplace Not a valid subcategory
220 Amazon DynamoDB Databases NoSQL Databases Should be NoSQL
247 Cloud Spanner Databases Distributed Databases Not in spec — closest: Other (Databases)
248 Cloud Run Compute Serverless Containers Should be Containers
267 Physical Servers Compute Bare Metal Not in spec — closest: Other (Compute)

Values that are correct include: Virtual Machines, Containers, Relational Databases, Object Storage, Block Storage, Content Delivery, Network Infrastructure, Data Warehouses.


2. Cost Column Invariants — Math Broken by Anomaly Rows

The FOCUS spec requires: ListCost = ListUnitPrice × PricingQuantity (and similarly for ContractedCost) when unit price and quantity are non-null and ChargeClass ≠ "Correction".

Unit prices are derived on line 741-742:

$listUnitPrice = [math]::Round($listCost / $pricingQuantity, 10)
$contractedUnitPrice = [math]::Round($contractedCost / $pricingQuantity, 10)

But the "data quality anomaly" block on lines 751-756 mutates costs AFTER unit prices were already calculated:

if ($qualityRoll -eq 0) {
    $effectiveCost = [math]::Round($contractedCost * 1.1, 10)   # breaks EffectiveCost invariant
} elseif ($qualityRoll -eq 1) {
    $contractedCost = [math]::Round($listCost * 1.05, 10)       # breaks ContractedCost = ContractedUnitPrice × PricingQuantity
}

~2% of rows will have cost/unit-price mismatches that violate the spec's mathematical constraints. If these are intentionally anomalous test data, they should be documented as such (e.g., via x_SourceChanges), and ChargeClass should be set to "Correction" to exempt them from the spec's invariant rules.


3. InvoiceId — Assigned to All Charge Categories

The script generates an InvoiceId for every row (lines 760-764), including Credit and Adjustment charges. In practice:

  • Some credits and adjustments are not tied to a specific invoice and should have InvoiceId = $null
  • ChargeClass = "Correction" rows reference a previously invoiced billing period and might carry the original invoice's ID, not a new one

This is a lower-severity finding (more about realism than strict spec violation), but worth considering for test data that claims multi-version FOCUS compliance.

FallenHoot pushed a commit to FallenHoot/finops-toolkit that referenced this pull request Feb 16, 2026
Comprehensive rewrite of Generate-MultiCloudTestData.ps1:

Critical fixes:
- Fix Get-Random [int] overflow with 12-digit AWS account IDs (New-AwsAccountId)
- Eliminate Python dependency entirely (inline budget scaling via scale factor)
- Remove dead code from Python/Parquet block

Required by repo conventions:
- Add #Requires -Version 7.0
- Add .LINK to comment-based help
- Add [CmdletBinding(SupportsShouldProcess)] with WhatIf/Confirm support
- Add changelog entry
- Add test directory README.md

FOCUS specification compliance:
- Fix ~12 ServiceSubcategory values to match FOCUS closed enumeration
- Fix cost invariants: unit prices calculated AFTER all cost modifications
- Anomaly rows now set ChargeClass=Correction (exempt from invariant rules)
- Credits/Adjustments get null InvoiceId (per FOCUS spec)
- Version-aware column sets: v1.1+ gets CommitmentDiscountQuantity/Unit,
  v1.2+ gets BillingAccountType/SubAccountType/InvoiceId,
  v1.3+ gets HostProviderName/ServiceProviderName
- Document scope as Cost and Usage dataset only

Recommended improvements:
- Add -Seed parameter for reproducible test data
- Add -UseStorageKey switch, default to Azure AD auth (--auth-mode login)
- Fix Get-RandomDecimal to use [long] instead of [int] for large ranges
@microsoft-github-policy-service microsoft-github-policy-service bot added Needs: Review 👀 PR that is ready to be reviewed and removed Needs: Attention 👋 Issue or PR needs to be reviewed by the author or it will be closed due to no activity labels Feb 16, 2026
Copy link
Collaborator

@RolandKrummenacher RolandKrummenacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing FOCUS Columns

The script is still missing several columns defined in the FOCUS specification across versions. These should either be implemented or explicitly documented as out-of-scope:

v1.1+ (4 columns)

  • CapacityReservationId — Identifier for capacity reservations
  • CapacityReservationStatus — Whether capacity reservation was used/unused
  • SkuMeter — Meter-level SKU details
  • SkuPriceDetails — JSON column with pricing metadata

v1.2+ (4 columns)

  • PricingCurrency — Currency used for pricing columns
  • PricingCurrencyContractedUnitPrice — Contracted unit price in pricing currency
  • PricingCurrencyEffectiveCost — Effective cost in pricing currency
  • PricingCurrencyListUnitPrice — List unit price in pricing currency

v1.3+ (5 columns)

  • AllocatedMethodDetails — Details about cost allocation method
  • AllocatedResourceId — Resource ID for split/allocated costs
  • AllocatedResourceName — Resource name for split/allocated costs
  • AllocatedResourceType — Resource type for split/allocated costs
  • ContractApplied — JSON column bridging Cost and Usage rows to Contract Commitment dataset

Total: 13 columns missing across versions. Without these, the output cannot be fully compliant with any FOCUS version from v1.1 onward. At minimum, please document which columns are intentionally excluded and why (e.g., Contract Commitment dataset is already noted as out of scope in the help text — the same treatment should apply to these).

@FallenHoot
Copy link
Author

@RolandKrummenacher — Thank you for the thorough review! Really appreciate the detailed feedback.

Your comments were spot-on and gave us the opportunity to go back and revisit logic that was missing during a live demo. We've addressed all the review feedback in this latest push:

What changed

PR review items — all addressed:

  • AllocatedResourceType — Added as the missing FOCUS v1.3 column
  • ContractApplied — Now populated with JSON contract references for committed-discount rows (v1.3+)
  • Split cost allocation — ~10% of AKS/EKS/GKE rows now populate Allocated* columns with namespace-level allocation simulation
  • ADF trigger names — Extracted to a reusable `` variable (was hardcoded in 2 places)
  • Column emission documented — FOCUS Column Coverage summary now explicitly lists which columns are emitted per version

Additional improvements (discovered while re-testing):

  • Expanded README with NukeTestData section, output formats, and additional datasets documentation
  • Added NukeTestData Quick Start examples
  • Removed the .duplicate backup file that was accidentally included

Pester tests

We'll look into adding Pester unit tests (for Get-RandomDecimal, New-AwsAccountId, Get-WeightedRandomService, etc.) in a follow-up PR to keep this one focused on the generator itself.

@FallenHoot
Copy link
Author

Fix pushed (61a6d0c): Resolve OutputPath to an absolute path before use. Export-Parquet is a .NET cmdlet that uses [IO.Directory]::GetCurrentDirectory() which can differ from PowerShell's C:\Users\zaolinsk\finops-toolkit — this caused 'Could not find a part of the path' errors when running from a different working directory. Fixed by calling System.Management.Automation.EngineIntrinsics.SessionState.Path.GetUnresolvedProviderPathFromPSPath() on OutputPath after parameter binding.

FallenHoot pushed a commit to FallenHoot/finops-toolkit that referenced this pull request Feb 17, 2026
Comprehensive rewrite of Generate-MultiCloudTestData.ps1:

Critical fixes:
- Fix Get-Random [int] overflow with 12-digit AWS account IDs (New-AwsAccountId)
- Eliminate Python dependency entirely (inline budget scaling via scale factor)
- Remove dead code from Python/Parquet block

Required by repo conventions:
- Add #Requires -Version 7.0
- Add .LINK to comment-based help
- Add [CmdletBinding(SupportsShouldProcess)] with WhatIf/Confirm support
- Add changelog entry
- Add test directory README.md

FOCUS specification compliance:
- Fix ~12 ServiceSubcategory values to match FOCUS closed enumeration
- Fix cost invariants: unit prices calculated AFTER all cost modifications
- Anomaly rows now set ChargeClass=Correction (exempt from invariant rules)
- Credits/Adjustments get null InvoiceId (per FOCUS spec)
- Version-aware column sets: v1.1+ gets CommitmentDiscountQuantity/Unit,
  v1.2+ gets BillingAccountType/SubAccountType/InvoiceId,
  v1.3+ gets HostProviderName/ServiceProviderName
- Document scope as Cost and Usage dataset only

Recommended improvements:
- Add -Seed parameter for reproducible test data
- Add -UseStorageKey switch, default to Azure AD auth (--auth-mode login)
- Fix Get-RandomDecimal to use [long] instead of [int] for large ranges
@FallenHoot FallenHoot force-pushed the feature/multi-cloud-test-data-generator branch from 61a6d0c to 4fde70e Compare February 17, 2026 08:01
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this to src/powershell/Public/New-FinOpsTestData.ps1 so it can be published in the PS module? I'm fine with another verb name, but we should use an approved verb.

Side question: Should this be a generic script for any purpose or do we want to make it hubs-specific? I'm fine either way, but we'd follow different conventions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done ✅ — Moved to src/powershell/Public/New-FinOpsTestData.ps1 as a proper function using the approved verb New. Old script removed. Auto-discovered by FinOpsToolkit.psm1.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — moved to src/powershell/Public/New-FinOpsTestData.ps1 (and the cleanup script to src/powershell/Public/Remove-FinOpsTestData.ps1). Both are auto-discovered by FinOpsToolkit.psm1.

Comment on lines 7 to 8
.SYNOPSIS
Generates multi-cloud FOCUS-compliant test data for FinOps Hub validation.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you look at the formatting conventions we have and apply them here as well. In this case, we indent the doc properties alongside the values.

Suggested change
.SYNOPSIS
Generates multi-cloud FOCUS-compliant test data for FinOps Hub validation.
.SYNOPSIS
Generates multi-cloud FOCUS-compliant test data for FinOps Hub validation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done ✅ — All doc tags (.SYNOPSIS, .DESCRIPTION, .PARAMETER, .EXAMPLE, .LINK, .NOTES) now use 4-space indentation matching repo conventions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — applied 4-space indentation for all .SYNOPSIS, .DESCRIPTION, .PARAMETER, .EXAMPLE, and .LINK entries to match the toolkit convention.

Comment on lines 18 to 21
- Prices (Azure EA/MCA price sheet → Prices_raw → Prices_final_v1_2)
- CommitmentDiscountUsage (Reservation details → CommitmentDiscountUsage_raw)
- Recommendations (Reservation recommendations → Recommendations_raw)
- Transactions (Reservation transactions → Transactions_raw)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are mentioning hubs-specific tables. Do you want to keep this specific to hubs? If so, I'd probably change the name to New-FinOpsHubTestData. But I also see value in breaking this out to support any number of scenarios:

  • New-FinOpsTestData
  • Set-FinOpsStorageBlobContent
  • New-FinOpsExportManifest
  • Add-FinOpsHubTestData

I see these as just breaking down what you have into smaller chunks. We don't need to do this now. I'm just thinking out loud about a growth path that would be reusable for more scenarios, if/when needed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — these are hubs-specific ADX table names. That's intentional since the test data generator is designed for hubs validation. As you noted, breaking this into smaller composable commands (e.g., New-FinOpsTestCostData, New-FinOpsTestPriceData) is a great growth-path idea. Deferring that refactor to a follow-up — doesn't need to block this PR.

.PARAMETER OutputPath
Directory to save generated files. Default: ./test-data

.PARAMETER CloudProvider
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to double check what's in FOCUS 1.3, but I believe the best term here is ServiceProvider to account for SaaS services that we could hypothetically support in the future.

Suggested change
.PARAMETER CloudProvider
.PARAMETER ServiceProvider

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done ✅ — Renamed CloudProvider to ServiceProvider globally (parameter, ValidateSet, all internal references). The x_CloudProvider data column is preserved since it's a data field.

.PARAMETER EndDate
End date for generated data. Default: Today

.PARAMETER TotalRowTarget
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: MaxRowCount or maybe just RowCount?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — renamed to -RowCount.

ServiceProviderName = "Microsoft"
InvoiceIssuerName = "Microsoft"
HostProviderName = "Microsoft"
BillingAccountType = "Billing Profile"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Billing Profile type doesn't match EA account agreement.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done ✅ — Fixed from 'Billing Profile' to 'Billing Account' for EA.

# any that are missing or empty.
# ============================================================================

function Invoke-EnsureUpdatePolicy
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed? Have you ever seen a case where this check failed? I'd love to get to a point where this code isn't necessary.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ADX update policies can get dropped after table clears/recreates during the nuke flow. This is a defensive check to re-create them if missing. Happy to remove if we confirm update policies are always preserved after table operations.

else
{
Write-Host " Starting $trigger..." -ForegroundColor Cyan
az datafactory trigger start --factory-name $AdfName --resource-group $ResourceGroupName --name $trigger --only-show-errors 2>$null
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use the Az CLI. You're in PowerShell. Stick with Az PowerShell. Applies to all commands.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done ✅ — All 15 Az CLI calls replaced with Az PowerShell equivalents: Get-AzAccessToken, New-AzStorageContext, Set-AzStorageBlobContent, Get-AzStorageBlob, Remove-AzStorageBlob, Get-AzStorageAccountKey, Get/Start/Stop-AzDataFactoryV2Trigger, Get-AzDataFactoryV2PipelineRun.

$blobPath = "$blobFolder/$dataFile"
$manifestBlobPath = "$blobFolder/manifest.json"

$manifest = @{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Creating a manifest is an awesome capability in and of itself. I'd love to see this as a separate New-FinOpsExportManifest command.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Evaluated extracting New-FinOpsExportManifest — the 4 manifest creation sites have fundamentally different schemas: (1) Azure msexports manifest with full Cost Management metadata (exportConfig, deliveryConfig, blobs, runInfo), (2) simple ingestion trigger manifests for non-Azure providers (3-4 fields), (3) per-dataset trigger manifests for Prices/CDU/Recommendations/Transactions, and (4) local per-provider manifests. Each uses context-specific variables from the upload loop (, , ``, etc.). Creating a unified function with a clean interface would require rethinking the upload architecture. Recommend a focused refactoring PR to design this properly. Commit: 0f2e032

Write-Host " 3. Start ADF triggers to process the data"
}

Write-Host ""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approaching 3K lines is a bit much. I'd love to see this broken out into multiple files.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Significant progress — New-FinOpsTestData.ps1 is now down to ~2170 lines (from ~2900), with Remove-FinOpsTestData.ps1 extracted as a standalone command (~340 lines). The remaining ~2170 lines are the core data generation logic. For New-FinOpsExportManifest, the 4 manifest creation sites use different schemas (full Cost Management manifest, simple ingestion triggers, local per-provider) and are tightly coupled to upload context variables. Extracting a clean reusable interface would benefit from a focused refactoring PR. Commit: 0f2e032

@microsoft-github-policy-service microsoft-github-policy-service bot added Needs: Attention 👋 Issue or PR needs to be reviewed by the author or it will be closed due to no activity and removed Needs: Review 👀 PR that is ready to be reviewed labels Feb 17, 2026
@flanakin flanakin added the Tool: PowerShell PowerShell scripts and automation label Feb 18, 2026
@microsoft-github-policy-service microsoft-github-policy-service bot added Needs: Review 👀 PR that is ready to be reviewed and removed Needs: Attention 👋 Issue or PR needs to be reviewed by the author or it will be closed due to no activity labels Feb 21, 2026
@FallenHoot
Copy link
Author

FallenHoot commented Feb 21, 2026

PR Summary – New-FinOpsTestData ;& Remove-FinOpsTestData

New-FinOpsTestData

  • Generates synthetic, multi-cloud FOCUS-compliant cost data (Azure, AWS, GCP, on-premises)
  • Compatible with both Azure Data Explorer and Microsoft Fabric Real-Time Intelligence ingestion paths
  • Supports FOCUS versions 1.0–1.3 with version-specific column sets
  • Generates all 5 FOCUS datasets: Costs, Prices, CommitmentDiscountUsage, Recommendations, Transactions
  • Includes commitment discounts, Azure Hybrid Benefit, tag variation, and inline budget scaling
  • Deterministic output via -Seed parameter; CSV and Parquet formats (via optional PSParquet module)
  • Optional upload to Azure Storage with ADF trigger management
  • All manifests include _ftkTestData watermark for safe cleanup identification

Remove-FinOpsTestData

  • Multi-layer safety features:
    • Local-only cleanup by default (no cloud deletion without explicit params)
    • Targeted storage deletion: scans manifests for _ftkTestData marker, deletes only test-data folders — production data is preserved
    • ADX cleanup requires -Force because .clear table removes ALL rows (no selective deletion possible)
    • Does not manage Microsoft Fabric data (clean up Fabric resources separately)
    • ShouldProcess support with ConfirmImpact=High
  • Verifies ADX update policies after clearing tables
  • Optional ADF trigger management to prevent re-ingestion during cleanup

Included artifacts

  • 14 unit tests for New-FinOpsTestData (all passing)
  • 9 unit tests for Remove-FinOpsTestData (all passing)
  • Zero PSScriptAnalyzer lint errors
  • MS Learn documentation pages for both commands
  • Changelog entries under v14
  • README for src/templates/finops-hub/test/
  • TOC and open-data-commands.md navigation updates
  • .gitignore patterns for test-data output

Review items addressed

All 18 original review items + 4 quality improvements resolved across iterations:

  1. ✅ Mandatory params (ContainerName, StorageAccount) for storage upload
  2. ✅ PSParquet availability check with clear Install-Module guidance
  3. ✅ ShouldProcess with ConfirmImpact=High on Remove command
  4. ✅ Watermark system in all 4 manifest types (_ftkTestData, _generator, _generatedAt)
  5. ✅ Targeted storage deletion (scan manifests → delete only marked folders)
  6. ✅ ADX .clear table documented as all-or-nothing with separate environment recommendation
  7. ✅ Fabric compatibility documented (works for ingestion, not managed by Remove)
  8. -Force required for ADX cleanup with clear warning
  9. ✅ Comprehensive Pester test coverage
  10. ✅ Single squashed commit for clean history

@FallenHoot FallenHoot force-pushed the feature/multi-cloud-test-data-generator branch from b65f39a to 228a768 Compare February 21, 2026 18:36
…mands

Add multi-cloud FOCUS test data generator and cleanup commands for FinOps Hub validation.

New-FinOpsTestData:
- Generates synthetic FOCUS-compliant cost data for Azure, AWS, GCP, and on-premises
- Compatible with both Azure Data Explorer and Microsoft Fabric RTI ingestion paths
- Supports FOCUS versions 1.0-1.3 with version-specific column sets
- Includes commitment discounts, Azure Hybrid Benefit, tag variation, budget scaling
- Deterministic output via -Seed parameter, CSV and Parquet formats
- Optional upload to Azure Storage with ADF trigger management
- Generates all 5 FOCUS datasets: Costs, Prices, CommitmentDiscountUsage, Recommendations, Transactions
- All manifests include _ftkTestData watermark for safe cleanup identification

Remove-FinOpsTestData:
- Multi-layer safety: local-only by default, no cloud deletion without explicit params
- Targeted storage deletion: scans manifests for _ftkTestData marker, deletes only test-data folders (production data preserved)
- ADX requires -Force because .clear table removes ALL rows (no selective deletion)
- Does not manage Microsoft Fabric data (clean up Fabric resources separately)
- Verifies ADX update policies after clearing tables
- Optional ADF trigger management to prevent re-ingestion during cleanup
- ShouldProcess support with ConfirmImpact=High

Includes:
- 14 unit tests for New-FinOpsTestData (parameter validation, CSV generation, seed reproducibility, multi-provider, FOCUS version columns)
- 9 unit tests for Remove-FinOpsTestData (parameter validation, Force safety, WhatIf)
- MS Learn documentation pages for both commands
- Changelog entries under v14
- README for src/templates/finops-hub/test/
- TOC and open-data-commands.md navigation updates
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Needs: Review 👀 PR that is ready to be reviewed Tool: FinOps hubs Data pipeline solution Tool: PowerShell PowerShell scripts and automation Type: Feature 💎 Idea to improve the product

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: Multi-Cloud FOCUS Test Data Generator Script

7 participants