feat: Add multi-cloud FOCUS test data generator for FinOps Hub by FallenHoot · Pull Request #2006 · microsoft/finops-toolkit

FallenHoot · 2026-02-15T10:44:51Z

Add multi-cloud FOCUS test data generator for FinOps Hub

Description

Adds Generate-MultiCloudTestData.ps1 — a PowerShell script that generates synthetic, multi-cloud, FOCUS-compliant cost data for testing and validating FinOps Hub deployments end-to-end.

Closes #2005

What's Included

Generate-MultiCloudTestData.ps1 (~1,430 lines) — Self-contained script that generates FOCUS 1.0–1.3 synthetic cost data for Azure, AWS, GCP, and DataCenter providers

Why This Script Is Needed

Testing a FinOps Hub deployment today requires real Cost Management export data. This script fills that gap by generating realistic synthetic data that:

Covers all 4 supported cloud providers with provider-specific conventions (Azure resource IDs, AWS ARNs, GCP resource paths)
Populates every column referenced by FinOps Hub dashboard KQL queries
Simulates real-world patterns: commitment discounts (Reservations + Savings Plans), Azure Hybrid Benefit, spot/dynamic pricing, marketplace purchases, negotiated discounts, and tag coverage variation
Generates data with proper Cost Management manifest.json files for ingestion pipeline compatibility
Optionally uploads to Azure Storage and manages ADF triggers

Key Features

Feature	Details
FOCUS compliance	All mandatory + conditional FOCUS columns (v1.0–1.3)
Persistent identities	Resources, billing accounts, subscriptions consistent across days
Budget scaling	Costs scaled to target budget via Python/pandas
Memory-safe	Streams rows daily to CSV, avoids OOM on 500K+ row datasets
Output formats	Parquet (pyarrow), CSV, or both
Upload support	Uploads to msexports + ingestion containers with proper blob paths

Testing

Tested with:

Default settings (500K rows, 6 months, all providers, $500K budget)
Single provider mode (Azure-only, 200K rows)
Full pipeline (generate → upload → ADF trigger → ADX ingestion → dashboard validation)
FOCUS versions 1.0, 1.2, and 1.3

Prerequisites

PowerShell 7+
Python 3 with pandas and pyarrow (for Parquet conversion)
Azure CLI (for upload functionality)

Checklist

Script follows FOCUS specification conventions
Microsoft copyright header included
Comment-based help with SYNOPSIS, DESCRIPTION, PARAMETERS, EXAMPLES
No hardcoded paths or environment-specific references
Tested with 498K+ rows successfully ingested into FinOps Hub

RolandKrummenacher

Review: Generate-MultiCloudTestData.ps1

Great concept — this fills a real gap for FinOps Hub end-to-end testing. The FOCUS column coverage and multi-cloud provider modeling are thorough. However, there are several issues to address before merging:

Critical

Get-Random overflow with 12-digit AWS account IDs (lines 335, 361) — will throw at runtime
Python dependency should be eliminated — budget scaling and Parquet output can be done in pure PowerShell, removing ~80 lines of fragile cross-language code with path-injection risk and dead code

Required by repo conventions

Missing changelog entry (v14 section in docs-mslearn/toolkit/changelog.md)
Missing README.md in the test directory
Missing #Requires statement and .LINK in help
No -WhatIf/-Confirm support for destructive operations (file creation, uploads, trigger starts)

Minor

Inconsistent cost rounding (10 vs 2 decimal places)
ADF trigger names hardcoded — should be parameterized or documented

src/templates/finops-hub/test/Generate-MultiCloudTestData.ps1

RolandKrummenacher · 2026-02-16T08:29:56Z

FOCUS Specification Compliance Analysis

I did a detailed comparison of the script's output against the official FOCUS specification at focus.finops.org for all four claimed versions (1.0, 1.1, 1.2, 1.3). Here's what I found:

Critical Issue: `$FocusVersion` parameter is cosmetic only

The script accepts $FocusVersion (ValidateSet "1.0", "1.1", "1.2", "1.3") but never uses it to vary the output schema. The same columns are emitted regardless of version. The value is only written to x_FocusVersion. This means the output cannot be properly compliant with any single FOCUS version — it's a superset/subset mix.

Per-Version Column Compliance (Cost and Usage Dataset)

FOCUS v1.0 (43 columns)

✅ All 43 v1.0 columns are present
❌ 8 extra columns that don't exist in v1.0: BillingAccountType (v1.2), SubAccountType (v1.2), InvoiceId (v1.2), CommitmentDiscountQuantity (v1.1), CommitmentDiscountUnit (v1.1), ServiceSubcategory (v1.1), HostProviderName (v1.3), ServiceProviderName (v1.3)

FOCUS v1.1 (50 columns)

❌ 4 columns missing: CapacityReservationId, CapacityReservationStatus, SkuMeter, SkuPriceDetails
❌ 5 extra columns not in v1.1: BillingAccountType, SubAccountType, InvoiceId, HostProviderName, ServiceProviderName

FOCUS v1.2 (57 columns)

❌ 8 columns missing: CapacityReservationId, CapacityReservationStatus, SkuMeter, SkuPriceDetails, PricingCurrency, PricingCurrencyContractedUnitPrice, PricingCurrencyEffectiveCost, PricingCurrencyListUnitPrice
❌ 2 extra columns not in v1.2: HostProviderName, ServiceProviderName

FOCUS v1.3 (64+ columns)

❌ 13 columns missing: AllocatedMethodDetails, AllocatedResourceId, AllocatedResourceName, AllocatedResourceType, CapacityReservationId, CapacityReservationStatus, ContractApplied, PricingCurrency, PricingCurrencyContractedUnitPrice, PricingCurrencyEffectiveCost, PricingCurrencyListUnitPrice, SkuMeter, SkuPriceDetails

Column Naming Issue

The script uses ServiceProviderName as the mandatory provider column for all versions, but the correct Column ID is ProviderName for v1.0–v1.2. ServiceProviderName only replaces ProviderName (deprecated) in v1.3. The script does output ProviderName too, but categorizes it under "FinOps Hub / Dashboard required columns" — it should be the primary mandatory column for v1.0–v1.2.

Missing v1.3 Structural Features

1. Contract Commitment Dataset (entirely absent)

FOCUS v1.3 introduced a second dataset with 13 mandatory columns (ContractId, ContractCommitmentId, ContractCommitmentCategory, ContractCommitmentCost, ContractCommitmentDescription, ContractCommitmentPeriodEnd/Start, ContractCommitmentQuantity, ContractCommitmentType, ContractCommitmentUnit, ContractPeriodEnd/Start, BillingCurrency). The script only generates Cost and Usage data — no Contract Commitment dataset is produced.

2. Data Generator-Calculated Split Cost Allocation (absent)

The 4 Allocated* columns (AllocatedMethodDetails, AllocatedResourceId, AllocatedResourceName, AllocatedResourceType) support shared cost splitting (e.g., K8s clusters, shared storage). Not implemented.

3. ContractApplied column (absent)

The JSON column that bridges Cost and Usage rows to the Contract Commitment dataset is not generated.

Summary Table

Version	Columns Present	Columns Missing	Extra Columns	Compliant?
v1.0	43/43	0	8	❌
v1.1	46/50	4	5	❌
v1.2	49/57	8	2	❌
v1.3	51/64+	13+	0	❌

Recommendation

Either:

Target a single version (e.g., v1.0 or v1.3) and get that version fully correct, or
Use $FocusVersion to dynamically select columns — only output columns valid for the chosen version and include all required columns for that version.

The closest match today is v1.3, but it's still missing conditional/recommended columns and the entire Contract Commitment dataset. For test data generation purposes, it may be acceptable to scope this to Cost and Usage only with a documented caveat, but the column set should still match the selected version.

RolandKrummenacher · 2026-02-16T08:40:43Z

Additional FOCUS Spec Compliance Findings

A few more items found during deeper analysis:

1. ServiceSubcategory — Invalid Values Against Spec's Closed Enumeration

The FOCUS spec (v1.1+) defines a closed list of allowed ServiceSubcategory values, each with a mandatory parent ServiceCategory. ~12 out of ~30 service entries use values that are not in the spec's allowed list:

Line	Service	Category	Subcategory in Script	Issue
169	Storage Accounts	Storage	`General Purpose v2`	Should be `Object Storage` or `Block Storage`
170	Azure Cosmos DB	Databases	`NoSQL Databases`	Should be `NoSQL`
171	Azure Data Explorer	Analytics	`Data Analytics`	Not in spec — closest: `Log Analytics` or `Other (Analytics)`
172	Azure App Service	Compute	`App Services`	Not in spec — closest: `Containers` or `Other (Compute)`
173	Azure Functions	Compute	`Serverless Compute`	Should be `Functions`
174	Azure Key Vault	Security	`Key Management`	Not in spec — closest: `Other (Security)`
175	Bandwidth	Networking	`Data Transfer`	Not in spec — closest: `Content Delivery` or `Other (Networking)`
176	Marketplace - 3rd Party	Compute	`Marketplace`	Not a valid subcategory
220	Amazon DynamoDB	Databases	`NoSQL Databases`	Should be `NoSQL`
247	Cloud Spanner	Databases	`Distributed Databases`	Not in spec — closest: `Other (Databases)`
248	Cloud Run	Compute	`Serverless Containers`	Should be `Containers`
267	Physical Servers	Compute	`Bare Metal`	Not in spec — closest: `Other (Compute)`

Values that are correct include: Virtual Machines, Containers, Relational Databases, Object Storage, Block Storage, Content Delivery, Network Infrastructure, Data Warehouses.

2. Cost Column Invariants — Math Broken by Anomaly Rows

The FOCUS spec requires: ListCost = ListUnitPrice × PricingQuantity (and similarly for ContractedCost) when unit price and quantity are non-null and ChargeClass ≠ "Correction".

Unit prices are derived on line 741-742:

$listUnitPrice = [math]::Round($listCost / $pricingQuantity, 10)
$contractedUnitPrice = [math]::Round($contractedCost / $pricingQuantity, 10)

But the "data quality anomaly" block on lines 751-756 mutates costs AFTER unit prices were already calculated:

if ($qualityRoll -eq 0) {
    $effectiveCost = [math]::Round($contractedCost * 1.1, 10)   # breaks EffectiveCost invariant
} elseif ($qualityRoll -eq 1) {
    $contractedCost = [math]::Round($listCost * 1.05, 10)       # breaks ContractedCost = ContractedUnitPrice × PricingQuantity
}

~2% of rows will have cost/unit-price mismatches that violate the spec's mathematical constraints. If these are intentionally anomalous test data, they should be documented as such (e.g., via x_SourceChanges), and ChargeClass should be set to "Correction" to exempt them from the spec's invariant rules.

3. InvoiceId — Assigned to All Charge Categories

The script generates an InvoiceId for every row (lines 760-764), including Credit and Adjustment charges. In practice:

Some credits and adjustments are not tied to a specific invoice and should have InvoiceId = $null
ChargeClass = "Correction" rows reference a previously invoiced billing period and might carry the original invoice's ID, not a new one

This is a lower-severity finding (more about realism than strict spec violation), but worth considering for test data that claims multi-version FOCUS compliance.

Comprehensive rewrite of Generate-MultiCloudTestData.ps1: Critical fixes: - Fix Get-Random [int] overflow with 12-digit AWS account IDs (New-AwsAccountId) - Eliminate Python dependency entirely (inline budget scaling via scale factor) - Remove dead code from Python/Parquet block Required by repo conventions: - Add #Requires -Version 7.0 - Add .LINK to comment-based help - Add [CmdletBinding(SupportsShouldProcess)] with WhatIf/Confirm support - Add changelog entry - Add test directory README.md FOCUS specification compliance: - Fix ~12 ServiceSubcategory values to match FOCUS closed enumeration - Fix cost invariants: unit prices calculated AFTER all cost modifications - Anomaly rows now set ChargeClass=Correction (exempt from invariant rules) - Credits/Adjustments get null InvoiceId (per FOCUS spec) - Version-aware column sets: v1.1+ gets CommitmentDiscountQuantity/Unit, v1.2+ gets BillingAccountType/SubAccountType/InvoiceId, v1.3+ gets HostProviderName/ServiceProviderName - Document scope as Cost and Usage dataset only Recommended improvements: - Add -Seed parameter for reproducible test data - Add -UseStorageKey switch, default to Azure AD auth (--auth-mode login) - Fix Get-RandomDecimal to use [long] instead of [int] for large ranges

RolandKrummenacher

Missing FOCUS Columns

The script is still missing several columns defined in the FOCUS specification across versions. These should either be implemented or explicitly documented as out-of-scope:

v1.1+ (4 columns)

CapacityReservationId — Identifier for capacity reservations
CapacityReservationStatus — Whether capacity reservation was used/unused
SkuMeter — Meter-level SKU details
SkuPriceDetails — JSON column with pricing metadata

v1.2+ (4 columns)

PricingCurrency — Currency used for pricing columns
PricingCurrencyContractedUnitPrice — Contracted unit price in pricing currency
PricingCurrencyEffectiveCost — Effective cost in pricing currency
PricingCurrencyListUnitPrice — List unit price in pricing currency

v1.3+ (5 columns)

AllocatedMethodDetails — Details about cost allocation method
AllocatedResourceId — Resource ID for split/allocated costs
AllocatedResourceName — Resource name for split/allocated costs
AllocatedResourceType — Resource type for split/allocated costs
ContractApplied — JSON column bridging Cost and Usage rows to Contract Commitment dataset

Total: 13 columns missing across versions. Without these, the output cannot be fully compliant with any FOCUS version from v1.1 onward. At minimum, please document which columns are intentionally excluded and why (e.g., Contract Commitment dataset is already noted as out of scope in the help text — the same treatment should apply to these).

FallenHoot · 2026-02-16T18:42:39Z

@RolandKrummenacher — Thank you for the thorough review! Really appreciate the detailed feedback.

Your comments were spot-on and gave us the opportunity to go back and revisit logic that was missing during a live demo. We've addressed all the review feedback in this latest push:

What changed

PR review items — all addressed:

AllocatedResourceType — Added as the missing FOCUS v1.3 column
ContractApplied — Now populated with JSON contract references for committed-discount rows (v1.3+)
Split cost allocation — ~10% of AKS/EKS/GKE rows now populate Allocated* columns with namespace-level allocation simulation
ADF trigger names — Extracted to a reusable `` variable (was hardcoded in 2 places)
Column emission documented — FOCUS Column Coverage summary now explicitly lists which columns are emitted per version

Additional improvements (discovered while re-testing):

Expanded README with NukeTestData section, output formats, and additional datasets documentation
Added NukeTestData Quick Start examples
Removed the .duplicate backup file that was accidentally included

Pester tests

We'll look into adding Pester unit tests (for Get-RandomDecimal, New-AwsAccountId, Get-WeightedRandomService, etc.) in a follow-up PR to keep this one focused on the generator itself.

FallenHoot · 2026-02-16T19:27:13Z

Fix pushed (61a6d0c): Resolve OutputPath to an absolute path before use. Export-Parquet is a .NET cmdlet that uses [IO.Directory]::GetCurrentDirectory() which can differ from PowerShell's C:\Users\zaolinsk\finops-toolkit — this caused 'Could not find a part of the path' errors when running from a different working directory. Fixed by calling System.Management.Automation.EngineIntrinsics.SessionState.Path.GetUnresolvedProviderPathFromPSPath() on OutputPath after parameter binding.

Comprehensive rewrite of Generate-MultiCloudTestData.ps1: Critical fixes: - Fix Get-Random [int] overflow with 12-digit AWS account IDs (New-AwsAccountId) - Eliminate Python dependency entirely (inline budget scaling via scale factor) - Remove dead code from Python/Parquet block Required by repo conventions: - Add #Requires -Version 7.0 - Add .LINK to comment-based help - Add [CmdletBinding(SupportsShouldProcess)] with WhatIf/Confirm support - Add changelog entry - Add test directory README.md FOCUS specification compliance: - Fix ~12 ServiceSubcategory values to match FOCUS closed enumeration - Fix cost invariants: unit prices calculated AFTER all cost modifications - Anomaly rows now set ChargeClass=Correction (exempt from invariant rules) - Credits/Adjustments get null InvoiceId (per FOCUS spec) - Version-aware column sets: v1.1+ gets CommitmentDiscountQuantity/Unit, v1.2+ gets BillingAccountType/SubAccountType/InvoiceId, v1.3+ gets HostProviderName/ServiceProviderName - Document scope as Cost and Usage dataset only Recommended improvements: - Add -Seed parameter for reproducible test data - Add -UseStorageKey switch, default to Azure AD auth (--auth-mode login) - Fix Get-RandomDecimal to use [long] instead of [int] for large ranges

flanakin · 2026-02-17T07:48:55Z

src/templates/finops-hub/test/Generate-MultiCloudTestData.ps1

Can we move this to src/powershell/Public/New-FinOpsTestData.ps1 so it can be published in the PS module? I'm fine with another verb name, but we should use an approved verb.

Side question: Should this be a generic script for any purpose or do we want to make it hubs-specific? I'm fine either way, but we'd follow different conventions.

Done ✅ — Moved to src/powershell/Public/New-FinOpsTestData.ps1 as a proper function using the approved verb New. Old script removed. Auto-discovered by FinOpsToolkit.psm1.

Done — moved to src/powershell/Public/New-FinOpsTestData.ps1 (and the cleanup script to src/powershell/Public/Remove-FinOpsTestData.ps1). Both are auto-discovered by FinOpsToolkit.psm1.

flanakin · 2026-02-17T07:51:22Z

src/templates/finops-hub/test/Generate-MultiCloudTestData.ps1

+.SYNOPSIS
+    Generates multi-cloud FOCUS-compliant test data for FinOps Hub validation.


Can you look at the formatting conventions we have and apply them here as well. In this case, we indent the doc properties alongside the values.

Suggested change

.SYNOPSIS

Generates multi-cloud FOCUS-compliant test data for FinOps Hub validation.

.SYNOPSIS

Generates multi-cloud FOCUS-compliant test data for FinOps Hub validation.

Done ✅ — All doc tags (.SYNOPSIS, .DESCRIPTION, .PARAMETER, .EXAMPLE, .LINK, .NOTES) now use 4-space indentation matching repo conventions.

Done — applied 4-space indentation for all .SYNOPSIS, .DESCRIPTION, .PARAMETER, .EXAMPLE, and .LINK entries to match the toolkit convention.

flanakin · 2026-02-17T08:07:43Z

src/templates/finops-hub/test/Generate-MultiCloudTestData.ps1

+    - Prices (Azure EA/MCA price sheet → Prices_raw → Prices_final_v1_2)
+    - CommitmentDiscountUsage (Reservation details → CommitmentDiscountUsage_raw)
+    - Recommendations (Reservation recommendations → Recommendations_raw)
+    - Transactions (Reservation transactions → Transactions_raw)


These are mentioning hubs-specific tables. Do you want to keep this specific to hubs? If so, I'd probably change the name to New-FinOpsHubTestData. But I also see value in breaking this out to support any number of scenarios:

New-FinOpsTestData

Set-FinOpsStorageBlobContent

New-FinOpsExportManifest

Add-FinOpsHubTestData

I see these as just breaking down what you have into smaller chunks. We don't need to do this now. I'm just thinking out loud about a growth path that would be reusable for more scenarios, if/when needed.

Acknowledged — these are hubs-specific ADX table names. That's intentional since the test data generator is designed for hubs validation. As you noted, breaking this into smaller composable commands (e.g., New-FinOpsTestCostData, New-FinOpsTestPriceData) is a great growth-path idea. Deferring that refactor to a follow-up — doesn't need to block this PR.

flanakin · 2026-02-17T09:31:56Z

src/templates/finops-hub/test/Generate-MultiCloudTestData.ps1

+.PARAMETER OutputPath
+    Directory to save generated files. Default: ./test-data
+
+.PARAMETER CloudProvider


I need to double check what's in FOCUS 1.3, but I believe the best term here is ServiceProvider to account for SaaS services that we could hypothetically support in the future.

Suggested change

.PARAMETER CloudProvider

.PARAMETER ServiceProvider

Done ✅ — Renamed CloudProvider to ServiceProvider globally (parameter, ValidateSet, all internal references). The x_CloudProvider data column is preserved since it's a data field.

flanakin · 2026-02-17T09:33:19Z

src/templates/finops-hub/test/Generate-MultiCloudTestData.ps1

+.PARAMETER EndDate
+    End date for generated data. Default: Today
+
+.PARAMETER TotalRowTarget


nit: MaxRowCount or maybe just RowCount?

Done — renamed to -RowCount.

flanakin · 2026-02-17T09:44:07Z

src/templates/finops-hub/test/Generate-MultiCloudTestData.ps1

+        ServiceProviderName     = "Microsoft"
+        InvoiceIssuerName       = "Microsoft"
+        HostProviderName        = "Microsoft"
+        BillingAccountType      = "Billing Profile"


nit: Billing Profile type doesn't match EA account agreement.

Done ✅ — Fixed from 'Billing Profile' to 'Billing Account' for EA.

flanakin · 2026-02-17T09:50:01Z

src/templates/finops-hub/test/Generate-MultiCloudTestData.ps1

+# any that are missing or empty.
+# ============================================================================
+
+function Invoke-EnsureUpdatePolicy


Is this needed? Have you ever seen a case where this check failed? I'd love to get to a point where this code isn't necessary.

ADX update policies can get dropped after table clears/recreates during the nuke flow. This is a defensive check to re-create them if missing. Happy to remove if we confirm update policies are always preserved after table operations.

flanakin · 2026-02-17T09:53:08Z

src/templates/finops-hub/test/Generate-MultiCloudTestData.ps1

+                else
+                {
+                    Write-Host "  Starting $trigger..." -ForegroundColor Cyan
+                    az datafactory trigger start --factory-name $AdfName --resource-group $ResourceGroupName --name $trigger --only-show-errors 2>$null


Don't use the Az CLI. You're in PowerShell. Stick with Az PowerShell. Applies to all commands.

Done ✅ — All 15 Az CLI calls replaced with Az PowerShell equivalents: Get-AzAccessToken, New-AzStorageContext, Set-AzStorageBlobContent, Get-AzStorageBlob, Remove-AzStorageBlob, Get-AzStorageAccountKey, Get/Start/Stop-AzDataFactoryV2Trigger, Get-AzDataFactoryV2PipelineRun.

flanakin · 2026-02-17T09:56:24Z

src/templates/finops-hub/test/Generate-MultiCloudTestData.ps1

+            $blobPath = "$blobFolder/$dataFile"
+            $manifestBlobPath = "$blobFolder/manifest.json"
+
+            $manifest = @{


nit: Creating a manifest is an awesome capability in and of itself. I'd love to see this as a separate New-FinOpsExportManifest command.

Evaluated extracting New-FinOpsExportManifest — the 4 manifest creation sites have fundamentally different schemas: (1) Azure msexports manifest with full Cost Management metadata (exportConfig, deliveryConfig, blobs, runInfo), (2) simple ingestion trigger manifests for non-Azure providers (3-4 fields), (3) per-dataset trigger manifests for Prices/CDU/Recommendations/Transactions, and (4) local per-provider manifests. Each uses context-specific variables from the upload loop (, , ``, etc.). Creating a unified function with a clean interface would require rethinking the upload architecture. Recommend a focused refactoring PR to design this properly. Commit: 0f2e032

flanakin · 2026-02-17T09:57:21Z

src/templates/finops-hub/test/Generate-MultiCloudTestData.ps1

+    Write-Host "  3. Start ADF triggers to process the data"
+}
+
+Write-Host ""


Approaching 3K lines is a bit much. I'd love to see this broken out into multiple files.

✅ Significant progress — New-FinOpsTestData.ps1 is now down to ~2170 lines (from ~2900), with Remove-FinOpsTestData.ps1 extracted as a standalone command (~340 lines). The remaining ~2170 lines are the core data generation logic. For New-FinOpsExportManifest, the 4 manifest creation sites use different schemas (full Cost Management manifest, simple ingestion triggers, local per-provider) and are tightly coupled to upload context variables. Extracting a clean reusable interface would benefit from a focused refactoring PR. Commit: 0f2e032

FallenHoot · 2026-02-21T12:06:58Z

PR Summary – New-FinOpsTestData ;& Remove-FinOpsTestData

New-FinOpsTestData

Generates synthetic, multi-cloud FOCUS-compliant cost data (Azure, AWS, GCP, on-premises)
Compatible with both Azure Data Explorer and Microsoft Fabric Real-Time Intelligence ingestion paths
Supports FOCUS versions 1.0–1.3 with version-specific column sets
Generates all 5 FOCUS datasets: Costs, Prices, CommitmentDiscountUsage, Recommendations, Transactions
Includes commitment discounts, Azure Hybrid Benefit, tag variation, and inline budget scaling
Deterministic output via -Seed parameter; CSV and Parquet formats (via optional PSParquet module)
Optional upload to Azure Storage with ADF trigger management
All manifests include _ftkTestData watermark for safe cleanup identification

Remove-FinOpsTestData

Multi-layer safety features:
- Local-only cleanup by default (no cloud deletion without explicit params)
- Targeted storage deletion: scans manifests for _ftkTestData marker, deletes only test-data folders — production data is preserved
- ADX cleanup requires -Force because .clear table removes ALL rows (no selective deletion possible)
- Does not manage Microsoft Fabric data (clean up Fabric resources separately)
- ShouldProcess support with ConfirmImpact=High
Verifies ADX update policies after clearing tables
Optional ADF trigger management to prevent re-ingestion during cleanup

Included artifacts

14 unit tests for New-FinOpsTestData (all passing)
9 unit tests for Remove-FinOpsTestData (all passing)
Zero PSScriptAnalyzer lint errors
MS Learn documentation pages for both commands
Changelog entries under v14
README for src/templates/finops-hub/test/
TOC and open-data-commands.md navigation updates
.gitignore patterns for test-data output

Review items addressed

All 18 original review items + 4 quality improvements resolved across iterations:

✅ Mandatory params (ContainerName, StorageAccount) for storage upload
✅ PSParquet availability check with clear Install-Module guidance
✅ ShouldProcess with ConfirmImpact=High on Remove command
✅ Watermark system in all 4 manifest types (_ftkTestData, _generator, _generatedAt)
✅ Targeted storage deletion (scan manifests → delete only marked folders)
✅ ADX .clear table documented as all-or-nothing with separate environment recommendation
✅ Fabric compatibility documented (works for ingestion, not managed by Remove)
✅ -Force required for ADX cleanup with clear warning
✅ Comprehensive Pester test coverage
✅ Single squashed commit for clean history

…mands Add multi-cloud FOCUS test data generator and cleanup commands for FinOps Hub validation. New-FinOpsTestData: - Generates synthetic FOCUS-compliant cost data for Azure, AWS, GCP, and on-premises - Compatible with both Azure Data Explorer and Microsoft Fabric RTI ingestion paths - Supports FOCUS versions 1.0-1.3 with version-specific column sets - Includes commitment discounts, Azure Hybrid Benefit, tag variation, budget scaling - Deterministic output via -Seed parameter, CSV and Parquet formats - Optional upload to Azure Storage with ADF trigger management - Generates all 5 FOCUS datasets: Costs, Prices, CommitmentDiscountUsage, Recommendations, Transactions - All manifests include _ftkTestData watermark for safe cleanup identification Remove-FinOpsTestData: - Multi-layer safety: local-only by default, no cloud deletion without explicit params - Targeted storage deletion: scans manifests for _ftkTestData marker, deletes only test-data folders (production data preserved) - ADX requires -Force because .clear table removes ALL rows (no selective deletion) - Does not manage Microsoft Fabric data (clean up Fabric resources separately) - Verifies ADX update policies after clearing tables - Optional ADF trigger management to prevent re-ingestion during cleanup - ShouldProcess support with ConfirmImpact=High Includes: - 14 unit tests for New-FinOpsTestData (parameter validation, CSV generation, seed reproducibility, multi-provider, FOCUS version columns) - 9 unit tests for Remove-FinOpsTestData (parameter validation, Force safety, WhatIf) - MS Learn documentation pages for both commands - Changelog entries under v14 - README for src/templates/finops-hub/test/ - TOC and open-data-commands.md navigation updates

FallenHoot requested review from MSBrett, arthurclares and flanakin as code owners February 15, 2026 10:44

FallenHoot added Type: Feature 💎 Idea to improve the product Tool: FinOps hubs Data pipeline solution labels Feb 15, 2026

microsoft-github-policy-service bot requested review from RolandKrummenacher, helderpinto and ro100e February 15, 2026 10:45

microsoft-github-policy-service bot added the Needs: Review 👀 PR that is ready to be reviewed label Feb 15, 2026

microsoft-github-policy-service bot assigned arthurclares, flanakin, MSBrett, helderpinto, ro100e and RolandKrummenacher Feb 15, 2026

RolandKrummenacher requested changes Feb 16, 2026

View reviewed changes

microsoft-github-policy-service bot added Needs: Attention 👋 Issue or PR needs to be reviewed by the author or it will be closed due to no activity and removed Needs: Review 👀 PR that is ready to be reviewed labels Feb 16, 2026

microsoft-github-policy-service bot assigned FallenHoot Feb 16, 2026

microsoft-github-policy-service bot added Needs: Review 👀 PR that is ready to be reviewed and removed Needs: Attention 👋 Issue or PR needs to be reviewed by the author or it will be closed due to no activity labels Feb 16, 2026

RolandKrummenacher reviewed Feb 16, 2026

View reviewed changes

RolandKrummenacher approved these changes Feb 17, 2026

View reviewed changes

FallenHoot force-pushed the feature/multi-cloud-test-data-generator branch from 61a6d0c to 4fde70e Compare February 17, 2026 08:01

flanakin requested changes Feb 17, 2026

View reviewed changes

microsoft-github-policy-service bot added Needs: Attention 👋 Issue or PR needs to be reviewed by the author or it will be closed due to no activity and removed Needs: Review 👀 PR that is ready to be reviewed labels Feb 17, 2026

flanakin added the Tool: PowerShell PowerShell scripts and automation label Feb 18, 2026

microsoft-github-policy-service bot requested a review from flanakin February 21, 2026 11:31

microsoft-github-policy-service bot added Needs: Review 👀 PR that is ready to be reviewed and removed Needs: Attention 👋 Issue or PR needs to be reviewed by the author or it will be closed due to no activity labels Feb 21, 2026

FallenHoot force-pushed the feature/multi-cloud-test-data-generator branch from b65f39a to 228a768 Compare February 21, 2026 18:36

FallenHoot force-pushed the feature/multi-cloud-test-data-generator branch from a1a55d4 to baa8af6 Compare February 21, 2026 20:02

		.SYNOPSIS
		Generates multi-cloud FOCUS-compliant test data for FinOps Hub validation.

Comments

Conversation

FallenHoot commented Feb 15, 2026

Add multi-cloud FOCUS test data generator for FinOps Hub

Description

What's Included

Why This Script Is Needed

Key Features

Testing

Prerequisites

Checklist

Uh oh!

RolandKrummenacher left a comment

Choose a reason for hiding this comment

Review: Generate-MultiCloudTestData.ps1

Critical

Required by repo conventions

Recommended

Minor

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RolandKrummenacher commented Feb 16, 2026

FOCUS Specification Compliance Analysis

Critical Issue: $FocusVersion parameter is cosmetic only

Per-Version Column Compliance (Cost and Usage Dataset)

FOCUS v1.0 (43 columns)

FOCUS v1.1 (50 columns)

FOCUS v1.2 (57 columns)

FOCUS v1.3 (64+ columns)

Column Naming Issue

Missing v1.3 Structural Features

1. Contract Commitment Dataset (entirely absent)

2. Data Generator-Calculated Split Cost Allocation (absent)

3. ContractApplied column (absent)

Summary Table

Recommendation

Uh oh!

RolandKrummenacher commented Feb 16, 2026

Additional FOCUS Spec Compliance Findings

1. ServiceSubcategory — Invalid Values Against Spec's Closed Enumeration

2. Cost Column Invariants — Math Broken by Anomaly Rows

3. InvoiceId — Assigned to All Charge Categories

Uh oh!

RolandKrummenacher left a comment

Choose a reason for hiding this comment

Missing FOCUS Columns

v1.1+ (4 columns)

v1.2+ (4 columns)

v1.3+ (5 columns)

Uh oh!

FallenHoot commented Feb 16, 2026

What changed

Pester tests

Uh oh!

FallenHoot commented Feb 16, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Critical Issue: `$FocusVersion` parameter is cosmetic only

FallenHoot commented Feb 21, 2026 •

edited

Loading