feat: Add multi-cloud FOCUS test data generator for FinOps Hub#2006
feat: Add multi-cloud FOCUS test data generator for FinOps Hub#2006FallenHoot wants to merge 1 commit intomicrosoft:devfrom
Conversation
RolandKrummenacher
left a comment
There was a problem hiding this comment.
Review: Generate-MultiCloudTestData.ps1
Great concept — this fills a real gap for FinOps Hub end-to-end testing. The FOCUS column coverage and multi-cloud provider modeling are thorough. However, there are several issues to address before merging:
Critical
Get-Randomoverflow with 12-digit AWS account IDs (lines 335, 361) — will throw at runtime- Python dependency should be eliminated — budget scaling and Parquet output can be done in pure PowerShell, removing ~80 lines of fragile cross-language code with path-injection risk and dead code
Required by repo conventions
- Missing changelog entry (v14 section in
docs-mslearn/toolkit/changelog.md) - Missing README.md in the test directory
- Missing
#Requiresstatement and.LINKin help - No
-WhatIf/-Confirmsupport for destructive operations (file creation, uploads, trigger starts)
Recommended
- Add Pester tests for helper functions
- Prefer Azure AD auth over storage account keys
- Add
-Seedparameter for reproducible test data - FOCUS version parameter is metadata-only — either vary the schema or simplify
Minor
- Inconsistent cost rounding (10 vs 2 decimal places)
- ADF trigger names hardcoded — should be parameterized or documented
FOCUS Specification Compliance AnalysisI did a detailed comparison of the script's output against the official FOCUS specification at focus.finops.org for all four claimed versions (1.0, 1.1, 1.2, 1.3). Here's what I found: Critical Issue:
|
| Version | Columns Present | Columns Missing | Extra Columns | Compliant? |
|---|---|---|---|---|
| v1.0 | 43/43 | 0 | 8 | ❌ |
| v1.1 | 46/50 | 4 | 5 | ❌ |
| v1.2 | 49/57 | 8 | 2 | ❌ |
| v1.3 | 51/64+ | 13+ | 0 | ❌ |
Recommendation
Either:
- Target a single version (e.g., v1.0 or v1.3) and get that version fully correct, or
- Use
$FocusVersionto dynamically select columns — only output columns valid for the chosen version and include all required columns for that version.
The closest match today is v1.3, but it's still missing conditional/recommended columns and the entire Contract Commitment dataset. For test data generation purposes, it may be acceptable to scope this to Cost and Usage only with a documented caveat, but the column set should still match the selected version.
Additional FOCUS Spec Compliance FindingsA few more items found during deeper analysis: 1. ServiceSubcategory — Invalid Values Against Spec's Closed EnumerationThe FOCUS spec (v1.1+) defines a closed list of allowed
Values that are correct include: 2. Cost Column Invariants — Math Broken by Anomaly RowsThe FOCUS spec requires: Unit prices are derived on line 741-742: $listUnitPrice = [math]::Round($listCost / $pricingQuantity, 10)
$contractedUnitPrice = [math]::Round($contractedCost / $pricingQuantity, 10)But the "data quality anomaly" block on lines 751-756 mutates costs AFTER unit prices were already calculated: if ($qualityRoll -eq 0) {
$effectiveCost = [math]::Round($contractedCost * 1.1, 10) # breaks EffectiveCost invariant
} elseif ($qualityRoll -eq 1) {
$contractedCost = [math]::Round($listCost * 1.05, 10) # breaks ContractedCost = ContractedUnitPrice × PricingQuantity
}~2% of rows will have cost/unit-price mismatches that violate the spec's mathematical constraints. If these are intentionally anomalous test data, they should be documented as such (e.g., via 3. InvoiceId — Assigned to All Charge CategoriesThe script generates an InvoiceId for every row (lines 760-764), including
This is a lower-severity finding (more about realism than strict spec violation), but worth considering for test data that claims multi-version FOCUS compliance. |
Comprehensive rewrite of Generate-MultiCloudTestData.ps1: Critical fixes: - Fix Get-Random [int] overflow with 12-digit AWS account IDs (New-AwsAccountId) - Eliminate Python dependency entirely (inline budget scaling via scale factor) - Remove dead code from Python/Parquet block Required by repo conventions: - Add #Requires -Version 7.0 - Add .LINK to comment-based help - Add [CmdletBinding(SupportsShouldProcess)] with WhatIf/Confirm support - Add changelog entry - Add test directory README.md FOCUS specification compliance: - Fix ~12 ServiceSubcategory values to match FOCUS closed enumeration - Fix cost invariants: unit prices calculated AFTER all cost modifications - Anomaly rows now set ChargeClass=Correction (exempt from invariant rules) - Credits/Adjustments get null InvoiceId (per FOCUS spec) - Version-aware column sets: v1.1+ gets CommitmentDiscountQuantity/Unit, v1.2+ gets BillingAccountType/SubAccountType/InvoiceId, v1.3+ gets HostProviderName/ServiceProviderName - Document scope as Cost and Usage dataset only Recommended improvements: - Add -Seed parameter for reproducible test data - Add -UseStorageKey switch, default to Azure AD auth (--auth-mode login) - Fix Get-RandomDecimal to use [long] instead of [int] for large ranges
RolandKrummenacher
left a comment
There was a problem hiding this comment.
Missing FOCUS Columns
The script is still missing several columns defined in the FOCUS specification across versions. These should either be implemented or explicitly documented as out-of-scope:
v1.1+ (4 columns)
CapacityReservationId— Identifier for capacity reservationsCapacityReservationStatus— Whether capacity reservation was used/unusedSkuMeter— Meter-level SKU detailsSkuPriceDetails— JSON column with pricing metadata
v1.2+ (4 columns)
PricingCurrency— Currency used for pricing columnsPricingCurrencyContractedUnitPrice— Contracted unit price in pricing currencyPricingCurrencyEffectiveCost— Effective cost in pricing currencyPricingCurrencyListUnitPrice— List unit price in pricing currency
v1.3+ (5 columns)
AllocatedMethodDetails— Details about cost allocation methodAllocatedResourceId— Resource ID for split/allocated costsAllocatedResourceName— Resource name for split/allocated costsAllocatedResourceType— Resource type for split/allocated costsContractApplied— JSON column bridging Cost and Usage rows to Contract Commitment dataset
Total: 13 columns missing across versions. Without these, the output cannot be fully compliant with any FOCUS version from v1.1 onward. At minimum, please document which columns are intentionally excluded and why (e.g., Contract Commitment dataset is already noted as out of scope in the help text — the same treatment should apply to these).
|
@RolandKrummenacher — Thank you for the thorough review! Really appreciate the detailed feedback. Your comments were spot-on and gave us the opportunity to go back and revisit logic that was missing during a live demo. We've addressed all the review feedback in this latest push: What changedPR review items — all addressed:
Additional improvements (discovered while re-testing):
Pester testsWe'll look into adding Pester unit tests (for |
|
Fix pushed (61a6d0c): Resolve |
Comprehensive rewrite of Generate-MultiCloudTestData.ps1: Critical fixes: - Fix Get-Random [int] overflow with 12-digit AWS account IDs (New-AwsAccountId) - Eliminate Python dependency entirely (inline budget scaling via scale factor) - Remove dead code from Python/Parquet block Required by repo conventions: - Add #Requires -Version 7.0 - Add .LINK to comment-based help - Add [CmdletBinding(SupportsShouldProcess)] with WhatIf/Confirm support - Add changelog entry - Add test directory README.md FOCUS specification compliance: - Fix ~12 ServiceSubcategory values to match FOCUS closed enumeration - Fix cost invariants: unit prices calculated AFTER all cost modifications - Anomaly rows now set ChargeClass=Correction (exempt from invariant rules) - Credits/Adjustments get null InvoiceId (per FOCUS spec) - Version-aware column sets: v1.1+ gets CommitmentDiscountQuantity/Unit, v1.2+ gets BillingAccountType/SubAccountType/InvoiceId, v1.3+ gets HostProviderName/ServiceProviderName - Document scope as Cost and Usage dataset only Recommended improvements: - Add -Seed parameter for reproducible test data - Add -UseStorageKey switch, default to Azure AD auth (--auth-mode login) - Fix Get-RandomDecimal to use [long] instead of [int] for large ranges
61a6d0c to
4fde70e
Compare
There was a problem hiding this comment.
Can we move this to src/powershell/Public/New-FinOpsTestData.ps1 so it can be published in the PS module? I'm fine with another verb name, but we should use an approved verb.
Side question: Should this be a generic script for any purpose or do we want to make it hubs-specific? I'm fine either way, but we'd follow different conventions.
There was a problem hiding this comment.
Done ✅ — Moved to src/powershell/Public/New-FinOpsTestData.ps1 as a proper function using the approved verb New. Old script removed. Auto-discovered by FinOpsToolkit.psm1.
There was a problem hiding this comment.
Done — moved to src/powershell/Public/New-FinOpsTestData.ps1 (and the cleanup script to src/powershell/Public/Remove-FinOpsTestData.ps1). Both are auto-discovered by FinOpsToolkit.psm1.
| .SYNOPSIS | ||
| Generates multi-cloud FOCUS-compliant test data for FinOps Hub validation. |
There was a problem hiding this comment.
Can you look at the formatting conventions we have and apply them here as well. In this case, we indent the doc properties alongside the values.
| .SYNOPSIS | |
| Generates multi-cloud FOCUS-compliant test data for FinOps Hub validation. | |
| .SYNOPSIS | |
| Generates multi-cloud FOCUS-compliant test data for FinOps Hub validation. |
There was a problem hiding this comment.
Done ✅ — All doc tags (.SYNOPSIS, .DESCRIPTION, .PARAMETER, .EXAMPLE, .LINK, .NOTES) now use 4-space indentation matching repo conventions.
There was a problem hiding this comment.
Done — applied 4-space indentation for all .SYNOPSIS, .DESCRIPTION, .PARAMETER, .EXAMPLE, and .LINK entries to match the toolkit convention.
| - Prices (Azure EA/MCA price sheet → Prices_raw → Prices_final_v1_2) | ||
| - CommitmentDiscountUsage (Reservation details → CommitmentDiscountUsage_raw) | ||
| - Recommendations (Reservation recommendations → Recommendations_raw) | ||
| - Transactions (Reservation transactions → Transactions_raw) |
There was a problem hiding this comment.
These are mentioning hubs-specific tables. Do you want to keep this specific to hubs? If so, I'd probably change the name to New-FinOpsHubTestData. But I also see value in breaking this out to support any number of scenarios:
- New-FinOpsTestData
- Set-FinOpsStorageBlobContent
- New-FinOpsExportManifest
- Add-FinOpsHubTestData
I see these as just breaking down what you have into smaller chunks. We don't need to do this now. I'm just thinking out loud about a growth path that would be reusable for more scenarios, if/when needed.
There was a problem hiding this comment.
Acknowledged — these are hubs-specific ADX table names. That's intentional since the test data generator is designed for hubs validation. As you noted, breaking this into smaller composable commands (e.g., New-FinOpsTestCostData, New-FinOpsTestPriceData) is a great growth-path idea. Deferring that refactor to a follow-up — doesn't need to block this PR.
| .PARAMETER OutputPath | ||
| Directory to save generated files. Default: ./test-data | ||
|
|
||
| .PARAMETER CloudProvider |
There was a problem hiding this comment.
I need to double check what's in FOCUS 1.3, but I believe the best term here is ServiceProvider to account for SaaS services that we could hypothetically support in the future.
| .PARAMETER CloudProvider | |
| .PARAMETER ServiceProvider |
There was a problem hiding this comment.
Done ✅ — Renamed CloudProvider to ServiceProvider globally (parameter, ValidateSet, all internal references). The x_CloudProvider data column is preserved since it's a data field.
| .PARAMETER EndDate | ||
| End date for generated data. Default: Today | ||
|
|
||
| .PARAMETER TotalRowTarget |
There was a problem hiding this comment.
nit: MaxRowCount or maybe just RowCount?
There was a problem hiding this comment.
Done — renamed to -RowCount.
| ServiceProviderName = "Microsoft" | ||
| InvoiceIssuerName = "Microsoft" | ||
| HostProviderName = "Microsoft" | ||
| BillingAccountType = "Billing Profile" |
There was a problem hiding this comment.
nit: Billing Profile type doesn't match EA account agreement.
There was a problem hiding this comment.
Done ✅ — Fixed from 'Billing Profile' to 'Billing Account' for EA.
| # any that are missing or empty. | ||
| # ============================================================================ | ||
|
|
||
| function Invoke-EnsureUpdatePolicy |
There was a problem hiding this comment.
Is this needed? Have you ever seen a case where this check failed? I'd love to get to a point where this code isn't necessary.
There was a problem hiding this comment.
ADX update policies can get dropped after table clears/recreates during the nuke flow. This is a defensive check to re-create them if missing. Happy to remove if we confirm update policies are always preserved after table operations.
| else | ||
| { | ||
| Write-Host " Starting $trigger..." -ForegroundColor Cyan | ||
| az datafactory trigger start --factory-name $AdfName --resource-group $ResourceGroupName --name $trigger --only-show-errors 2>$null |
There was a problem hiding this comment.
Don't use the Az CLI. You're in PowerShell. Stick with Az PowerShell. Applies to all commands.
There was a problem hiding this comment.
Done ✅ — All 15 Az CLI calls replaced with Az PowerShell equivalents: Get-AzAccessToken, New-AzStorageContext, Set-AzStorageBlobContent, Get-AzStorageBlob, Remove-AzStorageBlob, Get-AzStorageAccountKey, Get/Start/Stop-AzDataFactoryV2Trigger, Get-AzDataFactoryV2PipelineRun.
| $blobPath = "$blobFolder/$dataFile" | ||
| $manifestBlobPath = "$blobFolder/manifest.json" | ||
|
|
||
| $manifest = @{ |
There was a problem hiding this comment.
nit: Creating a manifest is an awesome capability in and of itself. I'd love to see this as a separate New-FinOpsExportManifest command.
There was a problem hiding this comment.
Evaluated extracting New-FinOpsExportManifest — the 4 manifest creation sites have fundamentally different schemas: (1) Azure msexports manifest with full Cost Management metadata (exportConfig, deliveryConfig, blobs, runInfo), (2) simple ingestion trigger manifests for non-Azure providers (3-4 fields), (3) per-dataset trigger manifests for Prices/CDU/Recommendations/Transactions, and (4) local per-provider manifests. Each uses context-specific variables from the upload loop (, , ``, etc.). Creating a unified function with a clean interface would require rethinking the upload architecture. Recommend a focused refactoring PR to design this properly. Commit: 0f2e032
| Write-Host " 3. Start ADF triggers to process the data" | ||
| } | ||
|
|
||
| Write-Host "" |
There was a problem hiding this comment.
Approaching 3K lines is a bit much. I'd love to see this broken out into multiple files.
There was a problem hiding this comment.
✅ Significant progress — New-FinOpsTestData.ps1 is now down to ~2170 lines (from ~2900), with Remove-FinOpsTestData.ps1 extracted as a standalone command (~340 lines). The remaining ~2170 lines are the core data generation logic. For New-FinOpsExportManifest, the 4 manifest creation sites use different schemas (full Cost Management manifest, simple ingestion triggers, local per-provider) and are tightly coupled to upload context variables. Extracting a clean reusable interface would benefit from a focused refactoring PR. Commit: 0f2e032
PR Summary – New-FinOpsTestData ;& Remove-FinOpsTestDataNew-FinOpsTestData
Remove-FinOpsTestData
Included artifacts
Review items addressedAll 18 original review items + 4 quality improvements resolved across iterations:
|
b65f39a to
228a768
Compare
…mands Add multi-cloud FOCUS test data generator and cleanup commands for FinOps Hub validation. New-FinOpsTestData: - Generates synthetic FOCUS-compliant cost data for Azure, AWS, GCP, and on-premises - Compatible with both Azure Data Explorer and Microsoft Fabric RTI ingestion paths - Supports FOCUS versions 1.0-1.3 with version-specific column sets - Includes commitment discounts, Azure Hybrid Benefit, tag variation, budget scaling - Deterministic output via -Seed parameter, CSV and Parquet formats - Optional upload to Azure Storage with ADF trigger management - Generates all 5 FOCUS datasets: Costs, Prices, CommitmentDiscountUsage, Recommendations, Transactions - All manifests include _ftkTestData watermark for safe cleanup identification Remove-FinOpsTestData: - Multi-layer safety: local-only by default, no cloud deletion without explicit params - Targeted storage deletion: scans manifests for _ftkTestData marker, deletes only test-data folders (production data preserved) - ADX requires -Force because .clear table removes ALL rows (no selective deletion) - Does not manage Microsoft Fabric data (clean up Fabric resources separately) - Verifies ADX update policies after clearing tables - Optional ADF trigger management to prevent re-ingestion during cleanup - ShouldProcess support with ConfirmImpact=High Includes: - 14 unit tests for New-FinOpsTestData (parameter validation, CSV generation, seed reproducibility, multi-provider, FOCUS version columns) - 9 unit tests for Remove-FinOpsTestData (parameter validation, Force safety, WhatIf) - MS Learn documentation pages for both commands - Changelog entries under v14 - README for src/templates/finops-hub/test/ - TOC and open-data-commands.md navigation updates
a1a55d4 to
baa8af6
Compare
Add multi-cloud FOCUS test data generator for FinOps Hub
Description
Adds
Generate-MultiCloudTestData.ps1— a PowerShell script that generates synthetic, multi-cloud, FOCUS-compliant cost data for testing and validating FinOps Hub deployments end-to-end.Closes #2005
What's Included
Generate-MultiCloudTestData.ps1(~1,430 lines) — Self-contained script that generates FOCUS 1.0–1.3 synthetic cost data for Azure, AWS, GCP, and DataCenter providersWhy This Script Is Needed
Testing a FinOps Hub deployment today requires real Cost Management export data. This script fills that gap by generating realistic synthetic data that:
Key Features
Testing
Tested with:
Prerequisites
pandasandpyarrow(for Parquet conversion)Checklist