CloudOps-Runbooks Inventory Module User GuideΒΆ
Version: 1.1.19 | Last Updated: November 7, 2025
Table of ContentsΒΆ
- Introduction & Architecture
- CLI Command Reference
- Taskfile Workflow Patterns
- Enricher Signal Reference
- Integration Patterns
- Troubleshooting & FAQ
- Performance Optimization
- API Reference
1. Introduction & ArchitectureΒΆ
1.1 OverviewΒΆ
CloudOps-Runbooks Inventory Module provides comprehensive AWS resource discovery across 88 resource types with enterprise-grade enrichment capabilities including:
- Multi-Account Discovery: Leverage AWS Resource Explorer for organization-wide resource discovery
- Cost Enrichment: Automated Cost Explorer API integration
- Activity Analysis: CloudTrail, CloudWatch, SSM, and Compute Optimizer integration
- Decommission Scoring: E1-E7 (EC2) and W1-W6 (WorkSpaces) signal frameworks
- MCP Validation: Hybrid intelligence engine for β₯99.5% accuracy
1.2 ArchitectureΒΆ
The inventory module follows a layered architecture pattern:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Layer 1: Discovery (resource-explorer) β
β β 88 AWS resource types across multi-account Landing Zone β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Layer 2: Organizations (enrich-accounts) β
β β Add account metadata (names, OUs, cost groups) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Layer 3: Costs (enrich-costs) β
β β Add Cost Explorer data (monthly, annual, trends) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Layer 4: Activity (enrich-activity) β
β β Add CloudTrail/CloudWatch/SSM/Compute Optimizer metrics β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Layer 5: Scoring (score-decommission) β
β β Calculate E1-E7/W1-W6 decommission scores (0-100) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Design Principles: - Unix Philosophy: Each layer does one thing well - Progressive Enhancement: Layers are independent and optional - Graceful Degradation: Missing data handled transparently - Performance Optimized: Parallel execution where possible
1.3 Profile ArchitectureΒΆ
The inventory module uses 3 distinct AWS profiles for separation of concerns:
| Profile | Purpose | Required Permissions | Usage |
|---|---|---|---|
| Operations | Resource discovery | ReadOnlyAccess, resource-explorer:* | Layer 1, 4 |
| Management | Organizations metadata | organizations:Describe, organizations:List | Layer 2 |
| Billing | Cost data | ce:GetCostAndUsage | Layer 3 |
Best Practice: Configure profiles in ~/.aws/credentials:
[ops-profile]
aws_access_key_id = AKIAEXAMPLEOPS
aws_secret_access_key = secret123
[management-profile]
aws_access_key_id = AKIAEXAMPLEMGMT
aws_secret_access_key = secret456
[billing-profile]
aws_access_key_id = AKIAEXAMPLEBILL
aws_secret_access_key = secret789
2. CLI Command ReferenceΒΆ
The inventory module provides 7 core commands organized in 3 categories:
2.1 Multi-Account Discovery CommandsΒΆ
resource-explorer - Universal Resource DiscoveryΒΆ
Purpose: Discover AWS resources by friendly alias across multi-account Landing Zone
Supported Resource Types: 88 types across 10 categories (Analytics, Compute, Databases, Developer Tools, Management, Migration, ML & AI, Networking, Security, Storage)
Basic Usage:
runbooks inventory resource-explorer \
--resource-type ec2 \
--profile ops-profile \
--output /tmp/ec2-discovered.csv
Advanced Options:
| Option | Type | Description | Example |
|---|---|---|---|
--resource-type |
string | AWS resource type to discover | ec2, s3, rds, lambda |
--list-types |
flag | Display all 88 supported resource types | - |
--profile |
string | AWS profile for single-account operations | ops-profile |
--all-profiles |
flag | Multi-account discovery via aggregator | - |
--regions |
string | Specific regions (space-separated) | ap-southeast-2 us-east-1 |
--all-regions |
flag | Process all enabled AWS regions | - |
--tags |
string | Filter by tags (key=value format) | Environment=prod |
--accounts |
string | Filter by account IDs (comma-separated) | 123456789012,987654321098 |
--output |
path | Output file path | /tmp/ec2.csv |
--format |
choice | Output format | json, csv, table, pdf, markdown |
--console-format |
flag | AWS Console-compatible 7-column export | - |
--enrich-costs |
flag | Auto-enrich with Cost Explorer data | - |
--verbose |
flag | Show detailed execution logs | - |
Examples:
# Single-account EC2 discovery
runbooks inventory resource-explorer \
--resource-type ec2 \
--profile ops-profile \
--output /tmp/ec2-discovered.csv
# Multi-account Lambda discovery across all regions
runbooks inventory resource-explorer \
--resource-type lambda \
--all-profiles \
--all-regions \
--output /tmp/lambda-all-accounts.csv
# Production resources only (tag filtering)
runbooks inventory resource-explorer \
--resource-type rds \
--profile ops-profile \
--tags Environment=prod \
--output /tmp/rds-prod.csv
# AWS Console-compatible export format
runbooks inventory resource-explorer \
--resource-type s3 \
--profile ops-profile \
--console-format \
--output /tmp/s3-console-format.csv
# List all 88 supported resource types
runbooks inventory resource-explorer --list-types
Output Schema (CSV):
resource_id,resource_type,region,account_id,resource_name,tags,state,created_date
i-0abc123,EC2::Instance,ap-southeast-2,123456789012,web-server-01,{"Environment":"prod"},running,2025-01-15
Performance: ~30 seconds per region for EC2 (typical 137 instances)
collect - Raw Multi-Account DiscoveryΒΆ
Purpose: Low-level multi-account resource discovery via Resource Explorer
Usage:
Note: resource-explorer is the recommended interface. Use collect only for programmatic access requiring raw JSON output.
resource-types - List Supported Resource TypesΒΆ
Purpose: Display all 88 supported AWS resource types organized by category
Usage:
Output:
ββββββββββββββββββββββ³βββββββββββββββββββββββββ³βββββββββββββββββββββββββββ
β Alias β AWS Type β Description β
β‘βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ©
β ec2 β EC2::Instance β Virtual servers β
β s3 β S3::Bucket β Object storage β
β rds β RDS::DBInstance β Relational databases β
β lambda β Lambda::Function β Serverless compute β
ββββββββββββββββββββββ΄βββββββββββββββββββββββββ΄βββββββββββββββββββββββββββ
2.2 Enrichment Layer CommandsΒΆ
enrich-accounts - Organizations Metadata EnrichmentΒΆ
Purpose: Add AWS Organizations account metadata (7 columns)
Added Columns:
- account_name: Friendly account name from Organizations
- account_email: Account email address
- wbs_code: Work breakdown structure code
- cost_group: Cost allocation group
- technical_lead: Technical owner
- account_owner: Business owner
- organizational_unit: OU path
Usage:
runbooks inventory enrich-accounts \
--input /tmp/ec2-discovered.csv \
--profile management-profile \
--output /tmp/ec2-with-accounts.csv
Advanced Options:
# Filter specific accounts before enrichment
runbooks inventory enrich-accounts \
--input /tmp/all-resources.csv \
--profile management-profile \
--accounts 123456789012,987654321098 \
--output /tmp/filtered-accounts.csv
# Multi-format export
runbooks inventory enrich-accounts \
--input /tmp/ec2-discovered.csv \
--profile management-profile \
--all-outputs \
--output-dir ./data/outputs
Performance: ~5 seconds (Organizations API caching)
enrich-costs - Cost Explorer IntegrationΒΆ
Purpose: Add Cost Explorer data (3 columns)
Added Columns:
- monthly_cost: Current month cost (USD)
- annual_cost_12mo: Trailing 12-month cost (USD)
- cost_trend_3mo: 3-month trend (increasing/stable/decreasing)
Usage:
runbooks inventory enrich-costs \
--input /tmp/ec2-with-accounts.csv \
--profile billing-profile \
--months 12 \
--output /tmp/ec2-with-costs.csv
Advanced Options:
| Option | Type | Description | Example |
|---|---|---|---|
--months |
integer | Trailing months for analysis | 12 (default) |
--granularity |
choice | MONTHLY or DAILY | MONTHLY |
--cost-metric |
choice | AmortizedCost, UnblendedCost, BlendedCost | AmortizedCost |
--skip-empty-costs |
flag | Exclude $0 resources | - |
--cost-threshold |
float | Minimum monthly cost | 10.0 (>$10/month) |
--group-by |
choice | Grouping dimension | SERVICE, ACCOUNT |
Examples:
# High-cost resources only (>$10/month)
runbooks inventory enrich-costs \
--input /tmp/ec2-with-accounts.csv \
--profile billing-profile \
--cost-threshold 10.0 \
--skip-empty-costs \
--output /tmp/high-cost-ec2.csv
# Daily granularity with amortized costs (RI/SP distributed)
runbooks inventory enrich-costs \
--input /tmp/ec2-with-accounts.csv \
--profile billing-profile \
--granularity DAILY \
--cost-metric AmortizedCost \
--output /tmp/ec2-amortized-costs.csv
# 3-month recent cost analysis
runbooks inventory enrich-costs \
--input /tmp/ec2-with-accounts.csv \
--profile billing-profile \
--months 3 \
--output /tmp/ec2-recent-costs.csv
Cost Metric Comparison: - AmortizedCost: RI/SP costs distributed across resources (enterprise recommendation) - UnblendedCost: Actual charges without RI/SP distribution (default) - BlendedCost: Organization-wide cost averaging
Performance: ~1 minute per Cost Explorer API call
Limitations: - 24-hour data latency (resources <1 day old have no cost data) - Account-level granularity (not resource-level) - Cost Explorer must be enabled (AWS Console β Billing β Cost Explorer)
enrich-activity - Multi-API Activity EnrichmentΒΆ
Purpose: Add CloudTrail/CloudWatch/SSM/Compute Optimizer activity data (11 columns)
Added Columns:
CloudTrail (E3 Signal - 8 points):
- last_activity_date: Most recent CloudTrail event timestamp
- days_since_activity: Days since last event (999 if no events)
- activity_count_90d: Total events in lookback window
CloudWatch (E2 Signal - 10 points):
- p95_cpu_utilization: P95 CPU utilization over period
- p95_network_bytes: P95 network bytes over period
- user_connected_sum: Total user connection minutes (WorkSpaces only)
SSM (E4 Signal - 8 points - EC2 only):
- ssm_ping_status: Online, Offline, ConnectionLost, Not SSM managed
- ssm_last_ping_date: Timestamp of last SSM heartbeat
- ssm_days_since_ping: Days since last heartbeat
Compute Optimizer (E1 Signal - 60 points - EC2 only):
- compute_optimizer_finding: Idle, Underprovisioned, Optimized
- compute_optimizer_cpu_max: Maximum CPU utilization over 14 days
- compute_optimizer_recommendation: Right-sizing recommendation
Usage:
runbooks inventory enrich-activity \
--input /tmp/ec2-with-costs.csv \
--resource-type ec2 \
--profile ops-profile \
--output /tmp/ec2-with-activity.csv
Advanced Options:
| Option | Type | Description | Default |
|---|---|---|---|
--resource-type |
choice | ec2 or workspaces | Required |
--activity-lookback-days |
integer | CloudTrail window | 90 |
--cloudwatch-period |
integer | CloudWatch metrics period | 14 |
--skip-cloudtrail |
flag | Skip E3 signal | False |
--skip-cloudwatch |
flag | Skip E2 signal | False |
--skip-ssm |
flag | Skip E4 signal | False |
--skip-compute-optimizer |
flag | Skip E1 signal | False |
--ssm-timeout |
integer | SSM API timeout (seconds) | 30 |
Examples:
# Standard enrichment with all signals
runbooks inventory enrich-activity \
--input /tmp/ec2-with-costs.csv \
--resource-type ec2 \
--profile ops-profile \
--output /tmp/ec2-fully-enriched.csv
# Fast enrichment (skip CloudTrail and SSM for 40% faster)
runbooks inventory enrich-activity \
--input /tmp/ec2-with-costs.csv \
--resource-type ec2 \
--profile ops-profile \
--skip-cloudtrail \
--skip-ssm \
--output /tmp/ec2-fast-activity.csv
# Custom activity window (30 days for faster API calls)
runbooks inventory enrich-activity \
--input /tmp/ec2-with-costs.csv \
--resource-type ec2 \
--profile ops-profile \
--activity-lookback-days 30 \
--cloudwatch-period 7 \
--output /tmp/ec2-short-window.csv
# WorkSpaces enrichment
runbooks inventory enrich-activity \
--input /tmp/workspaces-with-costs.csv \
--resource-type workspaces \
--profile ops-profile \
--output /tmp/workspaces-with-activity.csv
Performance Tuning: - Full enrichment: ~2 minutes (all 4 APIs) - Skip CloudTrail: ~1.2 minutes (60% faster) - Skip SSM: ~1.5 minutes (25% faster) - Skip both: ~45 seconds (63% faster)
Requirements:
- Input CSV must have resource_id column (instance_id for EC2, workspace_id for WorkSpaces)
- Profile must have CloudTrail, CloudWatch, SSM, Compute Optimizer read permissions
- Graceful degradation: Missing API permissions result in NULL values (not errors)
score-decommission - Decommission Candidate ScoringΒΆ
Purpose: Calculate E1-E7 (EC2) or W1-W6 (WorkSpaces) decommission scores (0-100)
Added Columns:
- decommission_score: 0-100 point score
- decommission_tier: MUST (80-100) | SHOULD (50-79) | COULD (25-49) | KEEP (<25)
- signal_breakdown: JSON object showing which signals triggered
EC2 Signal Scoring (E1-E7): - E1: Compute Optimizer idle (60 points) - BACKBONE SIGNAL - E2: CloudWatch CPU/Network (10 points) - E3: CloudTrail activity (8 points) - E4: SSM heartbeat (8 points) - E5: Service attachment (6 points) - E6: Storage I/O (5 points) - E7: Cost savings (3 points)
WorkSpaces Signal Scoring (W1-W6): - W1: Connection recency (45 points) - W2: CloudWatch usage (25 points) - W3: Billing vs usage (10/5 points) - W4: Cost Optimizer policy (10 points) - W5: Admin activity (5 points) - W6: User status (5 points)
Usage:
runbooks inventory score-decommission \
--input /tmp/ec2-fully-enriched.csv \
--resource-type ec2 \
--output /tmp/ec2-scored.csv
Advanced Options:
| Option | Type | Description | Example |
|---|---|---|---|
--score-threshold |
integer | Minimum score for output | 80 (MUST tier only) |
--tier-filter |
choice | Filter to specific tier | MUST, SHOULD |
--min-monthly-cost |
float | Minimum monthly cost | 10.0 |
--exclude-signals |
string | Exclude signals from scoring | E1,E2 |
--custom-weights |
json | Override default weights | {"E1": 70, "E2": 5} |
Examples:
# High-confidence candidates only (MUST tier: 80-100)
runbooks inventory score-decommission \
--input /tmp/ec2-fully-enriched.csv \
--resource-type ec2 \
--score-threshold 80 \
--output /tmp/ec2-must-decommission.csv
# High-cost idle resources (>$10/month, score β₯70)
runbooks inventory score-decommission \
--input /tmp/ec2-fully-enriched.csv \
--resource-type ec2 \
--score-threshold 70 \
--min-monthly-cost 10.0 \
--output /tmp/ec2-high-value-targets.csv
# Conservative scoring (exclude Compute Optimizer E1 signal)
runbooks inventory score-decommission \
--input /tmp/ec2-fully-enriched.csv \
--resource-type ec2 \
--exclude-signals E1 \
--output /tmp/ec2-conservative-scores.csv
# Custom weights (emphasize cost over activity)
runbooks inventory score-decommission \
--input /tmp/ec2-fully-enriched.csv \
--resource-type ec2 \
--custom-weights '{"E7": 20, "E3": 5}' \
--output /tmp/ec2-cost-weighted.csv
# WorkSpaces decommission scoring
runbooks inventory score-decommission \
--input /tmp/workspaces-fully-enriched.csv \
--resource-type workspaces \
--score-threshold 70 \
--output /tmp/workspaces-candidates.csv
Tier Interpretation: - MUST (80-100): Very high confidence - idle >90 days, no traffic, dev/test tags - SHOULD (50-79): High confidence - idle >60 days, low utilization - COULD (25-49): Medium confidence - review required before action - KEEP (<25): Active workload - do not decommission
Output Schema (CSV):
resource_id,decommission_score,decommission_tier,signal_breakdown,monthly_cost,recommendation
i-0abc123,95,MUST,{"E1":60,"E2":10,"E3":8,"E4":8,"E5":6,"E7":3},15.50,Terminate
i-0def456,72,SHOULD,{"E1":60,"E2":5,"E3":8,"E4":0},8.25,Stop or Resize
Performance: <10 seconds (local calculation, no API calls)
2.3 Validation Framework CommandsΒΆ
validate-mcp - MCP Cross-ValidationΒΆ
Purpose: Cross-validate cost calculations with MCP (Model Context Protocol) server (β₯99.5% accuracy target)
Usage:
Validation Process: 1. Extracts cost data from enriched CSV 2. Queries MCP cost-explorer server for same time period 3. Calculates variance (% difference) 4. Reports accuracy metrics
Output:
ββββββββββββββββββββββ³βββββββββββββββ³βββββββββββββββ³βββββββββββββ
β Metric β CSV Value β MCP Value β Variance β
β‘ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ©
β Total Cost β $15,432.67 β $15,401.23 β 0.2% β
β Resource Count β 137 β 137 β 0.0% β
β Accuracy β - β - β 99.8% β
ββββββββββββββββββββββ΄βββββββββββββββ΄βββββββββββββββ΄βββββββββββββ
β
PASS: Accuracy 99.8% exceeds target (β₯99.5%)
Requirements: - MCP awslabs.cost-explorer server configured - Same AWS profile used for both enrichment and validation
validate-costs - Cost Data Accuracy ValidationΒΆ
Purpose: Validate cost data accuracy against AWS Cost Explorer API
Usage:
Validation Checks: - Cost data completeness (% resources with cost data) - Cost range validation (detect outliers) - Trend consistency (monthly vs annual alignment) - Zero-cost resource detection
Output:
Cost Validation Report
ββββββββββββββββββββββββββββββββββββββββββββ
Completeness: 98.5% (135/137 resources have cost data)
Zero-cost: 2 resources (1.5%)
Outliers: 0 resources (0.0%)
Trend consistency: β
PASS
Recommendations:
- 2 resources missing cost data (likely <24h old)
- No action required
3. Taskfile Workflow PatternsΒΆ
The inventory module provides pre-configured Taskfile workflows for common scenarios.
3.1 Workflow OverviewΒΆ
# Display all available tasks
task -l -t Taskfile.inventory.yaml
# Execute specific task
task -t Taskfile.inventory.yaml <task-name>
3.2 Best Practice WorkflowsΒΆ
workflow-single - Single-Account 4-Layer PipelineΒΆ
Purpose: Complete inventory enrichment for single AWS account
Layers: 1. Discovery (resource-explorer) 2. Costs (enrich-costs) 3. Activity (enrich-activity) 4. Scoring (score-decommission)
Usage:
Customization:
# Override profile and output directory
task -t Taskfile.inventory.yaml workflow-single \
CENTRALISED_OPS_PROFILE=my-ops-profile \
OUTPUT_DIR=/tmp/inventory
Execution Time: ~10-15 minutes (typical 137 EC2 instances)
Output Files:
- data/outputs/ec2-discovered.csv (Layer 1)
- data/outputs/ec2-cost.csv (Layer 2)
- data/outputs/ec2-activity.csv (Layer 3)
- data/outputs/ec2-scored.csv (Layer 4)
workflow-multi-lz - Multi-Account Landing Zone 5-Layer PipelineΒΆ
Purpose: Complete inventory enrichment for multi-account Landing Zone
Layers: 1. Discovery (resource-explorer with --all-profiles) 2. Organizations (enrich-accounts) 3. Costs (enrich-costs) 4. Activity (enrich-activity) 5. Scoring (score-decommission)
Usage:
Customization:
# Override profiles
task -t Taskfile.inventory.yaml workflow-multi-lz \
MANAGEMENT_PROFILE=my-mgmt-profile \
BILLING_PROFILE=my-billing-profile \
CENTRALISED_OPS_PROFILE=my-ops-profile
Execution Time: ~20-30 minutes (67+ accounts, typical 500+ resources)
Output Files:
- data/outputs/ec2-discovered.csv (Layer 1)
- data/outputs/ec2-org.csv (Layer 2)
- data/outputs/ec2-cost.csv (Layer 3)
- data/outputs/ec2-activity.csv (Layer 4)
- data/outputs/ec2-scored.csv (Layer 5)
3.3 Individual Task WorkflowsΒΆ
Resource Discovery TasksΒΆ
# EC2 discovery
task -t Taskfile.inventory.yaml discover-ec2
# RDS discovery
task -t Taskfile.inventory.yaml discover-rds
# S3 discovery
task -t Taskfile.inventory.yaml discover-s3
# Lambda discovery
task -t Taskfile.inventory.yaml discover-lambda
# WorkSpaces discovery
task -t Taskfile.inventory.yaml discover-workspaces
# RAM shares discovery
task -t Taskfile.inventory.yaml discover-ram-shares
# List all resource types
task -t Taskfile.inventory.yaml list-resource-types
Organizations TasksΒΆ
# List all accounts
task -t Taskfile.inventory.yaml list-accounts
# Visualize organization hierarchy
task -t Taskfile.inventory.yaml draw-org
# Check Landing Zone configuration
task -t Taskfile.inventory.yaml check-landing-zone
# Check Control Tower setup
task -t Taskfile.inventory.yaml check-control-tower
# List organization users
task -t Taskfile.inventory.yaml list-org-users
# Find Landing Zone versions
task -t Taskfile.inventory.yaml find-lz-versions
Enrichment TasksΒΆ
# Enrich with Organizations metadata
task -t Taskfile.inventory.yaml enrich-accounts \
INPUT=/tmp/ec2-discovered.csv \
OUTPUT=/tmp/ec2-org.csv
# Enrich with cost data
task -t Taskfile.inventory.yaml enrich-costs \
INPUT=/tmp/ec2-org.csv \
OUTPUT=/tmp/ec2-cost.csv \
MONTHS=12
# Enrich with activity metrics
task -t Taskfile.inventory.yaml enrich-activity \
INPUT=/tmp/ec2-cost.csv \
RESOURCE_TYPE=ec2 \
OUTPUT=/tmp/ec2-activity.csv
# Score decommission candidates
task -t Taskfile.inventory.yaml score-decommission \
INPUT=/tmp/ec2-activity.csv \
RESOURCE_TYPE=ec2 \
OUTPUT=/tmp/ec2-scored.csv
Pipeline TasksΒΆ
# Execute complete 5-layer pipeline for EC2
task -t Taskfile.inventory.yaml pipeline-5-layer
# Execute complete 5-layer pipeline for WorkSpaces
task -t Taskfile.inventory.yaml pipeline-5-layer-workspaces
# Display pipeline summary
task -t Taskfile.inventory.yaml pipeline-summary RESOURCE_TYPE=ec2
Validation TasksΒΆ
# MCP cross-validation
task -t Taskfile.inventory.yaml validate-mcp RESOURCE_TYPE=ec2
# Cost data validation
task -t Taskfile.inventory.yaml validate-costs \
INPUT=/tmp/ec2-cost.csv
Utility TasksΒΆ
# Clean output directory
task -t Taskfile.inventory.yaml clean-outputs
# Show configured profiles
task -t Taskfile.inventory.yaml show-profiles
# List generated output files
task -t Taskfile.inventory.yaml list-outputs
4. Enricher Signal ReferenceΒΆ
4.1 EC2 Decommission Signals (E1-E7)ΒΆ
E1: Compute Optimizer Idle Detection (60 points)ΒΆ
Source: AWS Compute Optimizer GetEC2InstanceRecommendations API
Trigger Conditions: - Compute Optimizer finding = "Idle" - Maximum CPU utilization <5% over 14 days - Recommendation = "Terminate" or "Stop"
Data Collected:
- compute_optimizer_finding: Idle, Underprovisioned, Optimized
- compute_optimizer_cpu_max: Maximum CPU utilization (%)
- compute_optimizer_recommendation: Termination or right-sizing recommendation
Points: 60 (backbone signal - highest weight)
Example:
{
"compute_optimizer_finding": "Idle",
"compute_optimizer_cpu_max": 2.3,
"compute_optimizer_recommendation": "Terminate instance"
}
E2: CloudWatch Low Utilization (10 points)ΒΆ
Source: AWS CloudWatch GetMetricStatistics API
Trigger Conditions: - P95 CPU utilization <5% over 14 days - P95 network bytes <1MB/day over 14 days
Data Collected:
- p95_cpu_utilization: P95 CPU utilization (%)
- p95_network_bytes: P95 network bytes transferred
Points: 10
Example:
E3: CloudTrail Inactivity (8 points)ΒΆ
Source: AWS CloudTrail LookupEvents API
Trigger Conditions: - Zero CloudTrail events in 90-day lookback window - Days since last activity β₯90
Data Collected:
- last_activity_date: Most recent CloudTrail event timestamp
- days_since_activity: Days since last event (999 if no events)
- activity_count_90d: Total events in lookback window
Points: 8
Example:
E4: SSM Heartbeat Loss (8 points)ΒΆ
Source: AWS SSM DescribeInstanceInformation API
Trigger Conditions: - SSM ping status = "Offline" or "ConnectionLost" - Days since last SSM heartbeat β₯30
Data Collected:
- ssm_ping_status: Online, Offline, ConnectionLost, Not SSM managed
- ssm_last_ping_date: Timestamp of last SSM heartbeat
- ssm_days_since_ping: Days since last heartbeat
Points: 8
Example:
E5: Service Attachment (6 points)ΒΆ
Source: EC2 DescribeInstances API (enrichment metadata)
Trigger Conditions: - Not attached to Elastic Load Balancer - Not member of Auto Scaling Group - Not Target Group member
Points: 6
E6: Storage I/O (5 points)ΒΆ
Source: AWS CloudWatch GetMetricStatistics API
Trigger Conditions: - Disk read IOPS <100/day over 14 days - Disk write IOPS <100/day over 14 days
Points: 5
E7: Cost Savings (3 points)ΒΆ
Source: Cost enrichment layer
Trigger Conditions: - Monthly cost >$0 - Age >365 days (long-running resource)
Points: 3
4.2 WorkSpaces Decommission Signals (W1-W6)ΒΆ
W1: Connection Recency (45 points)ΒΆ
Source: AWS WorkSpaces DescribeWorkspaces + CloudWatch
Trigger Conditions: - Last user connection >90 days - Zero user connection minutes in 30-day window
Points: 45 (backbone signal for WorkSpaces)
W2: CloudWatch Usage (25 points)ΒΆ
Source: AWS CloudWatch GetMetricStatistics API
Trigger Conditions: - User connection time <5 minutes/day average over 30 days - Zero active sessions in 14-day window
Data Collected:
- user_connected_sum: Total user connection minutes
Points: 25
W3: Billing vs Usage (10/5 points)ΒΆ
Source: Cost enrichment + CloudWatch correlation
Trigger Conditions: - Monthly cost >$0 (10 points if true) - Usage <5% of billing period (5 points if true)
Points: 10 (billing active) + 5 (low usage) = 15 max
W4: Cost Optimizer Policy (10 points)ΒΆ
Source: AWS WorkSpaces DescribeWorkspaces API
Trigger Conditions: - Running mode = ALWAYS_ON with low usage - No AutoStop configuration
Points: 10
W5: Admin Activity (5 points)ΒΆ
Source: AWS CloudTrail LookupEvents API
Trigger Conditions: - Zero admin actions (ModifyWorkspaceProperties, etc.) in 90 days
Points: 5
W6: User Status (5 points)ΒΆ
Source: AWS WorkSpaces DescribeWorkspaces API
Trigger Conditions: - User state = ADMIN_MAINTENANCE or ERROR - WorkSpace state = STOPPED or SUSPENDED
Points: 5
4.3 S3 Cost Optimization Signals (S1-S7)ΒΆ
S1: No Lifecycle Policy (20 points)ΒΆ
Source: S3 GetBucketLifecycleConfiguration API
Trigger: Bucket has no lifecycle policy configured
S2: Intelligent Tiering Not Enabled (15 points)ΒΆ
Source: S3 GetBucketIntelligentTieringConfiguration API
Trigger: Bucket has >1TB data without Intelligent Tiering
S3: Glacier Migration Candidates (15 points)ΒΆ
Source: S3 Storage Lens API
Trigger: Objects not accessed in 90+ days, not in Glacier
S4: Deep Archive Candidates (10 points)ΒΆ
Source: S3 Storage Lens API
Trigger: Objects not accessed in 180+ days, not in Deep Archive
S5: Versioning Without Expiration (10 points)ΒΆ
Source: S3 GetBucketVersioning API
Trigger: Versioning enabled without NoncurrentVersionExpiration policy
S6: Encryption Not Enabled (10 points)ΒΆ
Source: S3 GetBucketEncryption API
Trigger: Bucket without default encryption (SSE-S3/KMS)
S7: High Storage Cost (10 points)ΒΆ
Source: Cost enrichment layer
Trigger: Monthly storage cost >$100 without optimization policies
4.4 RDS Activity Signals (R1-R7)ΒΆ
R1: Zero Connections (30 points)ΒΆ
Source: CloudWatch DatabaseConnections metric
Trigger: Zero database connections in 90-day window
R2: Low Connection Activity (20 points)ΒΆ
Source: CloudWatch DatabaseConnections metric
Trigger: Average connections <5 per day over 30 days
R3: No Read/Write Operations (15 points)ΒΆ
Source: CloudWatch ReadIOPS + WriteIOPS metrics
Trigger: Zero read/write operations in 30-day window
R4: Snapshot Age (10 points)ΒΆ
Source: RDS DescribeDBSnapshots API
Trigger: Most recent snapshot >30 days old (indicates inactive database)
R5: No CloudWatch Alarms (10 points)ΒΆ
Source: CloudWatch DescribeAlarms API
Trigger: No alarms configured for database instance
R6: Development/Test Tags (8 points)ΒΆ
Source: RDS DescribeDBInstances API tags
Trigger: Environment=dev/test/sandbox tags present
R7: High Cost, Low Utilization (7 points)ΒΆ
Source: Cost enrichment + CloudWatch correlation
Trigger: Monthly cost >$100 with <10% CPU utilization
4.5 NAT Gateway Traffic Signals (D1-D6)ΒΆ
D1: S3 Traffic (30 points)ΒΆ
Source: VPC Flow Logs analysis
Trigger: >50% traffic to S3 prefix lists (VPC Endpoint candidate)
D2: DynamoDB Traffic (25 points)ΒΆ
Source: VPC Flow Logs analysis
Trigger: >50% traffic to DynamoDB prefix lists (VPC Endpoint candidate)
D3: Low Traffic Volume (15 points)ΒΆ
Source: VPC Flow Logs aggregation
Trigger: <1GB/month traffic through NAT Gateway
D4: High Cost per GB (10 points)ΒΆ
Source: Cost enrichment + Flow Logs correlation
Trigger: Cost per GB >$0.10 (typical NAT Gateway cost)
D5: Multiple NAT Gateways (10 points)ΒΆ
Source: EC2 DescribeNatGateways API
Trigger: >1 NAT Gateway per AZ (consolidation opportunity)
D6: Idle NAT Gateway (10 points)ΒΆ
Source: VPC Flow Logs analysis
Trigger: Zero traffic in 30-day window
5. Integration PatternsΒΆ
5.1 MCP Validation IntegrationΒΆ
Purpose: Cross-validate inventory cost calculations with hybrid intelligence engine
Architecture:
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Inventory β β MCP Server β β AWS Cost β
β Cost Enricher βββββββΆβ awslabs.cost- βββββββΆβ Explorer API β
β β β explorer β β β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β
β β
βΌ βΌ
CSV Output JSON Response
($15,432.67) ($15,401.23)
β β
ββββββββββ¬ββββββββββββββββ
β
βΌ
Variance Analysis
(0.2% = 99.8% accuracy)
Configuration:
-
Install MCP server:
-
Configure
~/.config/mcp/config.json: -
Validate integration:
Accuracy Targets: - β₯99.5%: Production-ready (manager approval likely) - 95-99.5%: Acceptable with documented variance - <95%: Investigation required (cost data quality issue)
5.2 Cost Explorer API IntegrationΒΆ
API: AWS Cost Explorer GetCostAndUsage
Endpoint: ce:GetCostAndUsage
Request Pattern:
response = ce_client.get_cost_and_usage(
TimePeriod={
'Start': '2024-11-01',
'End': '2025-11-01'
},
Granularity='MONTHLY',
Metrics=['AmortizedCost'],
GroupBy=[
{'Type': 'DIMENSION', 'Key': 'LINKED_ACCOUNT'}
]
)
Rate Limits: - 5 requests per second - 100 requests per day (free tier) - 1,000 requests per day (paid tier)
Best Practices: - Use MONTHLY granularity for trends (fewer API calls) - Use AmortizedCost for enterprise RI/SP tracking - Cache responses for 24 hours (data latency) - Implement exponential backoff for throttling
5.3 CloudTrail IntegrationΒΆ
API: AWS CloudTrail LookupEvents
Endpoint: cloudtrail:LookupEvents
Request Pattern:
response = cloudtrail_client.lookup_events(
LookupAttributes=[
{
'AttributeKey': 'ResourceName',
'AttributeValue': 'i-0abc123def456789'
}
],
StartTime=datetime.now() - timedelta(days=90),
EndTime=datetime.now(),
MaxResults=50
)
Rate Limits: - 2 transactions per second (TPS) - Throttling: 429 TooManyRequestsException
Best Practices: - Use 90-day lookback (balance between coverage and API calls) - Implement exponential backoff for throttling - Cache results per resource (avoid duplicate queries) - Filter by event type to reduce noise
5.4 Organizations API IntegrationΒΆ
API: AWS Organizations ListAccounts, DescribeOrganization
Endpoint: organizations:ListAccounts
Request Pattern:
response = org_client.list_accounts()
accounts = response['Accounts']
for account in accounts:
account_id = account['Id']
account_name = account['Name']
account_email = account['Email']
Caching: Organizations data cached for 1 hour (low change frequency)
Best Practices: - Query once per session (data rarely changes) - Use pagination for >100 accounts - Store metadata in local cache file
5.5 Jupyter Notebook IntegrationΒΆ
Purpose: Interactive analysis of enriched inventory data
Example Notebook:
import pandas as pd
import matplotlib.pyplot as plt
# Load enriched data
df = pd.read_csv('/tmp/ec2-scored.csv')
# Filter high-confidence candidates (MUST tier)
must_candidates = df[df['decommission_tier'] == 'MUST']
# Calculate potential savings
total_savings = must_candidates['monthly_cost'].sum()
print(f"Potential monthly savings: ${total_savings:,.2f}")
# Visualize decommission score distribution
df['decommission_score'].hist(bins=20)
plt.xlabel('Decommission Score')
plt.ylabel('Frequency')
plt.title('EC2 Decommission Score Distribution')
plt.show()
# Top 10 decommission candidates by cost
top_candidates = must_candidates.nlargest(10, 'monthly_cost')
print(top_candidates[['resource_id', 'decommission_score', 'monthly_cost', 'signal_breakdown']])
Interactive Widgets:
import ipywidgets as widgets
from IPython.display import display
# Interactive score threshold slider
threshold_slider = widgets.IntSlider(
value=80,
min=0,
max=100,
step=5,
description='Threshold:',
continuous_update=False
)
def update_candidates(threshold):
filtered = df[df['decommission_score'] >= threshold]
savings = filtered['monthly_cost'].sum()
print(f"Candidates: {len(filtered)} | Savings: ${savings:,.2f}/month")
widgets.interact(update_candidates, threshold=threshold_slider)
6. Troubleshooting & FAQΒΆ
6.1 Common IssuesΒΆ
Issue: "Access Denied" errorsΒΆ
Symptoms:
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the DescribeInstances operation
Solution:
# Verify IAM permissions
aws iam get-user --profile ops-profile
aws iam list-attached-user-policies --user-name your-username
# Required permissions:
# - ec2:Describe*
# - rds:Describe*
# - s3:List*, s3:Get*
# - ce:GetCostAndUsage
# - cloudtrail:LookupEvents
# - organizations:List*, organizations:Describe*
Issue: "No Cost Data" in enrichment outputΒΆ
Symptoms:
All resources show monthly_cost = 0.00
Solution:
# 1. Verify Cost Explorer is enabled
aws ce get-cost-and-usage \
--time-period Start=2025-10-01,End=2025-11-01 \
--granularity MONTHLY \
--metrics AmortizedCost \
--profile billing-profile
# 2. Check resource age (must be >24 hours old)
# 3. Verify billing profile has ce:GetCostAndUsage permission
Issue: CloudTrail API throttlingΒΆ
Symptoms:
botocore.exceptions.ClientError: An error occurred (ThrottlingException) when calling the LookupEvents operation
Solution:
# Option 1: Reduce lookback window
runbooks inventory enrich-activity \
--input /tmp/ec2.csv \
--resource-type ec2 \
--profile ops-profile \
--activity-lookback-days 30 \ # Instead of 90
--output /tmp/ec2-activity.csv
# Option 2: Skip CloudTrail enrichment
runbooks inventory enrich-activity \
--input /tmp/ec2.csv \
--resource-type ec2 \
--profile ops-profile \
--skip-cloudtrail \
--output /tmp/ec2-activity.csv
Issue: Timeout errors with large inventoriesΒΆ
Symptoms:
Solution:
# Option 1: Limit regions
runbooks inventory resource-explorer \
--resource-type ec2 \
--profile ops-profile \
--regions ap-southeast-2 us-east-1 \ # Only 2 regions
--output /tmp/ec2.csv
# Option 2: Skip pagination for preview
runbooks inventory resource-explorer \
--resource-type ec2 \
--profile ops-profile \
--skip-pagination \ # First page only
--output /tmp/ec2-preview.csv
# Option 3: Filter by accounts
runbooks inventory resource-explorer \
--resource-type ec2 \
--profile ops-profile \
--accounts 123456789012,987654321098 \ # Specific accounts
--output /tmp/ec2-filtered.csv
Issue: Profile not foundΒΆ
Symptoms:
Solution:
# 1. List available profiles
aws configure list-profiles
# 2. Verify credentials file
cat ~/.aws/credentials
# 3. Add missing profile
aws configure --profile ops-profile
Issue: MCP validation fails with 0% accuracyΒΆ
Symptoms:
Solution:
# 1. Verify MCP server is running
mcp list-servers
# 2. Check MCP server configuration
cat ~/.config/mcp/config.json
# 3. Test MCP server directly
mcp query awslabs-cost-explorer "Get cost for October 2025"
# 4. Ensure same AWS profile used
# - Inventory enrichment: --profile billing-profile
# - MCP validation: AWS_PROFILE=billing-profile in MCP config
6.2 Frequently Asked QuestionsΒΆ
Q: How long does a complete 5-layer pipeline take?
A: Typical execution times: - Single account (137 EC2): 10-15 minutes - Multi-account LZ (67 accounts, 500+ resources): 20-30 minutes
Performance factors: - Number of regions (3-5 regions typical) - Number of resources (100-500 per account) - API rate limits (CloudTrail 2 TPS, Cost Explorer 5 TPS)
Q: Can I run inventory enrichment without Cost Explorer?
A: Yes. All enrichment layers are optional:
# Discovery only (no enrichment)
runbooks inventory resource-explorer \
--resource-type ec2 \
--profile ops-profile \
--output /tmp/ec2-discovered.csv
# Activity enrichment only (skip costs)
runbooks inventory enrich-activity \
--input /tmp/ec2-discovered.csv \
--resource-type ec2 \
--profile ops-profile \
--output /tmp/ec2-activity.csv
Q: What's the difference between --profile and --all-profiles?
A: - --profile: Single-account operations (developer/operator use case) - --all-profiles: Multi-account discovery via Resource Explorer aggregator (platform team use case)
# Single account
runbooks inventory resource-explorer \
--profile dev-account \
--resource-type ec2 \
--output /tmp/ec2.csv
# Multi-account (requires aggregator index)
runbooks inventory resource-explorer \
--all-profiles \
--resource-type ec2 \
--output /tmp/ec2-all-accounts.csv
Q: How do I export results in multiple formats?
A:
# Option 1: --all-outputs flag with --output-dir
runbooks inventory resource-explorer \
--resource-type ec2 \
--profile ops-profile \
--all-outputs \
--output-dir /tmp/outputs
# Option 2: Specify format explicitly
runbooks inventory resource-explorer \
--resource-type ec2 \
--profile ops-profile \
--format pdf \
--output /tmp/ec2.pdf
Q: Can I customize decommission signal weights?
A: Yes, use --custom-weights:
runbooks inventory score-decommission \
--input /tmp/ec2-activity.csv \
--resource-type ec2 \
--custom-weights '{"E1": 70, "E2": 5, "E7": 20}' \
--output /tmp/ec2-custom-scores.csv
Q: What AWS regions are supported?
A: All enabled AWS regions. Use --all-regions for complete coverage:
runbooks inventory resource-explorer \
--resource-type ec2 \
--profile ops-profile \
--all-regions \
--output /tmp/ec2-all-regions.csv
To list enabled regions:
aws ec2 describe-regions \
--query 'Regions[?OptInStatus==`opt-in-not-required` || OptInStatus==`opted-in`].RegionName' \
--output table
Q: How do I handle terminated/stopped resources?
A: Inventory module implements graceful degradation:
- Terminated resources: Skipped in activity enrichment (no API calls)
- Stopped resources: Included with state="stopped" in enrichment
- Cost data: Preserved for terminated resources (historical costs available)
# Filter to running instances only
runbooks inventory resource-explorer \
--resource-type ec2 \
--profile ops-profile \
--query-filter 'state:running' \
--output /tmp/ec2-running-only.csv
Q: Can I schedule inventory collection via cron?
A: Yes. Example cron job:
# Daily EC2 inventory at 2 AM
0 2 * * * cd /opt/cloudops && task -t Taskfile.inventory.yaml workflow-single >> /var/log/inventory.log 2>&1
# Weekly multi-account inventory on Sundays at 3 AM
0 3 * * 0 cd /opt/cloudops && task -t Taskfile.inventory.yaml workflow-multi-lz >> /var/log/inventory-lz.log 2>&1
7. Performance OptimizationΒΆ
7.1 Discovery OptimizationΒΆ
Problem: Slow discovery across many regions
Solutions:
-
Limit regions:
-
Skip pagination (preview mode):
-
Filter by tags (reduce result set):
7.2 Enrichment OptimizationΒΆ
Problem: Slow activity enrichment (CloudTrail/SSM)
Solutions:
-
Skip expensive APIs:
-
Reduce lookback window:
-
Parallel execution (Taskfile):
7.3 Cost Explorer OptimizationΒΆ
Problem: Cost Explorer API throttling
Solutions:
-
Use MONTHLY granularity:
-
Cache responses (24-hour TTL):
-
Filter resources before enrichment:
7.4 Benchmark ResultsΒΆ
Test Environment: 137 EC2 instances, 3 regions (ap-southeast-2, us-east-1, ap-southeast-1)
| Operation | Time | Optimization |
|---|---|---|
| Discovery (all regions) | 90s | Limit to 2 regions: 30s (67% faster) |
| Cost enrichment (DAILY) | 120s | Use MONTHLY: 60s (50% faster) |
| Activity enrichment (full) | 180s | Skip CloudTrail+SSM: 90s (50% faster) |
| Decommission scoring | 8s | No optimization needed |
| Total (full) | 398s | Optimized: 188s (53% faster) |
8. API ReferenceΒΆ
8.1 Python SDK UsageΒΆ
Installation:
Basic Usage:
from runbooks.inventory.core.collector import InventoryCollector
# Initialize collector
collector = InventoryCollector(profile='ops-profile')
# Collect EC2 instances
result = await collector.collect_service(
service='ec2',
regions=['ap-southeast-2', 'us-east-1'],
include_costs=True
)
print(f"Discovered {result['resource_count']} EC2 instances")
Advanced Usage:
from runbooks.inventory.enrichers.cost_enricher import CostEnricher
from runbooks.inventory.enrichers.activity_enricher import ActivityEnricher
# Cost enrichment
cost_enricher = CostEnricher(profile='billing-profile')
enriched_data = await cost_enricher.enrich(
resources=result['resources'],
months=12,
granularity='MONTHLY'
)
# Activity enrichment
activity_enricher = ActivityEnricher(profile='ops-profile')
fully_enriched = await activity_enricher.enrich(
resources=enriched_data,
resource_type='ec2',
lookback_days=90
)
8.2 REST API Integration (Future)ΒΆ
Status: Planned for v1.2.0
Endpoint Examples:
GET /api/v1/inventory/resources?type=ec2&profile=ops
POST /api/v1/inventory/enrich/costs
POST /api/v1/inventory/score/decommission
8.3 Data ModelsΒΆ
InventoryResource (Pydantic model):
from pydantic import BaseModel
from typing import Optional
class InventoryResource(BaseModel):
resource_id: str
resource_type: str
region: str
account_id: str
resource_name: Optional[str]
tags: dict
state: str
created_date: Optional[str]
# Organizations enrichment
account_name: Optional[str]
account_email: Optional[str]
organizational_unit: Optional[str]
# Cost enrichment
monthly_cost: Optional[float]
annual_cost_12mo: Optional[float]
cost_trend_3mo: Optional[str]
# Activity enrichment
last_activity_date: Optional[str]
days_since_activity: Optional[int]
p95_cpu_utilization: Optional[float]
# Decommission scoring
decommission_score: Optional[int]
decommission_tier: Optional[str]
signal_breakdown: Optional[dict]
Appendix A: AWS IAM Policy TemplatesΒΆ
A.1 Discovery Profile PolicyΒΆ
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:Describe*",
"rds:Describe*",
"s3:List*",
"s3:Get*",
"lambda:List*",
"lambda:Get*",
"resource-explorer-2:*",
"cloudwatch:GetMetricStatistics",
"compute-optimizer:GetEC2InstanceRecommendations"
],
"Resource": "*"
}
]
}
A.2 Management Profile PolicyΒΆ
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"organizations:ListAccounts",
"organizations:DescribeOrganization",
"organizations:DescribeAccount",
"organizations:ListOrganizationalUnitsForParent",
"organizations:ListAccountsForParent"
],
"Resource": "*"
}
]
}
A.3 Billing Profile PolicyΒΆ
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ce:GetCostAndUsage",
"ce:GetCostForecast",
"ce:GetDimensionValues"
],
"Resource": "*"
}
]
}
Appendix B: Resource Type ReferenceΒΆ
See complete list via:
10 Categories: 1. Analytics (9 types) 2. Compute (12 types) 3. Databases (8 types) 4. Developer Tools (5 types) 5. Management (11 types) 6. Migration (4 types) 7. ML & AI (6 types) 8. Networking (14 types) 9. Security (10 types) 10. Storage (9 types)
Total: 88 AWS resource types
Appendix C: ChangelogΒΆ
v1.1.19 (November 7, 2025): - Added EC2 decommission signals (E1-E7) - Added CloudTrail activity enricher (I1-I6) - Added S3 cost analyzer (S1-S7) - Added RDS activity enricher (R1-R7) - Added NAT Gateway traffic analyzer - 110+ collectors & enrichers operational - MCP validation integration (β₯99.5% accuracy)
v1.1.18 (November 6, 2025): - 7 inventory commands with OutputController - Cross-module UX standardization - Best practice workflows (single/multi-account)
v1.1.17 (November 6, 2025): - Emergency hotfix: cost_enricher logging - OutputController profile handling fix
Document Version: 1.1.19 Last Updated: November 7, 2025 Maintainer: CloudOps Team ([email protected]) License: Apache 2.0