Inventory Module: Architecture FlowΒΆ
Executive SummaryΒΆ
Purpose: Visual guide to the inventory module's complementary two-layer architecture and 5-layer enrichment pipeline
Key Concepts: - Layer 1 (Discovery): Fast metadata indexing via resource-explorer (AWS Resource Explorer API) - Layer 2 (Operational): Deep configuration retrieval via standalone scripts (service-specific APIs) - 5-Layer Pipeline: Discovery β Organizations β Costs β Activity β Scoring
Flow Diagram: Shows data flow from initial discovery through final decommission scoring
1. Introduction & ArchitectureΒΆ
1.1 OverviewΒΆ
Runbooks Inventory Module provides comprehensive AWS resource discovery across 88 resource types with enterprise-grade enrichment and financial intelligence.
Core Capabilities:
- Multi-Account Discovery: AWS Resource Explorer for organization-wide visibility
- Cost Enrichment: Automated Cost Explorer API integration with 12-month trends
- Activity Analysis: CloudTrail, CloudWatch, SSM, Compute Optimizer multi-signal validation
- Decommission Scoring: E1-E7 (EC2) and W1-W6 (WorkSpaces) evidence-based frameworks
- MCP Validation: Hybrid intelligence engine for β₯99.5% accuracy assurance
Business Benefits:
- Financial Intelligence: Cost-based prioritization for ROI-driven decisions
- Risk-Adjusted Scoring: 4-tier confidence system (MUST/SHOULD/COULD/KEEP)
- Audit Readiness: Complete activity trails for compliance frameworks
- Executive Reporting: Board-ready visualizations and recommendations
1.2 5-Layer ArchitectureΒΆ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Layer 1: Discovery (resource-explorer) β
β β 88 AWS resource types across multi-account Landing Zone β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Layer 2: Organizations (enrich-accounts) β
β β Add account metadata (names, OUs, cost groups) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Layer 3: Costs (enrich-costs) β
β β Add Cost Explorer data (monthly, annual, trends) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Layer 4: Activity (enrich-activity) β
β β Add CloudTrail/CloudWatch/SSM/Compute Optimizer metrics β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Layer 5: Scoring (score-decommission) β
β β Calculate E1-E7/W1-W6 decommission scores (0-100) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Design Principles:
- Unix Philosophy: Each layer does one thing well
- Progressive Enhancement: Layers independent and optional
- Graceful Degradation: Missing data handled transparently
1.3 Profile ArchitectureΒΆ
| Profile | Purpose | Permissions | Usage |
|---|---|---|---|
| Operations | Resource discovery | ReadOnlyAccess, resource-explorer:* | Layer 1, 4 |
| Management | Organizations metadata | organizations:Describe, List | Layer 2 |
| Billing | Cost data | ce:GetCostAndUsage | Layer 3 |
2.0 Command Architecture & PatternsΒΆ
Layer 1 vs Layer 2 Discovery ComparisonΒΆ
| Feature | Layer 1 (Resource Explorer) | Layer 2 (Service-Specific APIs) |
|---|---|---|
| Speed | <10 seconds | 4-5 minutes |
| AWS API | resource-explorer-2:Search |
Service-specific (ec2:Describe, rds:Describe, etc.) |
| Data Returned | Metadata (ARN, tags, CloudFormation) | Full configuration (state, networking, IPs, volumes) |
| Use Case | "What exists?" (inventory enumeration) | "What's configured?" (operational details) |
| Coverage | 88 resource types cross-service | Service-specific depth (100+ attributes/resource) |
| Profile | CENTRALISED_OPS_PROFILE (aggregator) |
Service-specific profiles (granular permissions) |
| Regions | Multi-region via aggregator | Per-region API calls |
| Best For | Initial discovery, compliance audits | Detailed analysis, operational workflows |
Recommendation: Use Layer 1 for discovery (fast enumeration), then Layer 2 for enrichment (detailed configuration).
Command Pattern: find vs listΒΆ
| Pattern | Definition | Examples | Use Case |
|---|---|---|---|
find-* |
Analytical queries with filtering/scoring/detection | find-cfn-drift, find-vpc-flow-logs, find-lz-versions |
Problem detection, compliance validation, analytical queries |
list-* |
Enumeration queries returning all resources of type | list-org-accounts, list-cfn-stacks, list-elbs |
Inventory creation, resource enumeration, bulk operations |
Design Principle: list returns everything, find returns filtered subset based on criteria.
Complementary Two-Layer Architecture FlowΒΆ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β INVENTORY MODULE ARCHITECTURE β
β Complementary Two-Layer Pattern β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββ
β User Initiates β
β Discovery via: β
β β’ CLI Command β
β β’ Taskfile β
β β’ Notebook β
ββββββββββ¬ββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LAYER 1: DISCOVERY & INDEXING β
β (Fast Metadata - <10 seconds) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββ΄βββββββββββββββββββ
β resource-explorer-2:Search API β
β (AWS Resource Explorer Service) β
ββββββββββββββββββββ¬βββββββββββββββββββ
β
ββββββββββββββββββββΌβββββββββββββββββββ
β METADATA RETURNED (88 types) β
β β’ Resource ARN β
β β’ Account ID, Region β
β β’ Resource Type (ec2, rds, etc.) β
β β’ Tags (key-value pairs) β
β β’ CloudFormation metadata β
β β’ Last reported timestamp β
ββββββββββββββββββββ¬βββββββββββββββββββ
β
β CSV Output:
β ec2-discovered.csv
β rds-discovered.csv
β lambda-discovered.csv
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LAYER 2: OPERATIONAL RETRIEVAL β
β (Deep Configuration - 4-5 minutes) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββββββ
β β
βΌ βΌ βΌ
βββββββββββββββββ βββββββββββββββββββββ ββββββββββββββββββββ
β EC2 Scripts β β RDS Scripts β β Lambda Scripts β
β β’ list_ec2_ β β β’ list_rds_db_ β β β’ list_lambda_ β
β instances β β instances β β functions β
β β’ list_ec2_ β β β’ list_rds_ β β β’ Runtime β
β ebs_volumes β β snapshots β β updates β
βββββββββ¬ββββββββ ββββββββββ¬βββββββββββ ββββββββββ¬ββββββββββ
β β β
β ec2:DescribeInstances β rds:DescribeDBInstances β lambda:ListFunctions
β ec2:DescribeVolumes β rds:DescribeDBSnapshots β lambda:GetFunction
β β β
βΌ βΌ βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FULL OPERATIONAL DETAILS RETURNED β
β β’ Instance/DB state (running, stopped, terminated) β
β β’ Instance/resource type (t3.medium, db.t3.small) β
β β’ Networking (IPs, DNS, VPC, subnets, security groups) β
β β’ Storage allocation (EBS volumes, RDS storage) β
β β’ Runtime versions (Lambda python3.11, RDS MySQL 8.0) β
β β’ Performance metrics (CPU, memory, connections) β
β β’ Backup and recovery information β
ββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββ
β Combined Dataset β
β Metadata + Operationalβ
β Ready for Enrichment β
ββββββββββββββββββββββββββ
5-Layer Enrichment Pipeline FlowΒΆ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 5-LAYER ENRICHMENT PIPELINE β
β Best Practice Multi-Account Landing Zone Pattern β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββ
β Start Pipeline β
β task -t β
β Taskfile. β
β inventory.yaml β
β pipeline-5- β
β layer β
ββββββββββ¬ββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LAYER 1: DISCOVERY (resource-explorer) β
β Time: <10 seconds | API: resource-explorer-2:Search β
ββββββββββββββββββββββββββββββββββ―βββββββββββββββββββββββ
β
Input: None β Output: ec2-discovered.csv
Profile: CENTRALISED_ β Records: 137 EC2 instances
OPS_PROFILE β Columns: 15 (ARN, account_id,
β region, resource_type,
β tags, cf_metadata)
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LAYER 2: ORGANIZATIONS ENRICHMENT (multi-account only)β
β Time: ~5 seconds | API: organizations:DescribeAccount β
ββββββββββββββββββββββββββββββββββ―βββββββββββββββββββββββ
β
Input: ec2-discovered. β Output: ec2-org.csv
csv β Records: 137/137 enriched
Profile: MANAGEMENT_ β Columns: +3 (account_name,
PROFILE β ou_name, ou_path)
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LAYER 3: COST ENRICHMENT β
β Time: ~15 seconds | API: ce:GetCostAndUsage β
ββββββββββββββββββββββββββββββββββ―βββββββββββββββββββββββ
β
Input: ec2-org.csv β Output: ec2-cost.csv
Profile: BILLING_ β Records: 137/137 enriched
PROFILE β Columns: +14 (12-month cost
β history: M01-M12,
β cost_total,
β cost_monthly_avg)
β Total: $15.4M analyzed
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LAYER 4: ACTIVITY ENRICHMENT β
β Time: ~4-5 minutes | APIs: Multiple service-specific β
ββββββββββββββββββββββββββββββββββ―βββββββββββββββββββββββ
β
Input: ec2-cost.csv β Output: ec2-activity.csv
Profile: CENTRALISED_ β Records: 137/137 enriched
OPS_PROFILE β Columns: +20+ (Layer 2
β operational data:
β State, InstanceType,
APIs Used: β VpcId, SubnetId,
β’ ec2:DescribeInstancesβ SecurityGroups,
β’ cloudtrail: β CloudTrail events,
LookupEvents β CloudWatch metrics,
β’ cloudwatch: β SSM compliance,
GetMetricStatistics β Compute Optimizer)
β’ ssm: β
DescribeInstanceInfo β
β’ compute-optimizer: β
GetEC2 β
Recommendations β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LAYER 5: SCORING & ANALYSIS β
β Time: ~5 seconds | Logic: E1-E7 decommission signals β
ββββββββββββββββββββββββββββββββββ―βββββββββββββββββββββββ
β
Input: ec2-activity. β Output: ec2-scored.csv
csv β Records: 137/137 scored
β Columns: +8 (E1-E7 signals,
Signals: β decommission_tier:
β’ E1: Terminated β MUST/SHOULD/COULD/
β’ E2: Stopped >90 days β KEEP)
β’ E3: Zero cost β
β’ E4: CloudTrail idle β Tiers:
β’ E5: CloudWatch idle β β’ MUST: 23 instances
β’ E6: SSM disconnected β β’ SHOULD: 47 instances
β’ E7: Optimizer β β’ COULD: 35 instances
downsize β β’ KEEP: 32 instances
β
βΌ
ββββββββββββββββββββ
β Pipeline β
β Complete β
β Total Time: β
β ~5-6 minutes β
β β
β Ready for: β
β β’ Cost analysis β
β β’ Decommission β
β planning β
β β’ Executive β
β reporting β
ββββββββββββββββββββ
Data Flow: Column Evolution Across LayersΒΆ
Layer 1 Output: ec2-discovered.csv (15 columns)ΒΆ
resource_arn | account_id | region | resource_type | resource_id | application |
tags | cf_stack_name | cf_logical_id | cf_stack_id | last_reported_at |
[additional resource-explorer metadata]
Example Row:
arn:aws:ec2:ap-southeast-2:123456789012:instance/i-abc123,123456789012,ap-southeast-2,ec2:instance,i-abc123,WebApp,"{\"Environment\": \"prod\"}",my-app-stack,WebServer,arn:aws:cloudformation:...,2025-11-06T10:00:00Z
Missing: Instance state, instance type, networking, operational details
Layer 2 Output: ec2-org.csv (18 columns)ΒΆ
Example Row:
Value: Organizational context for multi-account governance
Layer 3 Output: ec2-cost.csv (32 columns)ΒΆ
[Layer 2 columns] + M01 | M02 | M03 | M04 | M05 | M06 | M07 | M08 | M09 |
M10 | M11 | M12 | cost_total | cost_monthly_avg
Example Row:
[Layer 2 data...],45.23,47.18,44.92,46.15,45.87,47.23,46.45,45.78,46.92,47.15,45.67,46.23,554.78,46.23
Value: 12-month cost history for financial analysis
Layer 4 Output: ec2-activity.csv (50+ columns)ΒΆ
[Layer 3 columns] + State | InstanceType | PublicDNSName | PrivateIpAddress |
VpcId | SubnetId | SecurityGroups | LaunchTime | Platform | Architecture |
[CloudTrail columns] | [CloudWatch columns] | [SSM columns] |
[Compute Optimizer columns]
Example Row:
[Layer 3 data...],running,t3.medium,ec2-54-123-456-78.compute.amazonaws.com,10.0.1.100,vpc-12345,subnet-67890,"sg-web,sg-monitoring",2025-10-15T08:30:00Z,Linux,x86_64,[activity data...]
Value: Full operational details for technical analysis
Layer 5 Output: ec2-scored.csv (58+ columns)ΒΆ
[Layer 4 columns] + E1_terminated | E2_stopped_90d | E3_zero_cost |
E4_cloudtrail_idle | E5_cloudwatch_idle | E6_ssm_disconnected |
E7_optimizer_downsize | decommission_tier
Example Row:
Value: Actionable decommission recommendations with business justification
API Call Patterns: Layer 1 vs Layer 2ΒΆ
Layer 1: Resource Explorer (Single Aggregated API)ΒΆ
# Single API call returns metadata for ALL 88 resource types
response = resource_explorer_client.search(
QueryString='resourcetype:ec2:instance',
MaxResults=1000
)
# Returns metadata for 137 EC2 instances in <10 seconds
# Data: ARN, account_id, region, tags, CloudFormation metadata
# Missing: State, InstanceType, networking details
Layer 2: Service-Specific APIs (Detailed per-service calls)ΒΆ
# Separate API calls for operational details
ec2_response = ec2_client.describe_instances() # Full instance config
cloudtrail_response = cloudtrail_client.lookup_events(...) # Activity logs
cloudwatch_response = cloudwatch_client.get_metric_statistics(...) # Metrics
ssm_response = ssm_client.describe_instance_information(...) # Compliance
optimizer_response = compute_optimizer_client.get_ec2_instance_recommendations(...)
# Returns operational details in 4-5 minutes
# Data: Everything missing from Layer 1 + activity signals
When to Use Each LayerΒΆ
Use Layer 1 (Discovery) When:ΒΆ
- Initial exploration: "What resources exist across my organization?"
- Tag-based queries: "Find all resources tagged Environment=prod"
- CloudFormation tracking: "Which resources were created by IaC?"
- Cross-service search: "Show me ALL compute resources (EC2 + Lambda + WorkSpaces)"
- Speed is critical: Need results in <10 seconds
Use Layer 2 (Operational) When:ΒΆ
- State analysis: "Which EC2 instances are stopped vs running?"
- Configuration details: "What database engines are we using?"
- Runtime compliance: "Find deprecated Lambda runtimes"
- Network topology: "Which instances are in which VPCs?"
- Write operations: "Upgrade Lambda functions to latest runtime"
- Cost optimization: Need instance types for rightsizing analysis
Use Combined 5-Layer Pipeline When:ΒΆ
- Comprehensive analysis: Discovery + context + costs + activity + scoring
- Decommission planning: Complete dataset for business decision-making
- Executive reporting: Full narrative from discovery to recommendations
- Cost optimization: 12-month cost history + operational signals + savings calculations
- Compliance audit: Complete resource inventory + security + activity + policy compliance
Performance CharacteristicsΒΆ
Single-Account Mode (4 layers - skips Organizations)ΒΆ
Layer 1: Discovery <10 seconds
Layer 3: Cost Enrichment ~15 seconds
Layer 4: Activity ~4-5 minutes
Layer 5: Scoring ~5 seconds
ββββββββββββββββββββββββββββββββββββββββ
Total: ~5-6 minutes
Multi-Account Landing Zone (5 layers - includes Organizations)ΒΆ
Layer 1: Discovery <10 seconds
Layer 2: Organizations ~5 seconds
Layer 3: Cost Enrichment ~15 seconds
Layer 4: Activity ~4-5 minutes
Layer 5: Scoring ~5 seconds
ββββββββββββββββββββββββββββββββββββββββ
Total: ~5.5-6.5 minutes
Performance Optimization TipsΒΆ
- Parallel execution: Run multiple resource type pipelines concurrently (EC2 + RDS + Lambda)
- Incremental enrichment: Skip layers if data already exists (resume from Layer 4)
- Selective enrichment: Filter resources before Layer 4 (e.g., only prod environment)
- Caching: Layer 2 (Organizations) rarely changes - cache for 24 hours
- Batch processing: Layer 3 (Cost) supports batch queries - group by account
Error Handling & Graceful DegradationΒΆ
Terminated ResourcesΒΆ
- Layer 1: Returns terminated EC2 instances (resource-explorer shows last state)
- Layer 4: API calls fail for terminated resources (InstanceNotFoundException)
- Handling: Skip API enrichment, preserve cost history, mark as E1_terminated in Layer 5
Missing PermissionsΒΆ
- Layer 2: Organizations API requires management account access
- Fallback: Skip Organizations enrichment, continue with Layers 3-5
- Impact: Missing account_name/ou_name columns (can still analyze by account_id)
API Rate LimitsΒΆ
- Layer 4: CloudTrail/CloudWatch APIs have rate limits
- Handling: Exponential backoff with retry logic
- Fallback: Skip activity enrichment for specific signals, continue pipeline
ReferencesΒΆ
- Complementary Architecture: COMPLEMENTARY-ARCHITECTURE.md (two-layer pattern details)
- CLI Integration Priorities: CLI-INTEGRATION-PRIORITIES.md (12 scripts roadmap)
- Resource Explorer: src/runbooks/inventory/collectors/resource_explorer.py
- Enrichers: src/runbooks/inventory/enrichers/ (4 enrichers)
- Scorers: src/runbooks/inventory/scorers/ (1 scorer)
- Taskfile: Taskfile.inventory.yaml (28 operations + 2 best-practice workflows)
Version: v1.1.17 (documentation baseline) Last Updated: November 6, 2025 Status: Architecture documentation complete