Inventory Module: Architecture Flow¶

Executive Summary¶

Purpose: Visual guide to the inventory module's complementary two-layer architecture and 5-layer enrichment pipeline

Key Concepts: - Layer 1 (Discovery): Fast metadata indexing via resource-explorer (AWS Resource Explorer API) - Layer 2 (Operational): Deep configuration retrieval via standalone scripts (service-specific APIs) - 5-Layer Pipeline: Discovery → Organizations → Costs → Activity → Scoring

Flow Diagram: Shows data flow from initial discovery through final decommission scoring

1. Introduction & Architecture¶

1.1 Overview¶

Runbooks Inventory Module provides comprehensive AWS resource discovery across 88 resource types with enterprise-grade enrichment and financial intelligence.

Core Capabilities:

Multi-Account Discovery: AWS Resource Explorer for organization-wide visibility
Cost Enrichment: Automated Cost Explorer API integration with 12-month trends
Activity Analysis: CloudTrail, CloudWatch, SSM, Compute Optimizer multi-signal validation
Decommission Scoring: E1-E7 (EC2) and W1-W6 (WorkSpaces) evidence-based frameworks
MCP Validation: Hybrid intelligence engine for ≥99.5% accuracy assurance

Business Benefits:

Financial Intelligence: Cost-based prioritization for ROI-driven decisions
Risk-Adjusted Scoring: 4-tier confidence system (MUST/SHOULD/COULD/KEEP)
Audit Readiness: Complete activity trails for compliance frameworks
Executive Reporting: Board-ready visualizations and recommendations

1.2 5-Layer Architecture¶

┌─────────────────────────────────────────────────────────────┐
│ Layer 1: Discovery (resource-explorer)                     │
│ ↓ 88 AWS resource types across multi-account Landing Zone  │
├─────────────────────────────────────────────────────────────┤
│ Layer 2: Organizations (enrich-accounts)                   │
│ ↓ Add account metadata (names, OUs, cost groups)           │
├─────────────────────────────────────────────────────────────┤
│ Layer 3: Costs (enrich-costs)                              │
│ ↓ Add Cost Explorer data (monthly, annual, trends)         │
├─────────────────────────────────────────────────────────────┤
│ Layer 4: Activity (enrich-activity)                        │
│ ↓ Add CloudTrail/CloudWatch/SSM/Compute Optimizer metrics  │
├─────────────────────────────────────────────────────────────┤
│ Layer 5: Scoring (score-decommission)                      │
│ ↓ Calculate E1-E7/W1-W6 decommission scores (0-100)        │
└─────────────────────────────────────────────────────────────┘

Design Principles:

Unix Philosophy: Each layer does one thing well
Progressive Enhancement: Layers independent and optional
Graceful Degradation: Missing data handled transparently

1.3 Profile Architecture¶

Profile	Purpose	Permissions	Usage
Operations	Resource discovery	ReadOnlyAccess, resource-explorer:*	Layer 1, 4
Management	Organizations metadata	organizations:Describe, List	Layer 2
Billing	Cost data	ce:GetCostAndUsage	Layer 3

2.0 Command Architecture & Patterns¶

Layer 1 vs Layer 2 Discovery Comparison¶

Feature	Layer 1 (Resource Explorer)	Layer 2 (Service-Specific APIs)
Speed	<10 seconds	4-5 minutes
AWS API	`resource-explorer-2:Search`	Service-specific (ec2:Describe, rds:Describe, etc.)
Data Returned	Metadata (ARN, tags, CloudFormation)	Full configuration (state, networking, IPs, volumes)
Use Case	"What exists?" (inventory enumeration)	"What's configured?" (operational details)
Coverage	88 resource types cross-service	Service-specific depth (100+ attributes/resource)
Profile	`CENTRALISED_OPS_PROFILE` (aggregator)	Service-specific profiles (granular permissions)
Regions	Multi-region via aggregator	Per-region API calls
Best For	Initial discovery, compliance audits	Detailed analysis, operational workflows

Recommendation: Use Layer 1 for discovery (fast enumeration), then Layer 2 for enrichment (detailed configuration).

Command Pattern: `find` vs `list`¶

Pattern	Definition	Examples	Use Case
*`find-`**	Analytical queries with filtering/scoring/detection	`find-cfn-drift`, `find-vpc-flow-logs`, `find-lz-versions`	Problem detection, compliance validation, analytical queries
*`list-`**	Enumeration queries returning all resources of type	`list-org-accounts`, `list-cfn-stacks`, `list-elbs`	Inventory creation, resource enumeration, bulk operations

Design Principle: list returns everything, find returns filtered subset based on criteria.

Quick Comparison: Single vs Multi-Account

Feature	Single-Account	Multi-Account
Discovery Scope	✅ One AWS account	✅ All org accounts
Cost Enrichment	✅ Account-level costs	✅ Org-wide + chargeback
Organizations Data	❌ Not available	✅ Account hierarchy, OU structure
Workflows	✅ workflow-single-account	✅ workflow-multi-account
Setup Complexity	🟢 Simple (1 profile)	🟡 Moderate (3 profiles)
Execution Time	🟢 Fast (1-2 min)	🟡 Moderate (5-15 min)
Primary Persona	💻 Developer, 🔧 SRE	💼 Executive, 💰 FinOps, 🏗️ Architect

Complementary Two-Layer Architecture Flow¶

┌─────────────────────────────────────────────────────────────────────────────┐
│                          INVENTORY MODULE ARCHITECTURE                       │
│                          Complementary Two-Layer Pattern                     │
└─────────────────────────────────────────────────────────────────────────────┘

                              ┌──────────────────┐
                              │  User Initiates  │
                              │  Discovery via:  │
                              │  • CLI Command   │
                              │  • Taskfile      │
                              │  • Notebook      │
                              └────────┬─────────┘
                                       │
                                       ▼
        ┌──────────────────────────────────────────────────────────┐
        │            LAYER 1: DISCOVERY & INDEXING                 │
        │          (Fast Metadata - <10 seconds)                   │
        └──────────────────────────────────────────────────────────┘
                                       │
                    ┌──────────────────┴──────────────────┐
                    │   resource-explorer-2:Search API    │
                    │   (AWS Resource Explorer Service)   │
                    └──────────────────┬──────────────────┘
                                       │
                    ┌──────────────────▼──────────────────┐
                    │     METADATA RETURNED (88 types)     │
                    │  • Resource ARN                      │
                    │  • Account ID, Region                │
                    │  • Resource Type (ec2, rds, etc.)    │
                    │  • Tags (key-value pairs)            │
                    │  • CloudFormation metadata           │
                    │  • Last reported timestamp           │
                    └──────────────────┬──────────────────┘
                                       │
                                       │ CSV Output:
                                       │ ec2-discovered.csv
                                       │ rds-discovered.csv
                                       │ lambda-discovered.csv
                                       │
                                       ▼
        ┌──────────────────────────────────────────────────────────┐
        │         LAYER 2: OPERATIONAL RETRIEVAL                   │
        │       (Deep Configuration - 4-5 minutes)                 │
        └──────────────────────────────────────────────────────────┘
                                       │
        ┌──────────────────────────────┴───────────────────────────────┐
        │                                                               │
        ▼                              ▼                               ▼
┌───────────────┐         ┌───────────────────┐          ┌──────────────────┐
│ EC2 Scripts   │         │  RDS Scripts      │          │ Lambda Scripts   │
│ • list_ec2_   │         │  • list_rds_db_   │          │ • list_lambda_   │
│   instances   │         │    instances      │          │   functions      │
│ • list_ec2_   │         │  • list_rds_      │          │ • Runtime        │
│   ebs_volumes │         │    snapshots      │          │   updates        │
└───────┬───────┘         └────────┬──────────┘          └────────┬─────────┘
        │                          │                              │
        │ ec2:DescribeInstances    │ rds:DescribeDBInstances      │ lambda:ListFunctions
        │ ec2:DescribeVolumes      │ rds:DescribeDBSnapshots      │ lambda:GetFunction
        │                          │                              │
        ▼                          ▼                              ▼
┌────────────────────────────────────────────────────────────────────────────┐
│                     FULL OPERATIONAL DETAILS RETURNED                      │
│  • Instance/DB state (running, stopped, terminated)                        │
│  • Instance/resource type (t3.medium, db.t3.small)                         │
│  • Networking (IPs, DNS, VPC, subnets, security groups)                    │
│  • Storage allocation (EBS volumes, RDS storage)                           │
│  • Runtime versions (Lambda python3.11, RDS MySQL 8.0)                     │
│  • Performance metrics (CPU, memory, connections)                          │
│  • Backup and recovery information                                         │
└────────────────────────────────┬───────────────────────────────────────────┘
                                 │
                                 ▼
                    ┌────────────────────────┐
                    │   Combined Dataset     │
                    │   Metadata + Operational│
                    │   Ready for Enrichment │
                    └────────────────────────┘

5-Layer Enrichment Pipeline Flow¶

┌─────────────────────────────────────────────────────────────────────────────┐
│                      5-LAYER ENRICHMENT PIPELINE                             │
│              Best Practice Multi-Account Landing Zone Pattern                │
└─────────────────────────────────────────────────────────────────────────────┘

                              ┌──────────────────┐
                              │   Start Pipeline │
                              │   task -t        │
                              │   Taskfile.      │
                              │   inventory.yaml │
                              │   pipeline-5-    │
                              │   layer          │
                              └────────┬─────────┘
                                       │
                                       ▼
        ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
        ┃  LAYER 1: DISCOVERY (resource-explorer)                ┃
        ┃  Time: <10 seconds | API: resource-explorer-2:Search   ┃
        ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━┛
                                         │
                  Input: None            │  Output: ec2-discovered.csv
                  Profile: CENTRALISED_  │  Records: 137 EC2 instances
                           OPS_PROFILE   │  Columns: 15 (ARN, account_id,
                                         │           region, resource_type,
                                         │           tags, cf_metadata)
                                         ▼
        ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
        ┃  LAYER 2: ORGANIZATIONS ENRICHMENT (multi-account only)┃
        ┃  Time: ~5 seconds | API: organizations:DescribeAccount ┃
        ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━┛
                                         │
                  Input: ec2-discovered. │  Output: ec2-org.csv
                         csv             │  Records: 137/137 enriched
                  Profile: MANAGEMENT_   │  Columns: +3 (account_name,
                           PROFILE       │           ou_name, ou_path)
                                         │
                                         ▼
        ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
        ┃  LAYER 3: COST ENRICHMENT                              ┃
        ┃  Time: ~15 seconds | API: ce:GetCostAndUsage           ┃
        ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━┛
                                         │
                  Input: ec2-org.csv     │  Output: ec2-cost.csv
                  Profile: BILLING_      │  Records: 137/137 enriched
                           PROFILE       │  Columns: +14 (12-month cost
                                         │           history: M01-M12,
                                         │           cost_total,
                                         │           cost_monthly_avg)
                                         │  Total: $15.4M analyzed
                                         ▼
        ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
        ┃  LAYER 4: ACTIVITY ENRICHMENT                          ┃
        ┃  Time: ~4-5 minutes | APIs: Multiple service-specific ┃
        ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━┛
                                         │
                  Input: ec2-cost.csv    │  Output: ec2-activity.csv
                  Profile: CENTRALISED_  │  Records: 137/137 enriched
                           OPS_PROFILE   │  Columns: +20+ (Layer 2
                                         │           operational data:
                                         │           State, InstanceType,
                  APIs Used:             │           VpcId, SubnetId,
                  • ec2:DescribeInstances│           SecurityGroups,
                  • cloudtrail:          │           CloudTrail events,
                    LookupEvents         │           CloudWatch metrics,
                  • cloudwatch:          │           SSM compliance,
                    GetMetricStatistics  │           Compute Optimizer)
                  • ssm:                 │
                    DescribeInstanceInfo │
                  • compute-optimizer:   │
                    GetEC2              │
                    Recommendations      │
                                         ▼
        ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
        ┃  LAYER 5: SCORING & ANALYSIS                           ┃
        ┃  Time: ~5 seconds | Logic: E1-E7 decommission signals ┃
        ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━┛
                                         │
                  Input: ec2-activity.   │  Output: ec2-scored.csv
                         csv             │  Records: 137/137 scored
                                         │  Columns: +8 (E1-E7 signals,
                  Signals:               │           decommission_tier:
                  • E1: Terminated       │           MUST/SHOULD/COULD/
                  • E2: Stopped >90 days │           KEEP)
                  • E3: Zero cost        │
                  • E4: CloudTrail idle  │  Tiers:
                  • E5: CloudWatch idle  │  • MUST: 23 instances
                  • E6: SSM disconnected │  • SHOULD: 47 instances
                  • E7: Optimizer        │  • COULD: 35 instances
                    downsize             │  • KEEP: 32 instances
                                         │
                                         ▼
                              ┌──────────────────┐
                              │  Pipeline        │
                              │  Complete        │
                              │  Total Time:     │
                              │  ~5-6 minutes    │
                              │                  │
                              │  Ready for:      │
                              │  • Cost analysis │
                              │  • Decommission  │
                              │    planning      │
                              │  • Executive     │
                              │    reporting     │
                              └──────────────────┘

Data Flow: Column Evolution Across Layers¶

Layer 1 Output: ec2-discovered.csv (15 columns)¶

resource_arn | account_id | region | resource_type | resource_id | application |
tags | cf_stack_name | cf_logical_id | cf_stack_id | last_reported_at |
[additional resource-explorer metadata]

Example Row:

arn:aws:ec2:ap-southeast-2:123456789012:instance/i-abc123,123456789012,ap-southeast-2,ec2:instance,i-abc123,WebApp,"{\"Environment\": \"prod\"}",my-app-stack,WebServer,arn:aws:cloudformation:...,2025-11-06T10:00:00Z

Missing: Instance state, instance type, networking, operational details

Layer 2 Output: ec2-org.csv (18 columns)¶

[Layer 1 columns] + account_name | ou_name | ou_path

Example Row:

[Layer 1 data...],Production Account,Production-OU,/root/Production/

Value: Organizational context for multi-account governance

Layer 3 Output: ec2-cost.csv (32 columns)¶

[Layer 2 columns] + M01 | M02 | M03 | M04 | M05 | M06 | M07 | M08 | M09 |
M10 | M11 | M12 | cost_total | cost_monthly_avg

Example Row:

[Layer 2 data...],45.23,47.18,44.92,46.15,45.87,47.23,46.45,45.78,46.92,47.15,45.67,46.23,554.78,46.23

Value: 12-month cost history for financial analysis

Layer 4 Output: ec2-activity.csv (50+ columns)¶

[Layer 3 columns] + State | InstanceType | PublicDNSName | PrivateIpAddress |
VpcId | SubnetId | SecurityGroups | LaunchTime | Platform | Architecture |
[CloudTrail columns] | [CloudWatch columns] | [SSM columns] |
[Compute Optimizer columns]

Example Row:

[Layer 3 data...],running,t3.medium,ec2-54-123-456-78.compute.amazonaws.com,10.0.1.100,vpc-12345,subnet-67890,"sg-web,sg-monitoring",2025-10-15T08:30:00Z,Linux,x86_64,[activity data...]

Value: Full operational details for technical analysis

Layer 5 Output: ec2-scored.csv (58+ columns)¶

[Layer 4 columns] + E1_terminated | E2_stopped_90d | E3_zero_cost |
E4_cloudtrail_idle | E5_cloudwatch_idle | E6_ssm_disconnected |
E7_optimizer_downsize | decommission_tier

Example Row:

[Layer 4 data...],0,1,0,1,0,0,1,SHOULD

Value: Actionable decommission recommendations with business justification

API Call Patterns: Layer 1 vs Layer 2¶

Layer 1: Resource Explorer (Single Aggregated API)¶

# Single API call returns metadata for ALL 88 resource types
response = resource_explorer_client.search(
    QueryString='resourcetype:ec2:instance',
    MaxResults=1000
)

# Returns metadata for 137 EC2 instances in <10 seconds
# Data: ARN, account_id, region, tags, CloudFormation metadata
# Missing: State, InstanceType, networking details

Layer 2: Service-Specific APIs (Detailed per-service calls)¶

# Separate API calls for operational details
ec2_response = ec2_client.describe_instances()  # Full instance config
cloudtrail_response = cloudtrail_client.lookup_events(...)  # Activity logs
cloudwatch_response = cloudwatch_client.get_metric_statistics(...)  # Metrics
ssm_response = ssm_client.describe_instance_information(...)  # Compliance
optimizer_response = compute_optimizer_client.get_ec2_instance_recommendations(...)

# Returns operational details in 4-5 minutes
# Data: Everything missing from Layer 1 + activity signals

When to Use Each Layer¶

Use Layer 1 (Discovery) When:¶

Initial exploration: "What resources exist across my organization?"
Tag-based queries: "Find all resources tagged Environment=prod"
CloudFormation tracking: "Which resources were created by IaC?"
Cross-service search: "Show me ALL compute resources (EC2 + Lambda + WorkSpaces)"
Speed is critical: Need results in <10 seconds

Use Layer 2 (Operational) When:¶

State analysis: "Which EC2 instances are stopped vs running?"
Configuration details: "What database engines are we using?"
Runtime compliance: "Find deprecated Lambda runtimes"
Network topology: "Which instances are in which VPCs?"
Write operations: "Upgrade Lambda functions to latest runtime"
Cost optimization: Need instance types for rightsizing analysis

Use Combined 5-Layer Pipeline When:¶

Comprehensive analysis: Discovery + context + costs + activity + scoring
Decommission planning: Complete dataset for business decision-making
Executive reporting: Full narrative from discovery to recommendations
Cost optimization: 12-month cost history + operational signals + savings calculations
Compliance audit: Complete resource inventory + security + activity + policy compliance

Performance Characteristics¶

Single-Account Mode (4 layers - skips Organizations)¶

Layer 1: Discovery          <10 seconds
Layer 3: Cost Enrichment    ~15 seconds
Layer 4: Activity           ~4-5 minutes
Layer 5: Scoring            ~5 seconds
────────────────────────────────────────
Total:                      ~5-6 minutes

Multi-Account Landing Zone (5 layers - includes Organizations)¶

Layer 1: Discovery          <10 seconds
Layer 2: Organizations      ~5 seconds
Layer 3: Cost Enrichment    ~15 seconds
Layer 4: Activity           ~4-5 minutes
Layer 5: Scoring            ~5 seconds
────────────────────────────────────────
Total:                      ~5.5-6.5 minutes

Performance Optimization Tips¶

Parallel execution: Run multiple resource type pipelines concurrently (EC2 + RDS + Lambda)
Incremental enrichment: Skip layers if data already exists (resume from Layer 4)
Selective enrichment: Filter resources before Layer 4 (e.g., only prod environment)
Caching: Layer 2 (Organizations) rarely changes - cache for 24 hours
Batch processing: Layer 3 (Cost) supports batch queries - group by account

Error Handling & Graceful Degradation¶

Terminated Resources¶

Layer 1: Returns terminated EC2 instances (resource-explorer shows last state)
Layer 4: API calls fail for terminated resources (InstanceNotFoundException)
Handling: Skip API enrichment, preserve cost history, mark as E1_terminated in Layer 5

Missing Permissions¶

Layer 2: Organizations API requires management account access
Fallback: Skip Organizations enrichment, continue with Layers 3-5
Impact: Missing account_name/ou_name columns (can still analyze by account_id)

API Rate Limits¶

Layer 4: CloudTrail/CloudWatch APIs have rate limits
Handling: Exponential backoff with retry logic
Fallback: Skip activity enrichment for specific signals, continue pipeline

References¶

Complementary Architecture: COMPLEMENTARY-ARCHITECTURE.md (two-layer pattern details)
CLI Integration Priorities: CLI-INTEGRATION-PRIORITIES.md (12 scripts roadmap)
Resource Explorer: src/runbooks/inventory/collectors/resource_explorer.py
Enrichers: src/runbooks/inventory/enrichers/ (4 enrichers)
Scorers: src/runbooks/inventory/scorers/ (1 scorer)
Taskfile: Taskfile.inventory.yaml (28 operations + 2 best-practice workflows)

Version: v1.1.17 (documentation baseline) Last Updated: November 6, 2025 Status: Architecture documentation complete

Inventory Module: Architecture Flow¶

Executive Summary¶

1. Introduction & Architecture¶

1.1 Overview¶

1.2 5-Layer Architecture¶

1.3 Profile Architecture¶

2.0 Command Architecture & Patterns¶

Layer 1 vs Layer 2 Discovery Comparison¶

Command Pattern: find vs list¶

Complementary Two-Layer Architecture Flow¶

5-Layer Enrichment Pipeline Flow¶

Data Flow: Column Evolution Across Layers¶

Layer 1 Output: ec2-discovered.csv (15 columns)¶

Layer 2 Output: ec2-org.csv (18 columns)¶

Layer 3 Output: ec2-cost.csv (32 columns)¶

Layer 4 Output: ec2-activity.csv (50+ columns)¶

Layer 5 Output: ec2-scored.csv (58+ columns)¶

API Call Patterns: Layer 1 vs Layer 2¶

Layer 1: Resource Explorer (Single Aggregated API)¶

Layer 2: Service-Specific APIs (Detailed per-service calls)¶

When to Use Each Layer¶

Use Layer 1 (Discovery) When:¶

Use Layer 2 (Operational) When:¶

Use Combined 5-Layer Pipeline When:¶

Performance Characteristics¶

Single-Account Mode (4 layers - skips Organizations)¶

Multi-Account Landing Zone (5 layers - includes Organizations)¶

Performance Optimization Tips¶

Error Handling & Graceful Degradation¶

Terminated Resources¶

Missing Permissions¶

API Rate Limits¶

References¶

Command Pattern: `find` vs `list`¶