Activity Health Tree - Signal Architecture Overview¶
Version: 1.1.20 | Status: Production | Last Updated: 2025-11-20
Executive Summary¶
The Activity Health Tree uses a 100-point signal scoring system aligned with AWS Well-Architected Framework Cost Optimization pillar to identify decommissionable resources across 10+ AWS service types.
Key Metrics: - 55 signals across 10 resource types (S3, DynamoDB, RDS, EC2, WorkSpaces, AppStream, ALB, NLB, DirectConnect, Route53, ECS) - 23 AWS documentation URLs providing authoritative best practices citations - 3-tier decommission scoring: MUST (70-100), SHOULD (40-69), COULD (20-39), KEEP (0-19) - Confidence ratings: 0.45-0.95 per signal (weighted by reliability)
Signal Design Philosophy¶
AWS Well-Architected Framework Alignment¶
All signals map to AWS Cost Optimization pillar recommendations: - Tier 1 (30-60 pts): Direct cost impact signals (AWS native metrics like Compute Optimizer, Storage Lens) - Tier 2 (10-25 pts): Indirect cost signals (utilization, access patterns, configuration gaps) - Tier 3 (1-15 pts): Hygiene signals (tagging, environment classification, age-based heuristics)
AWS Reference: https://docs.aws.amazon.com/wellarchitected/latest/cost-optimization-pillar/welcome.html
Resource Type Coverage¶
| Resource Type | Signals | Total Points | AWS Docs | Primary Signal |
|---|---|---|---|---|
| S3 Buckets | S1-S7 (7) | 100 | 7 URLs | S1: Storage Lens Score (40 pts) |
| DynamoDB | D1-D5 (5) | 100 | 5 URLs | D1: Low Capacity <5% (45 pts) |
| RDS | R1-R7 (7) | 100 | 7 URLs | R1: Zero Connections 90d (60 pts) |
| EC2 | E1-E7 (7) | 100 | 7 URLs | E1: Compute Optimizer Idle (40 pts) |
| WorkSpaces | W1-W6 (6) | 100 | 3 URLs | W1: No Connection 90d (45 pts) |
| AppStream | A1-A7 (7) | 135→100* | 7 URLs | A1: Session Activity (30 pts) |
| ALB | L1-L5 (5) | 100 | 2 URLs | L1: No Active Targets (60 pts) |
| NLB | L1-L5 (5) | 100 | 2 URLs | L1: No Active Targets (60 pts) |
| DirectConnect | DX1-DX4 (4) | 100 | 2 URLs | DX1: Connection Down (60 pts) |
| Route53 | R53-1 to R53-4 (4) | 100 | 3 URLs | R53-1: Zero DNS Queries (40 pts) |
| ECS | C1-C5 (5) | 100 | 5 URLs | C1: CPU/Memory <5% (45 pts) |
*AppStream: 135 raw points normalized to 100 scale
Detailed Documentation: - AppStream: appstream-decommission-signals.md - S3: s3-optimization-signals.md - DynamoDB: dynamodb-rightsizing-signals.md - EC2: ec2-decommission-signals.md - WorkSpaces: workspaces-decommission-signals.md - RDS: rds-decommission-signals.md
Decommission Tier Classification¶
Tier Thresholds (Configurable)¶
DEFAULT_TIER_THRESHOLDS = {
'MUST': 70, # 70-100 points: High-confidence decommission candidates
'SHOULD': 40, # 40-69 points: Medium-confidence (investigate further)
'COULD': 20, # 20-39 points: Low-confidence (optimization opportunity)
'KEEP': 0 # 0-19 points: Active resources (retain)
}
Tier Interpretation¶
MUST Decommission (70-100 pts): - Multiple Tier 1 signals triggered (e.g., AWS native recommendations + zero usage) - High confidence (≥90%) resource is idle/unused - Immediate cost savings potential - Action: Decommission within 30 days after validation
SHOULD Investigate (40-69 pts): - Mixed Tier 1/2 signals or single strong Tier 1 signal - Medium confidence (70-89%) resource is underutilized - Potential for rightsizing or optimization - Action: Business owner review + 60-day observation period
COULD Optimize (20-39 pts): - Primarily Tier 2/3 signals (hygiene, configuration) - Lower confidence (50-69%) optimization opportunity - Minor cost savings or efficiency gains - Action: Tag cleanup, lifecycle policies, non-urgent optimization
KEEP Active (0-19 pts): - Few or no signals triggered - Resource shows active usage patterns - Production workload indicators present - Action: No immediate action required
Signal Confidence Methodology¶
Confidence Score Calculation¶
Each signal has an explicit confidence rating (0.00-1.00):
High Confidence (0.85-0.95): - AWS native signals (Compute Optimizer, Storage Lens, Cost Explorer) - Direct API metrics (connections, CPU utilization, network traffic) - Binary configuration checks (lifecycle policy exists, PITR enabled)
Medium Confidence (0.60-0.84): - Composite signals (multiple metrics combined) - Threshold-based heuristics (>90 days, <5% utilization) - Access pattern analysis (Storage Lens access analytics)
Low Confidence (0.45-0.59): - Age-based heuristics (resource age >180 days) - Tagging-based classification (test/dev tags) - Inferred usage patterns (non-business hours only)
Confidence Impact on Scoring¶
Confidence ratings inform tier classification but do NOT modify point values. Low-confidence signals receive lower point allocations in the design phase to prevent over-weighting unreliable indicators.
Example: RDS R7 (storage <20% utilized) has 0.45 confidence → assigned only 5 points (Tier 3)
AWS Best Practices Integration¶
Cost Optimization Pillar Mapping¶
Practice: Right-sizing and Elasticity - EC2 E1 (Compute Optimizer), E2 (CPU utilization) - RDS R2 (low connections), R3 (CPU <5%) - DynamoDB D1 (low capacity utilization)
Practice: Storage Optimization - S3 S1 (Storage Lens score), S2 (storage class mismatch), S4 (lifecycle policy gap) - RDS R7 (storage underutilization)
Practice: Resource Lifecycle Management - S3 S4 (no lifecycle policy), S6 (versioning risk) - EC2 E4 (instance age >180d) - WorkSpaces W1 (no connection 90d)
Practice: Cost-Aware Architecture - DynamoDB D2 (idle GSIs) - ALB/NLB L1 (no active targets) - DirectConnect DX1 (connection down)
Validation & Quality Assurance¶
4-Mode Validation Framework¶
All signal implementations undergo 4-mode validation:
- CLI Mode:
runbooks finops dashboard --activity-analysis - Notebook Mode: Jupyter notebook subprocess execution
- MCP Mode: Cost Explorer cross-validation (target: ≥99.5% accuracy)
- Direct Import: Python module import for programmatic access
Target: 100% pass rate across all modes
MCP Cross-Validation¶
Cost Explorer API provides ground truth for financial accuracy: - Per-resource cost validation (when API supports granularity) - Account-level cost validation (fallback for limited APIs) - ≥99.5% accuracy target for production deployment
Known Limitation: v1.1.20 shows 2.94% MCP accuracy (CRITICAL ISSUE requiring investigation)
Signal Weight Optimization (v1.1.20)¶
Recent Changes¶
DynamoDB D1 Adjustment (60→45 pts): - Rationale: ON-DEMAND tables don't have capacity utilization metric (D1 N/A) - Impact: Prevents false positives for ON-DEMAND billing mode - Compensation: D2-D5 weights increased to maintain 100-point total
RDS R4-R7 Rounding (1-8 pts → 5-10 pts): - Rationale: Eliminate 1-4 point noise in tier classification - Impact: Clearer SHOULD vs COULD boundaries - Result: Improved decision clarity for low-confidence signals
S3 Weights: KEEP AS-IS (product-owner strategic decision) - Current distribution proven in production - S1 (40 pts) appropriate for AWS native composite score
Usage Examples¶
CLI - Activity Health Tree Display¶
runbooks finops dashboard \
--profile my-aws-profile \
--mode architect \
--activity-analysis \
--export html \
--output-file dashboard.html
Output: Tree view with signal breakdowns per resource type
Python - Programmatic Scoring¶
from runbooks.finops.decommission_scorer import DecommissionScorer
scorer = DecommissionScorer()
# EC2 instance scoring
ec2_signals = {
'E1': 40, # Compute Optimizer: Idle recommendation
'E2': 20, # CloudWatch: CPU <5% avg 30d
'E3': 10, # CloudTrail: Zero API activity
'E4': 5, # Instance age >180 days
'E5': 0, # No service attachments
'E6': 10, # Low storage I/O
'E7': 0 # No Cost Explorer rightsizing
}
score, tier = scorer.calculate_ec2_score(ec2_signals)
# score = 85, tier = 'MUST' (decommission candidate)
Notebook - Batch Analysis¶
See: notebooks/decommission-analysis.ipynb for batch scoring examples
Related Documentation¶
Scoring Framework: ../inventory/activity-health-scoring.md
Implementation: src/runbooks/finops/decommission_scorer.py
Enrichers: src/runbooks/finops/*_analyzer.py and *_activity_enricher.py
Maintenance & Updates¶
Signal Addition Process:
1. Define signal with AWS documentation citation
2. Assign weight based on tier classification (1=Tier1, 2=Tier2, 3=Tier3)
3. Set confidence rating based on data source reliability
4. Implement collection logic in analyzer/enricher
5. Add to DEFAULT_*_WEIGHTS in decommission_scorer.py
6. Document in resource-specific signal guide
7. Validate via 4-mode framework
Weight Adjustment Protocol: 1. Identify issue (false positives, tier classification errors) 2. Cloud architect provides recommendation with rationale 3. Product owner approves strategic impact 4. Update decommission_scorer.py and sync enricher files 5. Document change in version notes 6. Re-run validation across test accounts
Framework Status: Production-ready with continuous improvement Next Review: Quarterly (aligned with AWS Well-Architected Framework updates)