★ Sample Report
B2B SaaS Platform on AWS
Mid-market, ~$4,200/month infrastructure
Mid-market B2B SaaS platform running a conventional three-tier architecture on AWS us-east-1 with significant gaps in high availability, security posture, and cost optimization. The infrastructure relies on manually provisioned EC2 instances behind an ALB with a single-AZ RDS MySQL database, no auto-scaling, and no disaster recovery plan — leaving the business exposed to both outages and unnecessary spend at approximately $4,200/month.
Architecture Scores
Overall Assessment
40
Overall
Needs Work
45
Cost Efficiency
Improvable
32
Security
Critical Gaps
38
Reliability
High Risk
Infrastructure Inventory
10 Resources Detected
This sample analyzes a simplified 10-resource configuration. The analyzer handles production environments with hundreds of resources across multiple services.
api-node-1 (m5.2xlarge)
Amazon EC2
api-node-2 (m5.2xlarge)
Amazon EC2
api-node-3 (m5.2xlarge)
Amazon EC2
prod-alb
Elastic Load Balancing (ALB)
prod-mysql (db.r5.2xlarge)
Amazon RDS MySQL
sessions-redis (t3.medium)
Amazon ElastiCache Redis
app-uploads (~2TB)
Amazon S3
d1a2b3c4.cloudfront.net
Amazon CloudFront
app.example.com
Amazon Route 53 (inferred)
Basic CloudWatch Metrics
Amazon CloudWatch
⚡ Senior Architect Insights
Cross-Resource Analysis — The View a Generalist Misses
These findings connect two or more resources and surface systemic implications. Generic AI tools analyze resources in isolation — these require understanding how your stack actually fits together.
🔗 sessions-redis (t3.medium) → prod-mysql (db.r5.2xlarge)
Capacity Mismatch
Cache Is 16× Undersized Relative to Database — You're Paying for Queries That Should Be Cache Hits
Your ElastiCache Redis instance (t3.medium, 4GB RAM) is massively undersized relative to your RDS database (db.r5.2xlarge, 64GB RAM). At a 16:1 RAM ratio, your working dataset won't fit in cache — meaning frequently-accessed records constantly fall through to the database. Every cache miss is a full roundtrip to prod-mysql at r5.2xlarge pricing.
Implication: Cache hit ratios below 80% are common in this configuration. With an r5.2xlarge at ~$0.53/hr, cache misses that could have been t3.medium hits are costing an estimated $180–320/month in unnecessary DB compute. Fixing the cache sizing would likely reduce RDS load enough to right-size the instance class.
Recommendation: Upgrade sessions-redis to at minimum a cache.r6g.large (13GB, $0.15/hr) or cache.r6g.xlarge (26GB). Monitor cache hit ratio — target >95%. Once hit ratio improves, evaluate downgrading prod-mysql from db.r5.2xlarge to db.r5.large.
🔗 api-node-1/2/3 (3× EC2) → prod-mysql (Single-AZ RDS)
Redundancy Gap
3-Node Compute Redundancy Creates False Sense of HA — Database Is the True Single Point of Failure
You've deployed three EC2 instances (api-node-1, api-node-2, api-node-3) behind prod-alb, which suggests an intent for high availability. But all three nodes depend on a single-AZ RDS MySQL instance. If the us-east-1a AZ has a prolonged issue, all compute nodes become useless simultaneously — the three-instance redundancy provides zero availability benefit at the database layer.
Implication: The architecture passes a surface-level HA review but fails under AZ-level failure. For a B2B SaaS charging enterprise customers, a database-level outage during a cloud AZ event (which occur several times per year across AWS regions) means full downtime with no failover path. SLA exposure is 100% during the incident.
Recommendation: Enable Multi-AZ on prod-mysql (adds a synchronous standby replica, automatic failover in ~60s). Additionally, ensure api-node-2 and api-node-3 are in different AZs than api-node-1. Multi-AZ RDS adds ~$530/mo but eliminates the gap between your compute and database HA tiers.
🔗 api-node-1/2/3 → app-uploads (S3) via public internet
Cost Leak
EC2–S3 Traffic Routes via Public Internet — NAT Gateway Charges Accruing on Every Upload/Download
Your EC2 instances (api-node-1, api-node-2, api-node-3) are communicating with app-uploads S3 (~2TB) via the public internet, which means all EC2-to-S3 traffic is passing through NAT Gateway at $0.045/GB. With 2TB of stored data and typical SaaS read/write patterns, EC2↔S3 data transfer likely exceeds 500GB/month — generating NAT costs that don't appear on the S3 line item but show up as a growing networking bill.
Implication: Estimated $22–45/month in avoidable NAT Gateway data charges. Beyond cost, public-internet routing exposes inter-service traffic unnecessarily — data between your app tier and storage layer should never leave the AWS network.
Recommendation: Create an S3 VPC Gateway Endpoint in your VPC (free) and update route tables to direct S3 traffic through it. EC2-to-S3 transfers within the same region via a Gateway Endpoint are free. 30-minute setup, immediate savings.
Security Analysis
4 Security Gaps — 2 Critical
Critical
Direct SSH Access to Production Instances
What We Found
Manual SSH deployments imply port 22 is open, likely to a broad IP range. Compromised SSH keys would give direct access to all production servers and database credentials.
Recommended Fix
Migrate to AWS Systems Manager Session Manager (IAM-based, audited, no open ports). Remove port 22 from all production security groups immediately.
High
No CloudTrail or VPC Flow Logs
What We Found
Without CloudTrail and VPC Flow Logs, there is no visibility into API calls, network traffic, or ability to perform forensic analysis during incidents.
Recommended Fix
Enable CloudTrail in all regions with encrypted S3 delivery. Enable VPC Flow Logs to CloudWatch Logs. Both are low cost and take under 30 minutes to configure.
High
No Encryption at Rest for RDS and S3
What We Found
Customer data in MySQL and S3 may be unencrypted at rest, failing SOC 2, GDPR, and HIPAA compliance requirements.
Recommended Fix
Enable S3 default encryption (SSE-S3, immediate). For RDS, create an encrypted snapshot, restore to a new encrypted instance during a maintenance window, then swap the endpoint.
Medium
ElastiCache Redis Without AUTH Token
What We Found
Redis session store likely has no AUTH token. Any compromised workload in the VPC could read or modify session tokens, enabling session hijacking across all active users.
Recommended Fix
Enable Redis AUTH with a token stored in Secrets Manager. Enable encryption in transit (TLS). Restrict security group to only EC2 instances that need access.
Investigate
Lambda Function With No Detectable Triggers
What We Found
A legacy Lambda function in the account has no event source mappings, no API Gateway triggers, and no CloudWatch event rules visible in the configuration. Configuration analysis cannot determine whether this is an active function or orphaned code from a deprecated feature.
Recommended Action
Before removing, confirm with your team whether this function is still in use. Check CloudWatch logs for recent invocations, review any team documentation, and run a 30-day log search. If confirmed unused, deletion saves ~$0.40/month and removes a potential security surface.
Cost Analysis
$1,200–1,600/month in Identified Savings
High
EC2 Instances on On-Demand Pricing
💰 $620–830/mo savings
What We Found
Three m5.2xlarge instances at on-demand pricing cost ~$2,088/month. These are predictable, baseline workloads — ideal for Reserved Instances or Savings Plans.
Recommended Fix
Purchase 1-year Compute Savings Plans for 2 instances. Right-size to m5.xlarge first (after utilization analysis) for additional savings on top of the plan discount.
High
Over-Provisioned RDS Instance
💰 $550–700/mo savings
What We Found
The db.r5.2xlarge (8 vCPU, 64 GiB RAM) costs ~$876/month. For 5,000 DAU, a db.r6g.xlarge (Graviton) would provide better price-performance at half the cost.
Recommended Fix
Enable RDS Performance Insights now. Analyze utilization for 2 weeks, then downsize to db.r6g.xlarge with a 1-year Reserved Instance. Expected to save 60–65% on database compute.
Medium
S3 Storage Without Lifecycle Policies (2TB)
💰 $25–35/mo savings
What We Found
All 2TB of objects remain in S3 Standard indefinitely. For file uploads, 70–80% are rarely accessed after 30 days and should be tiered to lower-cost storage classes.
Recommended Fix
Implement S3 Intelligent-Tiering or lifecycle rules: S3-IA after 30 days, Glacier Instant Retrieval after 90 days. Takes 20 minutes to configure via the console.
Low
Basic CloudWatch Monitoring Only
What We Found
No detailed monitoring, custom metrics, or alarms. Missing early performance degradation indicators means problems surface only when customers complain — not before. Enabling detailed monitoring adds ~$15/month but prevents costly undetected outages.
Impact: Prevents outage-level incidents that cost far more than monitoring. Set up alarms for CPU >80%, ALB 5xx rate >1%, and RDS connections at 80% of max.
Architecture Risks
4 Anti-Patterns Detected
Critical
Single-AZ RDS with No Backup Verification
Risk
The RDS MySQL instance is deployed in a single Availability Zone with no read replicas. If the underlying host fails or the AZ experiences an outage, the database will be completely unavailable with no tested recovery procedure.
Fix
Enable Multi-AZ deployment immediately (minimal downtime, AWS handles failover). Enable automated backups with 7-day retention and schedule a monthly point-in-time restore test.
Impact: A single AZ failure causes complete application downtime. Without verified backups, data loss could be permanent. This is the highest-risk item in the entire stack.
High
No Auto Scaling with Over-Provisioned Instances
Risk
Three m5.2xlarge instances (8 vCPU, 32 GiB each) run 24/7 regardless of traffic. With 5,000 DAU, this is over-provisioned 2–3x during off-peak and has no mechanism to handle traffic spikes.
Fix
Implement an Auto Scaling Group with min:2 / max:6 instances. Right-size to m5.xlarge after analyzing actual peak utilization. This both saves money and improves resilience.
Impact: Paying for peak capacity 24/7 while simultaneously unable to handle unexpected surges. ~$600–900/month wasted. Any viral moment could also cause downtime.
High
Manual SSH Deployments Without CI/CD
Risk
Deployments via SSH + git pull create inconsistent states across instances, provide no rollback capability, leave no audit trail, and require direct production access.
Fix
Implement CI/CD with GitHub Actions or CodePipeline. Use blue/green deployments via CodeDeploy. Replace all SSH access with Systems Manager Session Manager for auditability.
Impact: Manual deployments are the leading cause of self-inflicted outages. A bad deploy with no rollback path means extended downtime until the fix is found and manually applied.
Medium
Single Region with No Disaster Recovery Plan
Risk
Everything runs in us-east-1 with no cross-region replication or documented DR plan. us-east-1 has historically experienced the most frequent AWS service disruptions of any region.
Fix
Enable S3 Cross-Region Replication to us-west-2. Copy RDS snapshots to a secondary region on a schedule. Document a recovery runbook with RTO/RPO targets so the team can execute under pressure.
Impact: A regional outage results in complete downtime with no recovery path, directly violating any SLA commitments made to enterprise customers.
Migration Opportunities
3 Architecture Upgrades Identified
EC2 Fleet → ECS Fargate
3x manually managed EC2 m5.2xlarge with SSH deployments
Amazon ECS on Fargate with containerized services
Eliminates server management overhead, enables auto-scaling to zero, provides blue/green deploys natively. Typical 30–50% cost reduction through right-sized tasks and Fargate Spot pricing.
RDS MySQL → Aurora Serverless v2
Single-AZ RDS MySQL db.r5.2xlarge (fixed provisioning)
Aurora MySQL Serverless v2 with auto-scaling ACUs
Built-in Multi-AZ with 6-way replication, auto-scaling compute. MySQL wire-compatible — application code changes not required. Expected 30–40% cost reduction with dramatically better HA story.
Basic Metrics → Full Observability
CloudWatch basic metrics with no alerting configured
CloudWatch detailed + X-Ray tracing + SNS alerting
Application-level observability with distributed request tracing, error rate tracking, and proactive alerting before customers notice issues. Typically a 2–3 day implementation effort.
Prioritized Action Plan
6 Recommendations, Ordered by Impact
01
Enable RDS Multi-AZ and Verify Backups
Highest-risk gap in the stack. Enable Multi-AZ on RDS (AWS handles failover, minimal downtime). Verify 7-day automated backups and test a point-in-time restore this week.
02
Eliminate SSH Access and Enable CloudTrail
Install SSM Agent, attach the required IAM role, remove port 22 from all production security groups. Enable CloudTrail in all regions and VPC Flow Logs to CloudWatch.
03
Right-Size EC2 and RDS, Buy Savings Plans
Monitor utilization for 2 weeks via Performance Insights and CloudWatch, then right-size EC2 to m5.xlarge and RDS to db.r6g.xlarge. Purchase 1-year Compute Savings Plans. Saves ~$1,200–1,500/month.
04
Implement CI/CD with Blue/Green Deploys
GitHub Actions + CodeDeploy for automated, auditable deployments with instant rollback capability. Eliminates the manual deploy risk that is the leading cause of self-inflicted outages.
05
Enable Encryption at Rest Everywhere
S3 default encryption (SSE-S3, immediate — no downtime). RDS encrypted snapshot + restore during a maintenance window. Redis TLS + AUTH token stored in Secrets Manager.
06
Add Auto Scaling and S3 Lifecycle Policies
Auto Scaling Group behind ALB with min:2, max:6. S3 Intelligent-Tiering or lifecycle rules for the 2TB of uploads — moves rarely-accessed objects to cheaper storage tiers automatically.
Now run it on your infrastructure
This analysis took seconds. Paste your own Terraform, CloudFormation, Kubernetes manifests, or describe your stack — and get the same depth of analysis on your actual architecture. Free, no account required.
Most teams see their first finding within 10 seconds of pasting.
✓ Free to try ✓ No account required ✓ Results in 30 seconds