Crawl-Walk-Run Deployment Ladder

The fastest way to ship AI products is to start slow. This framework provides a three-phase ladder that lets you move fast without breaking things.

The Three Phases

Crawl: Shadow Mode (Weeks 1-2)

Run the AI system in parallel with existing system, but don't show results to users.

Goals:

Validate system works in production environment
Collect baseline metrics
Identify edge cases
Build confidence

Metrics to track:

Prediction accuracy vs. ground truth
Latency and error rates
Coverage (% of requests handled)
Cost per request

Success criteria:

Accuracy > 90%
P95 latency < 2x target
Error rate < 1%
Cost within budget

Duration: 1-2 weeks, minimum 10K requests

Walk: Limited Rollout (Weeks 3-6)

Show AI results to small % of users, with easy fallback to old system.

Goals:

Validate user acceptance
Measure engagement impact
Catch production issues early
Build operational muscle

Rollout schedule:

Week 3: 1% of traffic (use LaunchDarkly or similar)
Week 4: 5% of traffic
Week 5: 10% of traffic
Week 6: 25% of traffic

Metrics to track:

User engagement vs. control
Task success rate
User satisfaction (surveys, NPS)
Support ticket volume

Success criteria:

Engagement lift > 5%
Success rate > baseline
Satisfaction neutral or positive
Support tickets not increasing

Auto-rollback triggers:

Error rate > 2%
Latency > 3x baseline
Engagement drop > 10%
Support tickets spike > 50%

Run: Full Rollout (Weeks 7+)

Ramp to 100% of traffic with monitoring and gradual expansion.

Goals:

Reach full scale
Optimize for cost and latency
Build continuous improvement loop
Expand to new use cases

Rollout schedule:

Week 7: 50% of traffic
Week 8: 75% of traffic
Week 9: 90% of traffic
Week 10: 100% of traffic

Metrics to track:

All walk metrics, plus:
Cost efficiency trends
Model drift detection
Feature expansion opportunities
Competitive benchmarks

Success criteria:

All walk criteria maintained at scale
Cost per request decreasing
Model quality stable or improving
New use cases identified

Real-World Example

At Google, we deployed ML-powered deployment automation:

Crawl (2 weeks):

Ran automation in shadow mode
Collected 50K deployment predictions
Accuracy: 94%, latency: 60s, cost: $0.02/deployment

Walk (4 weeks):

Week 1: 1% rollout (500 deployments)
Week 2: 5% rollout (2,500 deployments)
Week 3: 10% rollout (5,000 deployments)
Week 4: 25% rollout (12,500 deployments)
Engagement lift: 15%, success rate: 96%, satisfaction: +8 NPS

Run (4 weeks):

Ramped to 100% over 4 weeks
Reduced deployment time from 6 hours to 80 minutes
Maintained 99.99% uptime
Expanded to 30+ product teams

Full details in the ML Platform case study.

Decision Framework

When to move from Crawl to Walk

✅ All success criteria met ✅ Team confident in system ✅ Rollback plan tested ✅ Monitoring in place ✅ On-call rotation ready

❌ Don't move if:

Accuracy < 90%
Error rate > 1%
Cost exceeds budget
Team not confident

When to move from Walk to Run

✅ All success criteria met at 25% traffic ✅ No auto-rollbacks in past week ✅ User feedback positive ✅ Support tickets stable ✅ Cost model validated

❌ Don't move if:

Engagement lift < 5%
Support tickets increasing
Any auto-rollbacks triggered
Cost per request increasing

When to pause or rollback

⚠️ Pause if:

Error rate spikes > 2x
Latency spikes > 3x
Support tickets spike > 50%
User satisfaction drops

🚨 Rollback if:

Critical bug discovered
Data privacy issue
Compliance violation
User safety concern

Implementation Checklist

Crawl Checklist

Shadow mode infrastructure
Baseline metrics collection
Ground truth comparison
Cost tracking
Success criteria defined

Walk Checklist

Feature flag system
Gradual rollout plan
Auto-rollback triggers
User feedback collection
Support team briefing

Run Checklist

Full monitoring dashboard
Cost optimization plan
Model drift detection
Continuous improvement process
Expansion roadmap

Measuring Success

Track these metrics by phase:

Crawl:

Prediction accuracy
Latency and error rates
Coverage and cost

Walk:

User engagement lift
Task success rate
Satisfaction scores
Support ticket volume

Run:

All walk metrics at scale
Cost efficiency trends
Model quality stability
Feature expansion velocity

Target: Meet criteria for each phase before advancing

Loading...

Crawl-Walk-Run Deployment Ladder

Crawl-Walk-Run Deployment Ladder

The Three Phases

Crawl: Shadow Mode (Weeks 1-2)

Walk: Limited Rollout (Weeks 3-6)

Run: Full Rollout (Weeks 7+)

Real-World Example

Decision Framework

When to move from Crawl to Walk

When to move from Walk to Run

When to pause or rollback

Implementation Checklist

Crawl Checklist

Walk Checklist

Run Checklist

Measuring Success

When to Use

Common Failure Modes

Instrumentation Checklist

Related Frameworks

Agent Reliability Patterns

Safety SLO Ladder

Want help implementing this framework?

Crawl-Walk-Run Deployment Ladder

Crawl-Walk-Run Deployment Ladder

The Three Phases

Crawl: Shadow Mode (Weeks 1-2)

Walk: Limited Rollout (Weeks 3-6)

Run: Full Rollout (Weeks 7+)

Real-World Example

Decision Framework

When to move from Crawl to Walk

When to move from Walk to Run

When to pause or rollback

Implementation Checklist

Crawl Checklist

Walk Checklist

Run Checklist

Measuring Success

When to Use

Common Failure Modes

Instrumentation Checklist

Related Frameworks

Agent Reliability Patterns

Safety SLO Ladder

Want help implementing this framework?