Crawl-Walk-Run Deployment Ladder
The fastest way to ship AI products is to start slow. This framework provides a three-phase ladder that lets you move fast without breaking things.
The Three Phases
Crawl: Shadow Mode (Weeks 1-2)
Run the AI system in parallel with existing system, but don't show results to users.
Goals:
- Validate system works in production environment
- Collect baseline metrics
- Identify edge cases
- Build confidence
Metrics to track:
- Prediction accuracy vs. ground truth
- Latency and error rates
- Coverage (% of requests handled)
- Cost per request
Success criteria:
- Accuracy > 90%
- P95 latency < 2x target
- Error rate < 1%
- Cost within budget
Duration: 1-2 weeks, minimum 10K requests
Walk: Limited Rollout (Weeks 3-6)
Show AI results to small % of users, with easy fallback to old system.
Goals:
- Validate user acceptance
- Measure engagement impact
- Catch production issues early
- Build operational muscle
Rollout schedule:
- Week 3: 1% of traffic
- Week 4: 5% of traffic
- Week 5: 10% of traffic
- Week 6: 25% of traffic
Metrics to track:
- User engagement vs. control
- Task success rate
- User satisfaction (surveys, NPS)
- Support ticket volume
Success criteria:
- Engagement lift > 5%
- Success rate > baseline
- Satisfaction neutral or positive
- Support tickets not increasing
Auto-rollback triggers:
- Error rate > 2%
- Latency > 3x baseline
- Engagement drop > 10%
- Support tickets spike > 50%
Run: Full Rollout (Weeks 7+)
Ramp to 100% of traffic with monitoring and gradual expansion.
Goals:
- Reach full scale
- Optimize for cost and latency
- Build continuous improvement loop
- Expand to new use cases
Rollout schedule:
- Week 7: 50% of traffic
- Week 8: 75% of traffic
- Week 9: 90% of traffic
- Week 10: 100% of traffic
Metrics to track:
- All walk metrics, plus:
- Cost efficiency trends
- Model drift detection
- Feature expansion opportunities
- Competitive benchmarks
Success criteria:
- All walk criteria maintained at scale
- Cost per request decreasing
- Model quality stable or improving
- New use cases identified
Real-World Example
At Google, we deployed ML-powered deployment automation:
Crawl (2 weeks):
- Ran automation in shadow mode
- Collected 50K deployment predictions
- Accuracy: 94%, latency: 60s, cost: $0.02/deployment
Walk (4 weeks):
- Week 1: 1% rollout (500 deployments)
- Week 2: 5% rollout (2,500 deployments)
- Week 3: 10% rollout (5,000 deployments)
- Week 4: 25% rollout (12,500 deployments)
- Engagement lift: 15%, success rate: 96%, satisfaction: +8 NPS
Run (4 weeks):
- Ramped to 100% over 4 weeks
- Reduced deployment time from 6 hours to 80 minutes
- Maintained 99.99% uptime
- Expanded to 30+ product teams
Decision Framework
When to move from Crawl to Walk
✅ All success criteria met
✅ Team confident in system
✅ Rollback plan tested
✅ Monitoring in place
✅ On-call rotation ready
❌ Don't move if:
- Accuracy < 90%
- Error rate > 1%
- Cost exceeds budget
- Team not confident
When to move from Walk to Run
✅ All success criteria met at 25% traffic
✅ No auto-rollbacks in past week
✅ User feedback positive
✅ Support tickets stable
✅ Cost model validated
❌ Don't move if:
- Engagement lift < 5%
- Support tickets increasing
- Any auto-rollbacks triggered
- Cost per request increasing
When to pause or rollback
⚠️ Pause if:
- Error rate spikes > 2x
- Latency spikes > 3x
- Support tickets spike > 50%
- User satisfaction drops
🚨 Rollback if:
- Critical bug discovered
- Data privacy issue
- Compliance violation
- User safety concern
Implementation Checklist
Crawl Checklist
Walk Checklist
Run Checklist
Measuring Success
Track these metrics by phase:
Crawl:
- Prediction accuracy
- Latency and error rates
- Coverage and cost
Walk:
- User engagement lift
- Task success rate
- Satisfaction scores
- Support ticket volume
Run:
- All walk metrics at scale
- Cost efficiency trends
- Model quality stability
- Feature expansion velocity
Target: Meet criteria for each phase before advancing