Blue/green, canary, rolling deploy: gerçek karar kriterleri

Modern web development’ta deploy stratejisi artık “deploy’u bas” kadar basit değil. Blue/green, canary, rolling – üç popüler yaklaşım. Her biri risk/maliyet/complexity trade-off’u sunuyor.

19 yıllık deneyimden production deploy scenario’larını anlatacağım.

Rolling deploy

En yaygın ve basit. Server cluster’ındaki instance’ları yavaş yavaş güncel version’a yükseltiyorsun.

Process:

Server 1: old version → new version (30 saniye offline)
Server 2: old → new (30 saniye offline)
Server 3, 4, 5… aynı

Load balancer health check ile sadece healthy server’lar serve ediyor.

Avantaj:
– Simple setup
– No extra infrastructure
– Low memory overhead
– Most Kubernetes deployments default

Dezavantaj:
– Rollback gradual (tüm server’lar back)
– Brief version mismatch (bazı user’lar v1, bazıları v2)
– Database schema migration tricky
– Partial failure mid-deploy karışık

Ne zaman kullan:
– Small-medium team
– Standard web app
– Breaking change yok
– Cost-conscious

Blue/green deploy

Two identical production environment. Blue = current, Green = new. Traffic switch atomic.

Process:

Blue environment live (current version)
Green environment’a new version deploy (traffic yok)
Green test ediliyor (smoke test, performance check)
Load balancer traffic’i Green’e switch
Blue idle kalıyor (rollback için hazır)

Avantaj:
– Zero-downtime deploy
– Instant rollback (traffic switch back)
– Pre-production testing on actual environment
– Database migrations easier
– Version mismatch yok

Dezavantaj:
– Double infrastructure cost
– Complex stateful app migration
– Session handling dikkat
– Extra operational complexity

Ne zaman kullan:
– High-traffic production
– Zero-downtime requirement
– Critical systems (payment, healthcare)
– Rollback speed kritik

Canary deploy

Progressive rollout. Önce küçük bir user grubuna, başarılıysa genişlet.

Process:

v2 deploy 1 server’a (cluster’ın %5-10’u)
15-30 dakika monitor et (error rate, latency, business metrics)
OK → %25’e genişlet, monitor et
OK → %50
OK → %100

Issue görünürse: immediate rollback (traffic %5’i etkileyen).

Advanced version: User ID hash’e göre routing. Aynı user her zaman aynı version.

Avantaj:
– Lowest risk deployment
– Real user testing, real data
– Gradual confidence building
– Easy rollback

Dezavantaj:
– Complex traffic splitting
– Feature flag gibi complexity
– Database schema changes careful (backward compat both versions)
– Longer deploy cycle (hours vs minutes)

Ne zaman kullan:
– High-stakes changes (core algorithm, payment logic)
– Large user base
– Breaking change bekliyoruz ama emin değiliz
– A/B testing de aynı anda

Karşılaştırma matrisi

Database schema migration

Deploy stratejisinin en sinsi parçası. Code ile schema eş zamanlı değişiyor mu?

Rolling deploy: v1 ve v2 beraber çalışıyor birkaç dakika. Schema hem v1 hem v2 compatible olmalı. Bu “expand-contract” pattern.

Phases:
1. Expand: yeni column ekle (optional, default value)
2. Deploy new code: yeni column’u write + read
3. Backfill: eski data’ya yeni column doldur
4. Contract: eski column’u sil (sonraki deploy)

3-4 deploy cycle’ına yayılıyor. Sabır gerektiriyor.

Blue/green: Green ayrı database mümkün ama replication karışık. Genelde shared DB, schema expand-contract.

Canary: v1 ve v2 paralel daha uzun süre. Schema compatibility en kritik.

Tool support

Kubernetes: Rolling default. Canary için Flagger, Argo Rollouts. Blue/green için Spinnaker.

AWS ECS: Rolling default. Blue/green CodeDeploy ile.

Vercel, Netlify: Atomic deploy (instant switch). Basic blue/green.

Heroku: Rolling default. Add-on’lar ile canary.

Custom: Nginx config değişiklikleri, HAProxy, Traefik ile kendi routing’in.

Gerçek proje scenario’ları

Scenario 1: SaaS e-commerce, 100K monthly user

Kubernetes cluster, 10 pod
Database migrations regularly
Critical checkout flow

Decision: Rolling default, checkout changes için canary. Blue/green infrastructure cost’u gereksiz.

Scenario 2: Mobile app backend, 1M+ user

High traffic, low-latency requirement
Payment processing core

Decision: Blue/green. Zero downtime zorunlu. Instant rollback payment issue’ları için kritik.

Scenario 3: Analytics platform, internal users

500 internal user
Complex data pipelines
Latency tolerable

Decision: Rolling. Canary/blue-green overkill. Internal user feedback hızlı.

Scenario 4: Fintech, critical ML model update

Trading algorithms
Millions depend on decision

Decision: Canary aggressive. %1 rollout, 2 hafta monitor, sonra genişlet. Risk minimization paramount.

Monitoring during deploy

Deploy sürecinde izlenmesi gerekenler:

1. Error rate. Baseline’dan artış var mı? %0.5 tolerans tipik.

2. Latency p50/p95/p99. Regression var mı?

3. Business metric’ler. Checkout conversion, signup, etc. Deploy sonrası anomaly?

4. Infrastructure health. CPU, memory, disk I/O. Hardware stress?

5. Synthetic checks. Smoke test’ler her 1 dakika.

Deploy failed state = otomatik rollback trigger. Thresholds CI/CD pipeline’ında tanımlı.

Common failure modes

Rolling deploy failure: 50% server’lar new version’da, new version bug’lı. Rollback tüm cluster’ı geri çeviriyor. Downtime risk.

Fix: deploy öncesi staging’de extensive test. Rolling deploy fail fast, smoke test first pod.

Blue/green switchover disaster: Green environment test’i geçti, production traffic alınca unknown behavior (real data, real load). Rollback instant ama some users affected.

Fix: Green’e pre-production traffic yönlendirmek (mirror real requests, responses’ları compare).

Canary silent failure: %5 user’a problem yaratıyor ama metric’lere reflect olmuyor. Slowly widening to 100%.

Fix: comprehensive metric coverage. User feedback channel. Error tracking every path.

Rollback strategy

Deploy strategy’nin önemli parçası: rollback.

Rolling: Previous version’a gradual rollback. Her server old version’a downgrade.

Blue/green: Load balancer switch. Instant.

Canary: Routing rule’unu disable et. New version ignore.

Rollback < 5 dakika target. Production issue'da her dakika revenue loss.

Testing deploys

Deploy automation’ı nasıl test ediyoruz?

1. Staging environment mirrors production. Deploy orada test.

2. Chaos engineering. Deploy sırasında random pod kill, network blip. Recovery test.

3. Game days. Quarterly. Deploy + failure scenario simulation. Team reaction test.

4. Disaster recovery drills. “Production failed deploy → rollback drill”. Timer ile.

Bu disciplin olmadan gerçek deploy sırasında panic mode.

Sonuç

Deploy strategy risk tolerance + infrastructure budget + operational capability’nin fonksiyonu. Rolling çoğu proje için yeter. Blue/green high-stakes için. Canary critical changes için.

Database schema migration expand-contract pattern’i esas oluyor. Rollback strategy en kritik hazırlık.

Monitoring deploy sırasında deploy sonrası kadar önemli. Failed deploy early detect + instant rollback production savings.

Blue/green, canary, rolling deploy: gerçek karar kriterleri

Rolling deploy

Blue/green deploy

Canary deploy

Karşılaştırma matrisi

Database schema migration

Tool support

Gerçek proje scenario’ları

Monitoring during deploy

Common failure modes

Rollback strategy

Testing deploys

Sonuç

Bu konuda bir projeniz mi var?