Canary Carousel: Zero-Downtime Deployments with Bash Scripts
Zero-downtime deployments are crucial for maintaining service availability in production environments. In this article, we’ll explore the canary-carousel
system, a robust blue-green deployment solution built with two Bash scripts that work together to ensure seamless application updates.
Understanding the Canary Carousel System
The Canary Carousel deployment system implements a blue-green deployment strategy using two complementary Bash scripts. This approach ensures zero-downtime deployments by maintaining two identical production environments and switching between them atomically.
The system consists of:
canary-carousel-ci.sh
- The entry point that handles environment setup and service routingcanary-carousel.sh
- The core deployment engine that manages the blue-green swap
Script 1: canary-carousel-ci.sh - Environment Orchestration
This script serves as the entry point for deployments, handling environment setup and service-specific configurations. It’s designed to be called via SSH or directly in CI/CD pipelines.
canary-carousel-ci.sh
#!/bin/bash
set -euo pipefail
# Use SSH_ORIGINAL_COMMAND if provided
if [[ -n "${SSH_ORIGINAL_COMMAND:-}" ]]; then
# Split into arguments while preserving spaces in quotes
eval "args=($SSH_ORIGINAL_COMMAND)"
SERVICE="${args[0]:-}"
ENVIRONMENT="${args[1]:-dev}"
else
SERVICE=""
ENVIRONMENT=""
fi
# Validate input
if [[ -z "$SERVICE" ]]; then
echo "Error: No service provided. Usage: ssh ... <project_bar|agency>" >&2
exit 1
fi
case "$SERVICE" in
project_foo)
set -a # Auto-export variables
# Source global environment
[ -f /home/gitlab-runner-deployer/.env ] && source /home/gitlab-runner-deployer/.env
# Switch to service directory and source local environment
pushd "/home/apps/$SERVICE" >/dev/null
[ -f .env ] && source .env
# Execute deployment sequence
if [[ "$ENVIRONMENT" == "demo" ]]; then
/usr/local/bin/canary-carousel project_foo demo -t 8081 -g 8086
else
/usr/local/bin/canary-carousel project_foo dev -t 8181 -g 8186
fi
# Cleanup
popd >/dev/null
set +a
;;
project_bar)
set -a # Auto-export variables
# Source global environment
[ -f /home/gitlab-runner-deployer/.env ] && source /home/gitlab-runner-deployer/.env
# Switch to service directory and source local environment
pushd "/home/apps/$SERVICE" >/dev/null
[ -f .env ] && source .env
# Execute deployment sequence
if [[ "$ENVIRONMENT" == "demo" ]]; then
/usr/local/bin/canary-carousel project_bar demo -t 8080 -g 8085
else
/usr/local/bin/canary-carousel project_bar dev -t 8180 -g 8185
fi
# Cleanup
popd >/dev/null
set +a
;;
*)
echo "Error: Invalid service '$SERVICE'. Must be 'project_bar' or 'agency'" >&2
exit 1
;;
esac
Script 2: canary-carousel.sh - Core Deployment Engine
The core deployment engine implements the blue-green swap logic with comprehensive health checking and rollback capabilities.
canary-carousel.sh
#!/usr/bin/env bash
##
# Blue-Green Deployment Script for Dockerized Applications
#
# TODO:
# - Check for UFW for connections
# `ufw allow in on br-1283b5742d30 to any port 8088`
#
# Features:
# - Zero-downtime deployments using blue-green strategy
# - Automatic version detection and switching
# - Comprehensive health checks with timeout
# - Atomic Nginx configuration switching with rollback
# - Docker container lifecycle management
# - Parameterized for multiple applications/environments
#
# Usage: deploy.sh <app> <env> -t <theta_port> -g <gamma_port>
# Example: ./deploy.sh manhatan dev -t 8180 -g 8185
# Example: ./deploy.sh kalipso demo -t 8180 -g 8185 -i 3000
#
# Version: 0.1.0
# Last Updated: 2025-06-24
##
set -euo pipefail
# Color definitions
BLUE='\033[1;34m'
RED='\033[1;31m'
GREEN='\033[1;32m'
YELLOW='\033[1;33m'
CYAN='\033[1;36m'
MAGENTA='\033[1;35m'
NC='\033[0m' # No color (reset)
show_help() {
cat <<EOF
Blue-Green Deployment Script
Usage: $0 <app> <env> -t <theta_port> -g <gamma_port>
Parameters:
<app> Application name (e.g., ulysa, ulacy)
<env> Environment (dev or demo)
-t, --theta_port Port for theta version
-g, --gamma_port Port for gamma version
-i, --internal_port Internal container port (default: 8080)
Features:
- Zero-downtime deployments
- Automatic version detection
- Health checks with progress monitoring
- Atomic configuration switching
- Automatic rollback on failures
- Container lifecycle management
Error Handling:
- Strict error checking throughout
- Automatic cleanup on failure
- Nginx configuration rollback on reload failure
EOF
exit 1
}
# Validate input parameters
if [[ $# -eq 0 ]]; then
show_help
fi
# Parse arguments
APP="$1"
ENV="$2"
shift 2
THETA_PORT=""
GAMMA_PORT=""
INTERNAL_PORT=8080
while [[ $# -gt 0 ]]; do
case "$1" in
-t | --theta_port)
THETA_PORT="$2"
shift 2
;;
-g | --gamma_port)
GAMMA_PORT="$2"
shift 2
;;
-h | --help)
show_help
;;
-i | --internal_port) # ✅ New parameter
INTERNAL_PORT="$2"
shift 2
;;
*)
echo "Unknown option: $1"
show_help
;;
esac
done
# Validate ports
if [[ -z "$THETA_PORT" || -z "$GAMMA_PORT" ]]; then
echo "Error: Both theta_port and gamma_port must be specified"
show_help
fi
deploy_app() {
# Configuration variables
local SYMLINK_PATH="/etc/nginx/includes/${APP}_${ENV}_active.map"
local THETA_MAP="/etc/nginx/includes/${APP}_${ENV}_theta.map"
local GAMMA_MAP="/etc/nginx/includes/${APP}_${ENV}_gamma.map"
local NETWORK_NAME="agency-net"
local HEALTH_ENDPOINT="/health"
local IMAGE_NAME="${APP}_${ENV}"
local CONTAINER_NAME="${APP}_${ENV}"
local MAX_WAIT=90
local INTERVAL=15
local current_active new_version new_port container_name log_pid
# Main deployment workflow
detect_active_version || exit 1
build_new_container || exit 1
run_new_container "$new_port" "$new_version" || exit 1
health_check_new_container_background || exit 1
switch_active_version
reload_nginx_with_rollback || exit 1
cleanup_previous_container
# Kill log stream on successful deployment
kill_docker_log
echo -e "${GREEN}Deployment successful! ${new_version} is now active for $APP in $ENV environment${NC}"
exit 0
}
# Function to determine current active version
detect_active_version() {
if [ ! -L "$SYMLINK_PATH" ]; then
handle_error "Active version symlink not found at $SYMLINK_PATH"
fi
local target
target=$(readlink -f "$SYMLINK_PATH")
if [[ "$target" == *"theta"* ]]; then
current_active="theta"
new_version="gamma"
new_port="$GAMMA_PORT"
elif [[ "$target" == *"gamma"* ]]; then
current_active="gamma"
new_version="theta"
new_port="$THETA_PORT"
else
handle_error "Unknown active version in symlink"
fi
echo -e "${CYAN}Application${NC}: ${MAGENTA}$APP${NC}"
echo -e "${CYAN}Environment${NC}: ${MAGENTA}$ENV${NC}"
echo -e "${CYAN}Current active version${NC}: ${YELLOW}${current_active}${NC}"
echo -e "${CYAN}Deploying new${NC} ${YELLOW}${new_version}${NC} ${CYAN}version on port${NC} ${MAGENTA}${new_port}${NC}"
}
# Function to build new container image
build_new_container() {
# Dynamically set environment file if it exists
local envfile="${ENV}.${new_version}.env"
if [[ -f "$envfile" ]]; then
echo -e "${CYAN}Setting environment file${NC}: ${MAGENTA}$envfile${NC} → .env"
cp "$envfile" .env
fi
echo -e "${CYAN}Building${NC} ${MAGENTA}${APP}_${ENV}:${new_version}${NC}..."
if ! docker build -t "${APP}_${ENV}:${new_version}" -f Dockerfile .; then
handle_error "Docker build failed"
fi
}
# Function to run new container
run_new_container() {
local new_port=$1
local new_version=$2
echo -e "${BLUE}Starting new container ${new_version} on port ${new_port}...${NC}"
docker run -d \
--name "${CONTAINER_NAME}_${new_version}" \
--network "$NETWORK_NAME" \
-p "${new_port}:${INTERNAL_PORT}" \
-e "NODE_ENV=production" \
"${IMAGE_NAME}:${new_version}"
container_name="${CONTAINER_NAME}_${new_version}"
echo -e "${CYAN}Starting log stream for ${container_name}...${NC}"
docker logs -f "$container_name" &
log_pid=$!
if [ $? -ne 0 ]; then
handle_error "Failed to start container ${new_version}"
fi
}
# Function to perform health check in background
health_check_new_container_background() {
# Run health check in background
health_check_process &
local health_pid=$!
# Wait for health check to complete
if ! wait $health_pid; then
handle_error "Health checks failed for ${new_version} after $MAX_WAIT seconds"
fi
# Log stream will be killed in deploy_app after successful deployment
}
# Health check process (runs in background)
health_check_process() {
local healthy=false
local start_time=$(date +%s)
while [ $healthy = false ]; do
local current_time=$(date +%s)
local elapsed=$((current_time - start_time))
if [ $elapsed -ge $MAX_WAIT ]; then
echo -e "${RED}Health check timeout after ${MAX_WAIT} seconds${NC}"
return 1
fi
echo -e "${BLUE}Checking health on port ${new_port}...${NC}"
# Simple single port check
if curl -fs "http://localhost:${new_port}${HEALTH_ENDPOINT}" >/dev/null 2>&1; then
echo -e "${GREEN}Health check passed for port ${new_port}${NC}"
healthy=true
break
else
echo -e "${YELLOW}Health check failed for port ${new_port}, retrying in ${INTERVAL}s...${NC}"
sleep $INTERVAL
fi
done
return 0
}
# Function to switch active version
switch_active_version() {
echo -e "${CYAN}Switching active version to${NC} ${YELLOW}${new_version}${NC}..."
rm -f "$SYMLINK_PATH"
ln -s "$([[ "${new_version}" == "theta" ]] && echo "$THETA_MAP" || echo "$GAMMA_MAP")" "$SYMLINK_PATH"
}
# Function to reload Nginx with rollback
reload_nginx_with_rollback() {
echo -e "${CYAN}Reloading Nginx configuration...${NC}"
# Test configuration before reload
if ! sudo nginx -t; then
echo -e "${RED}CRITICAL: New configuration test failed! Attempting rollback...${NC}"
rollback_symlink
if sudo nginx -t; then
handle_error "New configuration test failed. Rolled back to ${current_active}"
else
handle_error "CRITICAL: Configuration test failed AND rollback failed! Manual intervention required"
fi
fi
# Attempt reload with tested configuration
if ! sudo nginx -s reload; then
echo -e "${RED}CRITICAL: Nginx reload failed! Attempting rollback...${NC}"
rollback_symlink
# Test rollback configuration
if ! sudo nginx -t; then
handle_error "CRITICAL: Rollback configuration test failed after reload failure!"
fi
# Attempt reload with previous config
if sudo nginx -s reload; then
handle_error "Nginx reload failed. Rolled back to ${current_active}"
else
handle_error "CRITICAL: Nginx reload failed AND rollback reload failed! Manual intervention required"
fi
fi
}
# Function to rollback symlink
rollback_symlink() {
rm -f "$SYMLINK_PATH"
ln -s "$([[ "${current_active}" == "theta" ]] && echo "$THETA_MAP" || echo "$GAMMA_MAP")" "$SYMLINK_PATH"
}
# Function to cleanup previous container
cleanup_previous_container() {
local old_container="${APP}_${ENV}_${current_active}"
echo -e "${CYAN}Stopping previous version${NC} (${YELLOW}${current_active}${NC})..."
if ! docker rm -f "$old_container" >/dev/null 2>&1; then
echo -e "${YELLOW}WARNING: Failed to remove old container $old_container (may not exist)${NC}"
fi
} # ✅ Proper function closure
# Error handling function
handle_error() {
echo -e "${RED}ERROR: $1${NC}" >&2
if [ -n "${container_name:-}" ]; then
docker rm -f "${container_name}" >/dev/null 2>&1 || true
fi
kill_docker_log
exit 1
}
kill_docker_log() {
if [[ -n "${log_pid:-}" ]]; then
kill $log_pid 2>/dev/null || true
fi
}
# Execute the deployment
deploy_app
How the Scripts Work Together
The deployment process follows this sequence:
canary-carousel-ci.sh
receives the deployment request and sets up the environment- It calls
canary-carousel.sh
with appropriate service-specific parameters - The core script detects the currently active version and prepares the new version
- Docker builds and runs the new container with comprehensive health checking
- Nginx configuration is atomically switched to route traffic to the new version
- The previous version is cleaned up, completing the zero-downtime deployment
Integration with GitLab CI/CD
The scripts can be triggered via SSH commands in your pipeline:
.gitlab-ci.yml
stages:
- build
- lint
- test
- publish
- deploy
workflow: rules: - if: $CI_MERGE_REQUEST_ID - if: $CI_COMMIT_BRANCH == "main" - if:
$CI_COMMIT_TAG build: stage: build script: - git fetch --tags --unshallow || git fetch
--tags # Necessary to get dynamic version and pyproject info - make build artifacts:
paths: - dist/
lint: stage: lint script: - make lint rules: - if: $CI_PIPELINE_SOURCE ==
"merge_request_event"
test: stage: test script: - make test rules: - if: $CI_PIPELINE_SOURCE ==
"merge_request_event"
publish: stage: publish variables: TWINE_USERNAME: gitlab-ci-token TWINE_PASSWORD:
$CI_JOB_TOKEN
script:
- git fetch --tags --unshallow || git fetch --tags # Necessary to get dynamic version and pyproject info
- make publish
rules:
- if: '$CI_COMMIT_BRANCH == "main"'
deploy-dev: stage: deploy script: - ssh canary-carousel-ci.sh "project_bar" environment:
name: dev url: https://dev.host.com rules: - if: '$CI_COMMIT_BRANCH == "main" &&
$CI_COMMIT_TAG == null'
deploy-demo: stage: deploy script: - ssh canary-carousel-ci.sh "project_bar"
environment: name: demo url: https://demo.host.com rules: - if: $CI_COMMIT_TAG
review-app: stage: deploy script: - export
LINK="https://${CI_COMMIT_SHORT_SHA}.review.ultrai.com" - echo "to be visible at
${LINK} soon"
- ssh canary-carousel-ci.sh "project_bar"
environment:
name: review/${CI_COMMIT_SHORT_SHA} url:
https://${CI_COMMIT_SHORT_SHA}.review.ultrai.com rules: - if: $CI_MERGE_REQUEST_ID # -
if: $CI_PIPELINE_SOURCE == "merge_request_event"
Docker and Environment Considerations
Proper container configuration is essential. The scripts automatically handle:
- Environment file selection based on version (
dev.theta.env
,demo.gamma.env
, etc.) - Container lifecycle management with automatic cleanup
- Health checks using the
/health
endpoint - Port mapping for both primary and secondary ports
Advanced Features and Best Practices
The Canary Carousel system includes several advanced features:
- Automatic Rollback: Failed deployments automatically rollback to the previous version
- Secondary Port Support: Optional mapping of additional ports for specialized services
- Comprehensive Health Checking: Multi-port health checks with timeout and progress monitoring
- Atomic Configuration Switching: Nginx configuration changes are applied atomically
- Color-Coded Logging: Visual feedback through colored terminal output
Conclusion
The Canary Carousel deployment system provides a robust solution for zero-downtime deployments. The two-script architecture separates environment concerns from core deployment logic, making it both flexible and maintainable.