Terraform Module: Cloud Run Worker Pools For GitHub Runners

by Alex Johnson 60 views

Introduction

In today's dynamic software development landscape, efficient and scalable infrastructure is paramount. This article delves into the implementation of a Terraform module designed to deploy Cloud Run Worker Pools, a crucial component for running GitHub Actions runners. Our focus will be on providing a detailed, step-by-step guide to creating a robust and cost-effective solution. This article provides a comprehensive guide to implementing a Terraform module for Cloud Run Worker Pools. Cloud Run Worker Pools are essential for running GitHub Actions runners, offering a scalable solution for managing workflows. This guide provides a step-by-step approach to creating a robust and cost-effective infrastructure using Terraform.

Understanding Cloud Run Worker Pools

Cloud Run Worker Pools offer long-running container instances capable of processing tasks from external queues. This makes them ideal for running GitHub Actions runners, which scale according to workflow demands. By leveraging Cloud Run Worker Pools, we can ensure that our CI/CD pipelines are both responsive and resource-efficient. This is especially important for organizations that require flexible scaling options and cost optimization.

Objective: Crafting a Terraform Module

Our primary objective is to develop a Terraform module that automates the deployment of Cloud Run Worker Pools tailored for GitHub Actions runners. This module will encapsulate the necessary configurations and dependencies, making it reusable and easy to manage. The module's design prioritizes cost optimization, scalability, and seamless integration with GitHub Actions.

Technical Requirements: Setting the Stage

Before diving into the implementation, let's outline the technical specifications for our Cloud Run Worker Pool. These requirements are designed to balance performance and cost-effectiveness. This involves carefully configuring scaling options, CPU and memory allocations, timeout settings, and concurrency limits.

Cloud Run Worker Pool Configuration

To optimize performance and cost, the following configurations are crucial:

Setting Value Rationale
Scaling 0-10 instances Cost optimization with room to scale
CPU 2 vCPU Standard GitHub runner spec
Memory 4Gi Sufficient for most workflows
Timeout 3600s (1 hour) Max job duration
Concurrency 1 One job per instance

These settings ensure that our worker pool can handle varying workloads while minimizing unnecessary costs. The scaling range of 0-10 instances allows the pool to scale down to zero when idle, further reducing expenses.

Implementation Location

The Terraform module will reside in the following directory structure:

  • Directory: terraform/modules/worker-pool/
  • Files: main.tf, variables.tf, outputs.tf

This structure promotes modularity and maintainability, making it easier to manage and update the module in the future. The main.tf file will contain the primary resource definitions, variables.tf will define the input variables, and outputs.tf will declare the output values.

Terraform Implementation: The Core Logic

The heart of our solution lies in the Terraform implementation. We'll use the google-beta provider to interact with the Cloud Run Worker Pool API. The following code block demonstrates the main.tf file, which defines the worker pool resource and its configurations.

# terraform/modules/worker-pool/main.tf

terraform {
  required_providers {
    google-beta = {
      source  = "hashicorp/google-beta"
      version = "~> 5.0"
    }
  }
}

variable "project_id" {
  type = string
}

variable "region" {
  type    = string
  default = "us-central1"
}

variable "name" {
  type    = string
  default = "github-runners"
}

variable "image" {
  type        = string
  description = "Container image URL from Artifact Registry"
}

variable "service_account_email" {
  type = string
}

variable "min_instances" {
  type    = number
  default = 0
}

variable "max_instances" {
  type    = number
  default = 10
}

variable "cpu" {
  type    = string
  default = "2"
}

variable "memory" {
  type    = string
  default = "4Gi"
}

variable "github_org" {
  type    = string
  default = "Matchpoint-AI"
}

variable "runner_labels" {
  type    = string
  default = "self-hosted,cloud-run,linux,x64"
}

variable "secrets" {
  type = object({
    app_id          = string
    installation_id = string
    private_key     = string
  })
  description = "Secret Manager secret IDs for GitHub App credentials"
}

# Cloud Run Worker Pool (beta)
resource "google_cloud_run_v2_worker_pool" "runners" {
  provider = google-beta

  name     = var.name
  location = var.region
  project  = var.project_id

  template {
    containers {
      image = var.image

      resources {
        limits = {
          cpu    = var.cpu
          memory = var.memory
        }
      }

      # Environment variables
      env {
        name  = "GITHUB_ORG"
        value = var.github_org
      }

      env {
        name  = "RUNNER_LABELS"
        value = var.runner_labels
      }

      # Secrets from Secret Manager
      env {
        name = "GITHUB_APP_ID"
        value_source {
          secret_key_ref {
            secret  = var.secrets.app_id
            version = "latest"
          }
        }
      }

      env {
        name = "GITHUB_APP_INSTALLATION_ID"
        value_source {
          secret_key_ref {
            secret  = var.secrets.installation_id
            version = "latest"
          }
        }
      }

      env {
        name = "GITHUB_APP_PRIVATE_KEY"
        value_source {
          secret_key_ref {
            secret  = var.secrets.private_key
            version = "latest"
          }
        }
      }
    }

    # Service account
    service_account = var.service_account_email

    # Scaling configuration
    scaling {
      min_instance_count = var.min_instances
      max_instance_count = var.max_instances
    }

    # Timeout for long-running jobs
    timeout = "3600s"

    # Max retries on failure
    max_retries = 3
  }

  labels = {
    component   = "github-runner"
    managed-by  = "terraform"
  }
}

This code snippet defines the Cloud Run Worker Pool resource, configuring its scaling, resource limits, and environment variables. It also integrates with Secret Manager to securely inject GitHub App credentials. This approach ensures that sensitive information is not hardcoded in the configuration.

Variables File: Defining Inputs

The variables.tf file defines the input variables for our module. These variables allow users to customize the deployment according to their specific needs. Each variable includes a description and a default value (where applicable). This makes the module more user-friendly and self-documenting.

# terraform/modules/worker-pool/variables.tf

variable "project_id" {
  description = "GCP Project ID"
  type        = string
}

variable "region" {
  description = "GCP Region for the worker pool"
  type        = string
  default     = "us-central1"
}

variable "name" {
  description = "Name of the worker pool"
  type        = string
  default     = "github-runners"
}

variable "image" {
  description = "Container image URL (from Artifact Registry)"
  type        = string
}

variable "service_account_email" {
  description = "Service account email for the worker pool"
  type        = string
}

variable "min_instances" {
  description = "Minimum number of instances (0 for scale-to-zero)"
  type        = number
  default     = 0
}

variable "max_instances" {
  description = "Maximum number of instances"
  type        = number
  default     = 10
}

variable "cpu" {
  description = "CPU allocation per instance"
  type        = string
  default     = "2"
}

variable "memory" {
  description = "Memory allocation per instance"
  type        = string
  default     = "4Gi"
}

variable "github_org" {
  description = "GitHub organization name"
  type        = string
  default     = "Matchpoint-AI"
}

variable "runner_labels" {
  description = "Comma-separated labels for the runner"
  type        = string
  default     = "self-hosted,cloud-run,linux,x64"
}

variable "secrets" {
  description = "Secret Manager secret IDs for GitHub App credentials"
  type = object({
    app_id          = string
    installation_id = string
    private_key     = string
  })
}

The variables defined here cover a wide range of configurations, from the GCP project ID to the GitHub organization name. This flexibility ensures that the module can be adapted to various environments and use cases.

Outputs File: Exposing Key Identifiers

The outputs.tf file defines the output values of our module. These outputs provide key identifiers and URIs that can be used by other Terraform configurations or applications. By exposing these values, we facilitate integration with other systems and workflows. This allows users to easily retrieve essential information about the deployed worker pool, such as its ID and URI.

# terraform/modules/worker-pool/outputs.tf

output "worker_pool_id" {
  description = "The ID of the worker pool"
  value       = google_cloud_run_v2_worker_pool.runners.id
}

output "worker_pool_name" {
  description = "The name of the worker pool"
  value       = google_cloud_run_v2_worker_pool.runners.name
}

output "worker_pool_uri" {
  description = "The URI of the worker pool"
  value       = google_cloud_run_v2_worker_pool.runners.uri
}

The outputs include the worker pool ID, name, and URI, which are crucial for monitoring and managing the deployed infrastructure.

Acceptance Criteria: Ensuring Quality

To ensure the quality and reliability of our Terraform module, we've established a set of acceptance criteria. These criteria cover functional requirements, code quality, and verification steps. Meeting these criteria is essential for a successful deployment.

Functional Requirements

The module must meet the following functional requirements:

  • [ ] Worker pool Terraform resource created using google-beta provider
  • [ ] Scaling configured: min=0, max=10
  • [ ] Resources: 2 vCPU, 4Gi memory
  • [ ] Timeout set to 1 hour
  • [ ] GitHub App secrets injected via Secret Manager references
  • [ ] Service account attached
  • [ ] Environment variables set for GitHub org and runner labels

These requirements ensure that the worker pool is correctly configured and integrated with the necessary services.

Code Quality Requirements

The code must adhere to the following quality standards:

  • [ ] Uses google-beta provider for worker pool resource
  • [ ] All variables have descriptions and sensible defaults
  • [ ] Outputs expose key identifiers
  • [ ] Code passes terraform fmt and terraform validate

These standards promote readability, maintainability, and adherence to best practices.

Verification Steps

To verify the deployment, we'll use the following steps:

# After terraform apply
gcloud beta run worker-pools describe github-runners \
  --region=us-central1 \
  --project=${PROJECT_ID}

# Check scaling config
gcloud beta run worker-pools describe github-runners \
  --region=us-central1 \
  --format="value(template.scaling)"

These commands allow us to inspect the deployed worker pool and verify its configuration. By checking the scaling configuration, we can ensure that the pool is scaling as expected.

Dependencies: Mapping the Landscape

Our Terraform module has several dependencies that must be addressed before deployment. These dependencies include IAM configurations, Secret Manager secrets, Artifact Registry images, and the runner image itself. Understanding these dependencies is crucial for a smooth deployment process.

Dependencies List

  • Blocked By: #3 (IAM), #4 (Secrets), #5 (Artifact Registry), #7 (Runner Image)
  • Blocks: #9 (Deploy worker pool)

Addressing these dependencies ensures that all necessary resources are in place before deploying the worker pool. This reduces the risk of deployment failures and ensures that the worker pool functions correctly.

Estimated Complexity: Gauging the Effort

The implementation of this Terraform module involves a moderate level of complexity. This is due to the beta nature of the Cloud Run Worker Pools API and the need to integrate with multiple Google Cloud services. However, the benefits of a scalable and cost-effective CI/CD pipeline make the effort worthwhile.

Complexity Assessment

  • Effort: Medium
  • Risk: Medium (beta API may have quirks)
  • Files Changed: 3 files

Being aware of these complexities allows us to plan accordingly and allocate the necessary resources for the implementation.

Notes: Important Considerations

Several key considerations should be kept in mind during the implementation process. These include the beta status of Cloud Run Worker Pools, the potential need to enable the beta API, and the requirement for the google-beta provider. Keeping these notes in mind can help avoid common pitfalls and ensure a successful deployment.

Key Considerations

  • Cloud Run Worker Pools is a beta feature
  • May need to enable the beta API
  • The google-beta provider is required

Definition of Done: Setting the Goal

To mark the successful completion of this project, we've defined a clear set of criteria. These criteria encompass the creation of the Terraform module, adherence to coding standards, and successful validation. Meeting these criteria signifies that the module is ready for production use.

Completion Criteria

  • [ ] Terraform module created at terraform/modules/worker-pool/
  • [ ] Module uses google-beta provider correctly
  • [ ] All variables documented
  • [ ] Code passes validation
  • [ ] PR merged to main branch

Conclusion

Implementing a Terraform module for Cloud Run Worker Pools is a significant step towards building a scalable and cost-effective CI/CD pipeline. By following the guidelines and best practices outlined in this article, you can create a robust solution that meets your organization's needs. The module's modular design and comprehensive documentation make it easy to manage and maintain, ensuring long-term value. For further reading on Terraform and Cloud Run, consider exploring the official Terraform documentation and Google Cloud Run documentation.