Image

6 ways to speed up your CI

Waiting for CI to finish slows down development and can be extremely annoying, especially when CI fails and you have to run it again. Let's take a look into approaches on how to speed up your CI and minimize the inefficient time spent by developers when waiting on CI to finish.

We'll go through 6 different methods to speed up CI:

  • Makisu as your Docker build tool
  • Caching between Stages
  • Concurrent Jobs for each Stage
  • Running Tests across all CPUs
  • Parallel Tasks
  • Autoscaling CI Runners

Building Docker Images with Makisu + Redis KV storage

We've tried three approaches; default image building with Docker using --cache-from, Google's Kaninko with --cache=true and Uber's Makisu with Redis KV storage.

Kaniko is Google's image build tool which enables building images without a docker daemon, making it great for building images within a Kubernetes cluster as it is not possible to run a docker daemon in a standard Kubernetes cluster.

Makisu is also an image build tool that does not rely on the docker daemon making it work great within a Kubernetes cluster and it also adds a couple of new things as e.g distributed cache support and customizable layer caching, possibility to choose what layers to cache.

Docker provides it's default image building with options as --cache-from to chose images to cache from. Building images with Docker can be excluded as an option if you are running CI within a standard Kubernetes cluster.

We have had the fastest CI with Makisu even though the setup is more complex due to deploying Redis.

With Makisu we got the following perks:

  • Distributed cache support, builds that share Dockerfile directives can share cache. Cache hit rate is better across branches than Docker's --cache-from which is great for single branch cache.
  • Great image compression for large images
  • Customizable layer generation and caching, possibility to choose what layers to cache and generate with the function #!COMMIT.
  • Overall we experienced a faster build time than with the other build tools.

You can read more on why Makisu is great here. Now let's get going by deploying a Redis node for Makisu.

Deploying Redis(AWS Elasticache) with Terraform

Redis is an in-memory key-value storage and we deploy it alongside Makisu as Makisu uses it as a key-value storage to map the Dockerfile directives with the hash of the Dockerfile directive layers. Elasticache is a fully managed Redis solution by AWS.

Below is a example of how to deploy an Elasticache Redis service with Terraform. If you intend or have already deployed Redis in a different way skip til the next section.

We'll create a Elasticache subnet group in your chosen VPC for the Redis node with Terraform:

resource "aws_elasticache_subnet_group" "gitlab_runner" {
  name       = "gitlab-runner"
  subnet_ids = module.[ci_runners_vpc].private_subnets
}

And then we'll create the Elasticache Redis cluster within the subnet group created.

resource "aws_elasticache_cluster" "gitlab_runner" {
  cluster_id           = "gitlab-runner"
  engine               = "redis"
  node_type            = "cache.t3.micro"
  num_cache_nodes      = 1
  subnet_group_name    = aws_elasticache_subnet_group.gitlab_runner.name
  parameter_group_name = "default.redis5.0"
  engine_version       = "5.0.5"
  port                 = 6379
}

Note that the Elasticache node needs to share the same VPC(Virtual Private Cloud, used to share network amongst AWS resources) as the CI runners deployed, otherwise the runners won't be able to access the Redis service.

Creating an VPC to be sharable between the Elasticache node and your CI runners can be simplified with the Terraform VPC module:

module "gitlab_runner_vpc" {
  source = "terraform-aws-modules/vpc/aws"

  name = "gitlab_runner_vpc"
  cidr = "10.1.0.0/16"

  azs             = ["eu-west-1a"]
  private_subnets = ["10.1.1.0/24"]
  public_subnets  = ["10.1.101.0/24"]

  map_public_ip_on_launch = "false"

  tags = {
    Environment = var.environment
    Terraform   = true
  }
  ...
}

Now you can specify the VPC created for both the CI VMs and the Elasticache subnet group. For example we use the awesome terraform module to deploy Gitlab runners on cheap spot instances and as shown below you use vpc_id = module.gitlab_runner_vpc.vpc_id to place CI VMs into the VPC created above.

module "gitlab_runner" {
  source = "npalm/gitlab-runner/aws"

  aws_region  = var.aws_region
  environment = var.environment

  vpc_id                   = module.gitlab_runner_vpc.vpc_id
  subnet_ids_gitlab_runner = module.gitlab_runner_vpc.private_subnets
  runners_name             = "gitlab_runner_honeylogic"
  runners_gitlab_url       = "https://gitlab.com"
  runners_concurrent       = "5"
  runners_limit            = "5"
  runners_idle_time        = "1800"
  runners_idle_count       = "0"

  instance_type = "t3.large"
  ...
}

Then when you create the Elasticache subnet group, specify the VPC's subnets you created previously:

resource "aws_elasticache_subnet_group" "gitlab_runner" {
  name       = "gitlab-runner"
  subnet_ids = module.gitlab_runner_vpc.private_subnets
}

Now you have Gitlab runners that have access to an Redis(Elasticache) node which Makisu uses as a key value storage for caching.

Using Makisu

Now when Redis is setup we'll setup the Gitlab-CI stages(we use Gitlab-CI but tailor the setup to your own CI). We create templates for common CI builds so they become reusable across projects. The below template creates a docker image and tags it with latest and the git commit short sha.

gitlab-ci-makisu-build.yml

.build:
  extends:
    - .default_vars
  stage: build
  image:
    name: gcr.io/makisu-project/makisu-alpine:v0.1.12
    entrypoint: [""]
  before_script:
    - echo "{\"$CI_REGISTRY\":{\".*\":{\"security\":{\"basic\":{\"username\":\"$CI_REGISTRY_USER\",\"password\":\"$CI_REGISTRY_PASSWORD\"}}}}}" > $REGISTRY_CONFIG
  variables:
    REDIS_CACHE_ADDRESS: my-redis-address:6379
    REGISTRY_CONFIG: /registry-config.yaml

build:default:
  extends:
    - .default_vars
    - .build
  script:
    - /makisu-internal/makisu --log-fmt=console build --push=$CI_REGISTRY --modifyfs -t=$CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA --replica $CI_REGISTRY_IMAGE --redis-cache-addr=$REDIS_CACHE_ADDRESS --registry-config=$REGISTRY_CONFIG $CI_PROJECT_DIR

To better grasp the YML code piece and all predefined environment variables below I'll explain how the above job operates:

  • Clones the repository and Gitlab-CI populates a bunch of predefined environment variables.
  • We echo the credentials by using predefined variables as $CI_REGISTRY(CI registry for the project) and $CI_REGISTRY_USER/PASSWORD into a registry config file which Makisu uses for authorization.
  • The .build definition is just used as a template for all other Makisu build stages so common tasks are stored in the template. Templating stages is a Gitlab-CI feature.
  • We create two variables the REGISTRY_CONFIG location and the REDIS_CACHE_ADDRESS which are accessible by any stage that extends the .build stage.
  • Now we can create the build:default job which extends the build stage and then add the script where we use Makisu to build the image.
  • We use a couple of variables as -t(required) the tag for the image but also --replica which indicates the additional tags you want to push the image as. As mentioned previously we use image:latest and image:git_commit_short_sha as tags as you can see in the command.
  • Remember to specify the --redis-cache-addr so it uses Redis as a KV store. You also always need to specify the build context which by default is the $CI_PROJECT_DIR in our case.

Then we include the Makisu build stage in any repository where we need to build docker images:

include:
  - project: 'honeylogic/gitlab-ci-templates'
    ref: master
    file: 'gitlab-ci-base.yml'
  - project: 'honeylogic/gitlab-ci-templates'
    ref: master
    file: 'gitlab-ci-makisu-build.yml'

I've written a blog on Gitlab-CI templates which will help you grasp how the pieces work together and how you extend and reuse stages from templates.

Uber's Makisu increases the speed of Docker builds by a great amount of time(average 30-40% for us, for Uber itself it is from 40% on average up to 90% faster) and it has many more extensible options. You do not have to use Redis, you can use a HTTP cache. There is a possibility as well to not cache each Dockerfile directive as some stages can be excessive to cache and layer. An example is shown below which is used with the --commit=explicit flag. Specify #!COMMIT and it will only cache the layers with the #!COMMIT comment.

Dockerfile

RUN pip install -r requirements.txt #!COMMIT

Caching between Stages

This section is not correlated with the previous Docker image build caching.

When running CIs you'll most likely have several stages, and sometimes you might the run similar commands across stages. An example of this are dependencies e.g React's NPM modules that you install. For each stage you do not want to install all NPM modules, instead you want to cache them and make them reusable to not slow down the CI with repetitive jobs. With Gitlab caching between stages is achievable by defining the cache variable in your CI yaml file.

cache:
  key: ${CI_COMMIT_REF_SLUG}
  paths:
  - node_modules/

Now the node modules created do not need to be recreated for each stage. Many other CIs support this as Travis-CI and Circle-CI.

Running Tests across all CPUs

Run your tests spread across all CPUs which will speed CI up, make sure you run VMs that are compute based. As I am familiar with Python the most I'll show an Python example.

Python with Pytest-xdist

Achieving this when running python tests with pytest and the plugin pytest-xdist is simple. Pytest-xdist is a plugin which allows parallelization for tests amongst other things.

Install pytest-xdist with pip:

pip install pytest-xdist

Add an argument indicating number of cores to be used or auto to automatically detect the number of cores the machine has.

pytest -n [auto/number of cores]

The speed up won't be highly noticeable for small test suites, but for anything larger it is great. Remember to use compute-based VMs with several vCPUs to maximize the effect of parallelization.

Concurrent Jobs for each Stage

Spread your jobs for each stages across several VMs and run all jobs concurrently. Do not stack jobs sequentially, e.g run linting and testing seperately as they do not depend on each other. Gitlab supports this by default, all jobs in a stage run concurrently on different Gitlab-Runner VMs. As for Gitlab-CI you only need to set a limit of runners and a limit of concurrent runner higher than 1. With the Terraform module mentioned previously you can do this by:

module "gitlab_runner" {
  source = "npalm/gitlab-runner/aws"
  runners_concurrent = "5"
  runners_limit = "5"
}

You can read more in the Gitlab's docs to grasp how to fully customize it and for other CIs as e.g Circle-CI, they provide the same functionality.

Parallel Tasks

We use Task as a task runner for CI. Task is a task runner / build tool which I prefer due to it's syntax language. In Task you can run all dependencies for a task in parallel and this makes the tasks complete faster.

With Task you can specify dependencies with deps:

test:
  deps:
    - setup xyz
    - setup abc
    - setup efg
  desc: Run tests
  cmds:
    - pytest -n auto --cov

All dependencies run in parallel for better and faster performance.

Autoscaling your CI Runners

Create and manage your own runners, use AWS/GCP spot instances and autoscale your runners. Do not have jobs waiting around. This is easily achievable with the Terraform module terraform-aws-gitlab-runner. Just increase the limit and amount of concurrent runners, if you are worried about costs, lower the timeout so instances scale down when not used. Make sure to use spot instances as they are around 70% cheaper.

Summary

Several approaches to speed up your CI have been displayed, some trivial and some more advanced. Even though there was a focus on Gitlab-CI and Terraform, same functionality should be achievable without Terraform and a different CI platform. Long CIs can be blocking and are annoying for developers therefore we try to minimize CI completion time as much as possible.