Monitoring Kubernetes InitContainers with Prometheus

Adin Hodovic Nov. 30, 2019 1 min read
Responsive image

Kubernetes InitContainers are a neat way to run arbitrary code before your container starts. It ensures that certain pre-conditions are met before your app is up and running. For example it allows you to:

  • run database migrations with Django or Rails before your app starts
  • ensure a microservice or API you depend on to is running

Unfortunately InitContainers can fail and when that happens you probably want to be notified because your app will never start. Kube-state-metrics exposes plenty of Kubernetes cluster metrics for Prometheus. Combining the two we can monitor and alert whenever we discover container problems. Recently, a pull-request was merged that provides InitContainer data.

The metric kube_pod_init_container_status_last_terminated_reason tells us why a specific InitContainer failed to run; whether it's because it timed out or ran into errors.

To use the InitContainer metrics deploy Prometheus and kube-state-metrics. Then target the metrics server in your Prometheus scrape_configs to ensure we're pulling all the cluster metrics into Prometheus:

- job_name: 'kube-state-metrics'
    - targets: ['kube-state-metrics:8080']

kube_pod_init_container_status_last_terminated_reason contains the metric label reason that can be in five different states:

  • Completed
  • OOMKilled
  • Error
  • ContainerCannotRun
  • DeadlineExceeded

We want to be alerted whenever a metric that is not 'Completed' is scraped because that means an InitContainer has failed to run. Here is an example alerting rule.

  - name: Init container failure
      - alert: InitContainersFailed
        expr: kube_pod_init_container_status_last_terminated_reason{reason!="Completed"} == 1
          summary: '{{ $labels.container }} init failed'
          description: '{{ $labels.container }} has not completed init containers with the reason {{ $labels.reason }}'

Happy monitoring!