Continuous integration infrastructure
This page consolidates resources regarding our CI infrastucture, namely our Buildkite agents fleet. This infrastructure is maintained by the DevInfra team.
Related resources:
Buildkite agents
We maintain a shared fleet of Buildkite agents for continuous integration across all repositories.
- Active agents
- Terraform and Kubernetes manifests
- Images:
- Specific resources:
Buildkite agent queues
We have several different types of agents available. We recommend explicitly declaring which type of agent you want your jobs to run on with the agents: { queue: "standard" }
field in your pipeline configuration.
The currently available queues:
standard
: our default Buildkite agents, which are stateless, currently Docker-in-Docker agents running in Kubernetes- Use those for any non Bazel task, as they ensure that any state leak won’t affect further builds by design.
bazel
: our Bazel Buildkite agents, which are stateful, currently Docker-in-Docker agents running in Kubernetes- Use those for any Bazel task, as Bazel guarantees hermeticity, meaning that a given build won’t affect subsequent build on the same agent.
macos
: a stateful agent currently backed by a single host running MacOS. GCP does not provide instances which run MacOS which is why the host for this agent can be found in AWSus-ohio-2
region.vagrant
: special Buildkite agents desgined to run resource intensive test on docker deployments.
buildkite-job-dispatcher
Our Buildkite agents are stateless, and are deployed in batches as Kubernetes jobs where each agent runs its workload and exits based on the size of the Buildkite backlog.
This is managed by the buildkite-job-dispatcher
:
Another potentially fragile component of this system is buildkite-git-references
, which is a cron job and set of GCP disks that speed up pipelines by reducing the amount of cloning required.
Relevant runbooks:
A diagram overview of how the buildkite-job-dispatcher
works (diagram adapted from here):
sequenceDiagram
participant ba as buildkite-job-dispatcher
participant k8s as CI Kubernetes cluster
participant bk as Buildkite.com
participant gh as GitHub.com
loop
gh->>bk: enqueue jobs
activate bk
ba->>bk: list queued jobs and total agents
bk-->>ba: queued jobs, total agents
activate ba
ba->>ba: determine required agents
alt queue needs agents
ba->>k8s: get template Job
activate k8s
k8s-->>ba: template Job
deactivate k8s
ba->>k8s: get buildkite-git-references volume
activate k8s
k8s-->>ba: volume
deactivate k8s
ba->>ba: modify Job template
ba->>k8s: dispatch new Job
activate k8s
k8s->>bk: register agents
bk-->>k8s: assign jobs to agents
loop while % of Pods not online or completed
par deployed agents process jobs
k8s-->>bk: report completed jobs
bk-->>gh: report pipeline status
deactivate bk
and check previous dispatch
ba->>k8s: list Pods from dispatched Job
k8s-->>ba: Pods states
end
end
end
deactivate ba
k8s->>k8s: Clean up completed Jobs
deactivate k8s
end