Cloud Relay infrastructure operations
This document describes operational guidance for Cloud Relay infrastructure. This service is operated on the Managed Services Platform (MSP).
If you need assistance with MSP infrastructure, reach out to the Core Services team in #discuss-core-services.
Service overview
PROPERTY | DETAILS |
---|---|
Service ID | cloud-relay (specification) |
Owners | cloud |
Service kind | Cloud Run service |
Environments | prod |
Docker image | us-central1-docker.pkg.dev/control-plane-5e9ee072/docker/cloud-relay |
Source code | https://github.com/sourcegraph/cloud-relay - . |
Environments
prod
PROPERTY | DETAILS |
---|---|
Project ID | cloud-relay-prod-bd4c |
Category | internal |
Deployment type | manual |
Resources | |
Slack notifications | #alerts-cloud-relay-prod |
Alert policies | GCP Monitoring alert policies list, Dashboard |
Errors | Sentry cloud-relay-prod |
Domain | cloud-relay.sgdev.org |
Cloudflare WAF | ✅ |
MSP infrastructure access needs to be requested using Entitle for time-bound privileges.
ACCESS | ENTITLE REQUEST TEMPLATE |
---|---|
GCP project read access | Read-only Entitle request for the ‘Internal Services’ folder |
GCP project write access | Write access Entitle request for the ‘Internal Services’ folder |
For Terraform Cloud access, see prod Terraform Cloud.
prod Cloud Run
The Cloud Relay prod service implementation is deployed on Google Cloud Run.
PROPERTY | DETAILS |
---|---|
Console | Cloud Run service |
Service logs | GCP logging |
Service traces | Cloud Trace |
Service errors | Sentry cloud-relay-prod |
You can also use sg msp
to quickly open a link to your service logs:
sg msp logs cloud-relay prod
prod Architecture Diagram
prod Terraform Cloud
This service’s configuration is defined in sourcegraph/managed-services/services/cloud-relay/service.yaml
, and sg msp generate cloud-relay prod
generates the required infrastructure configuration for this environment in Terraform.
Terraform Cloud (TFC) workspaces specific to each service then provisions the required infrastructure from this configuration.
You may want to check your service environment’s TFC workspaces if a Terraform apply fails (reported via GitHub commit status checks in the sourcegraph/managed-services
repository, or in #alerts-msp-tfc).
To access this environment’s Terraform Cloud workspaces, you will need to log in to Terraform Cloud and then request Entitle access to membership in the “Managed Services Platform Operator” TFC team. The “Managed Services Platform Operator” team has access to all MSP TFC workspaces.
The Terraform Cloud workspaces for this service environment are grouped under the msp-cloud-relay-prod
tag, or you can use:
sg msp tfc view cloud-relay prod
Alert Policies
The following alert policies are defined for each of this service’s environments.
High Container CPU Utilization
High CPU Usage - it may be neccessary to reduce load or increase CPU allocation
Severity: WARNING
High Container Memory Utilization
High Memory Usage - it may be neccessary to reduce load or increase memory allocation
Severity: WARNING
Container Startup Latency
Service containers are taking longer than configured timeouts to start up.
Severity: WARNING
Cloud Run Pending Requests
There are requests pending - we may need to increase Cloud Run instance count, request concurrency, or investigate further.
Severity: WARNING
Cloud Run Instance Precondition Failed
Cloud Run instance failed to start due to a precondition failure.
This is unlikely to cause immediate downtime, and may auto-resolve if no new instances are created and/or we return to a healthy state, but you should follow up to ensure the latest Cloud Run revision is healthy.
Severity: WARNING
External Uptime Check
Service is failing to repond on https://cloud-relay.sgdev.org - this may be expected if the service was recently provisioned or if its external domain has changed.
Severity: CRITICAL