Sourcegraph Accounts infrastructure operations
This document describes operational guidance for Sourcegraph Accounts infrastructure. This service is operated on the Managed Services Platform (MSP).
If you need assistance with MSP infrastructure, reach out to the Core Services team in #discuss-core-services.
Service overview
PROPERTY | DETAILS |
---|---|
Service ID | sourcegraph-accounts (specification) |
Owners | core-services |
Service kind | Cloud Run service |
Environments | dev, prod |
Docker image | us-central1-docker.pkg.dev/sourcegraph-dev/sourcegraph-accounts/accounts-server |
Source code | github.com/sourcegraph/sourcegraph-accounts - cmd/accounts-server |
Operators cheat sheet
Get email domain stats
For Google sign-in abuse protection.
$ curl -s \
-H "Authorization: Bearer $MANAGEMENT_SECRET" \
https://accounts.sourcegraph.com/api/management/v1/email-domain-stats | jq
Create a new IdP client
$ curl -s -X POST \
-H "Authorization: Bearer $MANAGEMENT_SECRET" \
https://accounts.sourcegraph.com/api/management/v1/identity-provider/clients \
--data '{"name": "<SERVICE NAME>", "scopes": ["<SCOPE>"], "redirect_uris": ["<REDIRECT_URI>"]}' | jq
Add new scope to an IdP client
Connect to the “accounts” database:
UPDATE idp_clients
SET scopes = scopes || '["<SCOPE>"]'::jsonb
WHERE id = '<CLIENT_ID>'
Assign SSC admin role
- Connect to the “accounts” database.
- Get the user ID via email:
SELECT user_id FROM emails WHERE email = '<EMAIL>';
- Insert metadata for
ssc
:INSERT INTO user_metadata (created_at, updated_at, user_id, scope, metadata) VALUES (now(), now(), <USER_ID>, 'ssc', '{ "roles": ["admin"] }');
Rollouts
PROPERTY | DETAILS |
---|---|
Delivery pipeline | sourcegraph-accounts-us-central1-rollout |
Stages | dev -> prod |
Changes to Sourcegraph Accounts are continuously delivered to the first stage (dev) of the delivery pipeline.
Promotion of a release to the next stage in the pipeline must be done manually using the GCP Delivery pipeline UI.
Environments
dev
PROPERTY | DETAILS |
---|---|
Project ID | sourcegraph-accounts-dev-csvc |
Category | test |
Deployment type | rollout |
Resources | dev Redis, dev PostgreSQL instance, dev BigQuery dataset |
Slack notifications | #alerts-sourcegraph-accounts-dev |
Alert policies | GCP Monitoring alert policies list, Dashboard |
Errors | Sentry sourcegraph-accounts-dev |
Domain | accounts.sgdev.org |
Cloudflare WAF | ✅ |
MSP infrastructure access needs to be requested using Entitle for time-bound privileges. Test environments may have less stringent requirements.
ACCESS | ENTITLE REQUEST TEMPLATE |
---|---|
GCP project read access | Read-only Entitle request for the ‘Engineering Projects’ folder |
GCP project write access | Write access Entitle request for the ‘Engineering Projects’ folder |
For Terraform Cloud access, see dev Terraform Cloud.
dev Cloud Run
The Sourcegraph Accounts dev service implementation is deployed on Google Cloud Run.
PROPERTY | DETAILS |
---|---|
Console | Cloud Run service |
Service logs | GCP logging |
Service traces | Cloud Trace |
Service errors | Sentry sourcegraph-accounts-dev |
You can also use sg msp
to quickly open a link to your service logs:
sg msp logs sourcegraph-accounts dev
dev Redis
PROPERTY | DETAILS |
---|---|
Console | Memorystore Redis instances |
dev PostgreSQL instance
PROPERTY | DETAILS |
---|---|
Console | Cloud SQL instances |
Databases | accounts |
To connect to the PostgreSQL instance in this environment, use sg msp
in the sourcegraph/managed-services
repository:
# For read-only access
sg msp pg connect sourcegraph-accounts dev
# For write access - use with caution!
sg msp pg connect -write-access sourcegraph-accounts dev
dev BigQuery dataset
PROPERTY | DETAILS |
---|---|
Dataset Project | sourcegraph-accounts-dev-csvc |
Dataset ID | sourcegraph_accounts |
Tables | user_emails , events |
dev Architecture Diagram
dev Terraform Cloud
This service’s configuration is defined in sourcegraph/managed-services/services/sourcegraph-accounts/service.yaml
, and sg msp generate sourcegraph-accounts dev
generates the required infrastructure configuration for this environment in Terraform.
Terraform Cloud (TFC) workspaces specific to each service then provisions the required infrastructure from this configuration.
You may want to check your service environment’s TFC workspaces if a Terraform apply fails (reported via GitHub commit status checks in the sourcegraph/managed-services
repository, or in #alerts-msp-tfc).
To access this environment’s Terraform Cloud workspaces, you will need to log in to Terraform Cloud and then request Entitle access to membership in the “Managed Services Platform Operator” TFC team. The “Managed Services Platform Operator” team has access to all MSP TFC workspaces.
The Terraform Cloud workspaces for this service environment are grouped under the msp-sourcegraph-accounts-dev
tag, or you can use:
sg msp tfc view sourcegraph-accounts dev
prod
PROPERTY | DETAILS |
---|---|
Project ID | sourcegraph-accounts-prod-csvc |
Category | external |
Deployment type | rollout |
Resources | prod Redis, prod PostgreSQL instance, prod BigQuery dataset |
Slack notifications | #alerts-sourcegraph-accounts-prod |
Alert policies | GCP Monitoring alert policies list, Dashboard |
Errors | Sentry sourcegraph-accounts-prod |
Domain | accounts.sourcegraph.com |
Cloudflare WAF | ✅ |
MSP infrastructure access needs to be requested using Entitle for time-bound privileges.
ACCESS | ENTITLE REQUEST TEMPLATE |
---|---|
GCP project read access | Read-only Entitle request for the ‘Managed Services ’ folder |
GCP project write access | Write access Entitle request for the ‘Managed Services’ folder |
For Terraform Cloud access, see prod Terraform Cloud.
prod Cloud Run
The Sourcegraph Accounts prod service implementation is deployed on Google Cloud Run.
PROPERTY | DETAILS |
---|---|
Console | Cloud Run service |
Service logs | GCP logging |
Service traces | Cloud Trace |
Service errors | Sentry sourcegraph-accounts-prod |
You can also use sg msp
to quickly open a link to your service logs:
sg msp logs sourcegraph-accounts prod
prod Redis
PROPERTY | DETAILS |
---|---|
Console | Memorystore Redis instances |
prod PostgreSQL instance
PROPERTY | DETAILS |
---|---|
Console | Cloud SQL instances |
Databases | accounts |
To connect to the PostgreSQL instance in this environment, use sg msp
in the sourcegraph/managed-services
repository:
# For read-only access
sg msp pg connect sourcegraph-accounts prod
# For write access - use with caution!
sg msp pg connect -write-access sourcegraph-accounts prod
prod BigQuery dataset
PROPERTY | DETAILS |
---|---|
Dataset Project | sourcegraph-accounts-prod-csvc |
Dataset ID | sourcegraph_accounts |
Tables | user_emails , events |
prod Architecture Diagram
prod Terraform Cloud
This service’s configuration is defined in sourcegraph/managed-services/services/sourcegraph-accounts/service.yaml
, and sg msp generate sourcegraph-accounts prod
generates the required infrastructure configuration for this environment in Terraform.
Terraform Cloud (TFC) workspaces specific to each service then provisions the required infrastructure from this configuration.
You may want to check your service environment’s TFC workspaces if a Terraform apply fails (reported via GitHub commit status checks in the sourcegraph/managed-services
repository, or in #alerts-msp-tfc).
To access this environment’s Terraform Cloud workspaces, you will need to log in to Terraform Cloud and then request Entitle access to membership in the “Managed Services Platform Operator” TFC team. The “Managed Services Platform Operator” team has access to all MSP TFC workspaces.
The Terraform Cloud workspaces for this service environment are grouped under the msp-sourcegraph-accounts-prod
tag, or you can use:
sg msp tfc view sourcegraph-accounts prod
Alert Policies
The following alert policies are defined for each of this service’s environments.
Cloud SQL - Connections
The number of Cloud SQL connections are approaching the maximum number of connections.
This can be caused by an increase in the number of active service instances.
Try increasing the 'resource.postgreSQL.maxConnections' configuration parameter.
Severity: WARNING
Cloud SQL - CPU Utilization
Cloud SQL instance CPU utilization is above acceptable threshold.
Severity: WARNING
Cloud SQL - Disk Utilization
Cloud SQL instance disk utilization is above acceptable threshold.
Severity: WARNING
Cloud SQL - Memory Utilization
Cloud SQL instance memory utilization is above acceptable threshold.
Severity: WARNING
Cloud SQL - Server Availability
Cloud SQL instance is down.
Severity: WARNING
Cloud SQL - Spike in Per-Query Lock Time
Cloud SQL database queries encountered lock times well above acceptable thresholds.
Severity: WARNING
Cloud SQL - Sustained Per-Query Lock Times
Cloud SQL database queries are encountering lock times above acceptable thresholds over a window.
Severity: WARNING
High Container CPU Utilization
High CPU Usage - it may be neccessary to reduce load or increase CPU allocation
Severity: WARNING
High Container Memory Utilization
High Memory Usage - it may be neccessary to reduce load or increase memory allocation
Severity: WARNING
Container Startup Latency
Service containers are taking longer than configured timeouts to start up.
Severity: WARNING
Cloud Redis - System CPU Utilization
Redis Engine CPU Utilization goes above the set threshold. The utilization is measured on a scale of 0 to 1.
Severity: WARNING
Cloud Redis - Standard Instance Failover
Instance failover occured for a standard tier Redis instance.
Severity: WARNING
Cloud Redis - System Memory Utilization
Redis System memory utilization is above the set threshold. The utilization is measured on a scale of 0 to 1.
Severity: WARNING
High Ratio of 400 Responses
400 responses coming from the application. Does NOT include requests that did not reach the instance, e.g. if no host is available to receive a request - check GCP logs and error reports instead.
Severity: WARNING
High Ratio of 401 Responses
401 responses coming from the application. Does NOT include requests that did not reach the instance, e.g. if no host is available to receive a request - check GCP logs and error reports instead.
Severity: WARNING
High Ratio of 403 Responses
403 (forbidden) responses coming from the application. Does NOT include requests that did not reach the instance, e.g. if no host is available to receive a request - check GCP logs and error reports instead.
Severity: WARNING
High Ratio of 500 Responses
500 responses coming from the application. Does NOT include requests that did not reach the instance, e.g. if no host is available to receive a request - check GCP logs and error reports instead.
Severity: WARNING
Cloud Run Pending Requests
There are requests pending - we may need to increase Cloud Run instance count, request concurrency, or investigate further.
Severity: WARNING
Cloud Run Instance Precondition Failed
Cloud Run instance failed to start due to a precondition failure.
This is unlikely to cause immediate downtime, and may auto-resolve if no new instances are created and/or we return to a healthy state, but you should follow up to ensure the latest Cloud Run revision is healthy.
Severity: WARNING
External Uptime Check
Service is failing to repond on https://accounts.sourcegraph.com - this may be expected if the service was recently provisioned or if its external domain has changed.
Severity: CRITICAL
Container Instance Count
There are a lot of Cloud Run instances running - we may need to increase per-instance requests make make sure we won't hit the configured max instance count
Severity: WARNING