Architecture
Product and Sales
Visit the live Miro board for links.
Data Sources
This diagram provides a high-level overview of the data sources we have:
Pings
Pings are aggregated event telemetry we receive from on-prem and cloud instances.
User Events
User events are event telemetry that we receive from cloud instanecs.
For more info on our data stack tools, see the tools page.
Transcript Events from DotCom Users
For DotCom users we are permitted to store transcript data. To ensure safe handling of this sensitive data and restricting access. The following event pipeline has been built on top of the telemetry-v2 archetiture; and routes flagged transcript events seperately.
Considerations:
- Transcript data can only be collected through v2 telemetry and stored within
privateMetadata
argument of the event - Transcript data should be stored as top-level fields within
privateMetadata
, using the keyspromptText
orresponseText
- Transcript data can only be collected for DotCom (Free) Users
- Transcript data must include
recordsPrivateMetadataTranscript:1
in themetadata
argument of the event
Internal-only links to where the backend GCP changes live:
Pub/Sub Topic Subscriptions
DataFlow
- DataFlow Job that runs on the topic subscription event-telemtry-transcript-to-bq to redact transcripts (responseText, PromptText)
- DataFlow UDF that the DataFlow Job references (custom javascript function we can run on each event)
Below is a system diagram to illustrate the flow of transcript data further: