From 54b72c88bd8f8054620e271206e12e3f0481cbd5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E5=90=B4=E6=99=9F=20Wu=20Sheng?= Date: Mon, 1 Nov 2021 16:30:31 +0800 Subject: [PATCH] Enhance documents about the data report and query protocols. (#8041) --- CHANGES.md | 2 + docs/en/protocols/README.md | 60 +++---- docs/en/protocols/Trace-Data-Protocol-v3.md | 180 ++++++++++++++++++++ docs/en/protocols/query-protocol.md | 61 +++++-- docs/menu.yml | 6 +- 5 files changed, 260 insertions(+), 49 deletions(-) diff --git a/CHANGES.md b/CHANGES.md index b0d4f85147..3d984f41e5 100644 --- a/CHANGES.md +++ b/CHANGES.md @@ -61,6 +61,8 @@ Release Notes. #### Documentation +* Enhance documents about the data report and query protocols. + All issues and pull requests are [here](https://github.com/apache/skywalking/milestone/101?closed=1) ------------------ diff --git a/docs/en/protocols/README.md b/docs/en/protocols/README.md index 49d9757e23..31ea4e7559 100644 --- a/docs/en/protocols/README.md +++ b/docs/en/protocols/README.md @@ -1,51 +1,46 @@ -# Protocols -There are two different types of protocols. - -- [**Probe Protocol**](#probe-protocols). It includes descriptions and definitions on how agents send collected metrics data and traces, as well as the format of each entity. - -- [**Query Protocol**](#query-protocol). The backend enables the query function in SkyWalking's own UI and other UIs. These queries are based on GraphQL. - +# Probe Protocol +It includes descriptions and definitions on how agents send collected metrics, logs, traces and events, as well as the format of each entity. ## Probe Protocols They also related to the probe group. For more information, see [Concepts and Designs](../concepts-and-designs/overview.md). These groups are **language-based native agent protocol**, **service mesh protocol** and **3rd-party instrument protocol**. -### Language-based native agent protocol -There are two types of protocols that help language agents work in distributed environments. -1. **Cross Process Propagation Headers Protocol** and **Cross Process Correlation Headers Protocol** come in in-wire data format. Agent/SDK usually uses HTTP/MQ/HTTP2 headers -to carry the data with the RPC request. The remote agent will receive this in the request handler, and bind the context with this specific request. -1. **Trace Data Protocol** is in out-of-wire data format. Agent/SDK uses this to send traces and metrics to SkyWalking or other compatible backends. +### Tracing +There are two types of protocols that help language agents work in distributed tracing. + +- **Cross Process Propagation Headers Protocol** and **Cross Process Correlation Headers Protocol** come in in-wire data format. Agent/SDK usually uses HTTP/MQ/HTTP2 headers +to carry the data with the RPC request. The remote agent will receive this in the request handler, and bind the context with this specific request. [Cross Process Propagation Headers Protocol v3](Skywalking-Cross-Process-Propagation-Headers-Protocol-v3.md) has been the new protocol for in-wire context propagation since the version 8.0.0 release. [Cross Process Correlation Headers Protocol v1](Skywalking-Cross-Process-Correlation-Headers-Protocol-v1.md) is a new in-wire context propagation protocol which is additional and optional. -Please read SkyWalking language agents documentation to see whether it is supported. -This protocol defines the data format of transporting custom data with `Cross Process Propagation Headers Protocol`. -It has been supported by the SkyWalking javaagent since 8.0.0, +Please read SkyWalking language agents documentation to see whether it is supported. + +- **Trace Data Protocol** is an out-of-wire data format. Agent/SDK uses this to send traces to SkyWalking OAP server. [SkyWalking Trace Data Protocol v3](Trace-Data-Protocol-v3.md) defines the communication method and format between the agent and backend. +### Logging +- **Log Data Protocol** is an out-of-wire data format. Agent/SDK and collector use this to send logs into SkyWalking OAP server. [SkyWalking Log Data Protocol](Log-Data-Protocol.md) defines the communication method and format between the agent and backend. +### Metrics + +SkyWalking has native metrics format, and support widely used metric formats such as Prometheus, OpenCensus, and Zabbix. + +The native metrics format definition could be found [here](https://github.com/apache/skywalking-data-collect-protocol/blob/master/language-agent/Meter.proto). +Typically, agent meter plugin(e.g. [Java Meter Plugin](https://skywalking.apache.org/docs/skywalking-java/latest/en/setup/service-agent/java-agent/java-plugin-development-guide/#meter-plugin)) and +Satellite [Prometheus fetcher](https://skywalking.apache.org/docs/skywalking-satellite/latest/en/setup/plugins/fetcher_prometheus-metrics-fetcher/) +would transfer metrics into native format and forward to SkyWalking OAP server. + +About receiving 3rd party formats metrics, read [Meter receiver](../setup/backend/backend-meter.md) and [OpenTelemetry receiver](../setup/backend/backend-receivers.md#opentelemetry-receiver) docs for more details. + ### Browser probe protocol The browser probe, such as [skywalking-client-js](https://github.com/apache/skywalking-client-js), could use this protocol to send data to the backend. This service is provided by gRPC. [SkyWalking Browser Protocol](Browser-Protocol.md) defines the communication method and format between `skywalking-client-js` and backend. -### Service Mesh probe protocol -The probe in sidecar or proxy could use this protocol to send data to the backend. This service provided by gRPC requires -the following key information: - -1. Service Name or ID on both sides. -1. Service Instance Name or ID on both sides. -1. Endpoint. URI in HTTP, service method full signature in gRPC. -1. Latency. In milliseconds. -1. Response code in HTTP -1. Status. Success or fail. -1. Protocol. HTTP, gRPC -1. DetectPoint. In Service Mesh sidecar, `client` or `server`. In normal L7 proxy, value is `proxy`. - ### Events Report Protocol The protocol is used to report events to the backend. The [doc](../concepts-and-designs/event.md) introduces the definition of an event, and [the protocol repository](https://github.com/apache/skywalking-data-collect-protocol/blob/master/event) defines gRPC services and message formats of events. @@ -69,12 +64,3 @@ JSON event record example: } ] ``` - -### 3rd-party instrument protocol -3rd-party instrument protocols are not defined by SkyWalking. They are just protocols/formats with which SkyWalking is compatible, and SkyWalking could receive them from their existing libraries. SkyWalking starts with supporting Zipkin v1, v2 data formats. - -The backend has a modular design, so it is very easy to extend a new receiver to support a new protocol/format. - -## Query Protocol -The query protocol follows GraphQL grammar, and provides data query capabilities, which depends on your analysis metrics. -Read [query protocol doc](query-protocol.md) for more details. diff --git a/docs/en/protocols/Trace-Data-Protocol-v3.md b/docs/en/protocols/Trace-Data-Protocol-v3.md index 3fe886f2cc..355b0c60d4 100644 --- a/docs/en/protocols/Trace-Data-Protocol-v3.md +++ b/docs/en/protocols/Trace-Data-Protocol-v3.md @@ -42,3 +42,183 @@ See [Cross Process Propagation Headers Protocol v3](Skywalking-Cross-Process-Pro 4. `Span#skipAnalysis` may be TRUE, if this span doesn't require backend analysis. +### Protocol Definition +```protobuf +// The segment is a collection of spans. It includes all collected spans in a simple one request context, such as a HTTP request process. +// +// We recommend the agent/SDK report all tracked data of one request once for all, such as, +// typically, such as in Java, one segment represent all tracked operations(spans) of one request context in the same thread. +// At the same time, in some language there is not a clear concept like golang, it could represent all tracked operations of one request context. +message SegmentObject { + // A string id represents the whole trace. + string traceId = 1; + // A unique id represents this segment. Other segments could use this id to reference as a child segment. + string traceSegmentId = 2; + // Span collections included in this segment. + repeated SpanObject spans = 3; + // **Service**. Represents a set/group of workloads which provide the same behaviours for incoming requests. + // + // The logic name represents the service. This would show as a separate node in the topology. + // The metrics analyzed from the spans, would be aggregated for this entity as the service level. + string service = 4; + // **Service Instance**. Each individual workload in the Service group is known as an instance. Like `pods` in Kubernetes, it + // doesn't need to be a single OS process, however, if you are using instrument agents, an instance is actually a real OS process. + // + // The logic name represents the service instance. This would show as a separate node in the instance relationship. + // The metrics analyzed from the spans, would be aggregated for this entity as the service instance level. + string serviceInstance = 5; + // Whether the segment includes all tracked spans. + // In the production environment tracked, some tasks could include too many spans for one request context, such as a batch update for a cache, or an async job. + // The agent/SDK could optimize or ignore some tracked spans for better performance. + // In this case, the value should be flagged as TRUE. + bool isSizeLimited = 6; +} + +// Segment reference represents the link between two existing segment. +message SegmentReference { + // Represent the reference type. It could be across thread or across process. + // Across process means there is a downstream RPC call for this. + // Typically, refType == CrossProcess means SpanObject#spanType = entry. + RefType refType = 1; + // A string id represents the whole trace. + string traceId = 2; + // Another segment id as the parent. + string parentTraceSegmentId = 3; + // The span id in the parent trace segment. + int32 parentSpanId = 4; + // The service logic name of the parent segment. + // If refType == CrossThread, this name is as same as the trace segment. + string parentService = 5; + // The service logic name instance of the parent segment. + // If refType == CrossThread, this name is as same as the trace segment. + string parentServiceInstance = 6; + // The endpoint name of the parent segment. + // **Endpoint**. A path in a service for incoming requests, such as an HTTP URI path or a gRPC service class + method signature. + // In a trace segment, the endpoint name is the name of first entry span. + string parentEndpoint = 7; + // The network address, including ip/hostname and port, which is used in the client side. + // Such as Client --> use 127.0.11.8:913 -> Server + // then, in the reference of entry span reported by Server, the value of this field is 127.0.11.8:913. + // This plays the important role in the SkyWalking STAM(Streaming Topology Analysis Method) + // For more details, read https://wu-sheng.github.io/STAM/ + string networkAddressUsedAtPeer = 8; +} + +// Span represents a execution unit in the system, with duration and many other attributes. +// Span could be a method, a RPC, MQ message produce or consume. +// In the practice, the span should be added when it is really necessary, to avoid payload overhead. +// We recommend to creating spans in across process(client/server of RPC/MQ) and across thread cases only. +message SpanObject { + // The number id of the span. Should be unique in the whole segment. + // Starting at 0. + int32 spanId = 1; + // The number id of the parent span in the whole segment. + // -1 represents no parent span. + // Also, be known as the root/first span of the segment. + int32 parentSpanId = 2; + // Start timestamp in milliseconds of this span, + // measured between the current time and midnight, January 1, 1970 UTC. + int64 startTime = 3; + // End timestamp in milliseconds of this span, + // measured between the current time and midnight, January 1, 1970 UTC. + int64 endTime = 4; + // + // In the across thread and across process, these references targeting the parent segments. + // The references usually have only one element, but in batch consumer case, such as in MQ or async batch process, it could be multiple. + repeated SegmentReference refs = 5; + // A logic name represents this span. + // + // We don't recommend to include the parameter, such as HTTP request parameters, as a part of the operation, especially this is the name of the entry span. + // All statistic for the endpoints are aggregated base on this name. Those parameters should be added in the tags if necessary. + // If in some cases, it have to be a part of the operation name, + // users should use the Group Parameterized Endpoints capability at the backend to get the meaningful metrics. + // Read https://github.com/apache/skywalking/blob/master/docs/en/setup/backend/endpoint-grouping-rules.md + string operationName = 6; + // Remote address of the peer in RPC/MQ case. + // This is required when spanType = Exit, as it is a part of the SkyWalking STAM(Streaming Topology Analysis Method). + // For more details, read https://wu-sheng.github.io/STAM/ + string peer = 7; + // Span type represents the role in the RPC context. + SpanType spanType = 8; + // Span layer represent the component tech stack, related to the network tech. + SpanLayer spanLayer = 9; + // Component id is a predefinited number id in the SkyWalking. + // It represents the framework, tech stack used by this tracked span, such as Spring. + // All IDs are defined in the https://github.com/apache/skywalking/blob/master/oap-server/server-bootstrap/src/main/resources/component-libraries.yml + // Send a pull request if you want to add languages, components or mapping defintions, + // all public components could be accepted. + // Follow this doc for more details, https://github.com/apache/skywalking/blob/master/docs/en/guides/Component-library-settings.md + int32 componentId = 10; + // The status of the span. False means the tracked execution ends in the unexpected status. + // This affects the successful rate statistic in the backend. + // Exception or error code happened in the tracked process doesn't mean isError == true, the implementations of agent plugin and tracing SDK make the final decision. + bool isError = 11; + // String key, String value pair. + // Tags provides more informance, includes parameters. + // + // In the OAP backend analysis, some special tag or tag combination could provide other advanced features. + // https://github.com/apache/skywalking/blob/master/docs/en/guides/Java-Plugin-Development-Guide.md#special-span-tags + repeated KeyStringValuePair tags = 12; + // String key, String value pair with an accurate timestamp. + // Logging some events happening in the context of the span duration. + repeated Log logs = 13; + // Force the backend don't do analysis, if the value is TRUE. + // The backend has its own configurations to follow or override this. + // + // Use this mostly because the agent/SDK could know more context of the service role. + bool skipAnalysis = 14; +} + +message Log { + // The timestamp in milliseconds of this event., + // measured between the current time and midnight, January 1, 1970 UTC. + int64 time = 1; + // String key, String value pair. + repeated KeyStringValuePair data = 2; +} + +// Map to the type of span +enum SpanType { + // Server side of RPC. Consumer side of MQ. + Entry = 0; + // Client side of RPC. Producer side of MQ. + Exit = 1; + // A common local code execution. + Local = 2; +} + +// A ID could be represented by multiple string sections. +message ID { + repeated string id = 1; +} + +// Type of the reference +enum RefType { + // Map to the reference targeting the segment in another OS process. + CrossProcess = 0; + // Map to the reference targeting the segment in the same process of the current one, just across thread. + // This is only used when the coding language has the thread concept. + CrossThread = 1; +} + +// Map to the layer of span +enum SpanLayer { + // Unknown layer. Could be anything. + Unknown = 0; + // A database layer, used in tracing the database client component. + Database = 1; + // A RPC layer, used in both client and server sides of RPC component. + RPCFramework = 2; + // HTTP is a more specific RPCFramework. + Http = 3; + // A MQ layer, used in both producer and consuer sides of the MQ component. + MQ = 4; + // A cache layer, used in tracing the cache client component. + Cache = 5; +} + +// The segment collections for trace report in batch and sync mode. +message SegmentCollection { + repeated SegmentObject segments = 1; +} +``` diff --git a/docs/en/protocols/query-protocol.md b/docs/en/protocols/query-protocol.md index 558ce4d160..ba188f5a7a 100644 --- a/docs/en/protocols/query-protocol.md +++ b/docs/en/protocols/query-protocol.md @@ -9,17 +9,17 @@ Metadata contains concise information on all services and their instances, endpo You may query the metadata in different ways. ```graphql extend type Query { - getGlobalBrief(duration: Duration!): ClusterBrief - - # Normal service related metainfo - getAllServices(duration: Duration!): [Service!]! + # Normal service related meta info + getAllServices(duration: Duration!, group: String): [Service!]! searchServices(duration: Duration!, keyword: String!): [Service!]! searchService(serviceCode: String!): Service - + # Fetch all services of Browser type getAllBrowserServices(duration: Duration!): [Service!]! + searchBrowserServices(duration: Duration!, keyword: String!): [Service!]! + searchBrowserService(serviceCode: String!): Service - # Service intance query + # Service instance query getServiceInstances(duration: Duration!, serviceId: ID!): [ServiceInstance!]! # Endpoint query @@ -127,12 +127,51 @@ extend type Query { } ``` -### Others -The following queries are for specific features, including trace, alarm, and profile. -1. Trace. Query distributed traces by this. -1. Alarm. Through alarm query, you can find alarm trends and their details. +### Logs +```graphql +extend type Query { + # Return true if the current storage implementation supports fuzzy query for logs. + supportQueryLogsByKeywords: Boolean! + queryLogs(condition: LogQueryCondition): Logs + + # Test the logs and get the results of the LAL output. + test(requests: LogTestRequest!): LogTestResponse! +} +``` + +Log implementations have a little differences with different database options. Search engine(s), e.g. ElasticSearch and OpenSearch, could support +full log text fuzzy query. Others would not support considering performance impact and end user experience. + +`test` API is provided for the debugger tool of native LAL parsing. + +### Trace +```graphql +extend type Query { + queryBasicTraces(condition: TraceQueryCondition): TraceBrief + queryTrace(traceId: ID!): Trace +} +``` + +Trace query provides to fetch trace segment list, and spans of given trace id. + +### Alarm +```graphql +extend type Query { + getAlarmTrend(duration: Duration!): AlarmTrend! + getAlarm(duration: Duration!, scope: Scope, keyword: String, paging: Pagination!, tags: [AlarmTag]): Alarms +} +``` + +Alarm query provides to query detected alerting messages with relative events. + +### Event +```graphql +extend type Query { + queryEvents(condition: EventQueryCondition): Events +} +``` -The actual query GraphQL scripts can be found in the `query-protocol` folder [here](../../../oap-server/server-query-plugin/query-graphql-plugin/src/main/resources). +Event query is fetching the event list according to given sources and time range conditions. ## Condition ### Duration diff --git a/docs/menu.yml b/docs/menu.yml index f47a2ab5d8..4af86ec60b 100644 --- a/docs/menu.yml +++ b/docs/menu.yml @@ -172,7 +172,11 @@ catalog: - name: "Compiling Guide" path: "/en/guides/How-to-build" - name: "Protocols" - path: "/en/protocols/readme" + catalog: + - name: "Data Report(Probe/Agent) Protocol" + path: "/en/protocols/readme" + - name: "Query Protocol (GraphQL)" + path: "/en/protocols/query-protocol" - name: "FAQs" path: "/en/FAQ/readme" - name: "Changelog" -- GitLab