README.md 7.0 KB
Newer Older
wu-sheng's avatar
wu-sheng 已提交
1
# Protocols
wu-sheng's avatar
wu-sheng 已提交
2
There are two types of protocols list here. 
wu-sheng's avatar
wu-sheng 已提交
3

4
- [**Probe Protocol**](#probe-protocols). Include the descriptions and definitions about how agent send collected metrics data and traces, also the formats of each entities.
wu-sheng's avatar
wu-sheng 已提交
5

wu-sheng's avatar
wu-sheng 已提交
6
- [**Query Protocol**](#query-protocol). The backend provide query capability to SkyWalking own UI and others. These queries are based on GraphQL.
wu-sheng's avatar
wu-sheng 已提交
7

wu-sheng's avatar
wu-sheng 已提交
8

9
## Probe Protocols
wu-sheng's avatar
wu-sheng 已提交
10
They also related to the probe group, for understand that, look [Concepts and Designs](../concepts-and-designs/README.md) document.
wu-sheng's avatar
wu-sheng 已提交
11 12 13 14 15 16 17 18 19 20
These groups are **Language based native agent protocol**, **Service Mesh protocol** and **3rd-party instrument protocol**.

## Register Protocol
Include service, service instance, network address and endpoint meta data register.
Purposes of register are
1. For service, network address and endpoint, register returns the unique ID of register object, usually an integer. Probe
can use that to represent the literal String for data compression. Further, some protocols accept IDs only.
1. For service instance, register returns a new unique ID for every new instance. Every service instance register must contain the 
service ID.
 
wu-sheng's avatar
wu-sheng 已提交
21 22


23
### Language based native agent protocol
24 25 26 27 28 29 30 31 32
There is two types of protocols to make language agents work in distributed environments.
1. **Cross Process Propagation Headers Protocol** is in wire data format, agent/SDK usually uses HTTP/MQ/HTTP2 headers
to carry the data with rpc request. The remote agent will receive this in the request handler, and bind the context 
with this specific request.
1. **Trace Data Protocol** is out of wire data, agent/SDK uses this to send traces and metrics to skywalking or other
compatible backend. 

Header protocol have two formats for compatible. Using v2 in default.
* [Cross Process Propagation Headers Protocol v2](Skywalking-Cross-Process-Propagation-Headers-Protocol-v2.md) is the new protocol for 
wu-sheng's avatar
wu-sheng 已提交
33
in-wire context propagation, started in 6.0.0-beta release. It will replace the old **SW3** protocol in the future, now both of them are supported.
34
* [Cross Process Propagation Headers Protocol v1](Skywalking-Cross-Process-Propagation-Headers-Protocol-v1.md) is for in-wire propagation.
wu-sheng's avatar
wu-sheng 已提交
35
By following this protocol, the trace segments in different processes could be linked.
36 37 38

Since SkyWalking v6.0.0-beta, SkyWalking agent and backend are using Trace Data Protocol v2, and v1 is still supported in backend.
* [SkyWalking Trace Data Protocol v2](Trace-Data-Protocol-v2.md) define the communication way and format between agent and backend
wu-sheng's avatar
wu-sheng 已提交
39
* [SkyWalking Trace Data Protocol v1](Trace-Data-Protocol-v1.md). This protocol is used in old version. Still supported.
wu-sheng's avatar
wu-sheng 已提交
40 41


42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
### Service Mesh probe protocol
The probe in sidecar or proxy could use this protocol to send data to backendEnd. This service provided by gRPC, requires 
the following key info:

1. Service Name or ID at both sides.
1. Service Instance Name or ID at both sides.
1. Endpoint. URI in HTTP, service method full signature in gRPC.
1. Latency. In milliseconds.
1. Response code in HTTP
1. Status. Success or fail.
1. Protocol. HTTP, gRPC
1. DetectPoint. In Service Mesh sidecar, `client` or `server`. In normal L7 proxy, value is `proxy`.


### 3rd-party instrument protocol
3rd-party instrument protocols are not defined by SkyWalking. They are just protocols/formats, which SkyWalking is compatible and
could receive from their existed libraries. SkyWalking starts with supporting Zipkin v1, v2 data formats.

K
kezhenxu94 已提交
60
Backend is based on modularization principle, so very easy to extend a new receiver to support new protocol/format.
wu-sheng's avatar
wu-sheng 已提交
61 62 63 64 65 66 67

## Query Protocol
Query protocol follows GraphQL grammar, provides data query capabilities, which depends on your analysis metrics.

There are 5 dimensionality data is provided.
1. Metadata. Metadata includes the brief info of the whole under monitoring services and their instances, endpoints, etc.
Use multiple ways to query this meta data.
68
1. Topology. Show the topology and dependency graph of services or endpoints. Including direct relationship or global map.
69 70 71
1. Metrics. Metrics query targets all the objects defined in [OAL script](../concepts-and-designs/oal.md). You could get the 
metrics data in linear or thermodynamic matrix formats based on the aggregation functions in script. 
1. Aggregation. Aggregation query means the metrics data need a secondary aggregation in query stage, which makes the query 
wu-sheng's avatar
wu-sheng 已提交
72
interfaces have some different arguments. Such as, `TopN` list of services is a very typical aggregation query, 
73
metrics stream aggregation just calculates the metrics values of each service, but the expected list needs ordering metrics data
wu-sheng's avatar
wu-sheng 已提交
74 75 76 77
by the values.
1. Trace. Query distributed traces by this.
1. Alarm. Through alarm query, you can have alarm trend and details.

wu-sheng's avatar
wu-sheng 已提交
78 79
The actual query GraphQL scrips could be found inside `query-protocol` folder in [here](../../../oap-server/server-query-plugin/query-graphql-plugin/src/main/resources).

80
Here is the list of all existing metrics names, based on [official_analysis.oal](../../../oap-server/server-bootstrap/src/main/resources/official_analysis.oal)
wu-sheng's avatar
wu-sheng 已提交
81

82
**Global metrics**
wu-sheng's avatar
wu-sheng 已提交
83 84 85 86 87 88 89
- all_p99, p99 response time of all services
- all_p95
- all_p90
- all_p75
- all_p70
- all_heatmap, the response time heatmap of all services 

90
**Service metrics**
wu-sheng's avatar
wu-sheng 已提交
91 92 93 94 95 96 97 98 99
- service_resp_time, avg response time of service
- service_sla, successful rate of service
- service_cpm, calls per minute of service
- service_p99, p99 response time of service
- service_p95
- service_p90
- service_p75
- service_p50

100
**Service instance metrics**
wu-sheng's avatar
wu-sheng 已提交
101 102 103 104
- service_instance_sla, successful rate of service instance
- service_instance_resp_time, avg response time of service instance
- service_instance_cpm, calls per minute of service instance

105
**Endpoint metrics**
wu-sheng's avatar
wu-sheng 已提交
106 107 108 109 110 111 112 113 114
- endpoint_cpm, calls per minute of endpoint
- endpoint_avg, avg response time of endpoint
- endpoint_sla, successful rate of endpoint
- endpoint_p99, p99 response time of endpoint
- endpoint_p95
- endpoint_p90
- endpoint_p75
- endpoint_p50

115
**JVM metrics**, JVM related metrics, only work when javaagent is active
wu-sheng's avatar
wu-sheng 已提交
116 117 118 119 120 121 122 123 124 125
- instance_jvm_cpu
- instance_jvm_memory_heap
- instance_jvm_memory_noheap
- instance_jvm_memory_heap_max
- instance_jvm_memory_noheap_max
- instance_jvm_young_gc_time
- instance_jvm_old_gc_time
- instance_jvm_young_gc_count
- instance_jvm_old_gc_count

126 127
**Service relation metrics**, represents the metrics of calls between service. 
The metrics ID could be
wu-sheng's avatar
wu-sheng 已提交
128
got in topology query only.
K
kezhenxu94 已提交
129 130
- service_relation_client_cpm, calls per minute detected at client side
- service_relation_server_cpm, calls per minute detected at server side
wu-sheng's avatar
wu-sheng 已提交
131 132 133
- service_relation_client_call_sla, successful rate detected at client side
- service_relation_server_call_sla, successful rate detected at server side
- service_relation_client_resp_time, avg response time detected at client side
K
kezhenxu94 已提交
134
- service_relation_server_resp_time, avg response time detected at server side
wu-sheng's avatar
wu-sheng 已提交
135

136 137
**Endpoint relation metrics**, represents the metrics between dependency endpoints. Only work when tracing agent.
The metrics ID could be got in topology query only.
wu-sheng's avatar
wu-sheng 已提交
138
- endpoint_relation_cpm
K
kezhenxu94 已提交
139
- endpoint_relation_resp_time