diff --git a/docs/.vuepress/public/static/blog/2018-01-01-understand-trace/database-long-duration.png b/docs/.vuepress/public/static/blog/2018-01-01-understand-trace/database-long-duration.png new file mode 100644 index 0000000000000000000000000000000000000000..7227decbee2e6a7088ccc3e981a0232d51441519 Binary files /dev/null and b/docs/.vuepress/public/static/blog/2018-01-01-understand-trace/database-long-duration.png differ diff --git a/docs/.vuepress/public/static/blog/2018-01-01-understand-trace/deep-trace-1.png b/docs/.vuepress/public/static/blog/2018-01-01-understand-trace/deep-trace-1.png new file mode 100644 index 0000000000000000000000000000000000000000..3fb9d4c65a37da967ff93fb59123f1ea6f77cf8a Binary files /dev/null and b/docs/.vuepress/public/static/blog/2018-01-01-understand-trace/deep-trace-1.png differ diff --git a/docs/.vuepress/public/static/blog/2018-01-01-understand-trace/deep-trace.png b/docs/.vuepress/public/static/blog/2018-01-01-understand-trace/deep-trace.png new file mode 100644 index 0000000000000000000000000000000000000000..430a1cd9d2d3fcaf653d59600e9ad4b93e7c98a0 Binary files /dev/null and b/docs/.vuepress/public/static/blog/2018-01-01-understand-trace/deep-trace.png differ diff --git a/docs/.vuepress/public/static/blog/2018-01-01-understand-trace/demo-spring.png b/docs/.vuepress/public/static/blog/2018-01-01-understand-trace/demo-spring.png new file mode 100644 index 0000000000000000000000000000000000000000..eb4fad224f161348ecc08d58ca9381861bb5ebf2 Binary files /dev/null and b/docs/.vuepress/public/static/blog/2018-01-01-understand-trace/demo-spring.png differ diff --git a/docs/.vuepress/public/static/blog/2018-01-01-understand-trace/span-error-2.png b/docs/.vuepress/public/static/blog/2018-01-01-understand-trace/span-error-2.png new file mode 100644 index 0000000000000000000000000000000000000000..5afd233d8474d75fda8ffa04c91d06a53d1a5397 Binary files /dev/null and b/docs/.vuepress/public/static/blog/2018-01-01-understand-trace/span-error-2.png differ diff --git a/docs/.vuepress/public/static/blog/2018-01-01-understand-trace/span-error.png b/docs/.vuepress/public/static/blog/2018-01-01-understand-trace/span-error.png new file mode 100644 index 0000000000000000000000000000000000000000..c5d68f90c23bed9fe5d89a2bb6688df2f3f40c7a Binary files /dev/null and b/docs/.vuepress/public/static/blog/2018-01-01-understand-trace/span-error.png differ diff --git a/docs/.vuepress/public/static/blog/2018-01-01-understand-trace/too-many-child.png b/docs/.vuepress/public/static/blog/2018-01-01-understand-trace/too-many-child.png new file mode 100644 index 0000000000000000000000000000000000000000..ad31d9b7859cb8f39212fde886322fb8b6e1a6c6 Binary files /dev/null and b/docs/.vuepress/public/static/blog/2018-01-01-understand-trace/too-many-child.png differ diff --git a/docs/.vuepress/public/static/blog/2018-01-01-understand-trace/top5-not-clear.png b/docs/.vuepress/public/static/blog/2018-01-01-understand-trace/top5-not-clear.png new file mode 100644 index 0000000000000000000000000000000000000000..0908eec671b76bf0b06cec907432ee74b7d4313c Binary files /dev/null and b/docs/.vuepress/public/static/blog/2018-01-01-understand-trace/top5-not-clear.png differ diff --git a/docs/.vuepress/public/static/blog/2018-01-01-understand-trace/top5-span.png b/docs/.vuepress/public/static/blog/2018-01-01-understand-trace/top5-span.png new file mode 100644 index 0000000000000000000000000000000000000000..faeb6fa60c53966e1e174742605adea425153f04 Binary files /dev/null and b/docs/.vuepress/public/static/blog/2018-01-01-understand-trace/top5-span.png differ diff --git a/docs/blog/2019-01-01-Understand-Trace.md b/docs/blog/2019-01-01-Understand-Trace.md new file mode 100644 index 0000000000000000000000000000000000000000..c50dede9b4b07cda96f1ce2b55b0ced91467d050 --- /dev/null +++ b/docs/blog/2019-01-01-Understand-Trace.md @@ -0,0 +1,79 @@ +# Understand distributed trace easier in the incoming 6-GA + +- Auther: Wu Sheng, tetrate, SkyWalking original creator +- [GitHub](https://github.com/wu-sheng), [Twitter](https://twitter.com/wusheng1108), [Linkedin](https://www.linkedin.com/in/wusheng1108) + +Jan. 1st, 2019 + +# Background +Distributed tracing is a necessary part of modern microservices architecture, but how to understand or use distributed tracing data is unclear to some end users. This blog overviews typical distributed tracing use cases with new visualization features in SkyWalking v6. We hope new users will understand more through these examples. + +# Metric and topology +Trace data underpins in two well known analysis features: **metric** and **topology** + +**Metric** of each service, service instance, endpoint are derived from entry spans in trace. Metrics represent response time performance. So, you could have average response time, 99% response time, success rate, etc. These are broken down by service, service instance, endpoint. + +**Topology** represents links between services and is distributed tracing's most attractive feature. Topologies allows all users to understand distributed service relationships and dependencies even when they are varied or complex. This is important as it brings a single view to all interested parties, regardless of if they are a developer, designer or operator. + +Here's an example topology of 4 projects, including Kafka and two outside dependencies. + +![](../.vuepress/public/static/blog/2018-01-01-understand-trace/demo-spring.png) +

-Topology in SkyWalking optional UI, RocketBot-

+ +# Trace +In a distributed tracing system, we spend a lot of resources(CPU, Memory, Disk and Network) to generate, transport and persistent trace data. Let's try to answer why we do this? What are the typical diagnosis and system performance questions we can answer with trace data? + +SkyWalking v6 includes two trace views: +1. TreeMode: The first time provided. Help you easier to identify issues. +1. ListMode: Traditional view in time line, also usually seen in other tracing system, such as Zipkin. + +## Error occurred +In the trace view, the easiest part is locating the error, possibly caused by a code exception or network fault. Both ListMode and TreeMode can identify errors, while the span detail screen provides details. + +![](../.vuepress/public/static/blog/2018-01-01-understand-trace/span-error.png) +

-ListMode error span-

+ +![](../.vuepress/public/static/blog/2018-01-01-understand-trace/span-error-2.png) +

-TreeMode error span-

+ +## Slow span +A high priority feature is identifying the slowest spans in a trace. This uses execution duration captured by application agents. In the old ListMode trace view, parent span almost always includes the child span's duration, due to nesting. In other words, a slow span usually causes its parent to also become slow. In SkyWalking 6, we provide `Top 5 of slow span` filter to help you locate the spans directly. + +![](../.vuepress/public/static/blog/2018-01-01-understand-trace/top5-span.png) +

-Top 5 slow span-

+ +The above screenshot highlights the top 5 slow spans, excluding child span duration. Also, this shows all spans' execution time, which helps identify the slowest ones. + +## Too many child spans +In some cases, individual durations are quick, but the trace is still slow, like this one: + +![](../.vuepress/public/static/blog/2018-01-01-understand-trace/top5-not-clear.png) +

-Trace with no slow span-

+ +To understand if the root problem is related to too many operations, use `Top 5 of children span number`. This filter shows the amount of children each span has, highlighting the top 5. + +![](../.vuepress/public/static/blog/2018-01-01-understand-trace/too-many-child.png) +

-13 database accesses of a span-

+ +In this screenshot, there is a span with 13 children, which are all Database accesses. Also, when you see overview of trace, database cost 1380ms of this 2000ms trace. + +![](../.vuepress/public/static/blog/2018-01-01-understand-trace/database-long-duration.png) +

-1380ms database accesses-

+ +In this example, the root cause is too many database accesses. This is also typical in other scenarios like too many RPCs or cache accesses. + +## Trace depth +Trace depth is also related latency. Like the [too many child spans](#too-many-child-spans) scenario, each span latency looks good, but the whole trace is slow. + +![](../.vuepress/public/static/blog/2018-01-01-understand-trace/deep-trace-1.png) +

-Trace depth-

+ +Here, the slowest spans are less than 500ms, which are not too slow for a 2000ms trace. When you see the first line, there are four different colors representing four services involved in this distributed trace. Every one of them costs 100~400ms. For all four, there nearly 2000ms. From here, we know this slow trace is caused by 3 RPCs in a serial sequence. + +# At the end +Distributed tracing and APM tools help users identify root causes, allowing development and operation teams to optimize accordingly. We hope you enjoyed this, and love Apache SkyWalking and our new trace visualization. If so, [give us a star on GitHub](https://github.com/apache/incubator-skywalking) to encourage us. + +SkyWalking 6 is scheduled to release at the end of January 2019. You can contact the project team through the following channels: +- Follow [SkyWalking twitter](https://twitter.com/ASFSkyWalking) +- Subscribe mailing list: dev@skywalking.apache.org . Send to dev-subscribe@kywalking.apache.org to subscribe the mail list. +- Join [Gitter](https://gitter.im/OpenSkywalking/Lobby) room. diff --git a/docs/blog/README.md b/docs/blog/README.md index ad2007b6416f2c03d8521c06ca46ca9c29c4862e..3d668072bee3c04e4936214335a46b94f1b0f220 100755 --- a/docs/blog/README.md +++ b/docs/blog/README.md @@ -3,6 +3,11 @@ layout: LayoutBlog blog: +- title: Understand distributed trace easier in the incoming 6-GA + name: 2019-01-01-Understand-Trace + time: Sheng Wu. Jan. 1st, 2019 + short: + - title: SkyWalking v6 is Service Mesh ready name: 2018-12-12-skywalking-service-mesh-ready time: Sheng Wu. Dec. 5th, 2018