提交 364bc96f 编写于 作者: wu-sheng's avatar wu-sheng

Remove out-of-date doc. All doc release on WIKI, now.

上级 befb8f30
### 编译SkyWalking Protocol / Build SkyWalking Protocol
- 编译工程
- build
```shell
$cd github/sky-walking/skywalking-protocol
$mvn clean install -Dmaven.test.skip=true
```
### 编译SkyWalking主工程 / Build SkyWalking
```shell
$cd github/sky-walking
$mvn clean install -Dmaven.test.skip=true
```
- 从各工程目录中获取安装包
- SkyWalking Agent: github/sky-walking/skywalking-collector/skywalking-agent/target/skywalking-agent-2.0-2016.jar
- SkyWalking Server: github/sky-walking/skywalking-server/target/installer
- SkyWalking Alarm: github/sky-walking/skywalking-alarm/target/installer
- SkyWalking WebUI: github/sky-walking/skywalking-webui/skywalking.war
- SkyWalking Analysis: github/sky-walking/skywalking-analysis/skywalking-analysis-2.0-2016.jar
- 上传skywalking-analysis-2.0-2016.jar. Upload the skywalking-analysis-2.0-2016.jar
- 上传start-analysis.sh到同一目录. Upload the start-analysis.sh to the same directory
- 为start-analysis.sh创建crontable定时任务,30分钟执行一次。create crontable for shell, set cycle=30min.
### 初始化MySQL数据库
- 初始化管理数据库,根据[数据库脚本](https://github.com/wu-sheng/sky-walking/blob/master/skywalking-webui/src/main/sql/table.mysql)初始化管理数据库。其中,脚本中如下SQL片段需要修改
- initialize database using [database-script](https://github.com/wu-sheng/sky-walking/blob/master/skywalking-webui/src/main/sql/table.mysql)
```sql
--配置告警邮件的发送人和SMTP信息
--set sender and smtp of alarm e-mail
INSERT INTO `system_config` (`config_id`,`conf_key`,`conf_value`,`val_type`,`val_desc`,`create_time`,`sts`,`modify_time`) VALUES (1000,'mail_info','{\"mail.host\":\"mail.asiainfo.com\",\"mail.transport.protocol\":\"smtp\",\"mail.smtp.auth\":\"true\",\"mail.smtp.starttls.enable\":\"false\",\"mail.username\":\"testA\",\"mail.password\":\"******\",\"mail.sender\":\"mailSender@asiainfo.com\"}','json','默认邮件发送人信息','2015-12-10 11:54:06','A','2015-12-10 11:54:06');
--配置部署页面地址,用于告警邮件内的链接
--set webui addr of internet
INSERT INTO `system_config` (`config_id`,`conf_key`,`conf_value`,`val_type`,`val_desc`,`create_time`,`sts`,`modify_time`) VALUES (1001,'portal_addr','http://10.1.235.197:48080/skywalking/','string','默认门户地址','2015-12-10 15:23:53','A','2015-12-10 15:23:53');
--配置SkyWalking Server的集群地址(内网地址)
--set LAN addrs of server cluster
INSERT INTO `system_config` (`config_id`,`conf_key`,`conf_value`,`val_type`,`val_desc`,`create_time`,`sts`,`modify_time`) VALUES (1002,'servers_addr','10.1.235.197:34000;10.1.235.197:35000;','string','日志采集地址','2015-12-10 15:23:53','A','2015-12-10 15:23:53');
--配置SkyWalking Server的集群地址(外网地址)
--set internet addrs of server cluster
INSERT INTO `system_config` (`config_id`,`conf_key`,`conf_value`,`val_type`,`val_desc`,`create_time`,`sts`,`modify_time`) VALUES (1003,'servers_addr_1','60.194.3.183:34000;60.194.3.183:35000;60.194.3.184:34000;60.194.3.184:35000;','string','日志采集地址-外网','2015-12-10 15:23:53','A','2015-12-10 15:23:53');
--配置告警类型
--config types of alarm
INSERT INTO `system_config` (`config_id`,`conf_key`,`conf_value`,`val_type`,`val_desc`,`create_time`,`sts`,`modify_time`) VALUES ('1004', 'alarm_type_info', '[{"type":"default","label":"exception","desc":"System Exception"},{"type":"ExecuteTime-PossibleError","label":"remark","desc":"Excution Time > 5s"},{"type":"ExecuteTime-Warning","label":"remark","desc":"Excution Time > 500ms"}]', 'json', '告警类型', '2016-04-18 16:04:51', 'A', '2016-04-18 16:04:53');
```
- 注:2016-4-21日前的版本升级,system_config表config_id=1000,SQL语句更新,请注意修改,新增配置config_id=1004
- 注:2016-4-8日前的版本升级,请升级脚本中的相关片段
- execute update scripts, if update from the version which releases before 2016-4-8
```
# alter table since 2016-4-8
...
```
- 注:2016-5-26日前的版本升级,请升级脚本中的相关片段
- execute update scripts, if update from the version which releases before 2016-5-26
```
# alter table since 2016-5-26
...
```
### 配置SkyWalking Server / Config SkyWalking Server
- 根据服务器环境修改/config/config.properties
- config '/config/config.properties'
```properties
#服务器收集数据监听端口
#server listening port of collecting data
server.port=34000
#数据缓存文件目录,请确保此目录有一定的存储容量
#directory of cache data files.
buffer.data_buffer_file_parent_directory=D:/test-data/data/buffer
#偏移量注册文件的目录,这里为系统绝对路径
#directory of offset data file
registerpersistence.register_file_parent_directory=d:/test-data/data/offset
#hbase zk quorum,hbase的zk地址
hbaseconfig.zk_hostname=10.1.235.197,10.1.235.198,10.1.235.199
#hbase zk port,hbase的zk使用端口
hbaseconfig.client_port=29181
#告警数据暂存的Redis配置
#redis ip,port to save alarm data
alarm.redis_server=10.1.241.18:16379
```
- 启动服务
- start server
```shell
$cd installer/bin
$./swserver.sh
```
- 可根据需要部署多个实例
- Multiple instances can be deployed, according to the needs of processing capacity.
- 启动服务前,请注意hbase的客户端使用机器名而非ip连接主机,请在server所在机器上正确配置hosts文件,否则会造成数据无法入库
### 配置SkyWalking Alarm / Config SkyWalking Alarm
- 根据服务器环境修改/config/config.properties
- config '/config/config.properties'
```properties
#zookeeper连接地址,用于协调集群,可以和hbase的zookeeper共用
zkpath.connect_str=10.1.241.18:29181,10.1.241.19:29181,10.1.241.20:29181
#管理数据库的JDBC连接信息
#数据库连接地址
db.url=jdbc:mysql://10.1.241.20:31306/sw_db
#数据库用户名
db.user_name=sw_dbusr01
#数据库密码
db.password=sw_dbusr01
#告警信息存在的redis服务器地址,需要和skywalking-server的alarm.redis_server设置一致
alarm.redis_server=127.0.0.1:6379
```
- 启动服务
- start server
```shell
$cd installer/bin
$./sw-alarm-server.sh
```
- 可根据需要部署多个实例,根据实例启动数量,自动负载均衡
- Multiple instances can be deployed, according to the needs of processing capacity. Multiple instances will load balance automatically.
### 配置SkyWalking WebUI / Config SkyWalking WebUI
- 修改配置文件WEB-INF/classes/config.properties
- config 'WEB-INF/classes/config.properties'
```properties
#hbase的连接地址
hbaseconfig.quorum=10.1.235.197,10.1.235.198,10.1.235.199
hbaseconfig.client_port=29181
```
- 修改配置文件jdbc.properties
- config 'jdbc.properties'
```properties
#管理数据库的JDBC连接信息
jdbc.url=jdbc:mysql://10.1.228.202:31316/test
jdbc.username=devrdbusr21
jdbc.password=devrdbusr21
```
### 配置SkyWalking Analysis / Config SkyWalking Analysis
- 将HBase安装包拷贝到Hadoop安装目录下. Copy HBase installation package to the Hadoop installation directory.
- 用HBase的主节点的配置覆盖HBase的安装包里面的配置. Use the configuration of the HBase master node converting the new Hbase package
- 在.bash_profile文件添加下面的配置,(需要根据实际情况进行配置). Add the following configuration to .base_profile
```
export HBASE_HOME=/aifs01/users/hdpusr01/hbase-1.1.2
export PATH=$HBASE_HOME/bin:$PATH
```
- 运行以下命令. Run the command as follow.
```
source .bash_profile
echo ${HBASE_HOME}
```
- 以上配置用于运行HBase MR任务,仅供参考
- 修改配置文件analysis.conf
- config 'analysis.conf'
```
#hbase连接信息
hbase.zk_quorum=10.1.235.197,10.1.235.198,10.1.235.199
hbase.zk_client_port=29181
#mysql连接信息
mysql.url=jdbc:mysql://10.1.228.202:31316/test
mysql.username=devrdbusr21
mysql.password=devrdbusr21
```
- 修改权限. Change mode start-analysis.sh
```
> chmod +x start-analysis.sh
```
- 创建crontab并运行脚本. Create crontab and run the command.
Crontab表达式见[更多](http://tool.lu/crontab/).Crontab express:[more](http://tool.lu/crontab/)
```
> crontab -e
# 此处为每20分钟执行start-analysis.sh脚本. Executed start-analysis script every 20 minutes.
*/20 * * * * start-analysis.sh
```
- 查看日志. tail the log
```
skywalking-analysis/log> tail -f map-reduce.log
```
- 通过[log4j或者log4j2](../HOW_TO_FIND_TID.md)插件,显示tid,反映SDK的运行情况。
```
#tid:N/A,代表环境设置不正确或监控已经关闭
#tid: ,代表测试当前访问不在监控范围
#tid:1.0a2.1453065000002.c3f8779.27878.30.184,标识此次访问的tid信息,示例如下
[DEBUG] Returning handler method [public org.springframework.web.servlet.ModelAndView com.ai.cloud.skywalking.example.controller.OrderSaveController.save(javax.servlet.http.HttpServletRequest)] TID:1.0a2.1453192613272.2e0c63e.11144.58.1 2016-01-19 16:36:53.288 org.springframework.beans.factory.support.DefaultListableBeanFactory
```
- 通过web应用的http调用入口,通过reponse的header信息,找到此次调用的traceid。前提:此web应用的url,已经使用skywalking-web-plugin进行监控。
### 通过服务器日志的Server Health Collector Report分析运行情况
- Health Report会在服务器定时出现,报告在两次报告时间段内的运行情况
1. ServerReceiver反映服务端接收数据的情况
1. DataBufferThread反映接收到的数据,异步写入log文件的情况
1. PersistenceThread反映log文件中的内容读取情况
1. PersistenceThread extra:hbase反映log文件中的内容,持久化到hbase的情况
1. RedisInspectorThread反映和redis的连接池检测情况
- Health Report示例如下
```
---------Server Health Collector Report---------
id<SkyWalkingServer,M:host-10-1-241-17/61.50.248.117,P:18737,T:DataBufferThread_0(27)>
[INFO]DataBuffer flush data to local file:1460445849010-a4ea1e20927b40d290ebd4f5f3e08705(t:1460598787106)
id<SkyWalkingServer,M:host-10-1-241-17/61.50.248.117,P:18737,T:DataBufferThread_1(28)>
[INFO]DataBuffer flush data to local file:1460445849012-0ec5a44474bc4969a9194fa1b1cadf73(t:1460598848608)
id<SkyWalkingServer,M:host-10-1-241-17/61.50.248.117,P:18737,T:DataBufferThread_2(29)>
[INFO]DataBuffer flush data to local file:1460445849012-4182e3d0205e48968b59fe9fbb0923b4(t:1460598668464)
id<SkyWalkingServer,M:host-10-1-241-17/61.50.248.117,P:18737,T:DataBufferThread_4(31)>
[INFO]DataBuffer flush data to local file:1460445849013-54216553363343fe8b39221a0691cf40(t:1460598788488)
id<SkyWalkingServer,M:host-10-1-241-17/61.50.248.117,P:18737,T:DataBufferThread_5(32)>
[INFO]DataBuffer flush data to local file:1460445849013-531a955690764a56ac0490db531bec10(t:1460598727104)
id<SkyWalkingServer,M:host-10-1-241-17/61.50.248.117,P:18737,T:DataBufferThread_6(33)>
[INFO]DataBuffer flush data to local file:1460445849014-e0163d8c8b8446d6812a11e89cd48b0e(t:1460598787722)
id<SkyWalkingServer,M:host-10-1-241-17/61.50.248.117,P:18737,T:DataBufferThread_7(34)>
[INFO]DataBuffer flush data to local file:1460445849014-340435a1dd9e49f088af039c2e9e85af(t:1460598787510)
id<SkyWalkingServer,M:host-10-1-241-17/61.50.248.117,P:18737,T:DataBufferThread_8(35)>
[INFO]DataBuffer flush data to local file:1460445849015-5dacc5f6fb38442eb484b50a240568df(t:1460598727508)
id<SkyWalkingServer,M:host-10-1-241-17/61.50.248.117,P:18737,T:PersistenceThread_0(15)>
[INFO]read 224 chars from local file:1460445849014-340435a1dd9e49f088af039c2e9e85af(t:1460598787534)
id<SkyWalkingServer,M:host-10-1-241-17/61.50.248.117,P:18737,T:PersistenceThread_0(15),extra:hbase>
[INFO]save 1 BuriedPointEntries.(t:1460598787549)
id<SkyWalkingServer,M:host-10-1-241-17/61.50.248.117,P:18737,T:PersistenceThread_1(16)>
[INFO]read 268 chars from local file:1460445849013-531a955690764a56ac0490db531bec10(t:1460598727121)
id<SkyWalkingServer,M:host-10-1-241-17/61.50.248.117,P:18737,T:PersistenceThread_1(16),extra:hbase>
[INFO]save 1 BuriedPointEntries.(t:1460598727129)
id<SkyWalkingServer,M:host-10-1-241-17/61.50.248.117,P:18737,T:PersistenceThread_4(19)>
[INFO]read 238 chars from local file:1460445849013-54216553363343fe8b39221a0691cf40(t:1460598788522)
id<SkyWalkingServer,M:host-10-1-241-17/61.50.248.117,P:18737,T:PersistenceThread_4(19),extra:hbase>
[INFO]save 1 BuriedPointEntries.(t:1460598788532)
id<SkyWalkingServer,M:host-10-1-241-17/61.50.248.117,P:18737,T:PersistenceThread_5(20)>
[INFO]read 526 chars from local file:1460445849014-e0163d8c8b8446d6812a11e89cd48b0e(t:1460598787770)
id<SkyWalkingServer,M:host-10-1-241-17/61.50.248.117,P:18737,T:PersistenceThread_5(20),extra:hbase>
[INFO]save 1 BuriedPointEntries.(t:1460598787775)
id<SkyWalkingServer,M:host-10-1-241-17/61.50.248.117,P:18737,T:PersistenceThread_6(21)>
[INFO]read 238 chars from local file:1460445849010-a4ea1e20927b40d290ebd4f5f3e08705(t:1460598787126)
id<SkyWalkingServer,M:host-10-1-241-17/61.50.248.117,P:18737,T:PersistenceThread_6(21),extra:hbase>
[INFO]save 1 BuriedPointEntries.(t:1460598787135)
id<SkyWalkingServer,M:host-10-1-241-17/61.50.248.117,P:18737,T:PersistenceThread_7(22)>
[INFO]read 583 chars from local file:1460445849012-4182e3d0205e48968b59fe9fbb0923b4(t:1460598668466)
id<SkyWalkingServer,M:host-10-1-241-17/61.50.248.117,P:18737,T:PersistenceThread_7(22),extra:hbase>
[INFO]save 1 BuriedPointEntries.(t:1460598668473)
id<SkyWalkingServer,M:host-10-1-241-17/61.50.248.117,P:18737,T:PersistenceThread_8(23)>
[INFO]read 819 chars from local file:1460445849015-5dacc5f6fb38442eb484b50a240568df(t:1460598727531)
id<SkyWalkingServer,M:host-10-1-241-17/61.50.248.117,P:18737,T:PersistenceThread_8(23),extra:hbase>
[INFO]save 2 BuriedPointEntries.(t:1460598727542)
id<SkyWalkingServer,M:host-10-1-241-17/61.50.248.117,P:18737,T:PersistenceThread_9(24)>
[INFO]read 462 chars from local file:1460445849012-0ec5a44474bc4969a9194fa1b1cadf73(t:1460598848648)
id<SkyWalkingServer,M:host-10-1-241-17/61.50.248.117,P:18737,T:PersistenceThread_9(24),extra:hbase>
[INFO]save 3 BuriedPointEntries.(t:1460598848757)
id<SkyWalkingServer,M:host-10-1-241-17/61.50.248.117,P:18737,T:RedisInspectorThread(65)>
[INFO]alarm redis connectted.(t:1460598845054)
id<SkyWalkingServer,M:host-10-1-241-17/61.50.248.117,P:18737,T:RegisterPersistenceThread(14)>
[INFO]flush memory register to file.(t:1460598848748)
id<SkyWalkingServer,M:host-10-1-241-17/61.50.248.117,P:18737,T:ServerReceiver(39)>
[INFO]DataBuffer reveiving data.(t:1460598845300)
------------------------------------------------
```
\ No newline at end of file
排查步骤:
1. 首先检查server端的buffer物理文件是否有TID
2. 如果不存在,则检查客户端的授权文件中的服务端地址是否正确或者网络是否能够ping通服务端
3. 如果存在有,则查看服务端的日志,查看持久化线程是否卡死在入HBase的过程中,一般正常的日志如下:
```
id<SkyWalkingServer,M:host-10-1-241-16/10.1.241.16,P:1982,T:PersistenceThread_2(17)>
[INFO]read 217 chars from local file:1456395555088-c8c670ceb8db4e2186a544e589db2d3e(t:1459900864671)
id<SkyWalkingServer,M:host-10-1-241-16/10.1.241.16,P:1982,T:PersistenceThread_2(17),extra:hbase>
[INFO]save 1 BuriedPointEntries.(t:1459900864676)
id<SkyWalkingServer,M:host-10-1-241-16/10.1.241.16,P:1982,T:PersistenceThread_3(18)>
[INFO]read 270 chars from local file:1456395555087-0436594d077747279cebeeece70a0ce6(t:1459900864673)
id<SkyWalkingServer,M:host-10-1-241-16/10.1.241.16,P:1982,T:PersistenceThread_3(18),extra:hbase>
[INFO]save 1 BuriedPointEntries.(t:1459900864678)
id<SkyWalkingServer,M:host-10-1-241-16/10.1.241.16,P:1982,T:PersistenceThread_4(19)>
[INFO]read 217 chars from local file:1456395555014-93b0879718ee4156926090de3037ac27(t:1459900744687)
id<SkyWalkingServer,M:host-10-1-241-16/10.1.241.16,P:1982,T:PersistenceThread_4(19),extra:hbase>
[INFO]save 1 BuriedPointEntries.(t:1459900744692)
```
持久化线程卡死HBase原因:
一般情况,如果网络通顺的话,检查部署Server端的机器中/etc/hosts是否配置HBase的域名映射
#FAQ
#常见命令 Common Commands:
```
#上传所有jar包. Upload all jar file to hdfs
./hdfs dfs -put /aifs01/users/devhdp01/hadoop-2.6.0/share/hadoop/common/lib/*.jar /aifs01/users/devhdp01/hadoop-2.6.0/share/hadoop/common/lib/
#创建目录. Make directory into hdfs
./hdfs dfs -mkdir -p /aifs01/users/devhdp01/hbase-1.1.2/lib/
```
## Jar包找不到. Jar File cannot be find.
###日志. Detail
```
Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://host-10-1-241-18:9000/aifs01/users/devhdp01/hbase-1.1.2/lib/hbase-hadoop-compat-1.1.2.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1122)
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
```
###解决.Resolve
1. 创建目录. Make directory into hdfs
```
${HADOOP_HOME}/bin>./hdfs dfs -mkdir -p /aifs01/users/devhdp01/hbase-1.1.2/lib/
```
2. 上传文件. Upload jar file to hdfs
```
${HADOOP_HOME}/bin>./hdfs dfs -put /aifs01/users/devhdp01/hbase-1.1.2/lib/hbase-hadoop-compat-1.1.2.jar /aifs01/users/devhdp01/hbase-1.1.2/lib/
```
## 分析失败, 如何分析之前的数据. After analyis failed. Cannot analysis the data before now.
1. 删除start-analysis.sh脚本中${SW_ANALYSIS_HOME}对应的目录,默认分析三个月之前的数据
\ No newline at end of file
## network-protocol
* 描述采集传输过程中的包结构
<table>
<tr align="center">
<td rowspan="3">包长度(4位)</td>
<td colspan="7">正文</td>
<td rowspan="3">校验和(4位)</td>
</tr>
<tr align="center">
<td colspan="3">子数据包1</td>
<td colspan="3">子数据包2</td>
<td>…… (n)</td>
</tr>
<tr align="center">
<td>子包长度(4位)</td>
<td>子包类型(4位)</td>
<td>子包正文</td>
<td>子包长度(4位)</td>
<td>子包类型(4位)</td>
<td>子包正文</td>
<td>…… (n)</td>
</tr>
</table>
## buffer-file-protocol
* 描述collector-server使用本地缓存的文件结构
### 标准文件结构
<table>
<tr align="center">
<td rowspan="3">包长度(4位)</td>
<td colspan="7">正文</td>
<td rowspan="3">分隔符(4位)127,127,127,127</td>
</tr>
<tr align="center">
<td colspan="3">子数据包1</td>
<td colspan="3">子数据包2</td>
<td>…… (n)</td>
</tr>
<tr align="center">
<td>子包长度(4位)</td>
<td>子包类型(4位)</td>
<td>子包正文</td>
<td>子包长度(4位)</td>
<td>子包类型(4位)</td>
<td>子包正文</td>
<td>…… (n)</td>
</tr>
</table>
### 文件结束标识性数据包
<table>
<tr align="center">
<td rowspan="3">包长度(4位)</td>
<td colspan="3">正文</td>
<td rowspan="3">分隔符(4位)127,127,127,127</td>
</tr>
<tr align="center">
<td colspan="3">子数据包1</td>
</tr>
<tr align="center">
<td>子包长度(4位)</td>
<td>子包类型(4位)</td>
<td>EOFSpan</td>
</tr>
</table>
* 更为详细的结构,可以参考protocol.xlsx
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册