未验证 提交 d02f31e4 编写于 作者: Y Yuting 提交者: GitHub

docs(English Docs): Update the remaining English documents (#284)

Performance Tuning,Data Migration to StoneDB,Troubleshooting,FAQ

#276
Co-authored-by: Nmergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
上级 fb5fd260
......@@ -3,4 +3,62 @@ id: cpu-monitor
sidebar_position: 7.22
---
# Commands for CPU Monitoring
\ No newline at end of file
# Commands for CPU Monitoring
This topic describes two commands that are commonly used for CPU monitoring.
# vmstat
The `vmstat` command is used to monitor processes, virtual memory, I/O, and CPUs of the OS. `vmstat` is an acronym of Virtual Memory Statistics.
Command output example:
```shell
# vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 3340 1934580 656188 214762784 0 0 0 20 1 0 0 0 100 0 0
1 0 3340 1934548 656188 214762832 0 0 0 0 854 592 2 0 98 0 0
1 0 3340 1934548 656188 214762832 0 0 0 0 865 605 2 0 98 0 0
1 0 3340 1934548 656196 214762832 0 0 0 32 925 634 2 0 98 0 0
1 0 3340 1934548 656196 214762832 0 0 0 0 844 710 2 0 98 0 0
```
Parameter description:
| **Parameter** | **Description** |
| --- | --- |
| r | The number of runnable processes. |
| b | The number of processes that are waiting for I/O. |
| swpd | The amount of swap space. Unit: KB. |
| free | The amount of idle memory. |
| buff | The amount of memory that is used as buffers. |
| cache | The amount of memory that is used as cache. Larger cache indicates more cached files. If all frequently access files are cached, the value of **bi **will be small. |
| si | The amount of memory that is swapped in from disks per second. |
| so | The amount of memory that is swapped to disks per second. |
| bi | The number of data blocks received per second. |
| bo | The number of data blocks sent per second. |
| in | The percentage of available CPU time. |
| cs | The number of context switchovers per second. |
| us | The percentage of the CPU time spent in running user space processes. |
| sy | The percentage of the CPU time spent in running system processes. |
| id | The percentage of the CPU time spent idle. |
| wa | The CPU time spent in wait. |
| st | The percentage of the CPU time spent the hypervisor. |
:::info
- If the values of **si** and **so** are large, the kernel is swapping memory to disks.
- Larger values of **bi** and **bo** indicate higher consumption of I/O.
- Larger values of **in** and **cs** indicate higher frequency of communication between the system with the interface devices.
:::
# perf top
The `perf top` command can be used to monitor the CPU used by processes for calling functions.
Command output example:
![image.png](./command-output-example.png)
Parameter description:
| **Parameter** | **Description** |
| --- | --- |
| Overhead | The CPU utilization. |
| Shared Object | The name of the object that consumes CPU time, such as an application, kernel, or dynamic link. |
| Symbol | The name of the function, in most cases. |
......@@ -3,4 +3,92 @@ id: top-commands
sidebar_position: 7.21
---
# The top command
\ No newline at end of file
# The top command
The `top` command can be used to monitor the usage of CPUs, memory, and swap space of the OS. It can also be used to monitor processes. The command output is sorted based on the CPU time.
Command output example:
```shell
top - 10:12:21 up 5 days, 22:31, 4 users, load average: 1.00, 1.00, 0.78
Tasks: 731 total, 1 running, 730 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.7 us, 0.0 sy, 0.0 ni, 98.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 257841.3 total, 1887.5 free, 45581.6 used, 210372.2 buff/cache
MiB Swap: 8192.0 total, 8188.7 free, 3.3 used. 210450.4 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
908076 mysql 20 0 193.0g 42.4g 44088 S 100.3 16.8 228:10.34 mysqld
823137 root 20 0 6187564 83772 51636 S 6.6 0.0 6:36.12 dockerd
822938 root 20 0 3278696 58500 35420 S 0.7 0.0 38:37.69 containerd
1483 root 20 0 239280 9260 8136 S 0.3 0.0 0:19.16 accounts-daemon
928343 root 20 0 9936 4576 3240 R 0.3 0.0 0:00.04 top
......
```
Parameters in line 1:
| **Parameter** | **Description** |
| --- | --- |
| 10:12:21 | The current system time. |
| up 5 days | The number of days for which the system runs continuously since last startup. |
| 4 users | The number of the online users. |
| load average | The average system workloads in the past 1 minute, 5 minutes, and 15 minutes. |
Parameters in line 2:
| **Parameter** | **Description** |
| --- | --- |
| total | The number of processes. |
| running | The number of processes that are in the running state. |
| sleeping | The number of processes that are in the sleeping state. |
| stopped | The number of processes that are in the stopped state. |
| zombie | The number of processes that are in the zombie state. |
Parameters in line 3:
| **Parameter** | **Description** |
| --- | --- |
| us | The percentage of CPU time spent in running user space processes. |
| sy | The percentage of CPU time spent in running system processes. |
| ni | The percentage of CPU time spent in running the processes of which priorities are changed. |
| id | The percentage of CPU time spent idle. |
| wa | The percentage of CPU time spent in wait. |
| hi | The percentage of CPU time spent in handling hardware interrupts. |
| si | The percentage of CPU time spent in handling software interrups. |
| st | The percentage of CPU time spent on the hypervisor. |
Pay attention to values in this line. If the value of **us** is large, user space processes consume much CPU time. If the value of **us** is larger than 50% for a long time, applications must be tuned in time. If the value of **sy** is large, system processes consume much CPU time. This may be caused by improper OS configuration or OS bugs. If the value of **wa** is large, I/O waits are high. This may be caused by high random I/O access or an I/O bottleneck.
Parameters in line 4:
| **Parameter** | **Description** |
| --- | --- |
| total | The amount of memory. |
| free | The amount of free memory. |
| used | The amount of used memory. |
| buff/cache | The amount of memory that is used as buffers and cache. |
Parameters in line 5:
| **Parameter** | **Description** |
| --- | --- |
| total | The size of swap space. |
| free | The size of free swap space. |
| used | The size of used swap space. |
| avail Mem | The size of swap space that has been cached. |
Parameters in the process list:
| **Parameter** | **Description** |
| --- | --- |
| PID | The process ID. |
| USER | The owner of the process. |
| PR | The priority of the process. A smaller value indicates a higher priority. |
| NI | The nice value of the priority. A positive integer indicates that the priority of the process is being downgraded. A negative integer indicates that the priority of the process is being upgraded. The value range is -20 to 19 and the default value is 0. |
| VIRT | The amount of virtual memory occupied by the process. |
| RES | The amount of physical memory occupied by the process. |
| SHR | The amount of shared memory occupied by the process. |
| S | The status of the process. The value can be:<br />- **S**: sleeping<br />- **R**: running<br />- **Z**: zombie<br />- **N**: The nice value of the process is a negative value.<br /> |
| %CPU | The percentage of the CPU time used by the process. |
| %MEM | The percentage of the memory occupied by the process. |
| TIME+ | The total CPU time used by the process. |
| COMMAND | The command that the process is running. |
\ No newline at end of file
......@@ -3,4 +3,40 @@ id: parameter-tuning
sidebar_position: 7.43
---
# Database parameter tuning
\ No newline at end of file
# Database parameter tuning
## stonedb_insert_buffer_size
- Description: This parameter specifies the insert buffer size, expressed in MB.
- Default value: 512
- Value range: 512 to 10000
- Recommended value: If operations of inserting bulk data exist, we recommend that you set the parameter to 2048.
## stonedb_ini_servermainheapsize
- Description: This parameter specifies the size of heap memory on the server, expressed in MB.
- Default value: 0, which indicates half the size of the physical memory.
- Value range: 0 to 1000000
- Recommended value: 0
## stonedb_distinct_cache_size
- Description: This parameter specifies the amount of the Group Distinct Cache, expressed in MB.
- Default value: 64
- Value range: 64 to 256
- Recommended value: 128
## stonedb_bg_load_threads
- Description: This parameter specifies the number of worker threads that load data from the insert buffer to the background thread pool.
- Default value: 0
- Value range: 0 to 100
- Recommended value: We recommend that you set the parameter to half the number of CPU cores.
## stonedb_load_threads
- Description: This parameter specifies the number of worker threads in the StoneDB Load thread pool.
- Default value: 0
- Value range: 0 to 100
- Recommended value: We recommend that you set the parameter to the number of the CPU cores.
## stonedb_query_threads
- Description: This parameter specifies the number of worker threads in the StoneDB query thread pool.
- Default value: 0
- Value range: 0 to 100
- Recommended value: We recommend that you set the parameter to the number of the CPU cores.
......@@ -3,4 +3,244 @@ id: sql-best-practices
sidebar_position: 7.41
---
# Best Practices for SQL Coding
\ No newline at end of file
# Best Practices for SQL Coding
## **Best practices for designing tables**
- Define primary keys for your StoneDB tables. We recommend that you use primary keys to uniquely identify each record in your StoneDB tables, though StoneDB does not require you to create indexes.
- Use auto-increment primary keys and do not use Universally Unique Identifiers (UUIDs) as primary keys. Auto-increment primary keys can be used to improve performance of insert operations and prevent data page splits and fragmentation to improve space utilization. UUIDs are not ordered and space consuming.
- Do not use foreign key constraints. Each time after an insert, update, or delete operation on a table that is defined with foreign keys, an integrity check is performed on the table. This reduces query performance.
- Use data type CHAR to define fixed-length character fields and data type VARCHAR to define variable-length character fields.
- Properly define the length of each field. If the defined length is much longer than that of the stored record, a large amount of space will be wasted and the access efficiency is reduced.
- Define each field to not null and provide each field with a default value, if possible.
- Define a timestamp field in each table. Timestamps can be used for obtaining incremental data to estimate the number of rows generated in a specified time range, and to facilitate data cleaning and archiving.
- Do not use big object field types. If big object fields are retrieved for a query, a large amount of network and I/O resources will be consumed. We recommend that you store big objects in external storage.
- Do not use a reserved keyword such as **desc**, **order**, **group**, or **distinct** as a table or field name.
- Ensure that the fields in a table use the same character set as the table.
## **Best practices for writting SQL queries**
### **Avoid the USE of SELECT ***
When you use a `SELECT` statement, specify the names of columns, instead of using a wildcard (*). This is because using `SELECT *` has the following negative impacts:
- Transmit irrelevant fields from the server to the client, incurring additional network overhead.
- Affect the execution plan of the statement. A `SELECT *` statement is much slower than a `SELECT _Column names_` statement because a `SELECT _Column names_` statement can return data by using only column indexes.
Following are statement examples:
Avoid:
```sql
select * from test;
```
Prefer:
```sql
select id,name from test;
```
### **Avoid use OR in a WHERE clause**
Use `UNION ALL` instead of `OR` when combing multiple fields in a `WHERE` clause to split the query into multiple queries. Otherwise, the indexes may become invalid.
Following are statement examples:
Avoid:
```sql
select * from test where group_id='40' or user_id='uOrzp9ojhfgqcwRCiume';
```
Prefer:
```sql
select * from test where group_id='40'
union all
select * from test where user_id='uOrzp9ojhfgqcwRCiume';
```
### **Do not compute on indexed columns**
If an indexed column is used for computation, the index will become invalid.
Following are statement examples:
Avoid:
```sql
select * from test where id-1=99;
```
Prefer:
```sql
select * from test where id=100;
```
### **Avoid enclosing indexed columns in functions**
If a function is used on an indexed column, the index will become invalid.
Following are statement examples:
Avoid:
```sql
select * from test where date_add(create_time,interval 10 minute)=now();
```
Prefer:
```sql
select * from test where create_time=date_add(now(),interval - 10 minute);
```
### **Use a pair of apostrophes to quote the value for an indexed column whose data type is string**
If the data type of an indexed column is string and a number not quoted with a pair of apostrophes is specified as the value in the indexed column, the number will be automatically converted to a string. As a result, the index will become invalid.
Following are statement examples:
Avoid:
```sql
select * from test where group_id=40;
```
Prefer:
```sql
select * from test where group_id='40';
```
### **Avoid use NOT or <> on indexed columns**
If `NOT` or `<>` is used on an indexed column, the index will become invalid.
### **Avoid use IS NOT NULL on an indexed column**
If `IS NOT NULL` is used on an indexed column, the index will become invalid.
### **Do not use leading wildcards unless necessary**
If leading wildcards are used, relevant indexes will become invalid.
### **Use TRUNCATE instead of DELETE to delete a large table if no WHERE clause is used**
`TRUNCATE` is an DDL operation, which is faster and will release space after the table is deleted.
### **Use batch operations when deleting or updating a large amount of data**
We recommend that you split a large transaction into small transactions, since the locking scope for each small transaction is much smaller and the locking duration is much shorter. By doing this, the efficiency of system resources is improved.
### **Use batch operations when inserting a large amount of data**
We recommend you use batch operations when inserting a large amount of data. This can greatly reduce the number of commits, improving query performance.
### **Commit transactions as soon as possible**
We recommend that you commit transactions as soon as possible to reduce the lock time.
### **Avoid use HAVING to filter data**
When `HAVING` is used to filter the data set at the last step, the data set is sorted and summarized. Therefore, use `WHERE` to replace `HAVING` if possible.
Following are statement examples:
Avoid:
```sql
select job,avg(salary) from test group by job having job = 'managent';
```
Prefer:
```sql
select job,avg(salary) from test where job = 'managent' group by job;
```
### **Exercise with caution when using user-defined functions**
When a function is called by an SQL statement, the number of times that the function is called is equal to the number of records contained in the result set returned. If the result set of the query is large, the query performance will be deteriorated.
### **Exercise with caution when using scalar subqueries**
The number of times that a scalar subquery is executed is equal to the number of records returned for its main query. If the result set of the main query is large, the query performance will be deteriorated.
Following are statement examples:
Avoid:
```sql
select e.empno, e.ename, e.sal,e.deptno,
(select d.dname from dept d where d.deptno = e.deptno) as dname
from emp e;
```
Prefer:
```sql
select e.empno, e.ename, e.sal, e.deptno, d.dname
from emp e
left join dept d
on e.deptno = d.deptno;
```
### **Try to use the same sorting sequence, if fields need to be sorted**
If the fields in the same SQL statement need to be sorted and use the same sorting sequence, indexes can be used to eliminate CPU overhead caused by sorting. Otherwise, excessive CPU time will be consumed. In the first example provided bellowed, field **a** is sorted in descending order while field **b** is sorted in ascending order. As a result, the optimizer cannot use indexes to avoid the sorting process.
Following are statement examples:
Avoid:
```sql
select a,b from test order by a,b desc;
```
Prefer:
```sql
select a,b from test order by a,b;
select a,b from test order by a desc,b desc;
```
### **Use as few joins as possible**
The more tables that are joined in an SQL statement indicates longer time and higher cost spent in compiling the statement. In addition, the optimizer has a higher probability of failing to choose the best execution plan.
### **Keep levels of nesting as few as possible**
If too many nesting levels exist in an SQL statement, temporary tables will be generated and the execution plan generated for the SQL statement may have poor performance.
Following are statement examples:
Avoid:
```sql
select * from t1 a where a.proj_no in
(select b.proj_no from t2 b where b.proj_name = 'xxx'
and not exists
(select 1 from t3 c where c.mess_id = b.t_project_id))
and a.oper_type <> 'D';
```
### **Specify the join condition when joining two tables**
If no join condition is specified when two tables are joined, Cartesian products will be generated. In such case, if both tables store a large amount of data, such SQL statement will consume a lot of CPU and memory resources.
Following are statement examples:
Avoid:
```sql
select * from a,b;
```
Prefer:
```sql
select * from a,b where a.id=b.id;
```
### **Use a comparatively small offset for pagination with LIMIT**
When a pagination query with `LIMIT` is processed, the offset data is first obtained, the data for pagination is later obtained, and the offset data is discarded to return only the paginated data. Therefore, if the offset is large, the performance of the SQL statement will be poor.
Following are statement examples:
Avoid:
```sql
select id,name from test limit 10000,10;
```
Prefer:
```sql
select id,name from test where id>10000 limit 10;
```
### **In a LEFT JOIN operation, ensure the table on the left has a smaller result set**
In most cases, the table on the left in a LEFT JOIN functions as the driving table. The number of records in the result set of the driving table is equal to the number of times that the driven table is executed. Therefore, if the result set of the driving table is large, the performance will be poor.
### **Use EXIST and IN accurately**
When to use `EXISTS` or `IN` is determined by the result set sizes of the outer query and inner query. If the result set of the outer query is larger than that of the inner query, `IN` is superior to `EXIST`. Otherwise, `EXIST` is preferred.
### **Use UNION ALL and UNION accurately**
A `UNION ALL` operation simple combines the two result sets and returns the collection. A `UNION` operation combines the two result sets and sort, deduplicates the records in the collection, and then returns the collection. We recommend that you use `UNION ALL` if possible, because `UNION` consumes more resources.
### **Use LEFT JOIN and INNER JOIN accurately**
In a `LEFT JOIN` operation, the rows that match in both tables and the remaining rows in the table on the left are returned. In an `INNER JOIN` operation, only the rows that match in both tables are returned.
### **In a LEFT JOIN, use ON … AND and ON … WHERE accurately**
The following information describes the main differences between `ON … AND` and `ON … WHERE`:
- `ON … AND` does not provide the filtering capability. Rows that have no match in the table on the right are filled with null.
- `ON … WHERE` provides the filtering capability. No matter whether the predicate condition is placed after `ON` or `WHERE`, rows in the table on the right are filtered first. However, if the predicate condition is placed after `WHERE`, the `LEFT JOIN` operation will be converted into an `INNER JOIN` operation.
### **In an INNER JOIN, use ON … AND and ON … WHERE accurately**
In an `INNER JOIN` operation, `ON … AND` is equivalent to `ON … WHERE`. The both provide the filtering capability.
### **Avoid uncessary sorting**
For count operations, sorting is unnecessary.
Following are statement examples:
Avoid:
```sql
select count(*) as totalCount from
(select * from enquiry e where 1 = 1
AND status = 'closed'
AND is_show = 1
order by id desc, expire_date asc) _xx;
```
Prefer:
```sql
select count(*) from enquiry e where 1 = 1
AND status = 'closed'
AND is_show = 1;
```
### **Avoid unnecessary nesting**
For queries that can be implemented by a single `SELECT`, do not use nested `SELECT`.
Following are statement examples:
Avoid:
```sql
select count(*) as totalCount from
(select * from enquiry e where 1 = 1
AND status = 'closed'
AND is_show = 1
order by id desc, expire_date asc) _xx;
```
Prefer:
```sql
select count(*) from enquiry e where 1 = 1
AND status = 'closed'
AND is_show = 1;
```
### Each time after an SQL statement is written, execute an EXPLAIN statement to query its execution plan
Each time after you write an SQL statement, we recommend that you execute `EXPLAN` to check the execution plan of the SQL statement and pay special attention to parameters **type**, **rows**, and **extra**.
......@@ -3,4 +3,387 @@ id: olap-performance-test-method
sidebar_position: 7.611
---
# OLAP Performance Test Method
\ No newline at end of file
# OLAP Performance Test Method
## **TPC-H introduction**
The TPC Benchmark-H (TPC-H) is a decision support benchmark. It consists of a suite of business-oriented ad-hoc queries and concurrent data modifications. The queries and the data populating the database have been chosen to have broad industry-wide relevance.<br />In the TPC-H benchmark, 22 complex SQL queries are performed on 8 tables. Most queries contain joins on several tables, subqueries, and GROUP BY clauses. For more information, visit [https://www.tpc.org/tpch/](https://www.tpc.org/tpch/).
## **Test environment introduction**
- OS: CentOS 7.9
- CPU: Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz, 16 cores, and 64 threads
- Memory: 125 GB
- Deployment mode of StoneDB: standalone
```shell
bin/mysqld Ver 5.6.24-StoneDB for Linux on x86_64 (build-)
build information as follow:
Repository address: https://github.com/stoneatom/stonedb.git:stonedb-5.6
Branch name: stonedb-5.6
Last commit ID: 90583b2
Last commit time: Date: Wed Jul 6 23:31:30 2022 +0800
Build time: Date: Thu Jul 7 05:39:39 UTC 2022
```
## **Test scheme**
### **1. Set up the test environment**
For information about how to set up the test environment, see [Quick Deployment](..../../../../../02-getting-started/quick-deployment.md).
### 2. Compile and deploy TPC-H
1. Download the [TPC-H](https://www.tpc.org/tpc_documents_current_versions/current_specifications5.asp) installation package, upload the package to the test machine.
For example, upload it to the** /data** folder.
```shell
unzip tpc-h-tool.zip
mv TPC-H_Tools_v3.0.0/ tpc-h/
cd /data/tpc-h/dbgen/
# Install GCC and MAKE.
yum install gcc make -y
```
2. Modify file **makefile** as shown in the following code.
```shell
cp makefile.suite makefile
vim makefile
################
## CHANGE NAME OF ANSI COMPILER HERE
################
CC = gcc
# Current values for DATABASE are: INFORMIX, DB2, TDAT (Teradata)
# SQLSERVER, SYBASE, ORACLE, VECTORWISE
# Current values for MACHINE are: ATT, DOS, HP, IBM, ICL, MVS,
# SGI, SUN, U2200, VMS, LINUX, WIN32
# Current values for WORKLOAD are: TPCH
DATABASE= MYSQL
MACHINE = LINUX
WORKLOAD = TPCH
```
This modification is mandatory, because TPC-H originally does not support MySQL.
3. Modify file **tpcd.h** to add the database type MySQL to TPC-H.
```shell
vim tpcd.h
#ifdef MYSQL
#define GEN_QUERY_PLAN ""
#define START_TRAN "START TRANSACTION"
#define END_TRAN "COMMIT"
#define SET_OUTPUT ""
#define SET_ROWCOUNT "limit %d;\n"
#define SET_DBASE "use %s;\n"
#endif
# Run the "make" command.
make
```
### **3. Use TPC-H to generate 100 GB test data**
For example, run the following command to use TPC-H to generate **.tbl** data files.
```shell
./dbgen -s 100
# Copy the data files to the "stonedb" folder.
mkdir /data/tpc-h/stonedb/
mv *.tbl /data/tpc-h/stonedb/
```
**-s** _n_ in the command indicates the size of data generated. Unit: GB.<br />After the test files are generated, you can run the **head** command to check whether each row in the **.tbl** data files has some fields that are separated with vertical bars (|).
:::info
If this is not the first time you use TPC-H to generate data in the environment, we recommend that you run **make clean** and then **make** to clear data first and then run the command with **-f** specified to overwrite the data previously generated.
:::
### **4. Modify the dss.ddl and dss.ri commands**
The `dss.ddl` command is used to create tables. The `dss.ri` command is used to create indexes and foreign key indexes.
Because the syntax to create table schemas varies with the storage engine. Therefore, the statements for creating table schemas and indexes must be modified. The following code provides an example.
To modify the statements, copy the file that stores schemas and indexes to the **stonedb** folder.
```shell
[root@htap2 dbgen]# cp dss.ddl /data/tpc-h/stonedb/dss_stonedb.ddl # Schemas used for StoneDB tables
```
```sql
-- sccsid: @(#)dss.ddl 2.1.8.1
create table nation ( n_nationkey integer not null,
n_name char(25) not null,
n_regionkey integer not null,
n_comment varchar(152),primary key (n_nationkey))engine=StoneDB;
create table region ( r_regionkey integer not null,
r_name char(25) not null,
r_comment varchar(152),primary key (r_regionkey))engine=StoneDB;
create table part ( p_partkey integer not null,
p_name varchar(55) not null,
p_mfgr char(25) not null,
p_brand char(10) not null,
p_type varchar(25) not null,
p_size integer not null,
p_container char(10) not null,
p_retailprice decimal(15,2) not null,
p_comment varchar(23) not null,primary key (p_partkey) )engine=StoneDB;
create table supplier ( s_suppkey integer not null,
s_name char(25) not null,
s_address varchar(40) not null,
s_nationkey integer not null,
s_phone char(15) not null,
s_acctbal decimal(15,2) not null,
s_comment varchar(101) not null,primary key (s_suppkey))engine=StoneDB;
create table partsupp ( ps_partkey integer not null,
ps_suppkey integer not null,
ps_availqty integer not null,
ps_supplycost decimal(15,2) not null,
ps_comment varchar(199) not null,primary key (ps_partkey,ps_suppkey) )engine=StoneDB;
create table customer ( c_custkey integer not null,
c_name varchar(25) not null,
c_address varchar(40) not null,
c_nationkey integer not null,
c_phone char(15) not null,
c_acctbal decimal(15,2) not null,
c_mktsegment char(10) not null,
c_comment varchar(117) not null,primary key (c_custkey))engine=StoneDB;
create table orders ( o_orderkey integer not null,
o_custkey integer not null,
o_orderstatus char(1) not null,
o_totalprice decimal(15,2) not null,
o_orderdate date not null,
o_orderpriority char(15) not null,
o_clerk char(15) not null,
o_shippriority integer not null,
o_comment varchar(79) not null,primary key (o_orderkey))engine=StoneDB;
create table lineitem ( l_orderkey integer not null,
l_partkey integer not null,
l_suppkey integer not null,
l_linenumber integer not null,
l_quantity decimal(15,2) not null,
l_extendedprice decimal(15,2) not null,
l_discount decimal(15,2) not null,
l_tax decimal(15,2) not null,
l_returnflag char(1) not null,
l_linestatus char(1) not null,
l_shipdate date not null,
l_commitdate date not null,
l_receiptdate date not null,
l_shipinstruct char(25) not null,
l_shipmode char(10) not null,
l_comment varchar(44) not null,primary key (l_orderkey,l_linenumber))engine=StoneDB;
```
### **5. Import table schemas and data**
1. Import table schemas.
```sql
mysql> create database tpch;
mysql> source /data/tpc-h/stonedb/dss_stonedb.ddl # Modify the value of the PATH parameter to be the path to the TPC-H tool.
```
2. Import data.
You can directly import tables **part**, **region**, **nation**, **customer**, and **supplier**. For tables **lineitem**, **orders**, and **partsupp**, we recommend that you use a script such as split_file2db.sh to split them before the import.
```shell
# Import data to StoneDB.
mysql -uroot -pxxxx -hxx.xx.xx.xx -P3306 --local-infile -Dtpcd -e "load data local infile '/data/tpc-h/stonedb/part.tbl' into table part fields terminated by '|';"
mysql -uroot -pxxxx -hxx.xx.xx.xx -P3306 --local-infile -Dtpcd -e "load data local infile '/data/tpc-h/stonedb/region.tbl' into table region fields terminated by '|';"
mysql -uroot -pxxxx -hxx.xx.xx.xx -P3306 --local-infile -Dtpcd -e "load data local infile '/data/tpc-h/stonedb/nation.tbl' into table nation fields terminated by '|';"
mysql -uroot -pxxxx -hxx.xx.xx.xx -P3306 --local-infile -Dtpcd -e "load data local infile '/data/tpc-h/stonedb/customer.tbl' into table customer fields terminated by '|';"
mysql -uroot -pxxxx -hxx.xx.xx.xx -P3306 --local-infile -Dtpcd -e "load data local infile '/data/tpc-h/stonedb/supplier.tbl' into table supplier fields terminated by '|';"
```
```bash
#! /bin/bash
shopt -s expand_aliases
source ~/.bash_profile
# Obtain the .tbl files and the corresponding table names.
sql_path=/data/tpc-h/stonedb/
# split_tb=$(ls ${sql_path}/*.ddl)
# Files to split.
split_tb=(lineitem orders partsupp)
# split_tb=(customer nation supplier part region)
# split_tb=(part nation)
# Split settings.
# The interval (number of rows) for splitting.
line=1000000
# Database configuration.
db_host=192.168.30.102
db_port=3306
db_user=ztpch
db_pwd=******
db=ztpch
# Split a large SQL file.
function split_file()
{
for tb_name in ${split_tb[@]}
do
echo "$tb_name"
# Obtain the number of the file before it is split.
totalline=$(cat $sql_path/$tb_name.tbl | wc -l)
# echo totalline=$totalline
a=`expr $totalline / $line`
b=`expr $totalline % $line`
if [[ $b -eq 0 ]] ;then
filenum=$a
else
filenum=`expr $a + 1`
fi
# echo filenum=$filenum
echo "File $tb_name has $totalline rows of data and needs to be split into $filenum files."
# Split the file.
i=1 # Change 38 to 1.
while(( i<=$filenum ))
do
echo "File to split: $tb_name.tbl.$i"
# The interval for splitting must falls with [min, max] of the original file.
p=`expr $i - 1`
min=`expr $p \* $line + 1`
max=`expr $i \* $line`
sed -n "$min,$max"p $sql_path/$tb_name.tbl > $sql_path/$tb_name.tbl.$i
#echo "This operation does not split the file."
# Specify the name of the file to split.
filename=$sql_path/$tb_name.tbl.$i
echo "$tb_name.tbl.$i is split. File name: $filename"
# Import data to StoneDB.
mysql -u$db_user -p$db_pwd -h$db_host -P$db_port --local-infile -D$db -e "load data local infile '$filename' into table $tb_name fields terminated by '|';" $2>1 > /dev/null
i=`expr $i + 1`
done
done
}
split_file
```
### **6. Generate test SQL statements**
Copy the **qgen** and **dists.dss** files to the **queries** directory.
```shell
cp /data/tpc-h/dbgen/qgen /data/tpc-h/dbgen/queries
cp /data/tpc-h/dbgen/dists.dss /data/tpc-h/dbgen/qgen/queries
# Copy the files to path "data/tpc-h/stonedb".
cp -a /data/tpc-h/dbgen/qgen/queries /data/tpc-h/stonedb/queries
```
```bash
#!/usr/bin/bash
# Run "chmod +x tpch_querys.sh".
#./tpch_querys.sh stonedb
db_type=$1
for i in {1..22}
do
./qgen -d $i -s 100 > $db_type"$i".sql
done
```
```shell
# Execute the script to generate statements.
mkdir query
./tpch_querys.sh query
mv query*.sql /data/tpc-h/stonedb/queries
```
### **7. Modify the SQL statements and start the test**
Test script:
```bash
#!/bin/bash
# stone
host=192.168.30.120
port=3306
user=root
password=********
database=ztpch
# The absolate path. The following is for reference only.
banchdir=/data/tpc-h/stonedb/queries
db_type=stonedb #ck、stone、mysql
resfile=$banchdir/"TPCH_${db_type}_`date "+%Y%m%d%H%M%S"`"
echo "start test run at"`date "+%Y-%m-%d %H:%M:%S"`|tee -a ${resfile}.out
echo "Path to log: ${resfile}"
for (( i=1; i<=22;i=i+1 ))
do
queryfile=${db_type}${i}".sql"
echo "run query ${i}"|tee -a ${resfile}.out
echo " $database $banchdir/query$i.sql " #|tee -a ${resfile}.out
start_time=`date "+%s.%N"`
#clickhouse
#clickhouse-client --port $port --user $user --password $password --host $host -d $database < $banchdir/query$i.sql |tee -a ${resfile}.out
#stonedb and mysql
mysql -u$user -p$password -h$host -P$port $database -e "source $banchdir/query$i.sql" 2>&1 |tee -a ${resfile}.out
end_time=`date "+%s.%N"`
start_s=${start_time%.*}
start_nanos=${start_time#*.}
end_s=${end_time%.*}
end_nanos=${end_time#*.}
if [ "$end_nanos" -lt "$start_nanos" ];then
end_s=$(( 10#$end_s -1 ))
end_nanos=$(( 10#$end_nanos + 10 ** 9))
fi
time=$(( 10#$end_s - 10#$start_s )).`printf "%03d\n" $(( (10#$end_nanos - 10#$start_nanos)/10**6 ))`
echo ${queryfile} "the $i run cost "${time}" second start at "`date -d @$start_time "+%Y-%m-%d %H:%M:%S"`" stop at "`date -d @$end_time "+%Y-%m-%d %H:%M:%S"` >> ${resfile}.time
# systemctl stop clickhouse-server
done
```
#### **Statements before the modification**
[q1.sql](https://static.stoneatom.com/custom/sql/q1.sql)
[q2.sql](https://static.stoneatom.com/custom/sql/q2.sql)
[q3.sql](https://static.stoneatom.com/custom/sql/q3.sql)
[q4.sql](https://static.stoneatom.com/custom/sql/q4.sql)
[q5.sql](https://static.stoneatom.com/custom/sql/q5.sql)
[q6.sql](https://static.stoneatom.com/custom/sql/q6.sql)
[q7.sql](https://static.stoneatom.com/custom/sql/q7.sql)
[q8.sql](https://static.stoneatom.com/custom/sql/q8.sql)
[q9.sql](https://static.stoneatom.com/custom/sql/q9.sql)
[q10.sql](https://static.stoneatom.com/custom/sql/q10.sql)
[q11.sql](https://static.stoneatom.com/custom/sql/q11.sql)
[q12.sql](https://static.stoneatom.com/custom/sql/q12.sql)
[q13.sql](https://static.stoneatom.com/custom/sql/q13.sql)
[q14.sql](https://static.stoneatom.com/custom/sql/q14.sql)
[q15.sql](https://static.stoneatom.com/custom/sql/q15.sql)
[q16.sql](https://static.stoneatom.com/custom/sql/q16.sql)
[q17.sql](https://static.stoneatom.com/custom/sql/q17.sql)
[q18.sql](https://static.stoneatom.com/custom/sql/q18.sql)
[q19.sql](https://static.stoneatom.com/custom/sql/q19.sql)
[q20.sql](https://static.stoneatom.com/custom/sql/q20.sql)
[q21.sql](https://static.stoneatom.com/custom/sql/q21.sql)
[q22.sql](https://static.stoneatom.com/custom/sql/q22.sql)
#### **Statements after the modification**
To ensure the repeatability of this test, we recommend that you use the statements after the modification.
[q.zip](https://static.stoneatom.com/custom/sql/q.zip)
### **8. Execute the TPC-H script to obtain the test result**
The **.out** file stores the test results. The **.time** file records the execution time of each query.
```shell
ll /data/tpc-h/stonedb/queries/stonedb
-rw-r--r-- 1 root root 15019 Jun 1 00:55 TPCH_stone_20220531233024.out
-rw-r--r-- 1 root root 2179 Jun 1 00:57 TPCH_stone_20220531233024.time
```
\ No newline at end of file
......@@ -3,4 +3,98 @@ id: oltp-performance-test-method
sidebar_position: 7.621
---
# OLTP Performance Test Method
\ No newline at end of file
# OLTP Performance Test Method
## SysBench introduction
SysBench is a modular, cross-platform, and multithreaded benchmark tool for evaluating parameters that are important for a system that runs a database under heavy load. The idea of this benchmark suite is to quickly get an impression about system performance without setting up complex database benchmarks or even without installing a database at all.
## Test description
```sql
CREATE TABLE `sbtest1` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`k` int(11) NOT NULL DEFAULT '0',
`c` char(120) NOT NULL DEFAULT '',
`pad` char(60) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `k_1` (`k`)
) ENGINE=StoneDB AUTO_INCREMENT=800001 DEFAULT CHARSET=utf8
```
Percentage proportion of each type of SQL statements:
| **SELECT Type** | **Percentage (%)** | **SQL Statement Example** |
| --- | --- | --- |
| point_selects | 10 | SELECT c FROM sbtest%u WHERE id=? |
| simple_ranges | 1 | SELECT c FROM sbtest%u WHERE id BETWEEN ? AND ? |
| sum_ranges | 1 | SELECT SUM(k) FROM sbtest%u WHERE id BETWEEN ? AND ? |
| order_ranges | 1 | SELECT c FROM sbtest%u WHERE id BETWEEN ? AND ? ORDER BY c |
| distinct_ranges | 1 | SELECT DISTINCT c FROM sbtest%u WHERE id BETWEEN ? AND ? ORDER BY c |
| index_updates | 1 | UPDATE sbtest%u SET k=k+1 WHERE id=? |
| non_index_updates | 1 | UPDATE sbtest%u SET c=? WHERE id=? |
:::info
- In this test, operations involved in all SQL statements are read operations.
- StoneDB does not require secondary indexes, so **index_updates** is equivalent to **non_index_updates**.
:::
### Performance metrics
- Transactions Per Second (TPS): the number of transactions committed per second.
- Queries Per Second (QPS): the number of SQL statements executed per second, including INSERT, SELECT, UPDATE, and DELETE statements.
### Additional information
- In the standard OLTP read/write scenario provided by SysBench, a transaction consists of 18 read/write SQL statements. (Because StoneDB does not support DELETE operations, the DELETE statement is removed in this test.)
- In the standard OLTP read-only scenario provided by SysBench, a transaction consists of 14 read SQL statements: 14 primary key statements and 4 range statements.
- In the standard OLTP write-only scenario provided by SysBench, a transaction consists of 4 write SQL statements: 2 UPDATE statements, 1 DELETE statement, and 1 INSERT statement. (Because StoneDB does not support DELETE operations, a DELETE statement and an INSERT statement that is associated with the DELETE statement are removed.)
## Install SysBench
```shell
yum install gcc gcc-c++ autoconf automake make libtool bzr mysql-devel git mysql
git clone https://github.com/akopytov/sysbench.git
## Download SysBench from Git.
cd sysbench
## Open the directory that saves SysBench.
git checkout 1.0.18
## Switch the SysBench version to 1.0.18.
./autogen.sh
## Run autogen.sh.
./configure --prefix=$WROKSPACE/sysbench/ --mandir=/usr/share/man
make
## Compile
make install
```
Statement example for testing:
```shell
cd $WROKSPACE/sysbench/
# Prepare data.
bin/sysbench --db-driver=mysql --mysql-host=xx.xx.xx.xx --mysql-port=3306 --mysql-user=xxx --mysql-password=xxxxxx --mysql-db=sbtest --table_size=800000 --tables=230 --time=600 --mysql_storage_engine=StoneDB --create_secondary=false --test=src/lua/oltp_read_only.lua prepare
# Run workloads.
bin/sysbench --db-driver=mysql --mysql-host=xx.xx.xx.xx --mysql-port=3306 --mysql-user=xxx --mysql-password=xxxxxx --mysql-db=sbtest --table_size=800000 --tables=230 --events=0 --time=600 --mysql_storage_engine=StoneDB --threads=8 --percentile=95 --range_selects=0 --skip-trx=1 --report-interval=1 --test=src/lua/oltp_read_only.lua run
# Clear test data.
bin/sysbench --db-driver=mysql --mysql-host=xx.xx.xx.xx --mysql-port=3306 --mysql-user=xxx --mysql-password=xxxxxx --mysql-db=sbtest --table_size=800000 --tables=230 --events=0 --time=600 --mysql_storage_engine=StoneDB --threads=8 --percentile=95 --range_selects=0 --skip-trx=1 --report-interval=1 --test=src/lua/oltp_read_only.lua cleanup
```
## SysBench parameter description
| **Parameter** | **Description** |
| --- | --- |
| db-driver | The database driver. |
| mysql-host | The address of the test instance. |
| mysql-port | The port used to connect to the test instance. |
| mysql-user | The username of the test account. |
| mysql-password | The password of the test account. |
| mysql-db | The name of the test database. |
| table_size | The size of the table. |
| tables | The number of tables. |
| events | The number of connections. |
| time | The time that the test lasts. |
| threads | The number of threads. |
| percentile | The percentile to calculate in latency statistics. The default value is 95. |
| report-interval | The interval for generating reports about the test progress, expressed in seconds. Value 0 indicates that no such report will be generated, and only the final report will be generated. |
| skip-trx | Whether to skip transactions.<br />- **1**: yes<br />- **0**: no<br /> |
| mysql-socket | The **.sock** file specified for the instance. This parameter is valid if the instance is a local instance. |
| create_secondary | Whether to create secondary indexes. The default value is **true**. |
......@@ -3,4 +3,41 @@ id: os-tuning
sidebar_position: 7.3
---
# OS Tuning
\ No newline at end of file
# OS Tuning
This topic describes how to tune a Linux OS. Methods to tune other types of OSs are currently not provided. The commands used in the following example suits only to CentOS 7._x_.
## **Disable SELinux and the firewall**
We recommend that you disable SELinux and the firewall to allow access from certain services.
```shell
systemctl stop firewalld
systemctl disable firewalld
vi /etc/selinux/config
# Modify the value of SELINUX.
SELINUX = disabled
```
## **Change the I/O scheduling mode**
If your disks are hard disk drives (HDDs), change the mode to **Deadline** to improve throughput. If your disks are solid-state drive (SSDs), change the mode to **noop**.
```shell
dmesg | grep -i scheduler
grubby --update-kernel=ALL --args="elevator=noop"
```
## **Do use swap space unless necessary**
If your memory is insufficient, we recommend that you do not use swap space as buffer. This is because the OS will suffer from severe performance problems if the swap space is used. For this reason, set **vm.swappiness** to the smallest value.
```shell
vi /etc/sysctl.conf
# Add parameter setting vm.swappiness = 0
vm.swappiness = 0
```
## **Disable NUMA**
If the memory allocated to non-uniform memory access (NUMA) nodes in a zone is exhausted, the OS will reclaim memory even though the free memory in total is sufficient. This has certain impacts on the OS. We recommend that you disable NUMA to allocate and use memory more efficiently.
```shell
grubby --update-kernel=ALL --args="numa=off"
```
## **Disable transparent hugepages**
Transparent hugepages are dynamically allocated. Databases use sparse memory access. If the system contains a large number of memory fragments, dynamic allocation of transparent hugepages will suffer from high latency and CPU bursts will occur. For this reason, we recommend you to disable transparent hugepages.
```shell
cat /sys/kernel/mm/transparent_hugepage/enabled
vi /etc/default/grub
GRUB_CMDLINE_LINUX="xxx transparent_hugepage=never"
grub2-mkconfig -o /boot/grub2/grub.cfg
```
\ No newline at end of file
---
id: overview
id: performance-tuning-overview
sidebar_position: 7.1
---
# Overview
\ No newline at end of file
# Overview
As the concurrency and data volume of a database keep increasing, the database performance decreases. If the performance is not tuned in time, the database may break down. This may cause catastrophic losses.
Database performance can be tuned in terms of system architecture, application architecture, hardware configuration, OS, and database itself. The system architecture, application architecture, and hardware configuration should be tuned before the database rollout and further optimized during the database running.
This guide aims to help you tune the performance of your StoneDB from the following aspects:
- Common commands used for performance monitoring
- OS performance tuning
- Database performance tuning
- Architecture optimization
- Performance testing
......@@ -13,7 +13,7 @@ It is designed to be a customizable data migration tool that:
- Supports multiple data sources and destinations.
- Supports Kubernetes-based replication clusters.
*TODO*
![Gravity](./Gravity.png)
For more information about Gravity on GitHub, visit [https://github.com/moiot/gravity](https://github.com/moiot/gravity).
......@@ -143,8 +143,8 @@ The following are two screenshot examples of the Grafana monitoring dashboard. F
**Example 1:**
*image todo*
![example-1](./example-1.png)
**Example 2:**
*image todo*
\ No newline at end of file
![example-2](./example-2.png)
\ No newline at end of file
......@@ -3,4 +3,20 @@ id: failed-to-connect
sidebar_position: 9.7
---
# Failed to Connect to StoneDB
\ No newline at end of file
# Failed to Connect to StoneDB
# **Too many connections**
If the following error is returned when you connect to StoneDB, the maximum number of connections specified by parameter **max_connections** has been reached. You must log in as user admin, release idle connections, and determine whether to set **max_connections** to a larger value based on your service requirements.
```
ERROR 1040 (HY000): Too many connections
```
# **Metadata lock waits**
If your request for connecting to StoneDB is suspended, a metadata lock wait may occur. You need to log in as user admin and check whether the command output contains a large number of "Waiting for table metadata lock" messages. If yes, locate and terminate the thread that cause the congestion.
```sql
show processlist
```
# **Incorrect username or password, or insufficient permissions**
If the following error is returned when you try to connect to StoneDB, the username or password is incorrect, or you are not granted with the permissions to access StoneDB.
```
ERROR 1045 (28000): Access denied for user 'sjj'@'localhost' (using password: YES)
```
......@@ -3,4 +3,26 @@ id: failed-to-operate-table
sidebar_position: 9.5
---
# Failed to Operate on Data in StoneDB Tables
\ No newline at end of file
# Failed to Operate on Data in StoneDB Tables
StoneDB has limits on some DML operations. For example, the following error is returned if a DELETE operation is performed because StoneDB does not support DELETE operations.
```
ERROR 1031 (HY000): Table storage engine for 'xxx' doesn't have this option
```
In addition, StoneDB does not support following operations:
- Execute an REPLACE… INTO statement.
- Use subqueries in an UPDATE statement.
- Execute an UPDATE… JOIN statement to update multiple tables.
If you perform any of them, the system output indicates that the operation is successful. However, if you query relevant data, the data is not updated. You can execute `SHOW WARNINGS`, and the following warning information will be displayed.
```sql
mysql> show warnings;
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1592 | Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT. Statements writing to a table with an auto-increment column after selecting from another table are unsafe because the order in which rows are retrieved determines what (if any) rows will be written. This order cannot be predicted and may differ on master and the slave. |
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
```
StoneDB supports only statement-based binlog. For this reason, the three DML operations are identified as insecure. If StoneDB is deployed in active/standby architecture, these operations may also cause data inconsistency.
......@@ -3,4 +3,29 @@ id: failed-to-start-in-kvm
sidebar_position: 9.6
---
# Failed to Start StoneDB in a KVM
\ No newline at end of file
# Failed to Start StoneDB in a KVM
An error is returned when StoneDB is started in a kernel-based virtual machine (KVM).
The following code provides an example.
```shell
# /stonedb/install/bin/mysql_server start
Starting stonedbbasedir::: /stonedb/install/
bindir::: /stonedb/install//bin
datadir::: /stonedb/install/data
mysqld_pid::: /stonedb/install/data/mysqld.pid
...220307 02:14:15 mysqld_safe Logging to '/stonedb/install/log/mysqld.log'.
.220307 02:14:15 mysqld_safe Starting mysqld daemon with databases from /stonedb/install/data
/stonedb/install//bin/mysqld_safe: line 166: 22159 Illegal instruction nohup /stonedb/install/bin/mysqld --basedir=/stonedb/install/ --datadir=/stonedb/install/data --plugin-dir=/stonedb/install/lib/plugin --user=mysql --log-error=/stonedb/install/log/mysqld.log --open-files-limit=65535 --pid-file=/stonedb/install/data/mysqld.pid --socket=/stonedb/install//tmp/mysql.sock --port=3306 < /dev/null >> /stonedb/install/log/mysqld.log 2>&1:q
220307 02:14:15 mysqld_safe mysqld from pid file /stonedb/install/data/mysqld.pid ended
./mysql_server: line 264: kill: (20941) - No such process
ERROR!
```
The status code and error message are **22159 Illegal instruction**.
This error occurs when the system cannot identify the instruction set. After GDB is used to analyze the core dump files, the cause is located: Advanced Vector Extensions (AVX) is disabled. Enable AVX and then you can start StoneDB.
To check whether AVX is enabled, run the following command:
```shell
cat /proc/cpuinfo | grep avx
```
......@@ -3,4 +3,59 @@ id: mdl-wait
sidebar_position: 9.3
---
# Metadata Lock Waits
\ No newline at end of file
# Metadata Lock Waits
Although StoneDB does not support transactions, metadata locks for which you apply are not released if transactions are enabled and the transactions are not submitted or rolled back in time. In this case, if another thread executes a DDL operation on this table, a thread congestion will occur and then any threads perform queries or updates on this table are congested. This is because the write lock applied for by the DDL operation conflicts with the read lock. You need to locate and terminate the thread that causes the congestion in time. Otherwise, the maximum number of connections will be reached in a short period of time.
If **performance_schema** is enabled, you can execute the following statement to locate the thread that causes the congestion.
```sql
select locked_schema,
locked_table,
locked_type,
waiting_processlist_id,
waiting_age,
waiting_query,
waiting_state,
blocking_processlist_id,
blocking_age,
substring_index(sql_text, "transaction_begin;", -1) AS blocking_query,
sql_kill_blocking_connection
from (select b.owner_thread_id as granted_thread_id,
a.object_schema as locked_schema,
a.object_name as locked_table,
"Metadata Lock" AS locked_type,
c.processlist_id as waiting_processlist_id,
c.processlist_time as waiting_age,
c.processlist_info as waiting_query,
c.processlist_state as waiting_state,
d.processlist_id as blocking_processlist_id,
d.processlist_time as blocking_age,
d.processlist_info as blocking_query,
concat('kill ', d.processlist_id) as sql_kill_blocking_connection
from performance_schema.metadata_locks a
join performance_schema.metadata_locks b
on a.object_schema = b.object_schema
and a.object_name = b.object_name
and a.lock_status = 'PENDING'
and b.lock_status = 'GRANTED'
and a.owner_thread_id <> b.owner_thread_id
and a.lock_type = 'EXCLUSIVE'
join performance_schema.threads c
on a.owner_thread_id = c.thread_id
join performance_schema.threads d
on b.owner_thread_id = d.thread_id) t1,
(select thread_id,
group_concat(case
when event_name = 'statement/sql/begin' then
"transaction_begin"
else
sql_text
end order by event_id separator ";") as sql_text
from performance_schema.events_statements_history
group by thread_id) t2
where t1.granted_thread_id = t2.thread_id;
```
If **performance_schema** is disabled, execute the following statement to locate the thread.
```sql
select * from sys.schema_table_lock_waits where blocking_lock_type <> 'SHARED_UPGRADABLE'\G
```
......@@ -3,4 +3,199 @@ id: resource-bottleneck
sidebar_position: 9.9
---
# Diagnose System Resource Bottlenecks
\ No newline at end of file
# Diagnose System Resource Bottlenecks
If an OS resource bottleneck occurs, applications running on the OS are affected and the OS may even fail to respond to simple instructions. Before the OS stops providing services, you can run commands to collect usage information about CPU, memory, I/O, and network resources and then analyze whether these resources are properly used and whether any resource bottlenecks exist.
## CPU
The `top` and `vmstat` commands can be used to check the CPU utilization. The information returned by the `top` command is more comprehensive, which consists the statistics about the system performance and information about processes. The information returned is sorted by CPU utilization.
Example of `top `command output:
```shell
top - 10:12:21 up 5 days, 22:31, 4 users, load average: 1.00, 1.00, 0.78
Tasks: 731 total, 1 running, 730 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.7 us, 0.0 sy, 0.0 ni, 98.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 257841.3 total, 1887.5 free, 45581.6 used, 210372.2 buff/cache
MiB Swap: 8192.0 total, 8188.7 free, 3.3 used. 210450.4 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
908076 mysql 20 0 193.0g 42.4g 44088 S 100.3 16.8 228:10.34 mysqld
823137 root 20 0 6187564 83772 51636 S 6.6 0.0 6:36.12 dockerd
822938 root 20 0 3278696 58500 35420 S 0.7 0.0 38:37.69 containerd
1483 root 20 0 239280 9260 8136 S 0.3 0.0 0:19.16 accounts-daemon
928343 root 20 0 9936 4576 3240 R 0.3 0.0 0:00.04 top
......
```
### Parameter description
Parameters in line 1:
| **Parameter** | **Description** |
| --- | --- |
| 10:12:21 | The current system time. |
| up 5 days | The number of days for which the system runs continuously since last startup. |
| 4 users | The number of the online users. |
| load average | The average system workloads in the past 1 minute, 5 minutes, and 15 minutes. |
Parameters in line 2:
| **Parameter** | **Description** |
| --- | --- |
| total | The number of processes. |
| running | The number of processes that are in the running state. |
| sleeping | The number of processes that are in the sleeping state. |
| stopped | The number of processes that are in the stopped state. |
| zombie | The number of processes that are in the zombie state. |
Parameters in line 3:
| **Parameter** | **Description** |
| --- | --- |
| us | The percentage of CPU time spent in running user space processes. |
| sy | The percentage of CPU time spent in running system processes. |
| ni | The percentage of CPU time spent in running the processes of which priorities are changed. |
| id | The percentage of CPU time spent idle. |
| wa | The percentage of CPU time spent in wait. |
| hi | The percentage of CPU time spent in handling hardware interrupts. |
| si | The percentage of CPU time spent in handling software interrups. |
| st | The percentage of CPU time spent on the hypervisor. |
Pay attention to values in line 3. If the value of **us** is large, user space processes consume much CPU time. If the value of **us** is larger than 50% for a long time, applications must be tuned in time. If the value of **sy** is large, system processes consume much CPU time. This may be caused by improper OS configuration or OS bugs. If the value of **wa** is large, I/O waits are high. This may be caused by high random I/O access or an I/O bottleneck.
Parameters in line 4:
| **Parameter** | **Description** |
| --- | --- |
| total | The amount of memory. |
| free | The amount of free memory. |
| used | The amount of used memory. |
| buff/cache | The amount of memory that is used as buffers and cache. |
Parameters in line 5:
| **Parameter** | **Description** |
| --- | --- |
| total | The size of swap space. |
| free | The size of free swap space. |
| used | The size of used swap space. |
| avail Mem | The size of swap space that has been cached. |
Parameters in the process list:
| **Parameter** | **Description** |
| --- | --- |
| PID | The process ID. |
| USER | The owner of the process. |
| PR | The priority of the process. A smaller value indicates a higher priority. |
| NI | The nice value of the priority. A positive integer indicates that the priority of the process is being downgraded. A negative integer indicates that the priority of the process is being upgraded. The value range is -20 to 19 and the default value is 0. |
| VIRT | The amount of virtual memory occupied by the process. |
| RES | The amount of physical memory occupied by the process. |
| SHR | The amount of shared memory occupied by the process. |
| S | The status of the process. The value can be:<br />- **S**: sleeping<br />- **R**: running<br />- **Z**: zombie<br />- **N**: The nice value of the process is a negative value.<br /> |
| %CPU | The percentage of the CPU time used by the process. |
| %MEM | The percentage of the memory occupied by the process. |
| TIME+ | The total CPU time used by the process. |
| COMMAND | The command that the process is running. |
### Diagnose the cause of high CPU utilization
1. Find the function that is called by the process that consumes the most CPU time.
```shell
top -H
perf top -p xxx
```
_xxx_ indicates the process that consumes the most CPU time.
2. Find the SQL statement that consumes the most CPU time.
```shell
pidstat -t -p <mysqld_pid> 1 5
select * from performance_schema.threads where thread_os_id = xxx\G
select * from information_schema.processlist where id = performance_schema.threads.processlist_id\G
```
_xxx_ indicates the thread that consumes the most CPU time.
## **Memory**
The `top`, `vmstat`, and `free` commands can be used to monitor memory usage.
Example of `free` command output:
```shell
# free -g
total used free shared buff/cache available
Mem: 251 44 1 0 205 205
Swap: 7 0 7
```
### Parameter description
| **Parameter** | **Description** |
| --- | --- |
| total | The amount of memory. <br />**total** = **used** + **free** + **buff/cache** |
| used | The amount of used memory. |
| free | The amount of free memory. |
| shared | The amount of shared memory. |
| buff/cache | The amount of memory used as buffers and cache. |
| available | The amount of available memory.<br />**available** = **free** + **buff/cache** |
### Diagnose the cause of high memory usage
1. Check whether memory is properly configured. For example, if the physical memory of the OS is 128 GB and 110 GB is allocated to the database instance, there is a high probability of memory exhaustion, because other OS processes and applications are consuming memory.
2. Check whether too many concurrent connections exist. **read_buffer_size**, **read_rnd_buffer_size**, **sort_buffer_size**, **thread_stack**, **join_buffer_size**, and **binlog_cache_size** are all session-level parameters. More connections indicate more memory consumed. Therefore, we recommend that you set these parameters to small values.
3. Check whether improper JOIN queries exist. Suppose a query that joins multiple tables exists, and the driving table of this query has a large result set. When the query is being executed, the driven table is executed for a large number of times, which may result in memory leakage.
4. Check whether too many files are open and whether **table_open_cache** is properly set. When you access a table, the table is loaded to the cache specified by **table_open_cache** so that next access to this table can be accelerated. However, if **table_open_cache** is too large and too many tables are open, a large amount of memory will be consumed.
## I/O
The `iostat`, `dstat`, and `pidstat` commands are used to monitor the I/O usage.
Example of `iostat` command output:
```shell
# iostat -x 1 1
Linux 3.10.0-957.el7.x86_64 (htap2) 06/13/2022 _x86_64_ (64 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
0.06 0.00 0.03 0.01 0.00 99.90
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 0.00 0.04 0.00 85.75 0.00 0.25 0.25 0.00 0.15 0.00
sdb 0.06 0.11 7.61 1.10 1849.41 50.81 436.48 0.36 40.93 46.75 0.48 1.56 1.35
dm-0 0.00 0.00 0.28 0.19 8.25 12.05 87.01 0.00 4.81 7.37 0.94 1.61 0.08
```
### Parameter description
| **Parameter** | **Description** |
| --- | --- |
| rrqm/s | The number of read requests merged per second. |
| wrqm/s | The number of write requests merged per second. |
| r/s | The number (after merges) of read requests completed per second. |
| w/s | The number (after merges) of write requests completed per second. |
| rkB/s | The number of kilobytes read per second. |
| wkB/s | The number of kilobytes written per second. |
| avgrq-sz | The average size of the requests, expressed in sectors (512 bytes). |
| avgqu-sz | The average queue length of the I/O requests. |
| await | The average time for I/O requests to be served. |
| r_await | The average time for read requests to be served. |
| w_await | The average time for write requests to be served. |
| svctm | The average service time for I/O requests. |
| %util | The percentage of CPU time spent on I/O requests. |
:::info
The sum of **r/s** and **w/s** is the system input/output operations per second (IOPS).
:::
### Diagnose the cause of high I/O usage
1. Find the disk with the highest usage.
```shell
iostat -x -m 1
iostat -d /dev/sda -x -m 1
```
2. Find the application with the highest I/O usage.
```shell
pidstat -d 1
```
3. Find the thread with the highest I/O usage.
```shell
pidstat -dt -p mysqld_id 1
```
4. Find the SQL statement with the highest I/O usage.
```sql
select * from performance_schema.threads where thread_os_id = xxx\G
select * from information_schema.processlist where id = performance_schema.threads.processlist_id\G
```
_xxx_ indicates the thread with the highest I/O usage.
......@@ -3,4 +3,229 @@ id: slow-query
sidebar_position: 9.8
---
# Diagnose Slow SQL Queries
\ No newline at end of file
# Diagnose Slow SQL Queries
The slog query log is used to record SQL queries whose execution time is longer than the threshold specified by parameter **long_query_time**. The slow query log can be recorded in a file or in a StoneDB table. It helps identify SQL statements that may affect database performance.
The following table describes parameters for configuring slow query log.
| **Parameter** | **Description** |
| --- | --- |
| slow_query_log | Whether to enable slog query log. |
| slow_query_log_file | The file that stores slow query log records. |
| long_query_time | The execution time threshold. If the execution time of an SQL query exceeds this threshold, the SQL query will be recorded in the slow query log.<br>:::info **long_query_time** is used to limit the actual execution time of each SQL query, excluding the lock wait time. Therefore, if an SQL query has long total execution time but its actual execution time does not exceed the this threshold, the SQL query will not be recorded in the slow query log. :::|
| log_queries_not_using_indexes | Whether to record the queries that do not use indexes. |
| log_slow_admin_statements | Whether to record management statements, including ALTER, CREATE, and DROP. |
The following table describes the parameters in the slow query log:
| **Parameter** | **Description** |
| --- | --- |
| Query_time | The total execution time of an SQL query, including the lock wait time. |
| Lock_time | The lock wait time. |
| Rows_sent | The number of rows sent to the client. |
| Rows_examined | The number of rows that have been scanned during the execution. This number is accumulatively counted each time the storage engine is called by the executor to obtain data records. This value is obtained at the server level. |
# mysqldumpslow
mysqldumpslow is used to classify slow query log records. Following are some examples.
- Check the top 10 SQL statements sorted by average execution time in descending order.
```shell
mysqldumpslow -t 10 /var/lib/mysql/mysql-slow.log | more
```
- Check the top 10 SQL statements sorted by the number of returned records in descending order.
```shell
mysqldumpslow -s r -t 10 /var/lib/mysql/mysql-slow.log | more
```
- Check the top 10 SQL statements sorted by count in descending order.
```shell
mysqldumpslow -s c -t 10 /var/lib/mysql/mysql-slow.log | more
```
- Check the top 10 SQL statements that include LEFT JOIN sorted by total execution time in descending order.
```shell
mysqldumpslow -s t -t 10 -g "left join" /var/lib/mysql/mysql-slow.log | more
```
The following table describes the relevant parameters.
| **Parameter** | **Description** |
| --- | --- |
| -s | The sorting type. The value can be:<br />- **al**: Sort by average lock time<br />- **ar**: Sort by average number of returned records<br />- **at**: Sort by average execution time<br />- **c**: Sort by count<br />- **l**: Sort by total lock time<br />- **r**: Sort by total number of returned records<br />- **t**: Sort by total execution time<br />
If no value is specified, **at** is used by default. |
| -t NUM | Specifies a number _n_. Only the first _n_ statements will be returned in the output. |
| -g | Specifies a string. Only statements that contain the string are considered. |
| -h | The host name. |
| -i | The name of the instance. |
| -l | Do not subtrack lock wait time from total execution time. |
# profiling
The profiling variable can be used to records detailed information about an SQL statement in each state during the whole execution process. The recorded information includes CPU utilization, I/O usage, swap space usage, and the name, source file, and position of each function that is called.<br />The following example shows how to use profiling:
1. Enable profiling for the current thread.
```sql
set profiling=on;
```
2. Query the statements executed during the course of the current thread.
```sql
show profiles;
```
3. Query the CPU and I/O overhead of an SQL statement in each state during the execution.
```sql
show profile cpu,block io for query query_id;
```
4. Check the total overhead of a SQL statement in each state during the execution.
```sql
show profile all for query query_id;
```
Output example:
```sql
+----------------------+----------+----------+------------+-------------------+---------------------+--------------+---------------+---------------+-------------------+-------------------+-------------------+-------+-----------------------+----------------------+-------------+
| Status | Duration | CPU_user | CPU_system | Context_voluntary | Context_involuntary | Block_ops_in | Block_ops_out | Messages_sent | Messages_received | Page_faults_major | Page_faults_minor | Swaps | Source_function | Source_file | Source_line |
+----------------------+----------+----------+------------+-------------------+---------------------+--------------+---------------+---------------+-------------------+-------------------+-------------------+-------+-----------------------+----------------------+-------------+
| starting | 0.000363 | 0.000239 | 0.000025 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | NULL | NULL | NULL |
| checking permissions | 0.000055 | 0.000040 | 0.000004 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | check_access | sql_authorization.cc | 809 |
| checking permissions | 0.000045 | 0.000000 | 0.000047 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | check_access | sql_authorization.cc | 809 |
| Opening tables | 0.000315 | 0.000000 | 0.000307 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 7 | 0 | open_tables | sql_base.cc | 5815 |
| System lock | 0.000057 | 0.000000 | 0.000056 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | mysql_lock_tables | lock.cc | 330 |
| init | 0.000047 | 0.000000 | 0.000048 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | optimize_select | Engine_execute.cpp | 330 |
| optimizing | 0.000195 | 0.000000 | 0.000203 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | optimize | sql_optimizer.cc | 175 |
| update multi-index | 0.000488 | 0.000216 | 0.000272 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 6 | 0 | UpdateMultiIndex | ParameterizedFilter. | 981 |
| join | 8.127252 | 6.948224 | 1.168786 | 236 | 18 | 0 | 168 | 0 | 0 | 0 | 304466 | 0 | UpdateJoinCondition | ParameterizedFilter. | 603 |
| aggregation | 0.021095 | 0.000957 | 0.000102 | 3 | 0 | 72 | 8 | 0 | 0 | 0 | 18 | 0 | Aggregate | AggregationAlgorithm | 26 |
| query end | 0.000185 | 0.000092 | 0.000009 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 7 | 0 | mysql_execute_command | sql_parse.cc | 4972 |
| closing tables | 0.000156 | 0.000141 | 0.000015 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 11 | 0 | mysql_execute_command | sql_parse.cc | 5031 |
| freeing items | 0.000152 | 0.000136 | 0.000015 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | mysql_parse | sql_parse.cc | 5659 |
| logging slow query | 0.006152 | 0.000401 | 0.000000 | 1 | 0 | 8 | 8 | 0 | 0 | 0 | 6 | 0 | log_slow_do | log.cc | 1718 |
| cleaning up | 0.000245 | 0.000154 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 13 | 0 | dispatch_command | sql_parse.cc | 1935 |
+----------------------+----------+----------+------------+-------------------+---------------------+--------------+---------------+---------------+-------------------+-------------------+-------------------+-------+-----------------------+----------------------+-------------+
```
The following table describes each status that an SQL statement may pass through:
| **Status** | **Description** |
| --- | --- |
| starting | Performing lexical analysis and syntax analysis on the SQL statement to generate a parse tree. |
| checking permissions | Checking whether the server has the required privileges to execute the statement. |
| opening tables | Opening the tables and adding metadata locks. |
| System lock | Identifying which type of locks causes the system lock. |
| optimizing | Optimizing the statement to generate an execution plan. |
| statistics | |
| preparing | |
| Sending data | Transmitting data between the server and the storage engine. |
| Update | If the SQL statement is an INSERT statement, it will be in this state when a row lock occurs. |
| Updating | If the SQL statement is a DELETE or UPDATE statement, it will be in this state when a row lock occurs. |
| query end | Submitting the SQL statement. If a large transaction occurs, the statement will be in this state. |
| closing tables | Flushing the changed table data to disk and closing the used tables or rolling the statement back. |
| freeing items | Releasing the parse tree. |
| Creating tmp table | Creating a temporary table to save copied data. The temporary table is deleted after use. If the statement enters this state, it normally includes GROUP BY, DINSTICT, or subqueries. |
| Copying to tmp table on disk | Copying temporary files from memory to disks. |
| Creating sort index | Sorting data. If the statement enters this state, it normally includes ORDER BY. |
| locked | Encountering a congestion. |
| Waiting for table metadata lock | Waiting for a metadata lock. |
:::info
If the statement remains in any of the following states for a long time, the SQL statement has performance issues or is waiting for process:
- **Sending data**
- **Creating tmp table**
- **Copying to tmp table on disk**
- **Creating sort index**
- **locked**
- **Waiting for table metadata lock**
:::
# Optimizer trace
The optimizer trace feature helps you understand the process of generating an execution plan. This feature is controlled by system variable **optimizer_trace**.
```sql
mysql> show variables like 'optimizer_trace';
+-----------------+--------------------------+
| Variable_name | Value |
+-----------------+--------------------------+
| optimizer_trace | enabled=off,one_line=off |
+-----------------+--------------------------+
```
Variable **enabled** specifies whether the feature is enabled. **enabled=off** indicates that the optimizer trace feature is disabled by default. To enable the optimizer trace feature, set **enabled** to **on**. Variable **one_line** specifies whether the output is displayed in one line. **one_line=off** indicates that line breaks are used in the output by default.
To use the optimizer trace feature, perform the following steps:
1. Configure the cache size for optimizer trace to ensure that traces are not truncated.
```sql
set optimizer_trace_max_mem_size = 100*1024;
```
2. Enable the optimizer trace feature.
```sql
set optimizer_trace='enabled=on';
```
3. Execute an SQL statement.
:::info
If the execution time is too long and you want traces generated during the execution of the SQL statement, you can execute an EXPLAIN statement to explain the statement, instead.
:::
4. Check the optimization process of the previous SQL statement in the **optimizer_trace** table.
```sql
select * from information_schema.optimizer_trace;
```
5. Disable the optimizer trace feature.
```sql
set optimizer_trace='enabled=off';
```
The output of optimizer trace is generated through three steps, namely:
1. join_preparation
1. join_optimization
1. join_execution
In the join_optimization step, the statement is optimized based on cost. For a one-table query, the row_estimation sub-step is important, which analyzes the cost of full table scans and index scans. For a multi-table query, the considered_execution_plans sub-step is important, which analyzes the cost of every possible join order.
Key sub-steps in the join_optimization step:
condition_processing: In this sub-step, the optimizer processes the query conditions, such as passing the value of an argument to an equivalent argument and removing unnecessary conditions. For example, the source condition "a=1 and b=a" is converted to "a=1 and b=1".
rows_estimation: estimates the relevant rows for a one-table query.
- table_scan: full table scan statistics
- rows: the number of rows estimated to scan
- cost: the cost of full table scans
- potential_range_indexes: the indexes that may be used
- "index": "PRIMARY", # The primary key index cannot be used.
- "usable": false,
- "cause": "not_applicable"
- "index": "idx_xxx", # The **id**_x_xxx_ index may be used.
- "usable": true,
- "key_parts": [<br /> "column1",<br /> "column2",<br /> "id"<br /> analyzing_range_alternatives: analyzes the cost of each index that may be used.<br /> "index": "idx_xxx",<br /> "ranges": [<br /> "2 <= column1 <= 2 AND 0 <= column2 <= 0"<br /> ],
- "index_dives_for_eq_ranges": true, # Index dives are used.
- "rowid_ordered": true, # Whether the records obtained by using the index are sorted by primary key.
- "using_mrr": false, # Whether Multi-Range Read (MRR) is used.
- "index_only": false, # Whether to enable covering indexes.
- "rows": xxx, # Whether the index is used to estimate the number of relevant rows.
- "cost": xxx, # The cost of using the index.
- "chosen": true # Whether the index is used.
- chosen_range_access_summary: the execution plan that is being selected.
- considered_execution_plans: analyzes every possible execution plan. This parameter is often used for JOIN queries.
- attaching_conditions_to_tables: adds other query conditions to the query.
- reconsidering_access_paths_for_index_ordering: whether the execution plan is changed due to sorting.
......@@ -3,4 +3,13 @@ id: stonedb-crashed
sidebar_position: 9.4
---
# StoneDB Crashed
\ No newline at end of file
# StoneDB Crashed
This topic describes common causes of StoneDB crashes.
# High system workloads
System resources may fail to be applied for due to high system workloads. As a result, StoneDB crashes. In this case, address the issue by referring to [Diagnose system resource bottlenecks](./resource-bottleneck.md).
# **Corrupted data pages**
When the hardware is faulty or the disk space is exhausted, a data file is easy to be corrupted if data is written to the data file. If a data file is corrupted, StoneDB will crash to keep data consistent.
# Bugs
If StoneDB hits a bug, such as a deadlock, it will crash. To address this issue, collect the system log, error log, and trace log, and enable core dumps to locate the fault.
此差异已折叠。
......@@ -3,4 +3,17 @@ id: stonedb-faq
sidebar_position: 10.2
---
# StoneDB FAQ
\ No newline at end of file
# StoneDB FAQ
## Is StoneDB compatible with MySQL?
Yes. StoneDB is compatible with the MySQL 5.6 and 5.7 protocols, and the ecosystem, common features, and common syntaxes of MySQL. However, due to characteristics of column-based storage, StoneDB is incompatible with certain MySQL operations and features.
## Does StoneDB have its own optimizer, other than the MySQL optimizer?
Yes. StoneDB provides its own optimizer, though it still uses the MySQL optimizer to implement query parsing and rewriting.
## Why does StoneDB not support unique constraints?
Column-based storage provides the data compression feature. The compression efficiency is determined by the compression algorithm, data types of columns, and degree of repeatability. If you specify a unique constraint for a column, every data in the column is unique, and thus the compression ratio is low. Suppose 6,000 records of data are inserted into a column that is specified with a unique constraint respectively on InnoDB and StoneDB. After compression, the data volume on InnoDB is more than 16 GB and that on StoneDB is around 5 GB. The compression efficiency of StoneDB is only 3 times of that of InnoDB. Normally, this number is 10 or even higher.
## Do I need to create indexes on StoneDB?
No, you do not need to create indexes. StoneDB uses the knowledge grid technique to locate and decompress only relevant data packs based on metadata, greatly improving query performance. You can still use indexes, but the performance is low if the result sets of queries are large.
## Does StoneDB support transactions?
No. Transactions can be classified into secure transactions and non-secure transactions. Transactions that strictly meet with the atomicity, consistency, isolation, durability (ACID) attributes are identified as non-secure transactions because StoneDB does not provide redo or undo log.
## Can I join a StoneDB table with a table from another storage engine?
By default, StoneDB does not allow JOIN queries of a StoneDB table with a table from another storage engine. You can set **stonedb_ini_allowmysqlquerypath **to **1** to enable this feature.
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册