未验证 提交 6e8d64e5 编写于 作者: Y Yuting 提交者: GitHub

docs(stonedb): fix the deploy problem and update the latest docs(#414) (#416)

* fix(docs): fix the deploy problem

* docs(SQL Tuning): update the latest docs

* fix(docs): fix the deploy problem

* docs(stonedb): update the docs of Backup and Recovery
Co-authored-by: Nmergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
上级 50388b75
......@@ -113,4 +113,5 @@ CMakeFiles
CMakeFiles/*
CTestTestfile.cmake
COPYING
.vs/
\ No newline at end of file
.vs/
node_modules
......@@ -67,7 +67,7 @@ docker run -d --restart=always --name=prometheus -p 9090:9090 \
--storage.tsdb.retention.time=30d
```
If "http://<IP address of machine A>:9090" appears, Prometheus is successfully deployed. If the deployment fails, run the `docker logs <Container ID>` command to view logs and rectify the fault.
If `http://<IP address of machine A>:9090` appears, Prometheus is successfully deployed. If the deployment fails, run the `docker logs <Container ID>` command to view logs and rectify the fault.
![image.png](./Prometheus.png)
## Step 2. **Deploy Prometheus Exporter**
......@@ -251,7 +251,7 @@ docker run -d --restart=always --name=grafana -p 13000:3000 \
-v /home/zsp/grafana/data/grafana/:/var/lib/grafana/ grafana/grafana
```
3. Visit http://<IP address of machine A>:13000 and log in to Grafana. The default username and password are **admin** and **admin**.
3. Visit `http://<IP address of machine A>:13000` and log in to Grafana. The default username and password are **admin** and **admin**.
![image.png](./Grafana.png)
## **Step 5. Configure Grafana to display monitoring data from Prometheus**
......
---
id: use-mydumper-full-backup
sidebar_position: 4.32
---
# Use Mydumper for Full Backup
## Mydumper introduction
Mydumper is a logical backup tool for MySQL. It consists of two parts:
- mydumper: exports consistent backup files of MySQL databases.
- myloader: reads backups from mydumper, connects to destination databases, and imports backups.
Both parts require multithreading capacities.
### Benefits
- Parallelism and performance: The tool provides high backup rate. Expensive character set conversion routines are avoided and the overall high efficiency of code is ensured.
- Simplified output management: Separate files are used for tables and metadata is dumped, simplifying data view and parse.
- High consistency: The tool maintains snapshots across all threads and provides accurate positions of primary and secondary logs.
- Manageability: Perl Compatible Regular Expressions (PCRE) can be used to specify whether to include or exclude tables or databases.
### Features
- Multi-threaded backup, which generates multiple backup files
- Consistent snapshots for transactional and non-transactional tables
:::info
This feature is supported by versions later than 0.2.2.
:::
- Fast file compression
- Export of binlogs
- Multi-threaded recovery
:::info
This feature is supported by versions later than 0.2.1.
:::
- Function as a daemon to periodically perform snapshots and consistently records binlogs
:::info
This feature is supported by versions later than 0.5.0.
:::
- Open source (license: GNU GPLv3)
## Use Mydumper
### Parameters for mydumer
```bash
mydumper --help
Usage:
mydumper [OPTION…] multi-threaded MySQL dumping
Help Options:
-?, --help Show help options
Application Options:
-B, --database Database to dump
-o, --outputdir Directory to output files to
-s, --statement-size Attempted size of INSERT statement in bytes, default 1000000
-r, --rows Try to split tables into chunks of this many rows. This option turns off --chunk-filesize
-F, --chunk-filesize Split tables into chunks of this output file size. This value is in MB
--max-rows Limit the number of rows per block after the table is estimated, default 1000000
-c, --compress Compress output files
-e, --build-empty-files Build dump files even if no data available from table
-i, --ignore-engines Comma delimited list of storage engines to ignore
-N, --insert-ignore Dump rows with INSERT IGNORE
-m, --no-schemas Do not dump table schemas with the data and triggers
-M, --table-checksums Dump table checksums with the data
-d, --no-data Do not dump table data
--order-by-primary Sort the data by Primary Key or Unique key if no primary key exists
-G, --triggers Dump triggers. By default, it do not dump triggers
-E, --events Dump events. By default, it do not dump events
-R, --routines Dump stored procedures and functions. By default, it do not dump stored procedures nor functions
-W, --no-views Do not dump VIEWs
-k, --no-locks Do not execute the temporary shared read lock. WARNING: This will cause inconsistent backups
--no-backup-locks Do not use Percona backup locks
--less-locking Minimize locking time on InnoDB tables.
--long-query-retries Retry checking for long queries, default 0 (do not retry)
--long-query-retry-interval Time to wait before retrying the long query check in seconds, default 60
-l, --long-query-guard Set long query timer in seconds, default 60
-K, --kill-long-queries Kill long running queries (instead of aborting)
-D, --daemon Enable daemon mode
-X, --snapshot-count number of snapshots, default 2
-I, --snapshot-interval Interval between each dump snapshot (in minutes), requires --daemon, default 60
-L, --logfile Log file name to use, by default stdout is used
--tz-utc SET TIME_ZONE='+00:00' at top of dump to allow dumping of TIMESTAMP data when a server has data in different time zones or data is being moved between servers with different time zones, defaults to on use --skip-tz-utc to disable.
--skip-tz-utc
--use-savepoints Use savepoints to reduce metadata locking issues, needs SUPER privilege
--success-on-1146 Not increment error count and Warning instead of Critical in case of table doesn't exist
--lock-all-tables Use LOCK TABLE for all, instead of FTWRL
-U, --updated-since Use Update_time to dump only tables updated in the last U days
--trx-consistency-only Transactional consistency only
--complete-insert Use complete INSERT statements that include column names
--split-partitions Dump partitions into separate files. This options overrides the --rows option for partitioned tables.
--set-names Sets the names, use it at your own risk, default binary
-z, --tidb-snapshot Snapshot to use for TiDB
--load-data
--fields-terminated-by
--fields-enclosed-by
--fields-escaped-by Single character that is going to be used to escape characters in the LOAD DATA stament, default: '\'
--lines-starting-by Adds the string at the begining of each row. When --load-data is usedit is added to the LOAD DATA statement. Its affects INSERT INTO statementsalso when it is used.
--lines-terminated-by Adds the string at the end of each row. When --load-data is used it isadded to the LOAD DATA statement. Its affects INSERT INTO statementsalso when it is used.
--statement-terminated-by This might never be used, unless you know what are you doing
--sync-wait WSREP_SYNC_WAIT value to set at SESSION level
--where Dump only selected records.
--no-check-generated-fields Queries related to generated fields are not going to be executed.It will lead to restoration issues if you have generated columns
--disk-limits Set the limit to pause and resume if determines there is no enough disk space.Accepts values like: '<resume>:<pause>' in MB.For instance: 100:500 will pause when there is only 100MB free and willresume if 500MB are available
--csv Automatically enables --load-data and set variables to export in CSV format.
-t, --threads Number of threads to use, default 4
-C, --compress-protocol Use compression on the MySQL connection
-V, --version Show the program version and exit
-v, --verbose Verbosity of output, 0 = silent, 1 = errors, 2 = warnings, 3 = info, default 2
--defaults-file Use a specific defaults file
--stream It will stream over STDOUT once the files has been written
--no-delete It will not delete the files after stream has been completed
-O, --omit-from-file File containing a list of database.table entries to skip, one per line (skips before applying regex option)
-T, --tables-list Comma delimited table list to dump (does not exclude regex option)
-h, --host The host to connect to
-u, --user Username with the necessary privileges
-p, --password User password
-a, --ask-password Prompt For User password
-P, --port TCP/IP port to connect to
-S, --socket UNIX domain socket file to use for connection
-x, --regex Regular expression for 'db.table' matching
```
### Parameters for myloader
```bash
myloader --help
Usage:
myloader [OPTION…] multi-threaded MySQL loader
Help Options:
-?, --help Show help options
Application Options:
-d, --directory Directory of the dump to import
-q, --queries-per-transaction Number of queries per transaction, default 1000
-o, --overwrite-tables Drop tables if they already exist
-B, --database An alternative database to restore into
-s, --source-db Database to restore
-e, --enable-binlog Enable binary logging of the restore data
--innodb-optimize-keys Creates the table without the indexes and it adds them at the end
--set-names Sets the names, use it at your own risk, default binary
-L, --logfile Log file name to use, by default stdout is used
--purge-mode This specify the truncate mode which can be: NONE, DROP, TRUNCATE and DELETE
--disable-redo-log Disables the REDO_LOG and enables it after, doesn't check initial status
-r, --rows Split the INSERT statement into this many rows.
--max-threads-per-table Maximum number of threads per table to use, default 4
--skip-triggers Do not import triggers. By default, it imports triggers
--skip-post Do not import events, stored procedures and functions. By default, it imports events, stored procedures nor functions
--no-data Do not dump or import table data
--serialized-table-creation Table recreation will be executed in serie, one thread at a time
--resume Expect to find resume file in backup dir and will only process those files
-t, --threads Number of threads to use, default 4
-C, --compress-protocol Use compression on the MySQL connection
-V, --version Show the program version and exit
-v, --verbose Verbosity of output, 0 = silent, 1 = errors, 2 = warnings, 3 = info, default 2
--defaults-file Use a specific defaults file
--stream It will stream over STDOUT once the files has been written
--no-delete It will not delete the files after stream has been completed
-O, --omit-from-file File containing a list of database.table entries to skip, one per line (skips before applying regex option)
-T, --tables-list Comma delimited table list to dump (does not exclude regex option)
-h, --host The host to connect to
-u, --user Username with the necessary privileges
-p, --password User password
-a, --ask-password Prompt For User password
-P, --port TCP/IP port to connect to
-S, --socket UNIX domain socket file to use for connection
-x, --regex Regular expression for 'db.table' matching
--skip-definer Removes DEFINER from the CREATE statement. By default, statements are not modified
```
### Install and use Mydumper
```bash
# On GitHub, download the RPM package or source code package that corresponds to the machine that you use. We recommend you download the RPM package because it can be directly used while the source code package requires compilation. The OS used in the following example is CentOS 7. Therefore, download an el7 version.
[root@dev tmp]# wget https://github.com/mydumper/mydumper/releases/download/v0.12.1/mydumper-0.12.1-1-zstd.el7.x86_64.rpm
# Because the downloaded package is a ZSTD file, dependency 'libzstd' is required.
[root@dev tmp]# yum install libzstd.x86_64 -y
[root@dev tmp]#rpm -ivh mydumper-0.12.1-1-zstd.el7.x86_64.rpm
Preparing... ################################# [100%]
Updating / installing...
1:mydumper-0.12.1-1 ################################# [100%]
# Backup library
[root@dev home]# mydumper -u root -p ******** -P 3306 -h 127.0.0.1 -B zz -o /home/dumper/
# Recovery library
[root@dev home]# myloader -u root -p ******** -P 3306 -h 127.0.0.1 -S /stonedb/install/tmp/mysql.sock -B zz -d /home/dumper
```
#### Generated backup files
```bash
[root@dev home]# ll dumper/
total 112
-rw-r--r--. 1 root root 139 Mar 23 14:24 metadata
-rw-r--r--. 1 root root 88 Mar 23 14:24 zz-schema-create.sql
-rw-r--r--. 1 root root 97819 Mar 23 14:24 zz.t_user.00000.sql
-rw-r--r--. 1 root root 4 Mar 23 14:24 zz.t_user-metadata
-rw-r--r--. 1 root root 477 Mar 23 14:24 zz.t_user-schema.sql
[root@dev dumper]# cat metadata
Started dump at: 2022-03-23 15:51:40
SHOW MASTER STATUS:
Log: mysql-bin.000002
Pos: 4737113
GTID:
Finished dump at: 2022-03-23 15:51:40
[root@dev-myos dumper]# cat zz-schema-create.sql
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `zz` /*!40100 DEFAULT CHARACTER SET utf8 */;
[root@dev dumper]# more zz.t_user.00000.sql
/*!40101 SET NAMES binary*/;
/*!40014 SET FOREIGN_KEY_CHECKS=0*/;
/*!40103 SET TIME_ZONE='+00:00' */;
INSERT INTO `t_user` VALUES(1,"e1195afd-aa7d-11ec-936e-00155d840103","kAMXjvtFJym1S7PAlMJ7",102,62,"2022-03-23 15:50:16")
,(2,"e11a7719-aa7d-11ec-936e-00155d840103","0ufCd3sXffjFdVPbjOWa",698,44,"2022-03-23 15:50:16")
.....# The content is not full displayed since it is too long.
[root@dev dumper]# cat zz.t_user-metadata
10000
[root@dev-myos dumper]# cat zz.t_user-schema.sql
/*!40101 SET NAMES binary*/;
/*!40014 SET FOREIGN_KEY_CHECKS=0*/;
/*!40103 SET TIME_ZONE='+00:00' */;
CREATE TABLE `t_user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`c_user_id` varchar(36) NOT NULL DEFAULT '',
`c_name` varchar(22) NOT NULL DEFAULT '',
`c_province_id` int(11) NOT NULL,
`c_city_id` int(11) NOT NULL,
`create_time` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `idx_user_id` (`c_user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=10001 DEFAULT CHARSET=utf8;
```
The directory contains the following files:
**metadata**: records the name and position of the binlog file of the backup database at the backup point in time.
:::info
If the backup is performed on the standby library, this file also records the name and position of the binlog file that has been synchronized from the active libary when the backup is performed.
:::
Each table has two backup files:
- **database-schema-create**: records the statements for creating the library.
- **database.table-schema.sql**: records the table schemas.
- **database.table.00000.sql**: records table data.
- **database.table-metadata**: records table metadata.
***Extensions***
If you want to import data to StoneDB, you must replace **engine=innodb** with **engine=stonedb** in table schema file **database.table-schema.sql** and check whether the syntax of the table schema is compatible with StoneDB. For example, if the syntax contain keyword **unsigned**, it is incompatible. Following is a schema example after modification:
```
[root@dev-myos dumper]# cat zz.t_user-schema.sql
/*!40101 SET NAMES binary*/;
/*!40014 SET FOREIGN_KEY_CHECKS=0*/;
/*!40103 SET TIME_ZONE='+00:00' */;
CREATE TABLE `t_user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`c_user_id` varchar(36) NOT NULL DEFAULT '',
`c_name` varchar(22) NOT NULL DEFAULT '',
`c_province_id` int(11) NOT NULL,
`c_city_id` int(11) NOT NULL,
`create_time` datetime NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=STONEDB AUTO_INCREMENT=10001 DEFAULT CHARSET=utf8;
```
### Backup principles
1. The main thread executes **FLUSH TABLES WITH READ LOCK** to add a global read-only lock to ensure data consistency.
2. The name and position of the binlog file at the current point in time are obtained and recorded to the **metadata **file to support recovery performed later.
3. Multiple (4 by default, customizable) dump threads change the isolation level for transactions to Repeatable Read and enable read-consistent transactions.
4. Non-InnoDB tables are exported.
5. After data of the non-transaction engine is backed up, the main thread executes **UNLOCK TABLES** to release the global read-only lock.
6. InnoDB tables are exported.
\ No newline at end of file
......@@ -3,8 +3,7 @@ id: sql-tuning
sidebar_position: 7.42
---
# Optimize SQL Statements
# SQL Tuning
When an SQL statement passes through the optimizer, the optimizer analyzes the SQL statement to generate an execution plan. Then, the executor calls API operations to read data from the relevant tables and returns the query result. An SQL statement can have multiple execution plans, each of which describes a sequence of steps to access data for the SQL statement.
Though the query results retuned by the execution plans are the same, but the performance varies. The performance of an execution plan depends on many factors such as statistics information, whether to use temporary tables, offset of pagination query, and optimizer parameter settings. This topic describes how to optimize SQL statements from the following points:
......@@ -26,8 +25,10 @@ The following table describes the columns included in a StoneDB execution plan.
| **Column** | **Description** |
| --- | --- |
| id | The sequence number of an operation. <br />info - A higher `_id_` indicates higher priority for execution. <br /> - If two or more operations have the same `_id_`, first come, first served.|
| select_type | The query type, used to categorize simple queries, JOIN queries, and subqueries. The value can be:<br />- **SIMPLE**: simple query.<br />- **PRIMARY**: outmost `SELECT`. If a query contains subqueries, the outermost `SELECT` is marked with **PRIMARY**.<br />- **SUBQUERY**: subquery, normally placed after a `SELECT` or `WHERE` clause.<br />- **DEPENDENT SUBQUERY**: first `SELECT` statement in subquery, dependent on the outer query. The number of times that the subquery is executed is equal to the number of records contained in the result set of the outer query.<br />- **DERIVED**: derived query, normally placed after a `FROM` clause.<br />- **UNION**: second or later `SELECT` statement in a `UNION` operation.<br />- **UNION RESULT**: result of a `UNION` operation.<br />|
| id | The sequence number of an operation. :::info
A higher _id_ indicates higher priority for execution.<br/>
If two or more operations have the same _id_, first come, first served.:::|
| select_type | The query type, used to categorize simple queries, JOIN queries, and subqueries. The value can be:<br />- **SIMPLE**: simple query.<br />- **PRIMARY**: outmost `SELECT`. If a query contains subqueries, the outermost `SELECT` is marked with **PRIMARY**.<br />- **SUBQUERY**: subquery, normally placed after a `SELECT` or `WHERE` clause.<br />- **DEPENDENT SUBQUERY**: first `SELECT` statement in subquery, dependent on the outer query. The number of times that the subquery is executed is equal to the number of records contained in the result set of the outer query.<br />- **DERIVED**: derived query, normally placed after a `FROM` clause.<br />- **UNION**: second or later `SELECT` statement in a `UNION` operation.<br />- **UNION RESULT**: result of a `UNION` operation.<br /> |
| table | The name of the table to access in the current step. |
| partitions | The partitions from which records would be matched by the query. |
| type | The join type. The value can be:<br />- **eq_ref**: One record is read from the table for each combination of rows from the previous tables. It is used when all parts of an index are used by the join and the index is a PRIMARY KEY or UNIQUE NOT NULL index.<br />- **ref**: All rows with matching index values are read from this table for each combination of rows from the previous tables. **ref** is used if the join uses only a leftmost prefix of the key or if the key is not a PRIMARY KEY or UNIQUE index.<br />- **range**: **range** can be used when a key column is compared to a constant using any of the `=`, `<>`, `>`, `>=`, `<`, `<=`, `IS NULL`, `<=>`, `BETWEEN`, `LIKE`, or `IN()` operators.<br />- **index_merge**: Indexes are merged.<br />- **index_subquery**: The outer query is associated with the subquery. Some of the join fields of the subquery contain are indexed.<br />- **all**: The full table is scaned.<br /> |
......@@ -39,8 +40,8 @@ The following table describes the columns included in a StoneDB execution plan.
| filtered | An estimated percentage of records that are filtered to read. The maximum value is 100, which means no filtering of rows occurred. For MySQL 5.7 and later, this column is returned by default. For MySQL versions earlier than 5.7, this column is not returned unless you execute `EXPLAIN EXTENDED`. |
| Extra | The additional information about the execution. The value can be:<br />- **Using where with pushed condition**: The data returned by the storage engine is filtered on the server, regardless whether indexes are used.<br />- **Using filesort**: Sorting is required. <br />- **Using temporary**: A temporary table needs to be created to store the result set. In most cases, this happens if the query contains `UNION`, `DISTINC`, `GROUP BY`, or `ORDER BY` clauses that list columns differently.<br />- **Using union**: The result set is obtained by using at least two indexes and the indexed fields are joined by using `OR`.<br />- **Using join buffer (Block Nested Loop)**: The Block Nested-Loop algorithm is used, which indicates that the join fields of the driven table are not indexed.<br /> |
## **Common StoneDB execution plans**
### **Execution plans for index scans**
# **Common StoneDB execution plans**
## **Execution plans for index scans**
In an execution plan, the index scan type can be **eq_ref**, **ref**, **range**, **index_merge**, or **index_subquery**.
```sql
> explain select * from t_atomstore where id=1;
......@@ -65,7 +66,7 @@ Note: In this execution plan, the value in the "Extra" column is "NULL" instead
| 1 | SIMPLE | t_atomstore | NULL | range | idx_firstname | idx_firstname | 32 | NULL | 20 | 100.00 | Using where with pushed condition (`test`.`t_atomstore`.`first_name` in ('zhou','liu'))(t0) Pckrows: 2, susp. 2 (0 empty 0 full). Conditions: 1 |
+----+-------------+-------------+------------+-------+---------------+---------------+---------+------+------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------+
```
### **Execution plans for full table scans**
## **Execution plans for full table scans**
In an execution plan, the type of a full table scan can be only **ALL**. This is because StoneDB is a column-based storage engine and its data is highly compressed. Most queries involve full table scans.
```sql
> explain select first_name,count(*) from t_atomstore group by first_name;
......@@ -79,7 +80,7 @@ In an execution plan, the type of a full table scan can be only **ALL**. This is
In this execution plan, though field **first_name** is indexed so that sorting is eliminated and temporary tables do not need to be created, the optimizer still chooses full table scans instead of full index scans.
A warning message is displayed here because StoneDB rewrites the SQL statement by including `ORDER BY NULL` in the statement. This rewrite eliminates sorting on the returned grouping field. On InnoDB, if the returned grouping field is not indexed, sorting is performed.
### **Execution plans for aggregate operations**
## **Execution plans for aggregate operations**
Data in StoneDB is highly compressed. StoneDB uses the knowledge grid technique to record metadata of data packs in data pack nodes. When processing a statistical or aggregate query, StoneDB can quickly obtain the result set based on the metadata, ensuring optimal performance.
```sql
> explain select first_name,sum(score) from t_test1 group by first_name;
......@@ -89,7 +90,7 @@ Data in StoneDB is highly compressed. StoneDB uses the knowledge grid technique
| 1 | SIMPLE | t_test1 | NULL | ALL | NULL | NULL | NULL | NULL | 1000000 | 100.00 | Using temporary; Using filesort |
+----+-------------+---------+------------+------+---------------+------+---------+------+---------+----------+---------------------------------+
```
### **Execution plans for JOIN queries**
## **Execution plans for JOIN queries**
```sql
> explain select t1.id,t1.first_name,t2.first_name from t_test1 t1,t_test2 t2 where t1.id=t2.id and t1.first_name='zhou';
+----+-------------+-------+------------+--------+---------------+---------+---------+----------+---------+----------+------------------------------------------------------------------------------------------------------------------------------+
......@@ -109,7 +110,7 @@ mysql> explain select t1.id,t1.first_name,t2.first_name from t_test1 t1,t_test2
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+------------------------------------------------------------------------------------------------------------------------------------+
Note: If no join field of the driven table is indexed, the Block Nested-Loop algorithm is used to join the two tables. In this case, the performance is poor.
```
### **Execution plans for subqueries**
## **Execution plans for subqueries**
```sql
> explain select t1.first_name from t_test1 t1 where t1.id in (select t2.id from t_test2 t2 where t2.first_name='zhou');
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
......@@ -127,10 +128,10 @@ Note: If no join field of the driven table is indexed, the Block Nested-Loop alg
| 2 | DEPENDENT SUBQUERY | t2 | NULL | eq_ref | PRIMARY | PRIMARY | 4 | xx.t1.id | 1 | 10.00 | Using where with pushed condition (`xx`.`t2`.`first_name` = 'zhou')(t0) Pckrows: 16, susp. 16 (0 empty 0 full). Conditions: 1 |
+----+--------------------+-------+------------+--------+---------------+---------+---------+----------------+---------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
```
## **Common optimization methods**
### GROUP BY
# **Common optimization methods**
## GROUP BY
In MySQL, the GROUP BY operation sorts data first and then group the data. If you want to prevent temporary tables from being created and sorting, you must ensure that the field used for grouping is indexed. However, StoneDB uses the knowledge grid technique so that it can quickly filter needed data for statistical and aggregate queries based on the metadata recorded in data pack nodes. In this way, you do not need to create indexes.
### IN/EXISTS
## IN/EXISTS
In a join query, ensure that the table with smaller result set is used to drive the table with the larger result set. For example:
```sql
select * from A where id in (select id from B);
......@@ -205,9 +206,11 @@ If the result set of table A is smaller than that of table B, `EXISTS` is superi
1 row in set (0.03 sec)
```
:::info
If `IN` is used in this example, table B is used to drive table A and the execution time is 0.55s.
:::
### **Use IN and EXISTS interchangeably**
## **Use IN and EXISTS interchangeably**
Only when the join fields of the subquery are not null, `IN` can be converted to or from `EXISTS`. If you want the join fields of your subquery to use indexes, you can convert `IN` to `EXISTS`, as shown in the following example:
```sql
mysql> explain select * from t_test1 t1 where t1.id in (select t2.id from t_test2 t2 where t2.first_name='zhou');
......@@ -228,7 +231,7 @@ mysql> explain select * from t_test1 t1 where exists (select 1 from t_test2 t2 w
+----+--------------------+-------+------------+--------+---------------+---------+---------+----------------+---------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set, 2 warnings (0.00 sec)
```
### **Pagination**
## **Pagination**
Convert parts of pagination queries to subqueries and specify that only the primary keys are queried. This method ensures high performance and associates the subqueries with the main query as result set. The following code provides an example:
```sql
> select * from t_test1 order by id asc limit 949420,10;
......@@ -265,16 +268,20 @@ Convert parts of pagination queries to subqueries and specify that only the prim
+--------+------------+-----------+-----+-------+---------+
10 rows in set (0.13 sec)
```
## Table joins
### Nested loop joins
# Table joins
## Nested loop joins
The execution process is as follows:
1. The optimizer determines which table (table T1 or T2) is the driving table and which is the driven table based on certain rules. The driving table is used for the outer loop, and the driven table is used for the inner loop. In this example, the driving table is T1 and the driven table is T2.
2. Access driving table T1 based on the predicate condition specified in the SQL statement and record the result set as 1.
3. Traverse result set 1 and driven table T2: Read the records in result set 1 one by one. After reading each record, use the record to traverse driven table T2, checking whether a matching record exists in T2 based on the join condition.
If the join fields are indexed, use indexes to obtain rows that match the condition. For example, T1 has 100 rows and T2 has 1000 rows. T2 will be run for 100 times, and each time one row is scanned. The total number of rows scanned during the whole process is 200 rows.<br />If the fields that are joined are not indexed, the full table is scanned to obtain rows that match the condition. For example, T1 has 100 rows and T2 has 1000 rows. T2 will be run 100 times, and each time 1000 rows are scanned. The total number of rows scanned during the whole process is 100100.<br />Nested loop joins are suitable for join queries with small result sets.
### Hash joins
If the join fields are indexed, use indexes to obtain rows that match the condition. For example, T1 has 100 rows and T2 has 1000 rows. T2 will be run for 100 times, and each time one row is scanned. The total number of rows scanned during the whole process is 200 rows.
If the fields that are joined are not indexed, the full table is scanned to obtain rows that match the condition. For example, T1 has 100 rows and T2 has 1000 rows. T2 will be run 100 times, and each time 1000 rows are scanned. The total number of rows scanned during the whole process is 100100.
Nested loop joins are suitable for join queries with small result sets.
## Hash joins
Suppose two tables A and B exist. Table A contains 100,000 records of data and table B contains 1,000,000 records of data. The range of IDs of table A is 1 to 100000 and that of table B is 1 to 1000000. The tables are joined based on IDs.
SQL statement example:
......@@ -282,12 +289,13 @@ SQL statement example:
SELECT * FROM A,B WHERE A.ID=B.ID
```
1. Perform a full table scan on table A and use the hash function to hash the values of the join fields to buckets.
2. Perform a full table scan on table B and use the hash function to hash the values of the join fields to buckets.
3. Compare the hash values in each bucket and return only the hash values that are duplicate. Because the values are evenly hashed to each bucket, this method is efficient and effective.
1. The optimizer selects table A as the driving table and table B as the driven table, and then creates a hash table in the memory.
2. The optimizer scans all content in table A and uses the hash function to calculate the hash value of each field, and then saves the hash value to the hash table.
3. The optimizer scans all content in table B and uses the hash function to calculate the hash value of each field.
4. The optimizer compares each hash value in table B with hash values stored in the hash table. If a matching record is found, the corresponding data is returned. Otherwise, the data is discarded.
Hash joins are suitable for join queries with large result sets.
### Sort-merge joins
## Sort-merge joins
The execution process is as follows:
1. Access table T1 based on the predicate condition specified in the SQL statement, sort the result set based on the column in table T1 used for join, and then mark the result set as result set 1.
......
---
id: use-mydumper-full-backup
sidebar_position: 4.32
---
# MySQL全量数据备份-mydumper
mydumper项目地址:[https://github.com/mydumper/mydumper](https://github.com/mydumper/mydumper)
## Mydumper介绍
### 什么是Mydumper?
Mydumper 是一个 MySQL 逻辑备份工具。它有 2 个工具:
- mydumper负责导出 MySQL 数据库的一致备份
- myloader从 mydumper 读取备份,连接到目标数据库并导入备份。两种工具都使用多线程功能
### Mydumper优势
- 并行性(因此,速度)和性能(避免昂贵的字符集转换例程,整体高效的代码)
- 更易于管理输出(表的单独文件、转储元数据等,易于查看/解析数据)
- 一致性 - 维护所有线程的快照,提供准确的主从日志位置等
- 可管理性 - 支持 PCRE 以指定数据库和表的包含和排除
### Mydumper主要特性
- 多线程备份,备份后会生成多个备份文件
- 事务性和非事务性表一致的快照(适用于0.2.2以上版本)
- 快速的文件压缩
- 支持导出binlog
- 多线程恢复(适用于0.2.1以上版本)
- 以守护进程的工作方式,定时快照和连续二进制日志(适用于0.5.0以上版本)
- 开源 (GNU GPLv3)
## Mydumper使用
### Mydumer 参数
```bash
mydumper --help
Usage:
mydumper [OPTION…] multi-threaded MySQL dumping
Help Options:
-?, --help Show help options
Application Options:
-B, --database Database to dump
-o, --outputdir Directory to output files to
-s, --statement-size Attempted size of INSERT statement in bytes, default 1000000
-r, --rows Try to split tables into chunks of this many rows. This option turns off --chunk-filesize
-F, --chunk-filesize Split tables into chunks of this output file size. This value is in MB
--max-rows Limit the number of rows per block after the table is estimated, default 1000000
-c, --compress Compress output files
-e, --build-empty-files Build dump files even if no data available from table
-i, --ignore-engines Comma delimited list of storage engines to ignore
-N, --insert-ignore Dump rows with INSERT IGNORE
-m, --no-schemas Do not dump table schemas with the data and triggers
-M, --table-checksums Dump table checksums with the data
-d, --no-data Do not dump table data
--order-by-primary Sort the data by Primary Key or Unique key if no primary key exists
-G, --triggers Dump triggers. By default, it do not dump triggers
-E, --events Dump events. By default, it do not dump events
-R, --routines Dump stored procedures and functions. By default, it do not dump stored procedures nor functions
-W, --no-views Do not dump VIEWs
-k, --no-locks Do not execute the temporary shared read lock. WARNING: This will cause inconsistent backups
--no-backup-locks Do not use Percona backup locks
--less-locking Minimize locking time on InnoDB tables.
--long-query-retries Retry checking for long queries, default 0 (do not retry)
--long-query-retry-interval Time to wait before retrying the long query check in seconds, default 60
-l, --long-query-guard Set long query timer in seconds, default 60
-K, --kill-long-queries Kill long running queries (instead of aborting)
-D, --daemon Enable daemon mode
-X, --snapshot-count number of snapshots, default 2
-I, --snapshot-interval Interval between each dump snapshot (in minutes), requires --daemon, default 60
-L, --logfile Log file name to use, by default stdout is used
--tz-utc SET TIME_ZONE='+00:00' at top of dump to allow dumping of TIMESTAMP data when a server has data in different time zones or data is being moved between servers with different time zones, defaults to on use --skip-tz-utc to disable.
--skip-tz-utc
--use-savepoints Use savepoints to reduce metadata locking issues, needs SUPER privilege
--success-on-1146 Not increment error count and Warning instead of Critical in case of table doesn't exist
--lock-all-tables Use LOCK TABLE for all, instead of FTWRL
-U, --updated-since Use Update_time to dump only tables updated in the last U days
--trx-consistency-only Transactional consistency only
--complete-insert Use complete INSERT statements that include column names
--split-partitions Dump partitions into separate files. This options overrides the --rows option for partitioned tables.
--set-names Sets the names, use it at your own risk, default binary
-z, --tidb-snapshot Snapshot to use for TiDB
--load-data
--fields-terminated-by
--fields-enclosed-by
--fields-escaped-by Single character that is going to be used to escape characters in theLOAD DATA stament, default: '\'
--lines-starting-by Adds the string at the begining of each row. When --load-data is usedit is added to the LOAD DATA statement. Its affects INSERT INTO statementsalso when it is used.
--lines-terminated-by Adds the string at the end of each row. When --load-data is used it isadded to the LOAD DATA statement. Its affects INSERT INTO statementsalso when it is used.
--statement-terminated-by This might never be used, unless you know what are you doing
--sync-wait WSREP_SYNC_WAIT value to set at SESSION level
--where Dump only selected records.
--no-check-generated-fields Queries related to generated fields are not going to be executed.It will lead to restoration issues if you have generated columns
--disk-limits Set the limit to pause and resume if determines there is no enough disk space.Accepts values like: '<resume>:<pause>' in MB.For instance: 100:500 will pause when there is only 100MB free and willresume if 500MB are available
--csv Automatically enables --load-data and set variables to export in CSV format.
-t, --threads Number of threads to use, default 4
-C, --compress-protocol Use compression on the MySQL connection
-V, --version Show the program version and exit
-v, --verbose Verbosity of output, 0 = silent, 1 = errors, 2 = warnings, 3 = info, default 2
--defaults-file Use a specific defaults file
--stream It will stream over STDOUT once the files has been written
--no-delete It will not delete the files after stream has been completed
-O, --omit-from-file File containing a list of database.table entries to skip, one per line (skips before applying regex option)
-T, --tables-list Comma delimited table list to dump (does not exclude regex option)
-h, --host The host to connect to
-u, --user Username with the necessary privileges
-p, --password User password
-a, --ask-password Prompt For User password
-P, --port TCP/IP port to connect to
-S, --socket UNIX domain socket file to use for connection
-x, --regex Regular expression for 'db.table' matching
```
### Myloader参数
```bash
myloader --help
Usage:
myloader [OPTION…] multi-threaded MySQL loader
Help Options:
-?, --help Show help options
Application Options:
-d, --directory Directory of the dump to import
-q, --queries-per-transaction Number of queries per transaction, default 1000
-o, --overwrite-tables Drop tables if they already exist
-B, --database An alternative database to restore into
-s, --source-db Database to restore
-e, --enable-binlog Enable binary logging of the restore data
--innodb-optimize-keys Creates the table without the indexes and it adds them at the end
--set-names Sets the names, use it at your own risk, default binary
-L, --logfile Log file name to use, by default stdout is used
--purge-mode This specify the truncate mode which can be: NONE, DROP, TRUNCATE and DELETE
--disable-redo-log Disables the REDO_LOG and enables it after, doesn't check initial status
-r, --rows Split the INSERT statement into this many rows.
--max-threads-per-table Maximum number of threads per table to use, default 4
--skip-triggers Do not import triggers. By default, it imports triggers
--skip-post Do not import events, stored procedures and functions. By default, it imports events, stored procedures nor functions
--no-data Do not dump or import table data
--serialized-table-creation Table recreation will be executed in serie, one thread at a time
--resume Expect to find resume file in backup dir and will only process those files
-t, --threads Number of threads to use, default 4
-C, --compress-protocol Use compression on the MySQL connection
-V, --version Show the program version and exit
-v, --verbose Verbosity of output, 0 = silent, 1 = errors, 2 = warnings, 3 = info, default 2
--defaults-file Use a specific defaults file
--stream It will stream over STDOUT once the files has been written
--no-delete It will not delete the files after stream has been completed
-O, --omit-from-file File containing a list of database.table entries to skip, one per line (skips before applying regex option)
-T, --tables-list Comma delimited table list to dump (does not exclude regex option)
-h, --host The host to connect to
-u, --user Username with the necessary privileges
-p, --password User password
-a, --ask-password Prompt For User password
-P, --port TCP/IP port to connect to
-S, --socket UNIX domain socket file to use for connection
-x, --regex Regular expression for 'db.table' matching
--skip-definer Removes DEFINER from the CREATE statement. By default, statements are not modified
```
### 安装使用
```bash
#到项目github 上下载机器对应的rpm包或者源码包,源码包需要进行编译,rpm包安装简单建议使用,本文以centos 7系统为例,所以下载el7版本
[root@dev tmp]# wget https://github.com/mydumper/mydumper/releases/download/v0.12.1/mydumper-0.12.1-1-zstd.el7.x86_64.rpm
#由于下载的mydumper是zstd类型的,所以需要下载libzstd依赖
[root@dev tmp]# yum install libzstd.x86_64 -y
[root@dev tmp]#rpm -ivh mydumper-0.12.1-1-zstd.el7.x86_64.rpm
Preparing... ################################# [100%]
Updating / installing...
1:mydumper-0.12.1-1 ################################# [100%]
#备份库
[root@dev home]# mydumper -u root -p xxx -P 3306 -h 127.0.0.1 -B zz -o /home/dumper/
#恢复库
[root@dev home]# myloader -u root -p xxx -P 3306 -h 127.0.0.1 -S /stonedb/install/tmp/mysql.sock -B zz -d /home/dumper
```
**备份所生成的文件**
```bash
[root@dev home]# ll dumper/
total 112
-rw-r--r--. 1 root root 139 Mar 23 14:24 metadata
-rw-r--r--. 1 root root 88 Mar 23 14:24 zz-schema-create.sql
-rw-r--r--. 1 root root 97819 Mar 23 14:24 zz.t_user.00000.sql
-rw-r--r--. 1 root root 4 Mar 23 14:24 zz.t_user-metadata
-rw-r--r--. 1 root root 477 Mar 23 14:24 zz.t_user-schema.sql
[root@dev dumper]# cat metadata
Started dump at: 2022-03-23 15:51:40
SHOW MASTER STATUS:
Log: mysql-bin.000002
Pos: 4737113
GTID:
Finished dump at: 2022-03-23 15:51:40
[root@dev-myos dumper]# cat zz-schema-create.sql
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `zz` /*!40100 DEFAULT CHARACTER SET utf8 */;
[root@dev dumper]# more zz.t_user.00000.sql
/*!40101 SET NAMES binary*/;
/*!40014 SET FOREIGN_KEY_CHECKS=0*/;
/*!40103 SET TIME_ZONE='+00:00' */;
INSERT INTO `t_user` VALUES(1,"e1195afd-aa7d-11ec-936e-00155d840103","kAMXjvtFJym1S7PAlMJ7",102,62,"2022-03-23 15:50:16")
,(2,"e11a7719-aa7d-11ec-936e-00155d840103","0ufCd3sXffjFdVPbjOWa",698,44,"2022-03-23 15:50:16")
.....#内容过多不全部展示
[root@dev dumper]# cat zz.t_user-metadata
10000
[root@dev-myos dumper]# cat zz.t_user-schema.sql
/*!40101 SET NAMES binary*/;
/*!40014 SET FOREIGN_KEY_CHECKS=0*/;
/*!40103 SET TIME_ZONE='+00:00' */;
CREATE TABLE `t_user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`c_user_id` varchar(36) NOT NULL DEFAULT '',
`c_name` varchar(22) NOT NULL DEFAULT '',
`c_province_id` int(11) NOT NULL,
`c_city_id` int(11) NOT NULL,
`create_time` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `idx_user_id` (`c_user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=10001 DEFAULT CHARSET=utf8;
```
目录
metadata文件
- 记录了备份数据库在备份时间点的二进制日志文件名,日志的写入位置,
- 如果是在从库进行备份,还会记录备份时同步至主库的二进制日志文件及写入位置
每个表有两个备份文件:
database-schema-create 库创建语句文件
database.table-schema.sql 表结构文件
database.table.00000.sql 表数据文件
database.table-metadata 表元数据文件
***扩展***
如果要导入数据到StoneDB,需要把Mydumper的database.table-schema.sql 表结构文件中建表语句engine=innodb 改成 engine=stonedb,并检查表结构是否有StoneDB不兼容的语法:类似unsigned 之类的限制。修改后结构示例:
```
[root@dev-myos dumper]# cat zz.t_user-schema.sql
/*!40101 SET NAMES binary*/;
/*!40014 SET FOREIGN_KEY_CHECKS=0*/;
/*!40103 SET TIME_ZONE='+00:00' */;
CREATE TABLE `t_user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`c_user_id` varchar(36) NOT NULL DEFAULT '',
`c_name` varchar(22) NOT NULL DEFAULT '',
`c_province_id` int(11) NOT NULL,
`c_city_id` int(11) NOT NULL,
`create_time` datetime NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=STONEDB AUTO_INCREMENT=10001 DEFAULT CHARSET=utf8;
```
### 备份原理
- 主线程 FLUSH TABLES WITH READ LOCK, 施加全局只读锁,保证数据的一致性
- 读取当前时间点的二进制日志文件名和日志写入的位置并记录在metadata文件中,以供全量恢复后追加binlog恢复使用
- N个(线程数可以指定,默认是4)dump线程把事务隔离级别改为可重复读 并开启一致性读事务
- dump non-InnoDB tables, 首先导出非事物引擎的表
- 主线程 UNLOCK TABLES 非事物引擎备份完后,释放全局只读锁
- dump InnoDB tables, 基于事物导出InnoDB表
- 事物结束
\ No newline at end of file
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册