From 4659869553578d20031ceeedfe0cc4bd91e90199 Mon Sep 17 00:00:00 2001 From: Liang Zhang Date: Sat, 30 May 2020 19:08:37 +0800 Subject: [PATCH] update encrypt doc (#5856) --- README_ZH.md | 2 +- docs/document/content/faq/_index.cn.md | 8 +- docs/document/content/faq/_index.en.md | 4 +- .../content/features/encrypt/_index.cn.md | 32 +- .../content/features/encrypt/_index.en.md | 29 +- .../content/features/encrypt/concept.cn.md | 200 +----------- .../content/features/encrypt/concept.en.md | 205 +----------- .../content/features/encrypt/principle.cn.md | 233 ++++++++++++++ .../content/features/encrypt/principle.en.md | 294 ++++++++++++++++++ .../content/features/encrypt/use-norms.cn.md | 18 ++ .../content/features/encrypt/use-norms.en.md | 18 ++ .../content/features/spi/_index.cn.md | 6 +- .../content/features/spi/_index.en.md | 6 +- .../test-engine/performance-test.cn.md | 6 +- .../configuration/config-java.cn.md | 10 +- .../configuration/config-java.en.md | 6 +- .../configuration/config-spring-boot.cn.md | 8 +- .../configuration/config-spring-boot.en.md | 8 +- .../config-spring-namespace.cn.md | 10 +- .../config-spring-namespace.en.md | 8 +- .../configuration/config-yaml.cn.md | 10 +- .../configuration/config-yaml.en.md | 6 +- .../shardingsphere-jdbc/usage/encrypt.cn.md | 8 +- .../shardingsphere-jdbc/usage/encrypt.en.md | 6 +- .../shardingsphere-proxy/configuration.cn.md | 6 +- .../shardingsphere-proxy/configuration.en.md | 6 +- docs/document/content/overview/_index.cn.md | 2 +- examples/README_ZH.md | 2 +- 28 files changed, 660 insertions(+), 497 deletions(-) create mode 100644 docs/document/content/features/encrypt/principle.cn.md create mode 100644 docs/document/content/features/encrypt/principle.en.md create mode 100644 docs/document/content/features/encrypt/use-norms.cn.md create mode 100644 docs/document/content/features/encrypt/use-norms.en.md diff --git a/README_ZH.md b/README_ZH.md index 09e047bd9e..782d9ca91e 100644 --- a/README_ZH.md +++ b/README_ZH.md @@ -109,7 +109,7 @@ Apache ShardingSphere 是多接入端共同组成的生态圈。 * 分布式治理 * 弹性伸缩 * 可视化链路追踪 -* 数据脱敏 +* 数据加密 ## 如何构建 diff --git a/docs/document/content/faq/_index.cn.md b/docs/document/content/faq/_index.cn.md index f86674233d..bed2f55594 100644 --- a/docs/document/content/faq/_index.cn.md +++ b/docs/document/content/faq/_index.cn.md @@ -218,17 +218,17 @@ ShardingSphere中很多功能实现类的加载方式是通过[SPI](https://shar 与分布式主键`ShardingKeyGenerator`接口相同,其他ShardingSphere的[扩展功能](https://shardingsphere.apache.org/document/current/cn/features/spi/)也需要用相同的方式注入才能生效。 -## 17. JPA 和 数据脱敏无法一起使用,如何解决? +## 17. JPA 和 数据加密无法一起使用,如何解决? 回答: -由于数据脱敏的DDL尚未开发完成,因此对于自动生成DDL语句的JPA与数据脱敏一起使用时,会导致JPA的实体类(Entity)无法同时满足DDL和DML的情况。 +由于数据加密的DDL尚未开发完成,因此对于自动生成DDL语句的JPA与数据加密一起使用时,会导致JPA的实体类(Entity)无法同时满足DDL和DML的情况。 解决方案如下: -1. 以需要脱敏的逻辑列名编写JPA的实体类(Entity)。 +1. 以需要加密的逻辑列名编写JPA的实体类(Entity)。 2. 关闭JPA的auto-ddl,如 auto-ddl=none。 -3. 手动建表,建表时应使用数据脱敏配置的`cipherColumn`,`plainColumn`和`assistedQueryColumn`代替逻辑列。 +3. 手动建表,建表时应使用数据加密配置的`cipherColumn`,`plainColumn`和`assistedQueryColumn`代替逻辑列。 ## 18. 服务启动时如何加快`metadata`加载速度? diff --git a/docs/document/content/faq/_index.en.md b/docs/document/content/faq/_index.en.md index 9dd3062a21..b6e29ad3c5 100644 --- a/docs/document/content/faq/_index.en.md +++ b/docs/document/content/faq/_index.en.md @@ -215,11 +215,11 @@ More detail for SPI usage, please search by yourself. Other ShardingSphere [functionality implementation](https://shardingsphere.apache.org/document/current/en/features/spi/) will take effect in the same way. -## 17. How to solve that `DATA MASKING` can't work with JPA? +## 17. How to solve that `data encryption` can't work with JPA? Answer: -Because DDL for data masking has not yet finished, JPA Entity cannot meet the DDL and DML at the same time, when JPA that automatically generates DDL is used with data masking. +Because DDL for data encryption has not yet finished, JPA Entity cannot meet the DDL and DML at the same time, when JPA that automatically generates DDL is used with data encryption. The solutions are as follows: diff --git a/docs/document/content/features/encrypt/_index.cn.md b/docs/document/content/features/encrypt/_index.cn.md index 24fed0925a..5b1495276e 100644 --- a/docs/document/content/features/encrypt/_index.cn.md +++ b/docs/document/content/features/encrypt/_index.cn.md @@ -7,26 +7,26 @@ chapter = true ## 背景 -安全控制一直是治理的重要环节,数据脱敏属于安全控制的范畴。对互联网公司、传统行业来说,数据安全一直是极为重视和敏感的话题。数据脱敏是指对某些敏感信息通过脱敏规则进行数据的变形,实现敏感隐私数据的可靠保护。涉及客户安全数据或者一些商业性敏感数据,如身份证号、手机号、卡号、客户号等个人信息按照相关部门规定,都需要进行数据脱敏。 +安全控制一直是治理的重要环节,数据加密属于安全控制的范畴。无论对互联网公司还是传统行业来说,数据安全一直是极为重视和敏感的话题。 +数据加密是指对某些敏感信息通过加密规则进行数据的变形,实现敏感隐私数据的可靠保护。 +涉及客户安全数据或者一些商业性敏感数据,如身份证号、手机号、卡号、客户号等个人信息按照相关部门规定,都需要进行数据加密。 -在真实业务场景中,相关业务开发团队则往往需要针对公司安全部门需求,自行实行并维护一套加解密系统,而当脱敏场景发生改变时,自行维护的脱敏系统往往又面临着重构或修改风险。此外,对于已经上线的业务,如何在不修改业务逻辑、业务SQL的情况下,透明化、安全低风险地实现无缝进行脱敏改造呢? - -Apache ShardingSphere根据业界对脱敏的需求及业务改造痛点,提供了一套完整、安全、透明化、低改造成本的数据脱敏整合解决方案。 - -## 前序 - -数据脱敏模块属于ShardingSphere分布式治理这一核心功能下的子功能模块。它通过对用户输入的SQL进行解析,并依据用户提供的脱敏配置对SQL进行改写,从而实现对原文数据进行加密,并将原文数据(可选)及密文数据同时存储到底层数据库。在用户查询数据时,它又从数据库中取出密文数据,并对其解密,最终将解密后的原始数据返回给用户。Apache ShardingSphere分布式数据库中间件自动化&透明化了数据脱敏过程,让用户无需关注数据脱敏的实现细节,像使用普通数据那样使用脱敏数据。此外,无论是已在线业务进行脱敏改造,还是新上线业务使用脱敏功能,ShardingSphere都可以提供一套相对完善的解决方案。 - -## 需求场景分析 - -对于数据脱敏的需求,在现实的业务场景中一般分为两种情况: +对于数据加密的需求,在现实的业务场景中一般分为两种情况: 1. 新业务上线,安全部门规定需将涉及用户敏感信息,例如银行、手机号码等进行加密后存储到数据库,在使用的时候再进行解密处理。因为是全新系统,因而没有存量数据清洗问题,所以实现相对简单。 -2. 已上线业务,之前一直将明文存储在数据库中。相关部门突然需要对已上线业务进行脱敏整改。这种场景一般需要处理三个问题: +2. 已上线业务,之前一直将明文存储在数据库中。相关部门突然需要对已上线业务进行加密整改。这种场景一般需要处理 3 个问题: + +* 历史数据需要如何进行加密处理,即洗数。 +* 如何能在不改动业务SQL和逻辑情况下,将新增数据进行加密处理,并存储到数据库;在使用时,再进行解密取出。 +* 如何较为安全、无缝、透明化地实现业务系统在明文与密文数据间的迁移。 + +## 挑战 - a) 历史数据需要如何进行脱敏处理,即洗数。 +在真实业务场景中,相关业务开发团队则往往需要针对公司安全部门需求,自行实行并维护一套加解密系统。 +而当加密场景发生改变时,自行维护的加密系统往往又面临着重构或修改风险。 +此外,对于已经上线的业务,在不修改业务逻辑和 SQL 的情况下,透明化、安全低风险地实现无缝进行加密改造也相对复杂。 - b) 如何能在不改动业务SQL和逻辑情况下,将新增数据进行脱敏处理,并存储到数据库;在使用时,再进行解密取出。 +## 目标 - c) 如何较为安全、无缝、透明化地实现业务系统在明文与密文数据间的迁移。 +**根据业界对加密的需求及业务改造痛点,提供了一套完整、安全、透明化、低改造成本的数据加密整合解决方案,是Apache ShardingSphere 数据加密模块的主要设计目标。** diff --git a/docs/document/content/features/encrypt/_index.en.md b/docs/document/content/features/encrypt/_index.en.md index f062565d23..2b536c6e72 100644 --- a/docs/document/content/features/encrypt/_index.en.md +++ b/docs/document/content/features/encrypt/_index.en.md @@ -7,17 +7,11 @@ chapter = true ## Background -Security control has always been a crucial link of orchestration; data masking falls into this category. For both Internet enterprises and traditional sectors, data security has always been a highly valued and sensitive topic. Data masking refers to transforming some sensitive information through masking rules to safely protect the private data. Data involves client's security or business sensibility, such as ID number, phone number, card number, client number and other personal information, requires data masking according to relevant regulations. - -Because of that, ShardingSphere has provided data masking, which stores users' sensitive information in the database after encryption. When users search for them, the information will be decrypted and returned to users in the original form. - -ShardingSphere has made the encryption and decryption processes totally transparent to users, who can store desensitized data and acquire original data without any awareness. In addition, ShardingSphere has provided internal masking algorithms, which can be directly used by users. In the same time, we have also provided masking algorithm related interfaces, which can be implemented by users themselves. After simple configurations, ShardingSphere can use algorithms provided by users to perform encryption, decryption and masking. - -## Preface - -The data encryption module belongs to the sub-function module under the core function of ShardingSphere distributed governance. It parses the SQL input by the user and rewrites the SQL according to the encryption configuration provided by the user, thereby encrypting the original data and storing the original data and store the original data (optional) and cipher data to database at the same time. When the user queries the data, it takes the cipher data from the database and decrypts it, and finally returns the decrypted original data to the user. Apache ShardingSphere distributed database middleware automates and transparentizes the process of data encryption, so that users do not need to pay attention to the details of data decryption and use decrypted data like ordinary data. In addition, ShardingSphere can provide a relatively complete set of solutions for the encryption of online services or the encryption function of new services. - -## Demand Analysis +Security control has always been a crucial link of data governance, data encryption falls into this category. +For both Internet enterprises and traditional sectors, data security has always been a highly valued and sensitive topic. +Data encryption refers to transforming some sensitive information through encrypt rules to safely protect the private data. +Data involves client's security or business sensibility, +such as ID number, phone number, card number, client number and other personal information, requires data encryption according to relevant regulations. The demand for data encryption is generally divided into two situations in real business scenarios: @@ -25,9 +19,16 @@ The demand for data encryption is generally divided into two situations in real 2. For the service has been launched, and plaintext has been stored in the database before. The relevant department suddenly needs to encrypt the data from the on-line business. This scenario generally needs to deal with three issues as followings: - a) How to encrypt the historical data, a.k.a.s clean data. +* How to encrypt the historical data, a.k.a.s clean data. +* How to encrypt the newly added data and store it in the database without changing the business SQL and logic; then decrypt the taken out data when use it. +* How to securely, seamlessly and transparently migrate plaintext and ciphertext data between business systems + +## Challenges - b) How to encrypt the newly added data and store it in the database without changing the business SQL and logic; then decrypt the taken out data when use it. +In the real business scenario, the relevant business development team often needs to implement and maintain a set of encryption and decryption system according to the needs of the company's security department. +When the encryption scenario changes, the encryption system often faces the risk of reconstruction or modification. +In addition, for the online business system, it is relatively complex to realize seamless encryption transformation with transparency, security and low risk without modifying the business logic and SQL. - c) How to securely, seamlessly and transparently migrate plaintext and ciphertext data between business systems +## Goal +**Provides a security and transparent data encryption solution, which is the main design goal of Apache ShardingSphere data encryption module.** diff --git a/docs/document/content/features/encrypt/concept.cn.md b/docs/document/content/features/encrypt/concept.cn.md index 8709ddb50e..b79c0502b5 100644 --- a/docs/document/content/features/encrypt/concept.cn.md +++ b/docs/document/content/features/encrypt/concept.cn.md @@ -4,202 +4,4 @@ title = "核心概念" weight = 1 +++ -## 处理流程详解 - -### 整体架构 - -ShardingSphere提供的Encrypt-JDBC和业务代码部署在一起。业务方需面向Encrypt-JDBC进行JDBC编程。由于Encrypt-JDBC实现所有JDBC标准接口,业务代码无需做额外改造即可兼容使用。此时,业务代码所有与数据库的交互行为交由Encrypt-JDBC负责。业务只需提供脱敏规则即可。**作为业务代码与底层数据库中间的桥梁,Encrypt-JDBC便可拦截用户行为,并在改造行为后与数据库交互。** - -![1](https://shardingsphere.apache.org/document/current/img/encrypt/1.png) - -Encrypt-JDBC将用户发起的SQL进行拦截,并通过SQL语法解析器进行解析、理解SQL行为,再依据用户传入的脱敏规则,找出需要脱敏的字段和所使用的加解密器对目标字段进行加解密处理后,再与底层数据库进行交互。ShardingSphere会将用户请求的明文进行加密后存储到底层数据库;并在用户查询时,将密文从数据库中取出进行解密后返回给终端用户。ShardingSphere通过屏蔽对数据的脱敏处理,使用户无需感知解析SQL、数据加密、数据解密的处理过程,就像在使用普通数据一样使用脱敏数据。 - -### 脱敏规则 - -在详解整套流程之前,我们需要先了解下脱敏规则与配置,这是认识整套流程的基础。脱敏配置主要分为四部分:数据源配置,加密器配置,脱敏表配置以及查询属性配置,其详情如下图所示: - -![2](https://shardingsphere.apache.org/document/current/img/encrypt/2.png) - -**数据源配置**:是指DataSource的配置。 - -**加密器配置**:是指使用什么加密策略进行加解密。目前ShardingSphere内置了两种加解密策略:AES/MD5。用户还可以通过实现ShardingSphere提供的接口,自行实现一套加解密算法。 - -**脱敏表配置**:用于告诉ShardingSphere数据表里哪个列用于存储密文数据(cipherColumn)、哪个列用于存储明文数据(plainColumn)以及用户想使用哪个列进行SQL编写(logicColumn)。 - -> 如何理解`用户想使用哪个列进行SQL编写(logicColumn)`? -> -> 我们可以从Encrypt-JDBC存在的意义来理解。Encrypt-JDBC最终目的是希望屏蔽底层对数据的脱敏处理,也就是说我们不希望用户知道数据是如何被加解密的、如何将明文数据存储到plainColumn,将密文数据存储到cipherColumn。换句话说,我们不希望用户知道plainColumn和cipherColumn的存在和使用。所以,我们需要给用户提供一个概念意义上的列,这个列可以脱离底层数据库的真实列,它可以是数据库表里的一个真实列,也可以不是,从而使得用户可以随意改变底层数据库的plainColumn和cipherColumn的列名。或者删除plainColumn,选择永远不再存储明文,只存储密文。只要用户的SQL面向这个逻辑列进行编写,并在脱敏规则里给出logicColumn和plainColumn、cipherColumn之间正确的映射关系即可。 -> -> 为什么要这么做呢?答案在文章后面,即为了让已上线的业务能无缝、透明、安全地进行数据脱敏迁移。 - -**查询属性的配置**:当底层数据库表里同时存储了明文数据、密文数据后,该属性开关用于决定是直接查询数据库表里的明文数据进行返回,还是查询密文数据通过Encrypt-JDBC解密后返回。 - -### 脱敏处理过程 - -举例说明,假如数据库里有一张表叫做t_user,这张表里实际有两个字段pwd_plain,用于存放明文数据、pwd_cipher,用于存放密文数据,同时定义logicColumn为pwd。那么,用户在编写SQL时应该面向logicColumn进行编写,即INSERT INTO t_user SET pwd = '123'。ShardingSphere接收到该SQL,通过用户提供的脱敏配置,发现pwd是logicColumn,于是便对逻辑列及其对应的明文数据进行脱敏处理。可以看出**ShardingSphere将面向用户的逻辑列与面向底层数据库的明文列和密文列进行了列名以及数据的脱敏映射转换。**如下图所示: - -![3](https://shardingsphere.apache.org/document/current/img/encrypt/3.png) - -**这也正是Encrypt-JDBC核心意义所在,即依据用户提供的脱敏规则,将用户SQL与底层数据表结构割裂开来,使得用户的SQL编写不再依赖于真实的数据库表结构。而用户与底层数据库之间的衔接、映射、转换交由ShardingSphere进行处理。**为什么我们要这么做?还是那句话:为了让已上线的业务能无缝、透明、安全地进行数据脱敏迁移。 - -为了让读者更清晰了解到Encrypt-JDBC的核心处理流程,下方图片展示了使用Encrypt-JDBC进行增删改查时,其中的处理流程和转换逻辑,如下图所示。 - -![4](https://shardingsphere.apache.org/document/current/img/encrypt/4.png) - -## 解决方案详解 - -在了解了ShardingSphere脱敏处理流程后,即可将脱敏配置、脱敏处理流程与实际场景进行结合。所有的设计开发都是为了解决业务场景遇到的痛点。那么面对之前提到的业务场景需求,又应该如何使用ShardingSphere这把利器来满足业务需求呢? - -### 新上线业务 - -业务场景分析:新上线业务由于一切从零开始,不存在历史数据清洗问题,所以相对简单。 - -解决方案说明:选择合适的加密器,如AES后,只需配置逻辑列(面向用户编写SQL)和密文列(数据表存密文数据)即可,**逻辑列和密文列可以相同也可以不同**。建议配置如下(Yaml格式展示): - -```yaml -encryptRule: - encryptors: - aes_encryptor: - type: aes - props: - aes.key.value: 123456abc - tables: - t_user: - columns: - pwd: - cipherColumn: pwd - encryptor: aes_encryptor -``` - -使用这套配置,Encrypt-JDBC只需将logicColumn和cipherColumn进行转换,底层数据表不存储明文,只存储了密文,这也是安全审计部分的要求所在。如果用户希望将明文、密文一同存储到数据库,只需添加plainColumn配置即可。整体处理流程如下图所示: - -![5](https://shardingsphere.apache.org/document/current/img/encrypt/5.png) - -### 已上线业务改造 - -业务场景分析:由于业务已经在线上运行,数据库里必然存有大量明文历史数据。现在的问题是如何让历史数据得以加密清洗、如何让增量数据得以加密处理、如何让业务在新旧两套数据系统之间进行无缝、透明化迁移。 - -解决方案说明:在提供解决方案之前,我们先来头脑风暴一下:首先,既然是旧业务需要进行脱敏改造,那一定存储了非常重要且敏感的信息。这些信息含金量高且业务相对基础重要。如果搞错了,整个团队KPI就再见了。所以不可能一上来就停业务,禁止新数据写入,再找个加密器把历史数据全部加密清洗,再把之前重构的代码部署上线,使其能把存量和增量数据进行在线加密解密。如此简单粗暴的方式,按照历史经验来谈,一定凉凉。 - -那么另一种相对安全的做法是:重新搭建一套和生产环境一模一样的预发环境,然后通过相关迁移洗数工具把生产环境的**存量原文数据**加密后存储到预发环境,而**新增数据**则通过例如MySQL主从复制及业务方自行开发的工具加密后存储到预发环境的数据库里,再把重构后可以进行加解密的代码部署到预发环境。这样生产环境是一套**以明文为核心的查询修改**的环境;预发环境是一套**以密文为核心加解密查询修改**的环境。在对比一段时间无误后,可以夜间操作将生产流量切到预发环境中。此方案相对安全可靠,只是时间、人力、资金、成本较高,主要包括:预发环境搭建、生产代码整改、相关辅助工具开发等。除非无路可走,否则业务开发人员一般是从入门到放弃。 - -业务开发人员最希望的做法是:减少资金费用的承担、最好不要修改业务代码、能够安全平滑迁移系统。于是,ShardingSphere的脱敏功能模块便应用而生。可分为三步进行: - -1. 系统迁移前 - - 假设系统需要对t_user的pwd字段进行脱敏处理,业务方使用Encrypt-JDBC来代替标准化的JDBC接口,此举基本不需要额外改造(我们还提供了SpringBoot,SpringNameSpace,Yaml等接入方式,满足不同业务方需求)。另外,提供一套脱敏配置规则,如下所示: - - ```yaml - encryptRule: - encryptors: - aes_encryptor: - type: aes - props: - aes.key.value: 123456abc - tables: - t_user: - columns: - pwd: - plainColumn: pwd - cipherColumn: pwd_cipher - encryptor: aes_encryptor - props: - query.with.cipher.column: false - ``` - - 依据上述脱敏规则可知,首先需要在数据库表t_user里新增一个字段叫做pwd_cipher,即cipherColumn,用于存放密文数据,同时我们把plainColumn设置为pwd,用于存放明文数据,而把logicColumn也设置为pwd。由于之前的代码SQL就是使用pwd进行编写,即面向逻辑列进行SQL编写,所以业务代码无需改动。通过Encrypt-JDBC,针对新增的数据,会把明文写到pwd列,并同时把明文进行加密存储到pwd_cipher列。此时,由于query.with.cipher.column设置为false,对业务应用来说,依旧使用pwd这一明文列进行查询存储,却在底层数据库表pwd_cipher上额外存储了新增数据的密文数据,其处理流程如下图所示: - - ![6](https://shardingsphere.apache.org/document/current/img/encrypt/6.png) - - 新增数据在插入时,就通过Encrypt-JDBC加密为密文数据,并被存储到了cipherColumn。而现在就需要处理历史明文存量数据。**由于Apache ShardingSphere目前并未提供相关迁移洗数工具,此时需要业务方自行将pwd中的明文数据进行加密处理存储到pwd_cipher。** - -2. 系统迁移中 - - 新增的数据已被Encrypt-JDBC将密文存储到密文列,明文存储到明文列;历史数据被业务方自行加密清洗后,将密文也存储到密文列。也就是说现在的数据库里即存放着明文也存放着密文,只是由于配置项中的query.with.cipher.column=false,所以密文一直没有被使用过。现在我们为了让系统能切到密文数据进行查询,需要将脱敏配置中的query.with.cipher.column设置为true。在重启系统后,我们发现系统业务一切正常,但是Encrypt-JDBC已经开始从数据库里取出密文列的数据,解密后返回给用户;而对于用户的增删改需求,则依旧会把原文数据存储到明文列,加密后密文数据存储到密文列。 - - 虽然现在业务系统通过将密文列的数据取出,解密后返回;但是,在存储的时候仍旧会存一份原文数据到明文列,这是为什么呢?答案是:为了能够进行系统回滚。**因为只要密文和明文永远同时存在,我们就可以通过开关项配置自由将业务查询切换到cipherColumn或plainColumn。**也就是说,如果将系统切到密文列进行查询时,发现系统报错,需要回滚。那么只需将query.with.cipher.column=false,Encrypt-JDBC将会还原,即又重新开始使用plainColumn进行查询。处理流程如下图所示: - - ![7](https://shardingsphere.apache.org/document/current/img/encrypt/7.png) - -3. 系统迁移后 - - 由于安全审计部门要求,业务系统一般不可能让数据库的明文列和密文列永久同步保留,我们需要在系统稳定后将明文列数据删除。即我们需要在系统迁移后将plainColumn,即pwd进行删除。那问题来了,现在业务代码都是面向pwd进行编写SQL的,把底层数据表中的存放明文的pwd删除了,换用pwd_cipher进行解密得到原文数据,那岂不是意味着业务方需要整改所有SQL,从而不使用即将要被删除的pwd列?还记得我们Encrypt-JDBC的核心意义所在吗? - - > 这也正是Encrypt-JDBC核心意义所在,即依据用户提供的脱敏规则,将用户SQL与底层数据库表结构割裂开来,使得用户的SQL编写不再依赖于真实的数据库表结构。而用户与底层数据库之间的衔接、映射、转换交由ShardingSphere进行处理。 - - 是的,因为有logicColumn存在,用户的编写SQL都面向这个虚拟列,Encrypt-JDBC就可以把这个逻辑列和底层数据表中的密文列进行映射转换。于是迁移后的脱敏配置即为: - - ```yaml - encryptRule: - encryptors: - aes_encryptor: - type: aes - props: - aes.key.value: 123456abc - tables: - t_user: - columns: - pwd: # pwd与pwd_cipher的转换映射 - cipherColumn: pwd_cipher - encryptor: aes_encryptor - props: - query.with.cipher.column: true - ``` - -其处理流程如下: - -![8](https://shardingsphere.apache.org/document/current/img/encrypt/8.png) - -至此,已在线业务脱敏整改解决方案全部叙述完毕。我们提供了Java、Yaml、SpringBoot、SpringNameSpace多种方式供用户选择接入,力求满足业务不同的接入需求。该解决方案目前已在京东数科不断落地上线,提供对内基础服务支撑。 - -## 中间件脱敏服务优势 - -1. 自动化&透明化数据脱敏过程,用户无需关注脱敏中间实现细节。 -2. 提供多种内置、第三方(AKS)的脱敏策略,用户仅需简单配置即可使用。 -3. 提供脱敏策略API接口,用户可实现接口,从而使用自定义脱敏策略进行数据脱敏。 -4. 支持切换不同的脱敏策略。 -5. 针对已上线业务,可实现明文数据与密文数据同步存储,并通过配置决定使用明文列还是密文列进行查询。可实现在不改变业务查询SQL前提下,已上线系统对加密前后数据进行安全、透明化迁移。 - -## 适用场景说明 - -1. 用户项目使用Java语言进行编程。 -2. 后端数据库为MySQL、Oracle、PostgreSQL、SQLServer。 -3. 用户需要对数据库表中某个或多个列进行脱敏(数据加密&解密)。 -4. 兼容所有常用SQL。 - -## 限制条件 - -1. 用户需要自行处理数据库中原始的存量数据、洗数。 -2. 使用脱敏功能+分库分表功能,部分特殊SQL不支持,请参考[SQL使用规范]( https://shardingsphere.apache.org/document/current/cn/features/sharding/use-norms/sql/)。 -3. 脱敏字段无法支持比较操作,如:大于小于、ORDER BY、BETWEEN、LIKE等。 -4. 脱敏字段无法支持计算操作,如:AVG、SUM以及计算表达式 。 - -## 加密策略解析 - -ShardingSphere提供了两种加密策略用于数据脱敏,该两种策略分别对应ShardingSphere的两种加解密的接口,即ShardingEncryptor和ShardingQueryAssistedEncryptor。 - -一方面,ShardingSphere为用户提供了内置的加解密实现类,用户只需进行配置即可使用;另一方面,为了满足用户不同场景的需求,我们还开放了相关加解密接口,用户可依据该两种类型的接口提供具体实现类。再进行简单配置,即可让ShardingSphere调用用户自定义的加解密方案进行数据脱敏。 - -### ShardingEncryptor -该解决方案通过提供`encrypt()`, `decrypt()`两种方法对需要脱敏的数据进行加解密。在用户进行`INSERT`, `DELETE`, `UPDATE`时,ShardingSphere会按照用户配置,对SQL进行解析、改写、路由,并会调用`encrypt()`将数据加密后存储到数据库, 而在`SELECT`时,则调用`decrypt()`方法将从数据库中取出的脱敏数据进行逆向解密,最终将原始数据返回给用户。 - -当前,ShardingSphere针对这种类型的脱敏解决方案提供了两种具体实现类,分别是MD5(不可逆),AES(可逆),用户只需配置即可使用这两种内置的方案。 - -### ShardingQueryAssistedEncryptor -相比较于第一种脱敏方案,该方案更为安全和复杂。它的理念是:即使是相同的数据,如两个用户的密码相同,它们在数据库里存储的脱敏数据也应当是不一样的。这种理念更有利于保护用户信息,防止撞库成功。 - -它提供三种函数进行实现,分别是`encrypt()`, `decrypt()`, `queryAssistedEncrypt()`。在`encrypt()`阶段,用户通过设置某个变动种子,例如时间戳。针对原始数据+变动种子组合的内容进行加密,就能保证即使原始数据相同,也因为有变动种子的存在,致使加密后的脱敏数据是不一样的。在`decrypt()`可依据之前规定的加密算法,利用种子数据进行解密。 - -虽然这种方式确实可以增加数据的保密性,但是另一个问题却随之出现:相同的数据在数据库里存储的内容是不一样的,那么当用户按照这个加密列进行等值查询(`SELECT FROM table WHERE encryptedColumnn = ?`)时会发现无法将所有相同的原始数据查询出来。为此,我们提出了辅助查询列的概念。该辅助查询列通过`queryAssistedEncrypt()`生成,与`decrypt()`不同的是,该方法通过对原始数据进行另一种方式的加密,但是针对原始数据相同的数据,这种加密方式产生的加密数据是一致的。将`queryAssistedEncrypt()`后的数据存储到数据中用于辅助查询真实数据。因此,数据库表中多出这一个辅助查询列。 - -由于`queryAssistedEncrypt()`和`encrypt()`产生不同加密数据进行存储,而`decrypt()`可逆,`queryAssistedEncrypt()`不可逆。 在查询原始数据的时候,我们会自动对SQL进行解析、改写、路由,利用辅助查询列进行 -`WHERE`条件的查询,却利用 `decrypt()`对`encrypt()`加密后的数据进行解密,并将原始数据返回给用户。这一切都是对用户透明化的。 - -当前,ShardingSphere针对这种类型的脱敏解决方案并没有提供具体实现类,却将该理念抽象成接口,提供给用户自行实现。ShardingSphere将调用用户提供的该方案的具体实现类进行数据脱敏。 - - -## 后续 - -本篇文章介绍了如何使用ShardingSphere产品之一的Encrypt-JDBC进行接入,接入形式还可以选择使用SpringBoot、SpringNameSpace等,这种形态的接入端主要面向JAVA同构,并与业务代码共同部署在生产环境中。面向异构语言,ShardingSphere还提供Encrypt-Proxy客户端。Encrypt-Proxy是一款实现MySQL、PostgreSQL的二进制协议的服务器端产品,用户可独立部署Encrypt-Proxy服务,并且像使用普通MySQL、PostgreSQL数据库一样,使用例如Navicat第三方数据库管理工具、JAVA连接池、命令行的方式访问这台具有脱敏功能的`虚拟数据库服务器`。 - -脱敏功能属于Apache ShardingSphere分布式治理的功能范畴。事实上,Apache ShardingSphere这个生态还拥有其他更强大的能力,例如数据分片、读写分离、分布式事务、监控治理等。您甚至可以选择任意多种功能模块进行叠加使用,例如同时使用数据脱敏+数据分片,或是数据分片+读写分离,再或者是监控治理+数据分片等。除了在功能层面的叠加选择,ShardingSphere还提供了各种接入端形式,例如ShardingSphere-JDBC或ShardingSphere-Proxy等以满足大家不同场景需求。 +TODO \ No newline at end of file diff --git a/docs/document/content/features/encrypt/concept.en.md b/docs/document/content/features/encrypt/concept.en.md index d2a438ec24..18d1ffc62d 100644 --- a/docs/document/content/features/encrypt/concept.en.md +++ b/docs/document/content/features/encrypt/concept.en.md @@ -4,207 +4,4 @@ title = "Core Concept" weight = 1 +++ -## Detailed Process - -### Overall Architecture - -Encrypt-JDBC provided by ShardingSphere are deployed with business codes. Business parties need to perform JDBC programming for Encrypt-JDBC. Since Encrypt-JDBC implements all JDBC standard interfaces, business codes can be used without additional modification. At this time, Encrypt-JDBC is responsible for all interactions between the business code and the database. Business only needs to provide encryption rules. ** As a bridge between the business code and the underlying database, Encrypt-JDBC can intercept user behavior and interact with the database after transforming the user behavior. ** - -![1](https://shardingsphere.apache.org/document/current/img/encrypt/1_en.png) - -Encrypt-JDBC intercepts SQL initiated by user, analyzes and understands SQL behavior through the SQL syntax parser.According to the encryption rules passed by the user, find out the fields that need to be encrypted/decrypt and the encryptor/decryptor used to encrypt/decrypt the target fields, and then interact with the underlying database.ShardingSphere will encrypt the plaintext requested by the user and store it in the underlying database; and when the user queries, the ciphertext will be taken out of the database for decryption and returned to the end user.ShardingSphere shields the encryption of data, so that users do not need to perceive the process of parsing SQL, data encryption, and data decryption, just like using ordinary data. - -### Encryption Rule - -Before explaining the whole process in detail, we need to understand the encryption rules and configuration, which is the basis of understanding the whole process. The encryption configuration is mainly divided into four parts: data source configuration, encryptor configuration, encryption table configuration, and query attribute configuration. The details are shown in the following figure: - -![2](https://shardingsphere.apache.org/document/current/img/encrypt/2_en.png) - -**Datasource Configuration**:The configuration of DataSource. - -**Encryptor Configuration**:What kind of encryption strategy to use for encryption and decryption. Currently ShardingSphere has two built-in encryption/decryption strategies: AES / MD5. Users can also implement a set of encryption/decryption algorithms by implementing the interface provided by ShardingSphere. - -**Encryption Table Configuration**:Show the ShardingSphere data table which column is used to store cipher column data (cipherColumn), which column is used to store plain text data (plainColumn), and which column users want to use for SQL writing (logicColumn) - -> How to understand `Which column do users want to use to write SQL (logicColumn)`? -> -> We can understand according to the meaning of Encrypt-JDBC. The ultimate goal of Encrypt-JDBC is to shield the encryption of the underlying data, that is, we do not want users to know how the data is encrypted/decrypted, how to store plaintext data in plainColumn, and ciphertext data in cipherColumn. In other words, we do not even want users to know the existence and use of plainColumn and cipherColumn. Therefore, we need to provide users with a column in conceptual. This column can be separated from the real column of the underlying database. It can be a real column in the database table or not, so that the user can freely change the plainColumn and The column name of cipherColumn. Or delete plainColumn and choose to never store plain text and only store cipher text. As long as the user's SQL is written according to this logical column, and the correct mapping relationship between logicColumn and plainColumn, cipherColumn is given in the encryption rule. -> -> Why do you do this? The answer is at the end of the article, that is, to enable the online services to seamlessly, transparently, and safely carry out data encryption migration. - -**Query Attribute configuration**:When the plaintext data and ciphertext data are stored in the underlying database table at the same time, this attribute switch is used to decide whether to directly query the plaintext data in the database table to return, or to query the ciphertext data and decrypt it through Encrypt-JDBC to return. - -### Encryption Process - -For example, if there is a table in the database called t_user, there are actually two fields pwd_plain in this table, used to store plain text data, pwd_cipher, used to store cipher text data, and define logicColumn as pwd. Then, when writing SQL, users should write to logicColumn, that is, INSERT INTO t_user SET pwd = '123'. ShardingSphere receives the SQL, and through the encryption configuration provided by the user, finds that pwd is a logicColumn, so it decrypt the logical column and its corresponding plaintext data. As can be seen that ** ShardingSphere has carried out the column-sensitive and data-sensitive mapping conversion of the logical column facing the user and the plaintext and ciphertext columns facing the underlying database. **As shown below: - -![3](https://shardingsphere.apache.org/document/current/img/encrypt/3_en.png) - -** This is also the core meaning of Encrypt-JDBC, which is to separate user SQL from the underlying data table structure according to the encryption rules provided by the user, so that the SQL writter by user no longer depends on the actual database table structure. The connection, mapping, and conversion between the user and the underlying database are handled by ShardingSphere. ** Why should we do this? It is still the same : in order to enable the online business to seamlessly, transparently and safely perform data encryption migration. - -In order to make the reader more clearly understand the core processing flow of Encrypt-JDBC, the following picture shows the processing flow and conversion logic when using Encrypt-JDBC to add, delete, modify and check, as shown in the following figure. - -![4](https://shardingsphere.apache.org/document/current/img/encrypt/4_en.png) - -## Detailed Solution - -After understanding the ShardingSphere encryption process, you can combine the encryption configuration and encryption process with the actual scenario. All design and development are to solve the problems encountered in business scenarios. So for the business scenario requirements mentioned earlier, how should ShardingSphere be used to achieve business requirements? - -### New Business - -Business scenario analysis: The newly launched business is relatively simple because everything starts from scratch and there is no historical data cleaning problem. - -Solution description: After selecting the appropriate encryptor, such as AES, you only need to configure the logical column (write SQL for users) and the ciphertext column (the data table stores the ciphertext data). It can also be different **. The recommended configuration is as follows (shown in Yaml format): - -```yaml -encryptRule: - encryptors: - aes_encryptor: - type: aes - props: - aes.key.value: 123456abc - tables: - t_user: - columns: - pwd: - cipherColumn: pwd - encryptor: aes_encryptor -``` - -With this configuration, Encrypt-JDBC only needs to convert logicColumn and cipherColumn. The underlying data table does not store plain text, only cipher text. This is also a requirement of the security audit part. If users want to store plain text and cipher text together in the database, they just need to add plainColumn configuration. The overall processing flow is shown below: - -![5](https://shardingsphere.apache.org/document/current/img/encrypt/5_en.png) - -### Online Business Transformation - -Business scenario analysis: As the business is already running online, there must be a large amount of plain text historical data stored in the database. The current challenges are how to enable historical data to be encrypted and cleaned, how to enable incremental data to be encrypted, and how to allow businesses to seamlessly and transparently migrate between the old and new data systems. - -Solution description: Before providing a solution, let ’s brainstorm: First, if the old business needs to be desensitized, it must have stored very important and sensitive information. This information has a high gold content and the business is relatively important. If it is broken, the whole team KPI is over. Therefore, it is impossible to suspend business immediately, prohibit writing of new data, encrypt and clean all historical data with an encrypter, and then deploy the previously reconstructed code online, so that it can encrypt and decrypt online and incremental data. Such a simple and rough way, based on historical experience, will definitely not work. - -Then another relatively safe approach is to rebuild a pre-release environment exactly like the production environment, and then encrypt the ** Inventory plaintext data ** of the production environment through the relevant migration and washing tools and store it in the pre-release environment. The ** Increment data ** is encrypted by tools such as MySQL master-slave replication and the business party ’s own development, encrypted and stored in the database of the pre-release environment, and then the refactored code can be deployed to the pre-release environment. In this way, the production environment is a set of environment for ** modified/queries with plain text as the core **; the pre-release environment is a set of ** encrypt/decrypt queries modified with ciphertext as the core **. After comparing for a period of time, the production flow can be cut into the pre-release environment at night. This solution is relatively safe and reliable, but it takes more time, manpower, capital, and costs. It mainly includes: pre-release environment construction, production code rectification, and related auxiliary tool development. Unless there is no way to go, business developers generally go from getting started to giving up. - -Business developers must hope: reduce the burden of capital costs, do not modify the business code, and be able to safely and smoothly migrate the system. So, the encryption function module of ShardingSphere was born. It can be divided into three steps: - -1. Before system migration - - Assuming that the system needs to encrypt the pwd field of t_user, the business side uses Encrypt-JDBC to replace the standardized JDBC interface, which basically requires no additional modification (we also provide SpringBoot, SpringNameSpace, Yaml and other access methods to achieve different services demand). In addition, demonstrate a set of encryption configuration rules, as follows: - - ```yaml - encryptRule: - encryptors: - aes_encryptor: - type: aes - props: - aes.key.value: 123456abc - tables: - t_user: - columns: - pwd: - plainColumn: pwd - cipherColumn: pwd_cipher - encryptor: aes_encryptor - props: - query.with.cipher.column: false - ``` - - According to the above encryption rules, we need to add a column called pwd_cipher in the t_user table, that is, cipherColumn, which is used to store ciphertext data. At the same time, we set plainColumn to pwd, which is used to store plaintext data, and logicColumn is also set to pwd . Because the previous SQL was written using pwd, that is, the SQL was written for logical columns, so the business code did not need to be changed. Through Encrypt-JDBC, for the incremental data, the plain text will be written to the pwd column, and the plain text will be encrypted and stored in the pwd_cipher column. At this time, because query.with.cipher.column is set to false, for business applications, the plain text column of pwd is still used for query storage, but the cipher text data of the new data is additionally stored on the underlying database table pwd_cipher. The processing flow is shown below: - - - ![6](https://shardingsphere.apache.org/document/current/img/encrypt/6_en.png) - - When the newly added data is inserted, it is encrypted as ciphertext data through Encrypt-JDBC and stored in the cipherColumn. Now it is necessary to process historical plaintext inventory data. ** As Apache ShardingSphere currently does not provide the corresponding migration and washing tools, the business party needs to encrypt and store the plain text data in pwd to pwd_cipher. ** - -2. During system migration - - The incremental data has been stored by Encrypt-JDBC in the ciphertext column and the plaintext is stored in the plaintext column; after the historical data is encrypted and cleaned by the business party itself, the ciphertext is also stored in the ciphertext column. That is to say, the plaintext and the ciphertext are stored in the current database. Since the query.with.cipher.column = false in the configuration item, the ciphertext has never been used. Now we need to set the query.with.cipher.column in the encryption configuration to true in order for the system to cut the ciphertext data for query. After restarting the system, we found that the system business is normal, but Encrypt-JDBC has started to extract the ciphertext data from the database, decrypt it and return it to the user; and for the user's insert, delete and update requirements, the original data will still be stored The plaintext column, the encrypted ciphertext data is stored in the ciphertext column. - - Although the business system extracts the data in the ciphertext column and returns it after decryption; however, it will still save a copy of the original data to the plaintext column during storage. Why? The answer is: in order to be able to roll back the system. ** Because as long as the ciphertext and plaintext always exist at the same time, we can freely switch the business query to cipherColumn or plainColumn through the configuration of the switch item. ** In other words, if the system is switched to the ciphertext column for query, the system reports an error and needs to be rolled back. Then just set query.with.cipher.column = false, Encrypt-JDBC will restore, that is, start using plainColumn to query again. The processing flow is shown in the following figure: - - ![7](https://shardingsphere.apache.org/document/current/img/encrypt/7_en.png) - - - -3. After system migration - - Due to the requirements of the security audit department, it is generally impossible for the business system to keep the plaintext and ciphertext columns of the database permanently synchronized. We need to delete the plaintext data after the system is stable. That is, we need to delete plainColumn (ie pwd) after system migration. The problem is that now the business code is written for pwd SQL, delete the pwd in the underlying data table stored in plain text, and use pwd_cipher to decrypt to get the original data, does that mean that the business side needs to rectify all SQL, thus Do not use the pwd column that is about to be deleted? Remember the core meaning of our Encrypt-JDBC? - - > This is also the core meaning of Encrypt-JDBC. According to the encryption rules provided by the user, the user SQL is separated from the underlying database table structure, so that the user's SQL writing no longer depends on the actual database table structure. The connection, mapping, and conversion between the user and the underlying database are handled by ShardingSphere. - - Yes, because of the existence of logicColumn, users write SQL for this virtual column. Encrypt-JDBC can map this logical column and the ciphertext column in the underlying data table. So the encryption configuration after migration is: - - ```yaml - encryptRule: - encryptors: - aes_encryptor: - type: aes - props: - aes.key.value: 123456abc - tables: - t_user: - columns: - pwd: # pwd与pwd_cipher的转换映射 - cipherColumn: pwd_cipher - encryptor: aes_encryptor - props: - query.with.cipher.column: true - ``` - -The processing flow is as follows: - -![8](https://shardingsphere.apache.org/document/current/img/encrypt/8_en.png) - -So far, the online service encryption and rectification solutions have all been demonstrated. We provide Java, Yaml, SpringBoot, SpringNameSpace multiple ways for users to choose to use, and strive to fulfil business requirements. The solution has been continuously launched on JD Digits, providing internal basic service support. - -## The advantages of Middleware encryption service - -1. Automated & transparent data encryption process, users do not need to pay attention to the implementation details of encryption. -2. Provide a variety of built-in, third-party (AKS) encryption strategies, users only need to modify the configuration to use. -3. Provides a encryption strategy API interface, users can implement the interface to use a custom encryption strategy for data encryption. -4. Support switching different encryption strategies. -5. For online services, it is possible to store plaintext data and ciphertext data synchronously, and decide whether to use plaintext or ciphertext columns for query through configuration. Without changing the business query SQL, the on-line system can safely and transparently migrate data before and after encryption. - -## Description of applicable scenarios - -1. User projects are developed in Java. -2. The back-end databases are MySQL, Oracle, PostgreSQL, and SQLServer. -3. The user needs to encrypt one or more columns in the database table (data encryption & decryption). -4. Compatible with all commonly used SQL. - -## Limitation - -1. Users need to deal with the original inventory data and wash numbers in the database. -2. Use encryption function + sub-library sub-table function, some special SQL is not supported, please refer to [SQL specification]( https://shardingsphere.apache.org/document/current/en/features/sharding/use-norms/sql/)。 -3. Encryption fields cannot support comparison operations, such as: greater than less than, ORDER BY, BETWEEN, LIKE, etc. -4. Encryption fields cannot support calculation operations, such as AVG, SUM, and calculation expressions. - -## Solution - -ShardingSphere has provided two data masking solutions, corresponding to two ShardingSphere encryption and decryption interfaces, i.e., `ShardingEncryptor` and `ShardingQueryAssistedEncryptor`. - -On the one hand, ShardingSphere has provided internal encryption and decryption implementations for users, which can be used by them only after configuration. On the other hand, to satisfy users' requirements for different scenarios, we have also opened relevant encryption and decryption interfaces, according to which, users can provide specific implementation types. Then, after simple configurations, ShardingSphere can use encryption and decryption solutions defined by users themselves to desensitize data. - -### ShardingEncryptor - -The solution has provided two methods `encrypt()` and `decrypt()` to encrypt/decrypt data for encryption. - -When users `INSERT`, `DELETE` and `UPDATE`, ShardingSphere will parse, rewrite and route SQL according to the configuration. It will also use `encrypt()` to encrypt data and store them in the database. When using `SELECT`, they will decrypt sensitive data from the database with `decrypt()` reversely and return them to users at last. - -Currently, ShardingSphere has provided two types of implementations for this kind of masking solution, MD5 (irreversible) and AES (reversible), which can be used after configuration. - -### ShardingQueryAssistedEncryptor - -Compared with the first masking scheme, this one is more secure and complex. Its concept is: even the same data, two same user passwords for example, should not be stored as the same desensitized form in the database. It can help to protect user information and avoid credential stuffing. - -This scheme provides three functions to implement, `encrypt()`, `decrypt()` and `queryAssistedEncrypt()`. In `encrypt()` phase, users can set some variable, timestamp for example, and encrypt a combination of original data + variable. This method can make sure the encrypted masking data of the same original data are different, due to the existence of variables. In `decrypt()` phase, users can use variable data to decrypt according to the encryption algorithms set formerly. - -Though this method can indeed increase data security, another problem can appear with it: as the same data is stored in the database in different content, users may not be able to find out all the same original data with equivalent query (`SELECT FROM table WHERE encryptedColumnn = ?`) according to this encryption column.Because of it, we have brought out assistant query column, which is generated by `queryAssistedEncrypt()`. Different from `decrypt()`, this method uses another way to encrypt the original data; but for the same original data, it can generate consistent encryption data. Users can store data processed by `queryAssistedEncrypt()` to assist the query of original data. So there may be one more assistant query column in the table. - -`queryAssistedEncrypt()` and `encrypt()` can generate and store different encryption data; `decrypt()` is reversible and `queryAssistedEncrypt()` is irreversible. So when querying the original data, we will parse, rewrite and route SQL automatically. We will also use assistant query column to do `WHERE` queries and use `decrypt()` to decrypt `encrypt()` data and return them to users. All these can not be felt by users. - -For now, ShardingSphere has abstracted the concept to be an interface for users to develop rather than providing accurate implementation for this kind of masking solution. ShardingSphere will use the accurate implementation of this solution provided by users to desensitize data. - -## Continuance - -This article describes how to use Encrypt-JDBC, one of the ShardingSphere products, SpringBoot, SpringNameSpace are also could be the access form , etc. This form of access mainly focus to Java homogeneous, and is deployed together with business code In a production environment. For heterogeneous languages, ShardingSphere also provides Encrypt-Proxy client. Encrypt-Proxy is a server-side product that implements the binary protocol of MySQL and PostgreSQL. Users can independently deploy the Encrypt-Proxy service, User can access this `virtual database server` with encryption through third-party database management tools(e.g. Navicat), JAVA connection pool or the command line, just like access ordinary MySQL and PostgreSQL databases. - -The encryption function belongs to distributed governance of Apache ShardingSphere. In fact, the Apache ShardingSphere ecosystem also has other more powerful capabilities, such as data sharding, read-write separation, distributed transactions, and monitoring governance. You can even choose any combination of these functions, such as encryption + data sharding, or data sharding + read-write separation, or monitoring governance + data sharding. In addition to the combination of these functions, ShardingSphere also provides various access forms, such as ShardingSphere-JDBC and ShardingSphere-Proxy for different situations. +TODO \ No newline at end of file diff --git a/docs/document/content/features/encrypt/principle.cn.md b/docs/document/content/features/encrypt/principle.cn.md new file mode 100644 index 0000000000..ab0c6ac0f8 --- /dev/null +++ b/docs/document/content/features/encrypt/principle.cn.md @@ -0,0 +1,233 @@ ++++ +pre = "3.6.2. " +title = "实现原理" +weight = 2 ++++ + +## 处理流程详解 + +Apache ShardingSphere 通过对用户输入的 SQL 进行解析,并依据用户提供的加密规则对 SQL 进行改写,从而实现对原文数据进行加密,并将原文数据(可选)及密文数据同时存储到底层数据库。 +在用户查询数据时,它仅从数据库中取出密文数据,并对其解密,最终将解密后的原始数据返回给用户。 +Apache ShardingSphere 自动化 & 透明化了数据加密过程,让用户无需关注数据加密的实现细节,像使用普通数据那样使用加密数据。 +此外,无论是已在线业务进行加密改造,还是新上线业务使用加密功能,Apache ShardingSphere 都可以提供一套相对完善的解决方案。 + +### 整体架构 + +![1](https://shardingsphere.apache.org/document/current/img/encrypt/1.png) + +加密模块将用户发起的 SQL 进行拦截,并通过 SQL 语法解析器进行解析、理解 SQL 行为,再依据用户传入的加密规则,找出需要加密的字段和所使用的加解密器对目标字段进行加解密处理后,再与底层数据库进行交互。 +Apache ShardingSphere 会将用户请求的明文进行加密后存储到底层数据库;并在用户查询时,将密文从数据库中取出进行解密后返回给终端用户。 +通过屏蔽对数据的加密处理,使用户无需感知解析 SQL、数据加密、数据解密的处理过程,就像在使用普通数据一样使用加密数据。 + +### 加密规则 + +在详解整套流程之前,我们需要先了解下加密规则与配置,这是认识整套流程的基础。加密配置主要分为四部分:数据源配置,加密器配置,加密表配置以及查询属性配置,其详情如下图所示: + +![2](https://shardingsphere.apache.org/document/current/img/encrypt/2.png) + +**数据源配置**:指数据源配置。 + +**加密器配置**:指使用什么加密策略进行加解密。目前ShardingSphere内置了两种加解密策略:AES/MD5。用户还可以通过实现ShardingSphere提供的接口,自行实现一套加解密算法。 + +**加密表配置**:用于告诉ShardingSphere数据表里哪个列用于存储密文数据(cipherColumn)、哪个列用于存储明文数据(plainColumn)以及用户想使用哪个列进行SQL编写(logicColumn)。 + +> 如何理解`用户想使用哪个列进行SQL编写(logicColumn)`? +> +> 我们可以从加密模块存在的意义来理解。加密模块最终目的是希望屏蔽底层对数据的加密处理,也就是说我们不希望用户知道数据是如何被加解密的、如何将明文数据存储到 plainColumn,将密文数据存储到 cipherColumn。 +换句话说,我们不希望用户知道 plainColumn 和 cipherColumn 的存在和使用。 +所以,我们需要给用户提供一个概念意义上的列,这个列可以脱离底层数据库的真实列,它可以是数据库表里的一个真实列,也可以不是,从而使得用户可以随意改变底层数据库的 plainColumn 和 cipherColumn 的列名。 +或者删除 plainColumn,选择永远不再存储明文,只存储密文。 +只要用户的 SQL 面向这个逻辑列进行编写,并在加密规则里给出 logicColumn 和 plainColumn、cipherColumn 之间正确的映射关系即可。 +> +> 为什么要这么做呢?答案在文章后面,即为了让已上线的业务能无缝、透明、安全地进行数据加密迁移。 + +**查询属性的配置**:当底层数据库表里同时存储了明文数据、密文数据后,该属性开关用于决定是直接查询数据库表里的明文数据进行返回,还是查询密文数据通过 Apache ShardingSphere 解密后返回。 + +### 加密处理过程 + +举例说明,假如数据库里有一张表叫做 `t_user`,这张表里实际有两个字段 `pwd_plain`,用于存放明文数据、`pwd_cipher`,用于存放密文数据,同时定义 logicColumn 为 `pwd`。 +那么,用户在编写 SQL 时应该面向 logicColumn 进行编写,即 `INSERT INTO t_user SET pwd = '123'`。 +Apache ShardingSphere 接收到该 SQL,通过用户提供的加密配置,发现 `pwd` 是 logicColumn,于是便对逻辑列及其对应的明文数据进行加密处理。 +**Apache ShardingSphere 将面向用户的逻辑列与面向底层数据库的明文列和密文列进行了列名以及数据的加密映射转换。** +如下图所示: + +![3](https://shardingsphere.apache.org/document/current/img/encrypt/3.png) + +即依据用户提供的加密规则,将用户 SQL 与底层数据表结构割裂开来,使得用户的 SQL 编写不再依赖于真实的数据库表结构。 +而用户与底层数据库之间的衔接、映射、转换交由 Apache ShardingSphere 进行处理。 + +下方图片展示了使用加密模块进行增删改查时,其中的处理流程和转换逻辑,如下图所示。 + +![4](https://shardingsphere.apache.org/document/current/img/encrypt/4.png) + +## 解决方案详解 + +在了解了 Apache ShardingSphere 加密处理流程后,即可将加密配置、加密处理流程与实际场景进行结合。 +所有的设计开发都是为了解决业务场景遇到的痛点。那么面对之前提到的业务场景需求,又应该如何使用 Apache ShardingSphere 这把利器来满足业务需求呢? + +### 新上线业务 + +业务场景分析:新上线业务由于一切从零开始,不存在历史数据清洗问题,所以相对简单。 + +解决方案说明:选择合适的加密器,如 AES 后,只需配置逻辑列(面向用户编写 SQL )和密文列(数据表存密文数据)即可,**逻辑列和密文列可以相同也可以不同**。建议配置如下(YAML 格式展示): + +```yaml +-!ENCRYPT + encryptors: + aes_encryptor: + type: aes + props: + aes.key.value: 123456abc + tables: + t_user: + columns: + pwd: + cipherColumn: pwd + encryptor: aes_encryptor +``` + +使用这套配置, Apache ShardingSphere 只需将 logicColumn 和 cipherColumn 进行转换,底层数据表不存储明文,只存储了密文,这也是安全审计部分的要求所在。 +如果用户希望将明文、密文一同存储到数据库,只需添加 plainColumn 配置即可。整体处理流程如下图所示: + +![5](https://shardingsphere.apache.org/document/current/img/encrypt/5.png) + +### 已上线业务改造 + +业务场景分析:由于业务已经在线上运行,数据库里必然存有大量明文历史数据。现在的问题是如何让历史数据得以加密清洗、如何让增量数据得以加密处理、如何让业务在新旧两套数据系统之间进行无缝、透明化迁移。 + +解决方案说明:在提供解决方案之前,我们先来头脑风暴一下:首先,既然是旧业务需要进行加密改造,那一定存储了非常重要且敏感的信息。这些信息含金量高且业务相对基础重要。 +不应该采用停止业务禁止新数据写入,再找个加密器把历史数据全部加密清洗,再把之前重构的代码部署上线,使其能把存量和增量数据进行在线加密解密。 + +那么另一种相对安全的做法是:重新搭建一套和生产环境一模一样的预发环境,然后通过相关迁移洗数工具把生产环境的**存量原文数据**加密后存储到预发环境, +而**新增数据**则通过例如 MySQL 主从复制及业务方自行开发的工具加密后存储到预发环境的数据库里,再把重构后可以进行加解密的代码部署到预发环境。 +这样生产环境是一套**以明文为核心的查询修改**的环境;预发环境是一套**以密文为核心加解密查询修改**的环境。 +在对比一段时间无误后,可以夜间操作将生产流量切到预发环境中。 +此方案相对安全可靠,只是时间、人力、资金、成本较高,主要包括:预发环境搭建、生产代码整改、相关辅助工具开发等。 + +业务开发人员最希望的做法是:减少资金费用的承担、最好不要修改业务代码、能够安全平滑迁移系统。于是,ShardingSphere的加密功能模块便应用而生。可分为 3 步进行: + +1. 系统迁移前 + +假设系统需要对 `t_user` 的 `pwd`。字段进行加密处理,业务方使用 Apache ShardingSphere 来代替标准化的 JDBC 接口,此举基本不需要额外改造(我们还提供了 Spring Boot Starter,Spring 命名空间,YAML 等接入方式,满足不同业务方需求)。 +另外,提供一套加密配置规则,如下所示: + +```yaml +-!ENCRYPT + encryptors: + aes_encryptor: + type: aes + props: + aes.key.value: 123456abc + tables: + t_user: + columns: + pwd: + plainColumn: pwd + cipherColumn: pwd_cipher + encryptor: aes_encryptor +props: + query.with.cipher.column: false +``` + +依据上述加密规则可知,首先需要在数据库表 `t_user` 里新增一个字段叫做 `pwd_cipher`,即 cipherColumn,用于存放密文数据,同时我们把 plainColumn 设置为 `pwd`,用于存放明文数据,而把 logicColumn 也设置为 `pwd`。 +由于之前的代码 SQL 就是使用 `pwd` 进行编写,即面向逻辑列进行 SQL 编写,所以业务代码无需改动。 +通过 Apache ShardingSphere,针对新增的数据,会把明文写到pwd列,并同时把明文进行加密存储到 `pwd_cipher` 列。 +此时,由于 `query.with.cipher.column` 设置为 false,对业务应用来说,依旧使用 `pwd` 这一明文列进行查询存储,却在底层数据库表 `pwd_cipher` 上额外存储了新增数据的密文数据,其处理流程如下图所示: + +![6](https://shardingsphere.apache.org/document/current/img/encrypt/6.png) + +新增数据在插入时,就通过 Apache ShardingSphere 加密为密文数据,并被存储到了 cipherColumn。而现在就需要处理历史明文存量数据。 +**由于Apache ShardingSphere 目前并未提供相关迁移洗数工具,此时需要业务方自行将 `pwd` 中的明文数据进行加密处理存储到 `pwd_cipher`。** + +2. 系统迁移中 + +新增的数据已被 Apache ShardingSphere 将密文存储到密文列,明文存储到明文列;历史数据被业务方自行加密清洗后,将密文也存储到密文列。 +也就是说现在的数据库里即存放着明文也存放着密文,只是由于配置项中的 query.with.cipher.column=false,所以密文一直没有被使用过。 +现在我们为了让系统能切到密文数据进行查询,需要将加密配置中的query.with.cipher.column设置为true。 +在重启系统后,我们发现系统业务一切正常,但是 Apache ShardingSphere 已经开始从数据库里取出密文列的数据,解密后返回给用户; +而对于用户的增删改需求,则依旧会把原文数据存储到明文列,加密后密文数据存储到密文列。 + +虽然现在业务系统通过将密文列的数据取出,解密后返回;但是,在存储的时候仍旧会存一份原文数据到明文列,这是为什么呢? +答案是:为了能够进行系统回滚。 +**因为只要密文和明文永远同时存在,我们就可以通过开关项配置自由将业务查询切换到 cipherColumn 或 plainColumn。** +也就是说,如果将系统切到密文列进行查询时,发现系统报错,需要回滚。那么只需将 query.with.cipher.column=false,Apache ShardingSphere 将会还原,即又重新开始使用 plainColumn 进行查询。 +处理流程如下图所示: + +![7](https://shardingsphere.apache.org/document/current/img/encrypt/7.png) + +3. 系统迁移后 + +由于安全审计部门要求,业务系统一般不可能让数据库的明文列和密文列永久同步保留,我们需要在系统稳定后将明文列数据删除。 +即我们需要在系统迁移后将 plainColumn,即pwd进行删除。那问题来了,现在业务代码都是面向pwd进行编写 SQL 的,把底层数据表中的存放明文的 pwd 删除了, +换用 pwd_cipher 进行解密得到原文数据,那岂不是意味着业务方需要整改所有 SQL,从而不使用即将要被删除的 pwd 列?还记得我们 Apache ShardingSphere 的核心意义所在吗? + +> 这也正是 Apache ShardingSphere 核心意义所在,即依据用户提供的加密规则,将用户 SQL 与底层数据库表结构割裂开来,使得用户的SQL编写不再依赖于真实的数据库表结构。 +而用户与底层数据库之间的衔接、映射、转换交由 Apache ShardingSphere 进行处理。 + +是的,因为有 logicColumn 存在,用户的编写 SQL 都面向这个虚拟列,Apache ShardingSphere 就可以把这个逻辑列和底层数据表中的密文列进行映射转换。于是迁移后的加密配置即为: + +```yaml +-!ENCRYPT + encryptors: + aes_encryptor: + type: aes + props: + aes.key.value: 123456abc + tables: + t_user: + columns: + pwd: # pwd与pwd_cipher的转换映射 + cipherColumn: pwd_cipher + encryptor: aes_encryptor +props: + query.with.cipher.column: true +``` + +其处理流程如下: + +![8](https://shardingsphere.apache.org/document/current/img/encrypt/8.png) + +至此,已在线业务加密整改解决方案全部叙述完毕。我们提供了 Java、YAML、Spring Boot Starter、Spring 命名空间多种方式供用户选择接入,力求满足业务不同的接入需求。 +该解决方案目前已在京东数科不断落地上线,提供对内基础服务支撑。 + +## 中间件加密服务优势 + +1. 自动化&透明化数据加密过程,用户无需关注加密中间实现细节。 +2. 提供多种内置、第三方(AKS)的加密策略,用户仅需简单配置即可使用。 +3. 提供加密策略API接口,用户可实现接口,从而使用自定义加密策略进行数据加密。 +4. 支持切换不同的加密策略。 +5. 针对已上线业务,可实现明文数据与密文数据同步存储,并通过配置决定使用明文列还是密文列进行查询。可实现在不改变业务查询SQL前提下,已上线系统对加密前后数据进行安全、透明化迁移。 + +## 加密策略解析 + +Apache ShardingSphere 提供了两种加密策略用于数据加密,该两种策略分别对应 Apache ShardingSphere 的两种加解密的接口,即 `ShardingEncryptor` 和 `ShardingQueryAssistedEncryptor`。 + +一方面,Apache ShardingSphere 为用户提供了内置的加解密实现类,用户只需进行配置即可使用; +另一方面,为了满足用户不同场景的需求,我们还开放了相关加解密接口,用户可依据该两种类型的接口提供具体实现类。 +再进行简单配置,即可让 Apache ShardingSphere 调用用户自定义的加解密方案进行数据加密。 + +### ShardingEncryptor + +该解决方案通过提供`encrypt()`, `decrypt()`两种方法对需要加密的数据进行加解密。 +在用户进行`INSERT`, `DELETE`, `UPDATE`时,ShardingSphere会按照用户配置,对SQL进行解析、改写、路由,并会调用`encrypt()`将数据加密后存储到数据库, +而在`SELECT`时,则调用`decrypt()`方法将从数据库中取出的加密数据进行逆向解密,最终将原始数据返回给用户。 + +当前,Apache ShardingSphere 针对这种类型的加密解决方案提供了两种具体实现类,分别是 MD5(不可逆),AES(可逆),用户只需配置即可使用这两种内置的方案。 + +### ShardingQueryAssistedEncryptor + +相比较于第一种加密方案,该方案更为安全和复杂。它的理念是:即使是相同的数据,如两个用户的密码相同,它们在数据库里存储的加密数据也应当是不一样的。这种理念更有利于保护用户信息,防止撞库成功。 + +它提供三种函数进行实现,分别是`encrypt()`, `decrypt()`, `queryAssistedEncrypt()`。在`encrypt()`阶段,用户通过设置某个变动种子,例如时间戳。 +针对原始数据+变动种子组合的内容进行加密,就能保证即使原始数据相同,也因为有变动种子的存在,致使加密后的加密数据是不一样的。在`decrypt()`可依据之前规定的加密算法,利用种子数据进行解密。 + +虽然这种方式确实可以增加数据的保密性,但是另一个问题却随之出现:相同的数据在数据库里存储的内容是不一样的,那么当用户按照这个加密列进行等值查询(`SELECT FROM table WHERE encryptedColumnn = ?`)时会发现无法将所有相同的原始数据查询出来。 +为此,我们提出了辅助查询列的概念。 +该辅助查询列通过`queryAssistedEncrypt()`生成,与`decrypt()`不同的是,该方法通过对原始数据进行另一种方式的加密,但是针对原始数据相同的数据,这种加密方式产生的加密数据是一致的。 +将`queryAssistedEncrypt()`后的数据存储到数据中用于辅助查询真实数据。因此,数据库表中多出这一个辅助查询列。 + +由于`queryAssistedEncrypt()`和`encrypt()`产生不同加密数据进行存储,而`decrypt()`可逆,`queryAssistedEncrypt()`不可逆。 +在查询原始数据的时候,我们会自动对SQL进行解析、改写、路由,利用辅助查询列进行`WHERE`条件的查询,却利用 `decrypt()`对`encrypt()`加密后的数据进行解密,并将原始数据返回给用户。 +这一切都是对用户透明化的。 + +当前,Apache ShardingSphere 针对这种类型的加密解决方案并没有提供具体实现类,却将该理念抽象成接口,提供给用户自行实现。ShardingSphere将调用用户提供的该方案的具体实现类进行数据加密。 diff --git a/docs/document/content/features/encrypt/principle.en.md b/docs/document/content/features/encrypt/principle.en.md new file mode 100644 index 0000000000..cfe8657068 --- /dev/null +++ b/docs/document/content/features/encrypt/principle.en.md @@ -0,0 +1,294 @@ ++++ +pre = "3.6.2. " +title = "Principle" +weight = 2 ++++ + +## Process Details + +Apache ShardingSphere can encrypt the plaintext by parsing and rewriting SQL according to the encryption rule, +and store the plaintext (optional) and ciphertext data to the database at the same time. +Queries data only extracts the ciphertext data from database and decrypts it, and finally returns the plaintext to user. +Apache ShardingSphere transparently process of data encryption, so that users do not need to know to the implementation details of it, use encrypted data just like as regular data. +In addition, Apache ShardingSphere can provide a relatively complete set of solutions whether the online business system has been encrypted or the new online business system uses the encryption function. + +### Overall Architecture + +![1](https://shardingsphere.apache.org/document/current/img/encrypt/1_en.png) + +Encrypt module intercepts SQL initiated by user, analyzes and understands SQL behavior through the SQL syntax parser. +According to the encryption rules passed by the user, find out the fields that need to be encrypted/decrypted and the encryptor/decryptor used to encrypt/decrypt the target fields, +and then interact with the underlying database. +ShardingSphere will encrypt the plaintext requested by the user and store it in the underlying database; +and when the user queries, the ciphertext will be taken out of the database for decryption and returned to the end user. +ShardingSphere shields the encryption of data, so that users do not need to perceive the process of parsing SQL, data encryption, and data decryption, +just like using ordinary data. + +### Encryption Rule + +Before explaining the whole process in detail, we need to understand the encryption rules and configuration, which is the basis of understanding the whole process. +The encryption configuration is mainly divided into four parts: data source configuration, encryptor configuration, encryption table configuration, and query attribute configuration. + The details are shown in the following figure: + +![2](https://shardingsphere.apache.org/document/current/img/encrypt/2_en.png) + +**Datasource Configuration**:The configuration of DataSource. + +**Encryptor Configuration**:What kind of encryption strategy to use for encryption and decryption. +Currently ShardingSphere has two built-in encryption/decryption strategies: AES / MD5. +Users can also implement a set of encryption/decryption algorithms by implementing the interface provided by Apache ShardingSphere. + +**Encryption Table Configuration**:Show the ShardingSphere data table which column is used to store cipher column data (cipherColumn), +which column is used to store plain text data (plainColumn), and which column users want to use for SQL writing (logicColumn) + +> How to understand `Which column do users want to use to write SQL (logicColumn)`? +> +> We can understand according to the meaning of Apache ShardingSphere. +The ultimate goal of Apache ShardingSphere is to shield the encryption of the underlying data, that is, we do not want users to know how the data is encrypted/decrypted, +how to store plaintext data in plainColumn, and ciphertext data in cipherColumn. +In other words, we do not even want users to know the existence and use of plainColumn and cipherColumn. +Therefore, we need to provide users with a column in conceptual. This column can be separated from the real column of the underlying database. +It can be a real column in the database table or not, so that the user can freely change the plainColumn and The column name of cipherColumn. +Or delete plainColumn and choose to never store plain text and only store cipher text. +As long as the user's SQL is written according to this logical column, and the correct mapping relationship between logicColumn and plainColumn, cipherColumn is given in the encryption rule. +> +> Why do you do this? The answer is at the end of the article, that is, to enable the online services to seamlessly, transparently, and safely carry out data encryption migration. + +**Query Attribute configuration**:When the plaintext data and ciphertext data are stored in the underlying database table at the same time, +this attribute switch is used to decide whether to directly query the plaintext data in the database table to return, +or to query the ciphertext data and decrypt it through Apache ShardingSphere to return. + +### Encryption Process + +For example, if there is a table in the database called t_user, there are actually two fields pwd_plain in this table, used to store plain text data, pwd_cipher, used to store cipher text data, and define logicColumn as pwd. +Then, when writing SQL, users should write to logicColumn, that is, `INSERT INTO t_user SET pwd = '123'`. +Apache ShardingSphere receives the SQL, and through the encryption configuration provided by the user, finds that pwd is a logicColumn, so it decrypt the logical column and its corresponding plaintext data. +As can be seen that ** Apache ShardingSphere has carried out the column-sensitive and data-sensitive mapping conversion of the logical column facing the user and the plaintext and ciphertext columns facing the underlying database. +As shown below: + +![3](https://shardingsphere.apache.org/document/current/img/encrypt/3_en.png) + +This is also the core meaning of Apache ShardingSphere, which is to separate user SQL from the underlying data table structure according to the encryption rules provided by the user, +so that the SQL writter by user no longer depends on the actual database table structure. +The connection, mapping, and conversion between the user and the underlying database are handled by Apache ShardingSphere. +Why should we do this? +It is still the same : in order to enable the online business to seamlessly, transparently and safely perform data encryption migration. + +In order to make the reader more clearly understand the core processing flow of Apache ShardingSphere, +the following picture shows the processing flow and conversion logic when using Apache ShardingSphere to add, delete, modify and check, as shown in the following figure. + +![4](https://shardingsphere.apache.org/document/current/img/encrypt/4_en.png) + +## Detailed Solution + +After understanding the Apache ShardingSphere encryption process, you can combine the encryption configuration and encryption process with the actual scenario. +All design and development are to solve the problems encountered in business scenarios. So for the business scenario requirements mentioned earlier, +how should ShardingSphere be used to achieve business requirements? + +### New Business + +Business scenario analysis: The newly launched business is relatively simple because everything starts from scratch and there is no historical data cleaning problem. + +Solution description: After selecting the appropriate encryptor, such as AES, +you only need to configure the logical column (write SQL for users) and the ciphertext column (the data table stores the ciphertext data). +It can also be different **. The recommended configuration is as follows (shown in Yaml format): + +```yaml +-!ENCRYPT + encryptors: + aes_encryptor: + type: aes + props: + aes.key.value: 123456abc + tables: + t_user: + columns: + pwd: + cipherColumn: pwd + encryptor: aes_encryptor +``` + +With this configuration, Apache ShardingSphere only needs to convert logicColumn and cipherColumn. +The underlying data table does not store plain text, only cipher text. +This is also a requirement of the security audit part. If users want to store plain text and cipher text together in the database, +they just need to add plainColumn configuration. The overall processing flow is shown below: + +![5](https://shardingsphere.apache.org/document/current/img/encrypt/5_en.png) + +### Online Business Transformation + +Business scenario analysis: As the business is already running online, there must be a large amount of plain text historical data stored in the database. +The current challenges are how to enable historical data to be encrypted and cleaned, how to enable incremental data to be encrypted, +and how to allow businesses to seamlessly and transparently migrate between the old and new data systems. + +Solution description: Before providing a solution, let ’s brainstorm: +First, if the old business needs to be desensitized, it must have stored very important and sensitive information. +This information has a high gold content and the business is relatively important. +If it is broken, the whole team KPI is over. +Therefore, it is impossible to suspend business immediately, prohibit writing of new data, encrypt and clean all historical data with an encryptor, +and then deploy the previously reconstructed code online, so that it can encrypt and decrypt online and incremental data. +Such a simple and rough way, based on historical experience, will definitely not work. + +Then another relatively safe approach is to rebuild a pre-release environment exactly like the production environment, +and then encrypt the **Inventory plaintext data** of the production environment through the relevant migration and washing tools and store it in the pre-release environment. +The **Increment data** is encrypted by tools such as MySQL master-slave replication and the business party ’s own development, +encrypted and stored in the database of the pre-release environment, and then the refactored code can be deployed to the pre-release environment. +In this way, the production environment is a set of environment for **modified/queries with plain text as the core**; +the pre-release environment is a set of **encrypt/decrypt queries modified with ciphertext as the core**. +After comparing for a period of time, the production flow can be cut into the pre-release environment at night. +This solution is relatively safe and reliable, but it takes more time, manpower, capital, and costs. +It mainly includes: pre-release environment construction, production code rectification, and related auxiliary tool development. +Unless there is no way to go, business developers generally go from getting started to giving up. + +Business developers must hope: reduce the burden of capital costs, do not modify the business code, and be able to safely and smoothly migrate the system. +So, the encryption function module of ShardingSphere was born. It can be divided into three steps: + +1. Before system migration + +Assuming that the system needs to encrypt the pwd field of t_user, the business side uses Apache ShardingSphere to replace the standardized JDBC interface, +which basically requires no additional modification (we also provide Spring Boot Starter, Spring Namespace, YAML and other access methods to achieve different services demand). +In addition, demonstrate a set of encryption configuration rules, as follows: + +```yaml +-!ENCRYPT + encryptors: + aes_encryptor: + type: aes + props: + aes.key.value: 123456abc + tables: + t_user: + columns: + pwd: + plainColumn: pwd + cipherColumn: pwd_cipher + encryptor: aes_encryptor +props: + query.with.cipher.column: false +``` + +According to the above encryption rules, we need to add a column called pwd_cipher in the t_user table, that is, cipherColumn, which is used to store ciphertext data. +At the same time, we set plainColumn to pwd, which is used to store plaintext data, and logicColumn is also set to pwd. +Because the previous SQL was written using pwd, that is, the SQL was written for logical columns, so the business code did not need to be changed. +Through Apache ShardingSphere, for the incremental data, the plain text will be written to the pwd column, and the plain text will be encrypted and stored in the pwd_cipher column. +At this time, because query.with.cipher.column is set to false, for business applications, the plain text column of pwd is still used for query storage, +but the cipher text data of the new data is additionally stored on the underlying database table pwd_cipher. The processing flow is shown below: + +![6](https://shardingsphere.apache.org/document/current/img/encrypt/6_en.png) + +When the newly added data is inserted, it is encrypted as ciphertext data through Apache ShardingSphere and stored in the cipherColumn. +Now it is necessary to process historical plaintext inventory data. +**As Apache ShardingSphere currently does not provide the corresponding migration and washing tools, the business party needs to encrypt and store the plain text data in pwd to pwd_cipher.** + +2. During system migration + +The incremental data has been stored by Apache ShardingSphere in the ciphertext column and the plaintext is stored in the plaintext column; after the historical data is encrypted and cleaned by the business party itself, +the ciphertext is also stored in the ciphertext column. That is to say, the plaintext and the ciphertext are stored in the current database. +Since the `query.with.cipher.column = false` in the configuration item, the ciphertext has never been used. +Now we need to set the `query.with.cipher.column` in the encryption configuration to true in order for the system to cut the ciphertext data for query. +After restarting the system, we found that the system business is normal, but Apache ShardingSphere has started to extract the ciphertext data from the database, +decrypt it and return it to the user; and for the user's insert, delete and update requirements, +the original data will still be stored The plaintext column, the encrypted ciphertext data is stored in the ciphertext column. + +Although the business system extracts the data in the ciphertext column and returns it after decryption; +however, it will still save a copy of the original data to the plaintext column during storage. +Why? The answer is: in order to be able to roll back the system. +**Because as long as the ciphertext and plaintext always exist at the same time, we can freely switch the business query to cipherColumn or plainColumn through the configuration of the switch item.** +In other words, if the system is switched to the ciphertext column for query, the system reports an error and needs to be rolled back. +Then just set query.with.cipher.column = false, Apache ShardingSphere will restore, that is, start using plainColumn to query again. +The processing flow is shown in the following figure: + +![7](https://shardingsphere.apache.org/document/current/img/encrypt/7_en.png) + +3. After system migration + +Due to the requirements of the security audit department, +it is generally impossible for the business system to keep the plaintext and ciphertext columns of the database permanently synchronized. +We need to delete the plaintext data after the system is stable. That is, we need to delete plainColumn (ie pwd) after system migration. +The problem is that now the business code is written for pwd SQL, +delete the pwd in the underlying data table stored in plain text, and use pwd_cipher to decrypt to get the original data, +does that mean that the business side needs to rectify all SQL, thus Do not use the pwd column that is about to be deleted? +Remember the core meaning of our encrypt module? + +> This is also the core meaning of encrypt module. According to the encryption rules provided by the user, the user SQL is separated from the underlying database table structure, so that the user's SQL writing no longer depends on the actual database table structure. The connection, mapping, and conversion between the user and the underlying database are handled by ShardingSphere. + +Yes, because of the existence of logicColumn, users write SQL for this virtual column. +Apache ShardingSphere can map this logical column and the ciphertext column in the underlying data table. +So the encryption configuration after migration is: + +```yaml +-!ENCRYPT + encryptors: + aes_encryptor: + type: aes + props: + aes.key.value: 123456abc + tables: + t_user: + columns: + pwd: # pwd与pwd_cipher的转换映射 + cipherColumn: pwd_cipher + encryptor: aes_encryptor +props: + query.with.cipher.column: true +``` + +The processing flow is as follows: + +![8](https://shardingsphere.apache.org/document/current/img/encrypt/8_en.png) + +So far, the online service encryption and rectification solutions have all been demonstrated. +We provide Java, YAML, Spring Boot Starter, Spring Namespace multiple ways for users to choose to use, and strive to fulfil business requirements. +The solution has been continuously launched on JD Digits, providing internal basic service support. + +## The advantages of Middleware encryption service + +1. Transparent data encryption process, users do not need to pay attention to the implementation details of encryption. +2. Provide a variety of built-in, third-party (AKS) encryption strategies, users only need to modify the configuration to use. +3. Provides a encryption strategy API interface, users can implement the interface to use a custom encryption strategy for data encryption. +4. Support switching different encryption strategies. +5. For online services, it is possible to store plaintext data and ciphertext data synchronously, and decide whether to use plaintext or ciphertext columns for query through configuration. +Without changing the business query SQL, the on-line system can safely and transparently migrate data before and after encryption. + +## Solution + +Apache ShardingSphere has provided two data encryption solutions, corresponding to two ShardingSphere encryption and decryption interfaces, i.e., `ShardingEncryptor` and `ShardingQueryAssistedEncryptor`. + +On the one hand, Apache ShardingSphere has provided internal encryption and decryption implementations for users, which can be used by them only after configuration. +On the other hand, to satisfy users' requirements for different scenarios, we have also opened relevant encryption and decryption interfaces, according to which, users can provide specific implementation types. +Then, after simple configurations, Apache ShardingSphere can use encryption and decryption solutions defined by users themselves to desensitize data. + +### ShardingEncryptor + +The solution has provided two methods `encrypt()` and `decrypt()` to encrypt/decrypt data for encryption. + +When users `INSERT`, `DELETE` and `UPDATE`, ShardingSphere will parse, rewrite and route SQL according to the configuration. It will also use `encrypt()` to encrypt data and store them in the database. When using `SELECT`, +they will decrypt sensitive data from the database with `decrypt()` reversely and return them to users at last. + +Currently, Apache ShardingSphere has provided two types of implementations for this kind of encrypt solution, MD5 (irreversible) and AES (reversible), which can be used after configuration. + +### ShardingQueryAssistedEncryptor + +Compared with the first encrypt scheme, this one is more secure and complex. +Its concept is: even the same data, two same user passwords for example, should not be stored as the same desensitized form in the database. +It can help to protect user information and avoid credential stuffing. + +This scheme provides three functions to implement, `encrypt()`, `decrypt()` and `queryAssistedEncrypt()`. +In `encrypt()` phase, users can set some variable, timestamp for example, and encrypt a combination of original data + variable. +This method can make sure the encrypted data of the same original data are different, due to the existence of variables. +In `decrypt()` phase, users can use variable data to decrypt according to the encryption algorithms set formerly. + +Though this method can indeed increase data security, another problem can appear with it: as the same data is stored in the database in different content, +users may not be able to find out all the same original data with equivalent query (`SELECT FROM table WHERE encryptedColumnn = ?`) according to this encryption column. +Because of it, we have brought out assistant query column, which is generated by `queryAssistedEncrypt()`. +Different from `decrypt()`, this method uses another way to encrypt the original data; +but for the same original data, it can generate consistent encryption data. Users can store data processed by `queryAssistedEncrypt()` to assist the query of original data. +So there may be one more assistant query column in the table. + +`queryAssistedEncrypt()` and `encrypt()` can generate and store different encryption data; `decrypt()` is reversible and `queryAssistedEncrypt()` is irreversible. +So when querying the original data, we will parse, rewrite and route SQL automatically. +We will also use assistant query column to do `WHERE` queries and use `decrypt()` to decrypt `encrypt()` data and return them to users. +All these can not be felt by users. + +For now, ShardingSphere has abstracted the concept to be an interface for users to develop rather than providing accurate implementation for this kind of encrypt solution. +ShardingSphere will use the accurate implementation of this solution provided by users to desensitize data. diff --git a/docs/document/content/features/encrypt/use-norms.cn.md b/docs/document/content/features/encrypt/use-norms.cn.md new file mode 100644 index 0000000000..78e5b17ac3 --- /dev/null +++ b/docs/document/content/features/encrypt/use-norms.cn.md @@ -0,0 +1,18 @@ ++++ +pre = "3.6.3. " +title = "使用规范" +weight = 3 ++++ + +## 支持项 + +* 后端数据库为 MySQL、Oracle、PostgreSQL、SQLServer; +* 用户需要对数据库表中某个或多个列进行加密(数据加密 & 解密); +* 兼容所有常用SQL。 + +## 不支持项 + +* 用户需要自行处理数据库中原始的存量数据、洗数; +* 使用加密功能+分库分表功能,部分特殊SQL不支持,请参考[SQL使用规范]( https://shardingsphere.apache.org/document/current/cn/features/sharding/use-norms/sql/); +* 加密字段无法支持比较操作,如:大于小于、ORDER BY、BETWEEN、LIKE等; +* 加密字段无法支持计算操作,如:AVG、SUM以及计算表达式。 diff --git a/docs/document/content/features/encrypt/use-norms.en.md b/docs/document/content/features/encrypt/use-norms.en.md new file mode 100644 index 0000000000..4f3331e638 --- /dev/null +++ b/docs/document/content/features/encrypt/use-norms.en.md @@ -0,0 +1,18 @@ ++++ +pre = "3.6.3. " +title = "Use Norms" +weight = 3 ++++ + +## Supported Items + +* The back-end databases are MySQL, Oracle, PostgreSQL, and SQLServer; +* The user needs to encrypt one or more columns in the database table (data encryption & decryption); +* Compatible with all commonly used SQL. + +## Unsupported Items + +* Users need to deal with the original inventory data and wash numbers in the database; +* Use encryption function + sub-library sub-table function, some special SQL is not supported, please refer to [SQL specification]( https://shardingsphere.apache.org/document/current/en/features/sharding/use-norms/sql/); +* Encryption fields cannot support comparison operations, such as: greater than less than, ORDER BY, BETWEEN, LIKE, etc; +* Encryption fields cannot support calculation operations, such as AVG, SUM, and calculation expressions. diff --git a/docs/document/content/features/spi/_index.cn.md b/docs/document/content/features/spi/_index.cn.md index 46f6e1adee..680a6e8ee8 100644 --- a/docs/document/content/features/spi/_index.cn.md +++ b/docs/document/content/features/spi/_index.cn.md @@ -31,13 +31,13 @@ SQL解析的接口用于规定用于解析SQL的ANTLR语法文件。 主要接口是`DatabaseProtocolFrontendEngine`,其内置实现类有`MySQLProtocolFrontendEngine`和`PostgreSQLProtocolFrontendEngine`。 -### 数据脱敏 +### 数据加密 -数据脱敏的接口用于规定加解密器的加密、解密、类型获取、属性设置等方式。 +数据加密的接口用于规定加解密器的加密、解密、类型获取、属性设置等方式。 主要接口有两个:`ShardingEncryptor`和`ShardingQueryAssistedEncryptor`,其中`ShardingEncryptor`的内置实现类有`AESShardingEncryptor`和`MD5ShardingEncryptor`。 -有关加解密介绍,请参考[数据脱敏](/cn/features/orchestration/encrypt/)。 +有关加解密介绍,请参考[数据加密](/cn/features/orchestration/encrypt/)。 ### 分布式主键 diff --git a/docs/document/content/features/spi/_index.en.md b/docs/document/content/features/spi/_index.en.md index 8a8ee7c8cf..f2091e55df 100644 --- a/docs/document/content/features/spi/_index.en.md +++ b/docs/document/content/features/spi/_index.en.md @@ -27,13 +27,13 @@ The database protocol interface is used to regulate parse and adapter protocol o Its main interface is `DatabaseProtocolFrontendEngine` and built-in implementation types are `MySQLProtocolFrontendEngine` and `PostgreSQLProtocolFrontendEngine`. -### Data Masking +### data encryption -The Data masking interface is used to regulate the encryption, decryption, access type, property configuration and other methods of the encryptor. +The data encryption interface is used to regulate the encryption, decryption, access type, property configuration and other methods of the encryptor. There are mainly two interfaces, `ShardingEncryptor` and `ShardingQueryAssistedEncryptor` and built-in implementation types are `AESShardingEncryptor` and `MD5ShardingEncryptor`. -Please refer to [Data Masking](/en/features/orchestration/encrypt/) for the introduction. +Please refer to [data encryption](/en/features/orchestration/encrypt/) for the introduction. ### Distributed Primary Key diff --git a/docs/document/content/features/test-engine/performance-test.cn.md b/docs/document/content/features/test-engine/performance-test.cn.md index b1a1b8c645..660098eac5 100644 --- a/docs/document/content/features/test-engine/performance-test.cn.md +++ b/docs/document/content/features/test-engine/performance-test.cn.md @@ -6,7 +6,7 @@ weight = 5 ## 目标 -对ShardingSphere-JDBC,ShardingSphere-Proxy及MySQL进行性能对比。从业务角度考虑,在基本应用场景(单路由,主从+脱敏+分库分表,全路由)下,INSERT+UPDATE+DELETE通常用作一个完整的关联操作,用于性能评估,而SELECT关注分片优化可用作性能评估的另一个操作;而主从模式下,可将INSERT+SELECT+DELETE作为一组评估性能的关联操作。 +对ShardingSphere-JDBC,ShardingSphere-Proxy及MySQL进行性能对比。从业务角度考虑,在基本应用场景(单路由,主从+加密+分库分表,全路由)下,INSERT+UPDATE+DELETE通常用作一个完整的关联操作,用于性能评估,而SELECT关注分片优化可用作性能评估的另一个操作;而主从模式下,可将INSERT+SELECT+DELETE作为一组评估性能的关联操作。 为了更好的观察效果,设计在一定数据量的基础上,使用jmeter 20并发线程持续压测半小时,进行增删改查性能测试,且每台机器部署一个MySQL实例,而对比MySQL场景为单机单实例部署。 ## 测试场景 @@ -21,7 +21,7 @@ weight = 5 基本主从场景,设置一主库一从库,部署在两台不同的机器上,在10000数据量的基础上,观察读写性能; 作为对比,MySQL运行在10000数据量的基础上,使用INSERT+SELECT+DELETE语句。 -### 主从+脱敏+分库分表 +### 主从+加密+分库分表 在1000数据量的基础上,根据`id`分为4个库,部署在四台不同的机器上,根据`k`分为1024个表,`c`使用aes加密,`pad`使用md5加密,查询操作路由到单库单表; 作为对比,MySQL运行在1000数据量的基础上,使用INSERT+UPDATE+DELETE和单路由查询语句。 @@ -137,7 +137,7 @@ masterSlaveRule: - slave_ds_0 ``` -#### 主从+脱敏+分库分表配置 +#### 主从+加密+分库分表配置 ```yaml schemaName: sharding_db diff --git a/docs/document/content/manual/shardingsphere-jdbc/configuration/config-java.cn.md b/docs/document/content/manual/shardingsphere-jdbc/configuration/config-java.cn.md index c3f9aaa43f..f558508dab 100644 --- a/docs/document/content/manual/shardingsphere-jdbc/configuration/config-java.cn.md +++ b/docs/document/content/manual/shardingsphere-jdbc/configuration/config-java.cn.md @@ -60,7 +60,7 @@ weight = 1 } ``` -### 数据脱敏 +### 数据加密 ```java DataSource getEncryptDataSource() throws SQLException { @@ -129,7 +129,7 @@ weight = 1 } ``` -### 数据分片 + 数据脱敏 +### 数据分片 + 数据加密 ```java public DataSource getDataSource() throws SQLException { @@ -384,14 +384,14 @@ ShardingStrategyConfiguration的实现类,用于配置不分片的策略。 | max.connections.size.per.query (?) | int | 每个物理数据库为每次查询分配的最大连接数量。默认值: 1 | | check.table.metadata.enabled (?) | boolean | 是否在启动时检查分表元数据一致性,默认值: false | -### 数据脱敏 +### 数据加密 #### EncryptDataSourceFactory | *名称* | *数据类型* | *说明* | | --------------------- | ---------------------------- | ------------------ | | dataSource | DataSource | 数据源,任意连接池 | -| encryptRuleConfig | EncryptRuleConfiguration | 数据脱敏规则 | +| encryptRuleConfig | EncryptRuleConfiguration | 数据加密规则 | | props (?) | Properties | 属性配置 | #### EncryptRuleConfiguration @@ -436,7 +436,7 @@ ShardingStrategyConfiguration的实现类,用于配置不分片的策略。 #### OrchestrationEncryptDataSourceFactory -数据脱敏 + 治理的数据源工厂。 +数据加密 + 治理的数据源工厂。 | *名称* | *数据类型* | *说明* | | --------------------- | ---------------------------- | ------------------------------ | diff --git a/docs/document/content/manual/shardingsphere-jdbc/configuration/config-java.en.md b/docs/document/content/manual/shardingsphere-jdbc/configuration/config-java.en.md index 148a8f5dc7..7f12ce5945 100644 --- a/docs/document/content/manual/shardingsphere-jdbc/configuration/config-java.en.md +++ b/docs/document/content/manual/shardingsphere-jdbc/configuration/config-java.en.md @@ -60,7 +60,7 @@ weight = 1 } ``` -### Data Masking +### data encryption ```java DataSource getEncryptDataSource() throws SQLException { @@ -128,7 +128,7 @@ weight = 1 return result; } ``` -### Data Sharding + Data Masking +### Data Sharding + data encryption ```java public DataSource getDataSource() throws SQLException { @@ -374,7 +374,7 @@ Property configuration items, can be of the following properties. | max.connections.size.per.query (?) | int | The maximum connection number allocated by each query of each physical database, default value: 1 | | check.table.metadata.enabled (?) | boolean | Check meta-data consistency or not in initialization, default value: false | -### Data Masking +### data encryption #### EncryptDataSourceFactory diff --git a/docs/document/content/manual/shardingsphere-jdbc/configuration/config-spring-boot.cn.md b/docs/document/content/manual/shardingsphere-jdbc/configuration/config-spring-boot.cn.md index 57d102a23e..a12ee14db9 100644 --- a/docs/document/content/manual/shardingsphere-jdbc/configuration/config-spring-boot.cn.md +++ b/docs/document/content/manual/shardingsphere-jdbc/configuration/config-spring-boot.cn.md @@ -74,7 +74,7 @@ spring.shardingsphere.masterslave.slave-data-source-names=slave0,slave1 spring.shardingsphere.props.sql.show=true ``` -### 数据脱敏 +### 数据加密 ```properties spring.shardingsphere.datasource.name=ds @@ -158,7 +158,7 @@ spring.shardingsphere.sharding.master-slave-rules.ds1.master-data-source-name=ma spring.shardingsphere.sharding.master-slave-rules.ds1.slave-data-source-names=master1slave0, master1slave1 ``` -### 数据分片 + 数据脱敏 +### 数据分片 + 数据加密 ```properties spring.shardingsphere.datasource.names=ds_0,ds_1 @@ -348,7 +348,7 @@ spring.shardingsphere.props.executor.size= #工作线程数量,默认值: CPU spring.shardingsphere.props.check.table.metadata.enabled= #是否在启动时检查分表元数据一致性,默认值: false ``` -### 数据脱敏 +### 数据加密 ```properties #省略数据源配置,与数据分片一致 @@ -362,7 +362,7 @@ spring.shardingsphere.encrypt.tables..columns..en ### 治理 ```properties -#省略数据源、数据分片、读写分离和数据脱敏配置 +#省略数据源、数据分片、读写分离和数据加密配置 spring.shardingsphere.orchestration.spring_boot_ds_sharding.orchestration-type= #治理类型,例如config_center/registry_center/metadata_center spring.shardingsphere.orchestration.spring_boot_ds_sharding.instance-type= #配置/注册/元数据中心实例类型。如:zookeeper diff --git a/docs/document/content/manual/shardingsphere-jdbc/configuration/config-spring-boot.en.md b/docs/document/content/manual/shardingsphere-jdbc/configuration/config-spring-boot.en.md index 4bbd2c3265..086fbce202 100644 --- a/docs/document/content/manual/shardingsphere-jdbc/configuration/config-spring-boot.en.md +++ b/docs/document/content/manual/shardingsphere-jdbc/configuration/config-spring-boot.en.md @@ -73,7 +73,7 @@ spring.shardingsphere.masterslave.slave-data-source-names=slave0,slave1 spring.shardingsphere.props.sql.show=true ``` -### Data Masking +### data encryption ```properties spring.shardingsphere.datasource.name=ds @@ -157,7 +157,7 @@ spring.shardingsphere.sharding.master-slave-rules.ds1.master-data-source-name=ma spring.shardingsphere.sharding.master-slave-rules.ds1.slave-data-source-names=master1slave0, master1slave1 ``` -### Data Sharding + Data Masking +### Data Sharding + data encryption ```properties spring.shardingsphere.datasource.names=ds_0,ds_1 @@ -346,7 +346,7 @@ spring.shardingsphere.props.executor.size= #Executing thread number; default val spring.shardingsphere.props.check.table.metadata.enabled= #Whether to check meta-data consistency of sharding table when it initializes; default value: false ``` -### Data Masking +### data encryption ```properties #Omit data source configurations; keep it consistent with data sharding @@ -362,7 +362,7 @@ spring.shardingsphere.encrypt.tables..columns..en ### Orchestration ```properties -#Omit data source, data sharding, read-write split and data masking configurations +#Omit data source, data sharding, read-write split and data encryption configurations spring.shardingsphere.orchestration.spring_boot_ds_sharding.orchestration-type= The type of orchestration center: config_center or registry_center or metadata_center spring.shardingsphere.orchestration.spring_boot_ds_sharding.instance-type= #Center instance type. Example:zookeeper#Registry center type. Example:zookeeper diff --git a/docs/document/content/manual/shardingsphere-jdbc/configuration/config-spring-namespace.cn.md b/docs/document/content/manual/shardingsphere-jdbc/configuration/config-spring-namespace.cn.md index f0e26b2190..1bbfe912f1 100644 --- a/docs/document/content/manual/shardingsphere-jdbc/configuration/config-spring-namespace.cn.md +++ b/docs/document/content/manual/shardingsphere-jdbc/configuration/config-spring-namespace.cn.md @@ -161,7 +161,7 @@ example: [shardingsphere-example](https://github.com/apache/shardingsphere/tree/ ``` -### 数据脱敏 +### 数据加密 ```xml @@ -325,7 +325,7 @@ example: [shardingsphere-example](https://github.com/apache/shardingsphere/tree/ ``` -### 数据分片 + 数据脱敏 +### 数据分片 + 数据加密 ```xml @@ -478,7 +478,7 @@ example: [shardingsphere-example](https://github.com/apache/shardingsphere/tree/ | default-database-strategy-ref (?) | 属性 | 默认数据库分片策略,对应\中的策略Id,缺省表示不分库 | | default-table-strategy-ref (?) | 属性 | 默认表分片策略,对应\中的策略Id,缺省表示不分表 | | default-key-generator-ref (?) | 属性 | 默认自增列值生成器引用,缺省使用`org.apache.shardingsphere.core.keygen.generator.impl.SnowflakeKeyGenerator` | -| encrypt-rule (?) | 标签 | 脱敏规则 | +| encrypt-rule (?) | 标签 | 加密规则 | #### \ @@ -628,7 +628,7 @@ example: [shardingsphere-example](https://github.com/apache/shardingsphere/tree/ | type | 属性 | 负载均衡算法类型,'RANDOM'或'ROUND_ROBIN',支持自定义拓展| | props-ref (?) | 属性 | 负载均衡算法配置参数 | -### 数据脱敏 +### 数据加密 命名空间:http://shardingsphere.apache.org/schema/shardingsphere/encrypt/encrypt.xsd @@ -708,7 +708,7 @@ example: [shardingsphere-example](https://github.com/apache/shardingsphere/tree/ | instance-ref | 属性 | 治理实例id | | overwrite | 属性 | 本地配置是否覆盖配置中心配置。如果可覆盖,每次启动都以本地配置为准。缺省为不覆盖 | -### 数据脱敏 + 治理 +### 数据加密 + 治理 命名空间:http://shardingsphere.apache.org/schema/shardingsphere/orchestration/orchestration.xsd diff --git a/docs/document/content/manual/shardingsphere-jdbc/configuration/config-spring-namespace.en.md b/docs/document/content/manual/shardingsphere-jdbc/configuration/config-spring-namespace.en.md index 676307e65b..818dbb654e 100644 --- a/docs/document/content/manual/shardingsphere-jdbc/configuration/config-spring-namespace.en.md +++ b/docs/document/content/manual/shardingsphere-jdbc/configuration/config-spring-namespace.en.md @@ -163,7 +163,7 @@ example: [shardingsphere-example](https://github.com/apache/shardingsphere/tree/ ``` -### Data Masking +### data encryption ```xml @@ -331,7 +331,7 @@ example: [shardingsphere-example](https://github.com/apache/shardingsphere/tree/ ``` -### Sharding + Data Masking +### Sharding + data encryption ```xml @@ -647,7 +647,7 @@ Namespace: http://shardingsphere.apache.org/schema/shardingsphere/masterslave/ma | type | Attribute | Type of load balance algorithm, 'RANDOM'或'ROUND_ROBIN', support custom extension| | props-ref (?) | Attribute | Properties of load balance algorithm | -### Data Masking +### data encryption Namespace: http://shardingsphere.apache.org/schema/shardingsphere/encrypt/encrypt.xsd @@ -727,7 +727,7 @@ Namespace: http://shardingsphere.apache.org/schema/shardingsphere/orchestration/ | instance-ref | Attribute | The id of orchestration instance | | overwrite | Attribute | Use local configuration to overwrite config center or not | -### Data Masking + Orchestration +### data encryption + Orchestration Namespace: http://shardingsphere.apache.org/schema/shardingsphere/orchestration/orchestration.xsd diff --git a/docs/document/content/manual/shardingsphere-jdbc/configuration/config-yaml.cn.md b/docs/document/content/manual/shardingsphere-jdbc/configuration/config-yaml.cn.md index 101fdaf02c..a2dee8df95 100644 --- a/docs/document/content/manual/shardingsphere-jdbc/configuration/config-yaml.cn.md +++ b/docs/document/content/manual/shardingsphere-jdbc/configuration/config-yaml.cn.md @@ -92,7 +92,7 @@ props: sql.show: true ``` -### 数据脱敏 +### 数据加密 ```yaml dataSource: !!org.apache.commons.dbcp2.BasicDataSource driverClassName: com.mysql.jdbc.Driver @@ -211,7 +211,7 @@ props: sql.show: true ``` -### 数据分片 + 数据脱敏 +### 数据分片 + 数据加密 ```yaml dataSources: @@ -278,7 +278,7 @@ props: ### 治理 ```yaml -#省略数据分片、读写分离和数据脱敏配置 +#省略数据分片、读写分离和数据加密配置 orchestration: orchestration_ds: @@ -378,7 +378,7 @@ masterSlaveRule: : #属性值 ``` -### 数据脱敏 +### 数据加密 ```yaml dataSource: #省略数据源配置 @@ -405,7 +405,7 @@ encryptRule: dataSources: #省略数据源配置 shardingRule: #省略分片规则配置 masterSlaveRule: #省略读写分离规则配置 -encryptRule: #省略数据脱敏规则配置 +encryptRule: #省略数据加密规则配置 orchestration: orchestration_ds: #治理实例名称 diff --git a/docs/document/content/manual/shardingsphere-jdbc/configuration/config-yaml.en.md b/docs/document/content/manual/shardingsphere-jdbc/configuration/config-yaml.en.md index 8c949c0087..b40e0b887a 100644 --- a/docs/document/content/manual/shardingsphere-jdbc/configuration/config-yaml.en.md +++ b/docs/document/content/manual/shardingsphere-jdbc/configuration/config-yaml.en.md @@ -92,7 +92,7 @@ props: sql.show: true ``` -### Data Masking +### data encryption ```yaml dataSource: !!org.apache.commons.dbcp2.BasicDataSource @@ -212,7 +212,7 @@ props: sql.show: true ``` -### Data Sharding + Data Masking +### Data Sharding + data encryption ```yaml dataSources: @@ -384,7 +384,7 @@ props: #Property configuration max.connections.size.per.query: #The maximum connection number allocated by each query of each physical database. default value: 1 ``` -### Data Masking +### data encryption ```yaml dataSource: #Ignore data sources configuration diff --git a/docs/document/content/manual/shardingsphere-jdbc/usage/encrypt.cn.md b/docs/document/content/manual/shardingsphere-jdbc/usage/encrypt.cn.md index 71173625c9..9822499352 100644 --- a/docs/document/content/manual/shardingsphere-jdbc/usage/encrypt.cn.md +++ b/docs/document/content/manual/shardingsphere-jdbc/usage/encrypt.cn.md @@ -1,10 +1,10 @@ +++ -title = "数据脱敏" +title = "数据加密" weight = 6 +++ -该章节主要介绍如何使用数据脱敏功能,如何进行相关配置。数据脱敏功能即可与数据分片功能共同使用,又可作为单独功能组件,独立使用。 -与数据分片功能共同使用时,会创建ShardingDataSource;单独使用时,会创建EncryptDataSource来完成数据脱敏功能。 +该章节主要介绍如何使用数据加密功能,如何进行相关配置。数据加密功能即可与数据分片功能共同使用,又可作为单独功能组件,独立使用。 +与数据分片功能共同使用时,会创建ShardingDataSource;单独使用时,会创建EncryptDataSource来完成数据加密功能。 ## 不使用Spring @@ -28,7 +28,7 @@ weight = 6 dataSource.setUsername("root"); dataSource.setPassword(""); - // 配置脱敏规则 + // 配置加密规则 Properties props = new Properties(); props.setProperty("aes.key.value", "123456"); EncryptorRuleConfiguration encryptorConfig = new EncryptorRuleConfiguration("AES", props); diff --git a/docs/document/content/manual/shardingsphere-jdbc/usage/encrypt.en.md b/docs/document/content/manual/shardingsphere-jdbc/usage/encrypt.en.md index ea90a7d30b..0929e8f879 100644 --- a/docs/document/content/manual/shardingsphere-jdbc/usage/encrypt.en.md +++ b/docs/document/content/manual/shardingsphere-jdbc/usage/encrypt.en.md @@ -1,10 +1,10 @@ +++ -title = "Data Masking" +title = "data encryption" weight = 6 +++ -This chapter mainly introduces how to use the feather of Data Masking. On one hand User can use Data Masking and Sharding together, which will -create ShardingDataSource, On another hand, when user only adopt the feather of Data Masking, ShardingSphere will create EncryptDataSource. +This chapter mainly introduces how to use the feather of data encryption. On one hand User can use data encryption and Sharding together, which will +create ShardingDataSource, On another hand, when user only adopt the feather of data encryption, ShardingSphere will create EncryptDataSource. ## Not Use Spring diff --git a/docs/document/content/manual/shardingsphere-proxy/configuration.cn.md b/docs/document/content/manual/shardingsphere-proxy/configuration.cn.md index 67e3fbfd76..731e950cb2 100644 --- a/docs/document/content/manual/shardingsphere-proxy/configuration.cn.md +++ b/docs/document/content/manual/shardingsphere-proxy/configuration.cn.md @@ -106,7 +106,7 @@ masterSlaveRule: - ds_slave1 ``` -### 数据脱敏 +### 数据加密 ```yaml schemaName: encrypt_db @@ -247,7 +247,7 @@ shardingRule: loadBalanceAlgorithmType: ROUND_ROBIN ``` -### 数据分片 + 数据脱敏 +### 数据分片 + 数据加密 dataSources: @@ -393,7 +393,7 @@ dataSources: #省略数据源配置,与数据分片一致 masterSlaveRule: #省略读写分离配置,与ShardingSphere-JDBC配置一致 ``` -### 数据脱敏 +### 数据加密 ```yaml dataSource: #省略数据源配置 diff --git a/docs/document/content/manual/shardingsphere-proxy/configuration.en.md b/docs/document/content/manual/shardingsphere-proxy/configuration.en.md index fb57650739..12ec61145d 100644 --- a/docs/document/content/manual/shardingsphere-proxy/configuration.en.md +++ b/docs/document/content/manual/shardingsphere-proxy/configuration.en.md @@ -106,7 +106,7 @@ masterSlaveRule: - ds_slave1 ``` -### Data Masking +### data encryption ```yaml schemaName: encrypt_db @@ -246,7 +246,7 @@ shardingRule: loadBalanceAlgorithmType: ROUND_ROBIN ``` -### Data Sharding + Data Masking +### Data Sharding + data encryption dataSources: @@ -392,7 +392,7 @@ dataSources: #Omit data source configurations; keep it consistent with data shar masterSlaveRule: #Omit data source configurations; keep it consistent with ShardingSphere-JDBC ``` -### Data Masking +### data encryption ```yaml dataSource: #Ignore data sources configuration diff --git a/docs/document/content/overview/_index.cn.md b/docs/document/content/overview/_index.cn.md index 9733586778..f07cfd7f9c 100644 --- a/docs/document/content/overview/_index.cn.md +++ b/docs/document/content/overview/_index.cn.md @@ -113,4 +113,4 @@ Apache ShardingSphere 是多接入端共同组成的生态圈。 * 分布式治理 * 弹性伸缩 * 可视化链路追踪 -* 数据脱敏 +* 数据加密 diff --git a/examples/README_ZH.md b/examples/README_ZH.md index 17fd3f8af2..2bdbc7a863 100644 --- a/examples/README_ZH.md +++ b/examples/README_ZH.md @@ -88,7 +88,7 @@ shardingsphere-example | [orchestration](shardingsphere-jdbc-example/orchestration-example) | 演示了如何在 ShardingSphere 中使用 orchestration | | [事务](shardingsphere-jdbc-example/transaction-example) | 演示了如何在 ShardingSphere 中使用事务 | | [hint](shardingsphere-jdbc-example/other-feature-example/hint-example) | 演示了如何在 ShardingSphere 中使用 hint | -| [脱敏](shardingsphere-jdbc-example/other-feature-example/encrypt-example) | 演示了如何在 ShardingSphere 中使用脱敏 | +| [加密](shardingsphere-jdbc-example/other-feature-example/encrypt-example) | 演示了如何在 ShardingSphere 中使用加密 | | APM监控(Pending) | 演示了如何在 ShardingSphere 中使用 APM 监控 | | proxy(Pending) | 演示了如何使用 sharding proxy | | [docker](./docker/docker-compose.md) | 演示了如何通过 docker 创建 ShardingSphere 所依赖的环境 | -- GitLab