release_note_en.md 21.3 KB
Newer Older
C
chentianyu03 已提交
1 2
# Release Note

C
Chen Long 已提交
3 4
## Important Update
This version is the beta version of PaddlePaddle Framework v2.0. The most important change is the full upgrade of the API system and the comprehensive improvement on the imperative programming (dynamic graph) capability. This version systematically optimizes the directory structure of PaddlePaddle basic APIs, comprehensively fixes relevant issues left over from the past, fully supplements APIs, and especially provides the better high-level API functions. It also provides support for the quantitative training and mixed precision training under a dynamic graph. Perfect syntax support is implemented in the dynamic-to-static conversion. The usability is improved substantially. Dynamic graph-related functions tend to be perfect. The default development mode of PaddlePaddle is changed to the dynamic graph mode.In addition, the C++ APIs for the inference library are upgraded and optimized. Both the support of the inference library for quantitative models and the inference performance are fully enhanced.
C
chentianyu03 已提交
5

C
Chen Long 已提交
6
## Training Framework
C
chentianyu03 已提交
7 8 9

### Basic APIs

C
Chen Long 已提交
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
#### Compatibility Description

For Version Paddle 2.x, users are recommended to use APIs in the paddle root directory. In addition, all the APIs of Version Paddle 1.x are reserved in the paddle.fluid directory. Codes for Version Paddle 1.x training are not changed according to the design, that is, models saved for Version Paddle 1.x training can run on Version Paddle 2.x normally and inference can be performed using Version Paddle 2.x.

#### Directory Structure Adjustment
- Based on the 2.0-alpha version, this version has made some adjustments to the directory structure. The latest adjusted directory structure is as follows:

  | Directory | Functions and Included APIs |
  | :--- | --------------- |
  | paddle.* | The aliases of commonly used APIs are reserved in the paddle root directory, which currently include all the APIs in the paddle.tensor and paddle.framework directories |
  | paddle.tensor | APIs related to tensor operations such as creating zeros, matrix operation matmul, transforming concat, computing add, and finding argmax |
  | paddle.nn | Networking-related APIs such as Linear, Conv2d, loss function, convolution, LSTM,and activation function |
  | paddle.static.nn | Special APIs for networking under a static graph such as input placeholder data/Input and control flow while_loop/cond |
  | paddle.static | APIs related to the basic framework under a static graph such as Variable, Program, and Executor |
  | paddle.framework | Universal APIs and imprerative mode APIs such as to_variable and prepare_context |
  | paddle.optimizer | APIs related to optimization algorithms such as SGD, Adagrad, and Adam |
  | paddle.optimizer.lr_scheduler | APIs related to learning rate attenuation |
  | paddle.metric | APIs related to evaluation index computation such as accuracy and auc |
  | paddle.io | APIs related to data input and output such as save, load, Dataset, and DataLoader |
  | paddle.device | APIs related to device management such as CPUPlace and CUDAPlace |
  | paddle.distributed | Distributed related basic APIs |
  | paddle.distributed.fleet | Distributed related high-level APIs |
  | paddle.vision | Vision domain APIs such as datasets, data processing, and commonly used basic network structures like resnet |
  | paddle.text | NLP domain APIs such as datasets, data processing, and commonly used basic network structures like transformer |

#### API Alias Rules
- For the convenience of users, APIs will create aliases in different paths, such as `paddle.add -> paddle.sensor.add`. Users are recommend to use the shorter path `paddle.add`.

- All the APIs in the framework and tensor directories are aliased in the paddle root directory. Except for a few special APIs, all other APIs have no aliases in the paddle root directory.
C
chentianyu03 已提交
39

C
Chen Long 已提交
40
- All the APIs in the paddle.nn directory, except those in the functional directory, have aliases in the paddle.nn directory. All the APIs in the functional directory have no aliases in the paddle.nn directory.
C
chentianyu03 已提交
41

C
Chen Long 已提交
42 43 44 45 46 47 48 49 50 51 52 53
- The following are some special alias relations. It is recommended to use the names on the left.
  - paddle.sigmoid -> paddle.tensor.sigmoid -> paddle.nn.functional.sigmoid
  - paddle.tanh -> paddle.tensor.tanh -> paddle.nn.functional.tanh
  - paddle.remainder -> paddle.mod -> paddle.floor_mod
  - paddle.divide -> paddle.true_divide
  - paddle.rand -> paddle.uniform
  - paddle.randn -> paddle.standard_normal
  - Optimizer.clear_grad -> Optimizer.clear_gradients
  - Optimizer.set_state_dict -> Optimizer.set_dict
  - Optimizer.get_lr -> Optimizer.current_step_lr
  - Layer.clear_grad -> Layer.clear_gradients
  - Layer.set_state_dict -> Layer.set_dict
C
chentianyu03 已提交
54

C
Chen Long 已提交
55
#### Name Change of Commonly Used APIs
C
chentianyu03 已提交
56

C
Chen Long 已提交
57 58 59 60 61
- This version uses tensor representation data, creates tensor APIs, and changes paddle.fluid.dygraph.to_variable to paddle.to_tensor
- Addition, subtraction, multiplication, and division use full names only
- For the current element-by-element operation, no elementwise prefix is added
- For operating by a certain axis, no reduce prefix is added
- For Conv, Pool, Dropout, BatchNorm and Pad networking APIs, 1d, 2d, and 3d suffixes are added according to the input data type
C
chentianyu03 已提交
62

C
Chen Long 已提交
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
  | Paddle 1.8    | Paddle 2.0-beta |
  | --------------- | ------------------------ |
  | paddle.fluid.layers.elementwise_add | paddle.add               |
  | paddle.fluid.layers.elementwise_sub | paddle.subract           |
  | paddle.fluid.layers.elementwise_mul | paddle.multiply          |
  | paddle.fluid.layers.elementwise_div | paddle.divide |
  | paddle.fluid.layers.elementwise_max | paddle.maximum             |
  | paddle.fluid.layers.elementwise_min | paddle.minimum |
  | paddle.fluid.layers.reduce_sum | paddle.sum |
  | paddle.fluid.layers.reduce_prod | paddle.prod |
  | paddle.fluid.layers.reduce_max | paddle.max        |
  | paddle.fluid.layers.reduce_min | paddle.min        |
  | paddle.fluid.layers.reduce_all | paddle.all        |
  | paddle.fluid.layers.reduce_any | paddle.any        |
  | paddle.fluid.dygraph.Conv2D | paddle.nn.Conv2d |
  | paddle.fluid.dygraph.Conv2DTranspose | paddle.nn.ConvTranspose2d |
  | paddle.fluid.dygraph.Pool2D | paddle.nn.MaxPool2d, paddle.nn.AvgPool2d |

#### Fixing and Improving APIs
- Modified and improved a total of 155 APIs. See [Link] (https://github.com/PaddlePaddle/Paddle/wiki/Paddle-2.0beta-Upgraded-API-List) and the API document
- Fixed APIs related to random number generation including: seed setting paddle.rand, randn, randint, randperm, dropout, Uniform, and Normal
- Upgraded the codes of the underlying C++ operators corresponding to the following APIs to theoretically achieve compatibility without excluding slight incompatibility: linspace, concat, gather, gather_nd, split, squeeze, unsqueeze, clip, argmax, argmin, mean, norm, unique, cumsum, LeakyReLU, leaky_relu, hardshrink, embedding, margin_ranking_loss, grid_sample, affine_grid
- Added oneDNN support for the relu6 and Sigmoid activation functions

#### Multi-device/Distributed Training APIs
- Single-Machine Multi-Card Training Under a Dynamic Graph
  - Added paddle.distributed.spawn(func, args=(), nprocs=-1, join=True, daemon=False, **options),which is used to start multi-card training under a dynamic graph.
  - Added paddle.distributed.init_parallel_env(), which is used to initialize the environment of multi-card training under a dynamic graph.
  - Added paddle.distribued.get_rank(), which is used to get the rank of the current process during the multi-card training.
  - Added paddle.distribued.get_world_size(), which is used to get the total number of processes participating in training during the multi-card training.

- Distributed Collective Communication
  - Added paddle.distributed.broadcast(tensor, src, group=0), which broadcasts a tensor of a specified process to all the processes.
  - Added paddle.distributed.all_reduce(tensor, op=ReduceOp.SUM, group=0), which performs the reduce operation on specified tensors of all the processes and returns results to all the processes.
  - Added paddle.distributed.reduce(tensor, dst, op=ReduceOp.SUM, group=0), which performs the reduce operation on specified tensors of all the processes and returns results to specified processes.
  - Added paddle.distributed.all_gather(tensor_list, tensor, group=0), which gathers specified tensors of all the processes and returns results to all the processes.
  - Added paddle.distributed.scatter(tensor, tensor_list=None, src=0, group=0), which distributes tensors in a specified tensor list to all the processes.
  - Added paddle.distributed.barrier(group=0),which synchronizes all the processes.
C
chentianyu03 已提交
101 102 103

### High-level APIs

C
Chen Long 已提交
104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222
- Added PaddlePaddle high-level APIs to encapsulate common operations such as networking, training, evaluation, inference, and access so as to implement low code development. In the MNIST handwritten digit recognition task versus the imperative programming implementation mode, high-level APIs can reduce 80% of executable codes.

- **Data Management**
  - Unified data loading and usage method
    - Dataset definition, which is implemented by inheriting `paddle.io.Dataset`.
    - Multi-process data loading using `paddle.io.DataLoader`.
  - Added `paddle.io.IterableDataset`, which is used for a streaming dataset and supports its concurrent acceleration in `paddle.io.DataLoader`.
  - Added `paddle.io.get_worker_info` for dividing child process data in `paddle.io.IterableDataset`.

- **Model Networking**
  - Added the encapsulation of the common loss API `paddle.nn.loss.*` and metric API `paddle.metric.*`
  - Released 12 models based on high-level API implementations, including Transformer, Seq2seq, LAC, BMN, ResNet, YOLOv3, VGG, MobileNet, TSM, CycleGAN, Bert, OCR. The code can be found in [PaddlePaddle/hapi examples](https://github.com/PaddlePaddle/hapi/tree/master/examples).

- **Model Execution**
  - Added class API `paddle.Model`, which encapsulates the common model development methods:
    - API `Model.summary`   to view the network structure and the number of parameters of the dynamic graph networking.
    - API `Model.prepare`  to specify a loss function and an optimization algorithm.
    - API `Model.fit`  to implement training and evaluation, which can implement the execution of user-defined functions such as model storage by callback.
    - API `Model.evaluate`  to implement the computation of inference and evaluation indexes on the evaluation set.
    - API `Model.predict`  to implement specific test data inference.
    - API `Model.train_batch`  to implement training on a single batch of data.
    - API `Model.eval_batch`  to implement evaluation on a single batch of data.
    - API `Model.text_batch`  to implement testing on a single batch of data.
    - API `Model.save`/`Model.load` , which supports storing an inference model in dynamic graph training mode.
  - Added callback API `paddle.callbacks.*` as a model execution API, which performs logging and Checkpoint model saving, etc. Users can customize a callback by inheriting `paddle.callbacks.Callback`.

- **Domain APIs**
  - Added computer vision (CV) APIs `paddle.vision`
    - Added dataset API `paddle.vision.datasets.*`, which encapsulates common public datasets and supports random access to data.
    - Added 24 common data preprocessing APIs `paddle.vision.transforms.*` such as Resize, Normalize, etc.
    - Added image classification backbone network and pre-training parameters:
      - `paddle.vision.models.lenet` or `paddle.vision.lenet`
      - `paddle.vision.models.vgg` or `paddle.vision.vgg`
      - `paddle.vision.models.resnet` or `paddle.vision.resnet`
      - `paddle.vision.models.mobilenetv1` or `paddle.vision.mobilenetv1`
      - `paddle.vision.models.mobilenetv2` or `paddle.vision.mobilenetv2`
  - Added natural language processing (NLP)  APIs `paddle.text`.
    - Added dataset API `paddle.text.datasets.*`, which encapsulates commonly-used datasets and supports random access to data.
    - Added networking API `paddle.text.*`.
- **Automatic Breakpoint Restart**
  - Added API `train_epoch_range`, which implements the epoch-level `checkpoint` autosave and autoloading functions on a static graph and supports automatic breakpoint restart.

### Function Optimization (Including Distributed)

#### Dynamic Graph to Static Graph

- **Added Syntax Support for ProgramTranslator**

  - Added dynamic-to-static support for the return syntax so as to return in advance or to return different types of tensors or none in if-elif-else or loop conditions during the dynamic-to-static conversion.

  - Added dynamic-to-static support for the print syntax so that print (tensor) can also print out a tensor in the dynamic-to-static conversion.

    - Added dynamic support for “for traversing a tensor”, “for traversing a tensor using enumeration”, “for traversing a TensorList”, and “for traversing a TensorList using enumeration” syntaxes so that operations related to the circular processing of tensors can be flexibly used in the dynamic-to-static conversion.

    - Added dynamic-to-static support for the assert syntax to ensure that an assert tensor can be true (bool type) or non-0 (other data types) in the dynamic-to-static conversion.

    - Added support for the transfer of cast of data type so that type conversion of similar conversion statements of dynamic graph type such as float (tensor) and int (tensor) can also be performed in a static graph.

- **ProgramTranslator Usability Optimization Function**

  - Changed the dynamic-to-static return type to class StaticLayer from callable. This class can obtain converted static graph information more easily by calling .code,.main_program, and other APIs.

  - Added set_verbosity and set_code_level APIs so that users can set a log class to view a log in the dynamic-to-static running process or a converted code in intermediate state.

  - Added InputSpec to specify the shape and data type of an input tensor variable.

  - Optimized an error message displayed in case of error in the dynamic-to-static running so that codes with running error in the static graph after dynamic-to-static conversion can also be reported to the original error code line in the dynamic graph; deleted some dynamic-to-static errors from python stacks so that an error message is more related to user codes.

  - Support performing a breakpoint test using pdb.set_trace() during the dynamic-to-static conversion.

- **Optimized Deployment of Model Storage and Loading APIs**

  - Added paddle.jit.save API, which is used to save a dynamic-to-static model so that the API is easier to use; deleted an old API ProgramTranslator.save_inference_model.
  - Added paddle.jit.load API, which is used to load inference models including models saved by paddle.jit.save and paddle.io.save_inference_model. After being loaded, models can be used for model inference or model training optimization in a dynamic graph.


#### Mixed Precision Training
- Added the support for mixed precision of dynamic graphs. The ratio of the speed when the ResNet-50 model is trained on V100 using mixed precision to the speed using fp32 is 2.6.

#### Quantitative Training

- Added `ImperativeQuantAware` class. The dynamic graph quantitative training function is provided. Currently, the quantization of Conv2D, Linear, and other layers are supported. The supported model types include MobileNetV1/MobileNetV2/ResNet50.
- After dynamic graph quantitative training is performed on a model, inference deployment of any quantitative model saved using an `ImperativeQuantAware.save_quantized_model` API can be performed using a Paddle-Lite inference library.
- As for static graph quantization, Conv2d_tranpose quantization as well as Linear quantization in the form of per-channel is supported.

#### Performance Optimization (Including Distributed)

- Simplified the DataLoader underlying implementation logic in dynamic graph mode, reduced the thread reading overhead, and further improved the data reading efficiency and the overall model training speed.The overall training speed of MobileNetV1 in a scenario of single V100 card and BatchSize = 128 is increased by 34%.
- Upgrade and performance optimization of dynamic graph networking. A large number of dynamic graph APIs will directly call an automatically generated Pybind API, improving the performance.

#### Basic Functions for Dynamic Graph

- Support the function of updating the gradient using a sparse parameter by configuring embedding and other APIs.
- Added over 120 member functions of Tensor type, including Tensor().abs(), Tensor().add(), and Tensor().cos().
- Added dir() API for a layer to facilitate viewing the attributes and functions in the layer.
- Added an optimizer.set_lr() API so that users can flexibly adjust a learning rate in dynamic diagram mode.
- Added a global parameter initialization method API set_global_initializer to define a global parameter initialization method.
- Added oneDNN (former MKL-DNN) support for dynamic training and inference.Resent50 oneDNN dynamic training with minist dataset is enabled.
- Added oneDNN support for dynamic training and inference. Resent50 oneDNN dynamic training with minist dataset is enabled.

#### Debugging Analysis

- Uniformly changed the wording of LOG (FATAL) throw abnormal at just 100 points to PADDLE_THROW; optimized the error format and content caused by non-support of the framework for a behavior.
- Improved Signal Handler implementation within the framework; optimized the error format and content when system signal error occurs during the execution.
- Optimized the framework error stack format. The python error stack occurring during the compilation is moved to below the native error stack to improve error message reading experience.
- Further improved an accumulative total of about 1,300 error type and prompt copywritings of check errors within the framework to enhance the overall debugging usability of the framework.
- Enhanced dynamic graph error messages. Error messages on the Pybind layer under a dynamic graph are systematically enhanced to improve user experience.

### Bug Fixing

- Fixed the problem that AttributeError may unexpectedly occur when the add_parameter API is used on a layer under a dynamic graph; enhance the input check.
- Fixed the problem that tensors of int_8 and uint_8 types cannot be normally printed so that data can be normally output.

#### Dependency Library Upgrading
- Upgraded oneDNN (former MKL-DNN) to Version 1.5 from Version 1.3.
- Upgrade oneDNN from 1.3->1.5


## Inference
C
chentianyu03 已提交
223

C
Chen Long 已提交
224
### Paddle Inference
C
chentianyu03 已提交
225

C
Chen Long 已提交
226 227 228 229 230 231
#### API
- Fully upgraded the inference C++ APIs. The new version of the APIs is recommended. The original APIs are reserved tentatively, but give a warning during use, and are planned to be deleted in the future. The upgrade to the new version of the APIs mainly involves naming standardization and usage method simplification. The important changes include:
  - adding a `paddle_infer` naming space for the C++ APIs, containing inference-related APIs.
  - renaming `ZeroCopyTensor` to `Tensor` as the default input/output representation method for the inference APIs.
  - simplifying `CreatePaddlePredictor` to `CreatePredictor` and reserving the support for only `AnalysisConfig`, not for other Configs any more.
  - adding service-related utility classes such as `PredictorPool`, which can be used when multiple predictors are created.
C
chentianyu03 已提交
232

C
Chen Long 已提交
233 234 235 236 237 238 239 240
#### Functional Upgrading
- Upgraded the operator version compatibility information registry to support more accurate Op version information and improve inferential compatibility.
- Added the adaptive support for Version TRT 7.1.
- Paddle-TensorRT enhances the support for the PaddleSlim quantitative model. Multiple tasks such as detection, classification, and segmentation on CV are covered.
- Added the support for user-defined operators for Python-side inference.
- Added the kernel support for `elementwise_add` and `elementwise_mul` INT8 oneDNN (former MKL-DNN) on the CPU side.
- Improved the usability of CPU-side test quantitative models. A simultaneous comparison test of original models with quantitative models is supported.
- Added the adaptive support for Jetson Nx hardware.
C
chentianyu03 已提交
241

C
Chen Long 已提交
242 243 244 245 246 247 248 249 250 251 252 253
### Performance optimization
- Added conv + affine_op pass. The MASK-RCNN fp32 single thread performance is improved by 26% (1.26x) on machine 6248.
  - Added conv + affine_op pass, MASK-RCNN single thread performance is improved by 26% (1.26x) on machine 6248
- Added fc + gru pass and enabled oneDNN (former MKL-DNN) GRU fp32 kernel, speeding up GRU fp32 model inference on 4 CPU threads by 20% on machine Intel Xeon 6248.
  - Added fc + gru fuse pass and enabled oneDNN gru fp32 kernel, speeding up GRU fp32 model inference on 4 CPU threads by 20% (1.2x) on machine Intel Xeon 6248
- Added oneDNN inplace support for many operators (speedup 2% for the feature fp32 model).
  - Added support for oneDNN inplace support for many operators (speedup 2% for Feature model)
- Optimized oneDNN LRN operator (speedup 1% for the GoogleNet fp32 model).
  - Optimized LRN operator (speedup 1% for GoogleNet)
- Improved the transformation and optimization of quantitative models.
  -  Improved the transformation and optimization of quantized model
- Optimized the ArgMin, ArgMax operator of CUDA so that the binary system size of the operator is decreased to 1.3 M from 60 M.
C
chentianyu03 已提交
254

C
Chen Long 已提交
255
#### Bug Fixing
C
chentianyu03 已提交
256

C
Chen Long 已提交
257 258 259 260
- Fixed the mask-rcnn inference error under CPU inference.
  - Fixed mask-rcnn inference error under CPU inference
- Fixed the error occurring in the CPU multithread inference on quantitative models.
  - Fixed the CPU multithread inference on oneDNN quantized INT8 models