This release is a weekly alpha version of PaddlePaddle. It should be only used for internal tests. This is not a production-ready version.

Release log

Performance gain and memory optimization

Config and Env:

  • model: SE-ResNet-150
  • Input: 3 x 224 x 224
  • batch_size: 25
  • CentOS 6.3, Tesla P40, single card.

The comparison results before optimization:

  Speed Memory
Fluid(before) 1.95 sec/iter 18341 MB
PyTorch 1.154 sec/iter 13359 MB
Fluid/PyTorch 1.6898 1.3729

After optimizing the speed:

  Speed Memory
Fluid(opti_speed) 1.45 sec/iter 17222 MB
PyTorch 1.154 sec/iter 13359 MB
Fluid/PyTorch 1.2565 1.2892

After optimizing the memory usage:

  Speed Memory
Fluid(opti_mem) 1.93  sec/iter 14388 MB
PyTorch 1.154 sec/iter 13359 MB
Fluid/PyTorch 1.6724 1.0770
  • Overall performance gain.
  • Delete GPU memory while training.
  • [WIP] Feed data from C++
    • Add basic RecordIO API
    • Polish C++ Reader operators
    • Add DoubleBuffer Reader

Distributed training

  • now support distributed sparse update
  • [WIP] send recv using zerocopy grpc transfer

项目简介

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)

🚀 Github 镜像仓库 🚀

源项目地址

https://github.com/paddlepaddle/paddle

发行版本 60

PaddlePaddle 2.5.0 Release Note

全部发行版

贡献者 246

全部贡献者

开发语言

  • C++ 49.8 %
  • Python 41.0 %
  • Cuda 7.0 %
  • CMake 1.1 %
  • Shell 0.6 %