Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • Paddle
  • Issue
  • #849

P
Paddle
  • 项目概览

PaddlePaddle / Paddle
大约 2 年 前同步成功

通知 2325
Star 20933
Fork 5424
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 1423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
P
Paddle
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 1,423
    • Issue 1,423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
    • 合并请求 543
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板
已关闭
开放中
Opened 12月 13, 2016 by saxon_zh@saxon_zhGuest0 of 10 tasks completed0/10 tasks

Paddle Multiple Language API/SDK

Created by: reyoung

Paddle is a standalone application now, which cannot customize training progress conveniently. The current API of Paddle only support the model inference.

We consider to rewrite current API now and make Paddle as a standard Python library and could easily port to another programming language.

There are several agreements and todos for this feature.

Using standard C99 API instead of SWIG

SWIG API is excellent for Python binding, but it seems not work smoothly for other languages, such as Julia, Go. Make Paddle integrated to other systems easily is an essential requirement for Paddle API.

Only expose GradientMachine.

The GradientMachine is an abstraction class of neural network, which can perform forward/backward on multiple local devices(CPU cores, GPU cards). In the cluster environment, we should provide the same abstraction with some additional configurations about node count, etc.

The GradientMachine will always act as a single thread program. We won't provide API about how to sending data from one GPU to another, how to use many CPUs, etc. We think that API is too low-level, and is not necessary to be exposed.

And there are few rules about GradientMachine-API:

  • Expose GradientMachine as details as possible.
  • The ParameterUpdater is exposed in C-API, but not for end-users.

Wrap C-API into a standard Python library.

Python is used widely in neural network domains. We will write a standard Python library as the first language binding.

However, Python library can be considered as a demo only; other language bindings are welcome to contribute.


Possible Python API Demos

Here is a possible Python usage in current design. It would be flux.

import paddle

@paddle.network(
    input_types = {
        'img': dense_vector(784),
        'label': integer_value(10)
    }
)
def mnist_network(img, label):
     hidden1 = fc_layer(input=img, size=200)
     hidden2 = fc_layer(input=hidden2, size=200)
     inference = fc_layer(input=hidden2, size=10, act=SoftmaxActivation())
     cost = classification_cost(input=inferrence, label=label)
     return cost


@mnist_network.train_data(files = ['dataset1.txt', 'dataset2.txt'])
@mnist_network.test_data(files=['dataset_test.txt'])
def provider(filename):
      with open(filename) as f:
          for each_sample in readFromFile(f):
               yield each_sample

if __name__ == '__main__':  #main function.
    network = mnist_network()
    #trainer = network.createClusterTrainer("node0, node1")
    trainer = network.createLocalTrainer("gpu0, gpu1").withSGDOptimizer(learning_rate=0.001, batch_size=200)

    for _ in xrange(100):
        trainer.trainOnePass()

Tasks

Step 1. Single Machine Development.

To implement this feature, several tasks should be done.

  • Remove all global variables in Paddle. Most of them are command line flags.

    • Related Issue #852 (closed). @reyoung @gangliao
  • Find a way not to core-dump when log(FATAL) or CHECK error.

    • Only not to exit program is not enough, we should also recover the process.
    • @hohdiy @jacquesqiao
  • Expose C-API about

    • Paddle Matrix/SparseMatrix/Vector with unit tests.
      • It is used for feed data to GradientMachine. So only get/set method should be exposed. The calculation methods are not urgent now.
    • Paddle Parameter/Argument with unit tests.
      • It is used for feed data, get parameter, etc.
    • Optimizers, parameter updaters with unit tests.
      • optimizers such as adam, sgd.
      • parameter updaters should be exposed from C++, or reimplement them in other languages should be discussed.
    • Expose GradientMachines with unit tests.
  • Python Library [should be parallel with C-API exposion]

    • Python Matrix/SparseMatrix/Vector with unit tests. * exchange data with NumPy.
    • Parameter/Arguments Python API with unit tests.
    • Optimizers, Paramater Updaters in Python.
    • GradientMachines in Python.

Step 2. Cluster development.

TBD

指派人
分配到
无
里程碑
无
分配里程碑
工时统计
无
截止日期
无
标识: paddlepaddle/Paddle#849
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7