README.en.md 3.5 KB
Newer Older
C
chenxuyi 已提交
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
[简体中文](./README.md)|English
# Introducing Propeller
This doc introduct Propeller, a high level paddle API for general ML, Propeller encapsulate the following actions::
-  training
-  evaluation
-  prediction
-  export serving
  
Propeller provide the following benefits:

-   You can run Propeller-based models on a local host or on a distributed multi-server environment without changing your model. Furthermore, you can run Propeller-based models on CPUs, GPUs without recoding your model.
-   Propeller simplify sharing implementations between model developers.
-   Propeller do many things for you (logging, hot-start...)
-   Propeller buids Program and PyReader or you.
-   Propeller provide a safe distributed training loop that controls how and when to:
    -   build the Program
    -   initialize variables
    -   create checkpoint files and recover from failures
    -   save visualizable results

## install

```script
C
chenxuyi 已提交
24
pip install --user .
C
chenxuyi 已提交
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92
```

## Getting Started
```python

    #Define model
    class BowModel(propeller.Model):
        def __init__(self, config, mode):
            self.embedding = Embedding(config['emb_size'], config['vocab_size'])
            self.fc1 = FC(config['hidden_size'])
            self.fc2 = FC(config['hidden_size'])

        def forward(self, features):
            q, t = features 
            q_emb = softsign(self.embedding(q))
            t_emb = softsign(self.embedding(t))
            q_emb = self.fc1(q_emb)
            t_emb = self.fc2(t_emn)
            prediction = dot(q_emb,  emb)
            return prediction

        def loss(self, predictions, label):
            return sigmoid_cross_entropy_with_logits(predictions, label)

        def backward(self, loss):
            opt = AdamOptimizer(1.e-3)
            opt.mimize(loss)

        def metrics(self, predictions, label):
            auc = atarshi.metrics.Auc(predictions, label)
            return {'auc': auc}

    # hyper param comes from files/command line prompt/env vir
    run_config = propeller.parse_runconfig(args)
    hparams = propeller.parse_hparam(args)
    
    # Define data
    # `FeatureColumns` helps you to organize training/evluation files.
    feature_column = propeller.data.FeatureColumns(columns=[
            propeller.data.TextColumn('query', vocab='./vocab'),
            propeller.data.TextColumn('title', vocab='./vocab'),
            propeller.data.LabelColumn('label'),
        ])
    train_ds = feature_column.build_dataset(data_dir='./data',  shuffle=True, repeat=True)
    eval_ds = feature_column.build_dataset(data_dir='./data', shuffle=False, repeat=False)

    # Start training!
    propeller.train_and_eval(BowModel, hparams, run_config, train_ds, eval_ds)
```

## Main Feature
1. train_and_eval

    according to user-specified `propeller.Model`class,initialize training model in the following 2 modes: 1. TRAIN mode 2. EVAL mode and
    perform train_and_eval

2. FeatureColumns
    
    `FeatureColumns`is used to ogranize train data. With custmizable `Column` property, it can adaps to many ML tasks(NLP/CV...).
    `FeatureColumns` also do the preprocessing for you (tokenization, vocab lookup, serialization, batcing etc.)


3. Dataset

    `FeatureColumns` generats `Dataset`,or you can call `propeller.Dataset.from_generator_func` to build your own `Dataset`.

4. Summary
    To trace tensor histogram in training, simply:
C
chenxuyi 已提交
93 94 95
```python
    propeller.summary.histogram('loss', tensor) 
```
C
chenxuyi 已提交
96 97 98 99 100


## Contributing

1. This project is in alpha stage, any contribution is welcomed. Fill free to create a PR.