evaluator.md 2.5 KB
Newer Older
D
Dong Zhihong 已提交
1 2 3 4 5 6 7 8 9
## Evaluator Design

### The Problem

During training or serving, we provide the evaluation function to measure the model performance, e.g., accuracy, precision. In the operator based framework design, the data go through the network pipeline batch by batch. As a result, inside the operator, we only can calculate one minibatch metrics. We need to provide a mechanism to calculate the metrics for each N pass/batch the user wanted.

### Evaluator Design
Currently, every operation is expressed in the graph. we divide the evaluator process into three steps.

D
Dong Zhihong 已提交
10
1. Initialize the metric state and add it into the block.
D
Dong Zhihong 已提交
11

D
Dong Zhihong 已提交
12
2. Calculate the statistic of the metric state in every mini-batch. The single operator is only responsible for calculating necessary statistics for one mini-batch. For example, accuracy operator only calculate a minibatch data if run once.
D
Dong Zhihong 已提交
13 14 15 16 17


3. Merge the mini-batch statistics to form the evaluation result for multiple mini-batches. When it comes to distributed training/Multi-GPU training, aggregate the value from different devices.

### Implementation
D
Dong Zhihong 已提交
18 19
This design is shown in python API. 
Each metric operator need to caculate the metric statistic and return the batch aware states, Python side responsible for accumulate the states for each pass. 
D
Dong Zhihong 已提交
20

D
Dong Zhihong 已提交
21
    
D
Dong Zhihong 已提交
22 23 24
```python
class Evaluator(object):
    """
D
Dong Zhihong 已提交
25
    Evaluator Base class.
D
Dong Zhihong 已提交
26
    """
D
Dong Zhihong 已提交
27
    def __init__(self, name, **kwargs):
D
Dong Zhihong 已提交
28
       """
D
Dong Zhihong 已提交
29 30
       Different evaluator may has different metric states. E.g, Accuracy need two variables, total and right sample counts.
       Auc need four variables, `true_positives`,
D
Dong Zhihong 已提交
31
         `true_negatives`, `false_positives` and `false_negatives`. So every evaluator should create its needed variables and append to main_program
D
Dong Zhihong 已提交
32 33 34

       The initialization of Evaluator should be responsible for:
       create metric states and append to the main_program
D
Dong Zhihong 已提交
35 36 37
       """ 
       pass

D
Dong Zhihong 已提交
38 39 40 41 42 43 44 45
    def _update_ops(self, input, label, **kwargs)
       """
       Add mini-batch evaluator caculate operators to the main_program.
       Add increment operator to accumulate the metric states.
       """
    

    def reset(self, executor, program=None):
D
Dong Zhihong 已提交
46
      """
D
Dong Zhihong 已提交
47 48
      Reset metric states at the begin of each pass/user specified batch number.
      Execute the reset_program to reset the states.
D
Dong Zhihong 已提交
49
      """
D
Dong Zhihong 已提交
50
      
D
Dong Zhihong 已提交
51

D
Dong Zhihong 已提交
52
    def eval(self, executor, program=None):
D
Dong Zhihong 已提交
53
      """
D
Dong Zhihong 已提交
54
      Merge the mini-batch statistics to form the evaluation result for multiple mini-batches.
D
Dong Zhihong 已提交
55
      Execute the eval_program and return the result.
D
Dong Zhihong 已提交
56
      """
D
Dong Zhihong 已提交
57
      return eval_result
D
Dong Zhihong 已提交
58
```