visualdl_usage_en.md 23.6 KB
Newer Older
C
chenlong 已提交
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633
# VisualDL user guide

## Overview

VisualDL is a toolkit to visualize data generated in deep learning tasks. VisualDL make use of [ECharts](https://echarts.apache.org/en/feature.html) to display the distribution and change tendency of data, so that users can view data more clearly and intuitively.

To be conductive to analyze the characteristics of data, detect errors, and optimize the neural network model, VisualDL provides seven functional components, including  scalar, histogram, image, text, audio, high dimensional and graph.

| Component name | Display chart | Function of component |
|:----:|:----:|:---|
|<a href="#1">scalar</a>| Line Chart | Dynamically display scalar data, such as loss, accuracy, etc.|
|<a href="#2">histogram</a>| Histogram | Dynamically display the numerical distribution and change tendency of parameters (such as weight matrix, offset, gradient, etc)|
|<a href="#3">image</a>| Image | Dynamically display images, including input images and convolution results, it is conveniently to view the change tendency of intermediate process|
|<a href="#4">text</a>| Text | Dynamically display text |
|<a href="#5">audio</a>| Audio | Dynamically display audio, users can play directly or choose to download|
|<a href="#6">high dimensional</a>| Coordinate | Map high dimensional data into 2D/3D space, for making it easy to observe the correlation of different data|
|<a href="#7">graph</a>| Directed Graph | Display the neural networks |

## Toolkits of adding data

The six components (scalar, histogram, image, text, audio and high dimensional) are used to add data during program running. Class LogWriter must be initialized before adding data, in order to set the storage path and synchronization cycle. The input parameters of each components will be saved as log file in disk, after that the log file will be loaded into front end to display.  

### LogWriter

LogWriter is a Python wrapper to write data to log file with the data format defined as in protobuf file [storage.proto](https://github.com/PaddlePaddle/VisualDL/blob/develop/visualdl/storage/storage.proto).

The definition of LogWriter :

```python
class LogWriter(dir, sync_cycle)
```

> :param dir : the directory path to the saved log files.  
> :param sync_cycle : specify how often should the system store data into the file system, that is, system will save the data into the file system once operations count reaches sync_cycle.  
> :return: a new LogWriter instance.  

Demo 1.  Create a LogWriter instance

```python
# Create a LogWriter instance named log_writer
log_writer = LogWriter("./log", sync_cycle=10)
```

class LogWriter include the following member functions:

* `mode()`  
* `scalar()`, `histogram()`, `image()`, `text()`, `audio()`, `embedding()`  

The member function mode() is used to specify the phase of program running. The input string is customized, such as `test`, `validation`, `test`, `conv_layer1`. Components with same mode are grouped together, so users can choose different modes to display on the frontend webpage.

The member functions scalar(), histogram(), image(), text(), audio() and embedding() are used to create component instance。

Demo 2. Use LogWriter instance to create component instance

```python
# Set the name of mode to "train", and create a scalar component instance
with log_writer.mode("train") as logger:
    train_scalar = logger.scalar("acc")

# Set the name of mode to "test", and create an image component instance
with log_writer.mode("test") as shower:
    test_image = shower.image("conv_image", 10, 1)
```

### scalar -- component to draw line charts

The <a name="1">scalar</a> component is used to draw line charts. By passing scalar data such as loss value, accuracy as input parameters into the scalar() function, the frontend webpage will display the data in the form of line charts. It can facilitate users to grasp the changing tendency of training process.

The first step of using scalar component is initializing the member function scalar() of LogWriter instance, then you can add data through the member function add_record() of ScalarWriter instance.

* The member function `scalar()` of LogWriter instance :  

```python
def scalar(tag, type)  
```  

> :param tag : The scalar writer will label the data with tag.  
> :param type : Data type, optional choice is limited to “float”, "double", "int", the default setting is "float".  
> :return : A ScalarWriter instance to handle step and value records.  

* The member function `add_record()` of ScalarWriter instance :  

```python
def add_record(step, value)  
```

> :param step : Step number.  
> :param value : Input data.  

Demo 3. scalar demo program[Github](https://github.com/PaddlePaddle/VisualDL/blob/develop/demo/component/scalar-demo.py)

```python
# coding=utf-8
from visualdl import LogWriter

# Create a LogWriter instance
log_writer = LogWriter("./log", sync_cycle=20)

# Create two ScalarWriter instances, whose mode is set to be "train"
with log_writer.mode("train") as logger:
    train_acc = logger.scalar("acc")
    train_loss = logger.scalar("loss")

# Create a ScalarWriter instance, whose mode is set to be "test"
with log_writer.mode("test") as logger:
    test_acc = logger.scalar("acc")

value = [i/1000.0 for i in range(1000)]
for step in range(1000):
    # Add data
    train_acc.add_record(step, value[step])
    train_loss.add_record(step, 1 / (value[step] + 1))
    test_acc.add_record(step, 1 - value[step])
```

After running the demo program above, you can start the flask server with command ``visualdl`` :

```shell
visualdl --logdir ./log --host 0.0.0.0 --port 8080
```

By opening the URL [http://0.0.0.0:8080](http://0.0.0.0:8080) in your browser,you will see the interface below.

<p align="center">
<img src="https://raw.githubusercontent.com/PaddlePaddle/VisualDL/develop/demo/component/usage-interface/scalar-interface.png" width=800><br/>
Figure 1. scalar component displays line charts <br/>
</p>

The right sidebar of VisualDL has adjustment options for each component, take scalar component as example:

* Smoothing : To adjust the smoothness of the line charts.  
* X-axis : The horizontal ordinate of line charts, optional choice : Step, Relative, Wall Time.  
* Tooltip sorting : Sorting method of tag, optional choice : default, descending, ascending, nearest.  

There is also a ``RUNNING`` button at the bottom of the right sidebar, the frontend webpage will send request to the flask server for data synchronization. Switching to ``Stopped``, it will pause the data update.  

### histogram -- component to display data distribution

The <a name="2">histogram</a> component is used to draw histogram for displaying the distribution of input data. By passing some parameters of model training, such as weight matrices, biases, gradient, as input parameters into the `histogram()` function, the frontend webpage will display the data in the form of histogram. It can facilitate users to view the change tendency of parameters distribution.

The first step of using histogram component is initializing the member function `histogram()` of LogWriter instance, then you can add data through the member function `add_record()` of HistogramWriter instance.

* The member function histogram() of LogWriter instance :

```python
def histogram(tag, num_buckets, type)  
```

> :param tag : The histogram writer will label the data with tag.  
> :param num_buckets : The number of pillar in the histogram.  
> :param type : Data type, optional choice is limited to “float”, "double", "int", the default setting is "float".  
> :return : A HistogramWriter instance to record distribution.  

* The member function add_record() of HistogramWriter instance :

```python
def add_record(step, value)  
```

> :param step : Step number.  
> :param value : Input data, type is list[].  

Demo 4. histogram demo program [Github](https://github.com/PaddlePaddle/VisualDL/blob/develop/demo/component/histogram-demo.py)

```python
# coding=utf-8
import numpy as np
from visualdl import LogWriter

# Create a LogWriter instance
log_writer = LogWriter('./log', sync_cycle=10)

# Create a HistogramWriter instance, whose mode is set to be "train"
with log_writer.mode("train") as logger:
    param1_histogram = logger.histogram("param1", num_buckets=100)

# Loop
for step in range(1, 101):
    # Create input data
    interval_start = 1 + 2 * step/100.0
    interval_end = 6 - 2 * step/100.0
    data = np.random.uniform(interval_start, interval_end, size=(10000))

    # Use member function add_record() to add data
    param1_histogram.add_record(step, data)
```

After running the demo program above, you can start the flask server with command ``visualdl`` :

```shell
visualdl --logdir ./log --host 0.0.0.0 --port 8080
```

By opening the URL [http://0.0.0.0:8080](http://0.0.0.0:8080) in your browser,you will see the interface below.

<p align="center">
<img src="https://raw.githubusercontent.com/PaddlePaddle/VisualDL/develop/demo/component/usage-interface/histogram-interface.png" width=800><br/>
Figure 2. histogram component displays histograms <br/>
</p>

### image -- component to display image

The <a name="3">image</a> component is used to visualize the image data. By passing the image data (type numpy.ndarray) into the image() function, the frontend webpage will display the image directly.

The first step of using image component is initializing the member function image() of LogWriter instance. Then you can add data through the member functions start_sampling(), is_sample_taken(), set_sample(), and finish_sample() of ImageWriter instance.

* The member function image() of LogWriter instance :

```python
def image(tag, num_samples, step_cycle)  
```

> :param tag : The image writer will label the image with tag.  
> :param num_samples : Appoint the number of samples to take in a step.  
> :param step_cycle : Store every `step_cycle` as a record, the default value is 1.  
> :return:  A ImageWriter instance to sample images.  

* Start a new sampling cycle, allocate memory space for the sampled data

```python
def start_sampling()
```

* Determine whether the picture should be sampled or not. If the return value is -1, it means no sampling, otherwise it should be sampled :

```python
def is_sample_taken()
```

* Add image data :

```python
def set_sample(index, image_shape, image_data)  
```  

> :param index : Combined with tag, used to determine the sub-frame of the image display.  
> :param image_shape : The shape of image, [weight, height, channel(RGB is 3, GrayScale is 1)].  
> :param image_data : Image data with type numpy.ndarray, member function flatten() can turn the shape to row vector.  

* End the current sampling period, load the sampled data into disk, and release the memory space :

```python
def finish_sample()  
```

Demo 5. image demo program [Github](https://github.com/PaddlePaddle/VisualDL/blob/develop/demo/component/image-demo.py)

```python
# coding=utf-8
import numpy as np
from visualdl import LogWriter
from PIL import Image


def random_crop(img):
    '''
    This function is used to get a random block (100*100 pixels) of data img.
    '''
    img = Image.open(img)
    w, h = img.size
    random_w = np.random.randint(0, w - 100)
    random_h = np.random.randint(0, h - 100)
    return img.crop((random_w, random_h, random_w + 100, random_h + 100))


# Create a LogWriter instance
log_writer = LogWriter("./log", sync_cycle=10)

# Create a ImageWriter instance
ns = 2
with log_writer.mode("train") as logger:
    input_image = logger.image(tag="test", num_samples=ns)

# The variable sample_num is used to record the number of image data that have been sampled
sample_num = 0

for step in range(6):
    # Set the condition of start_sampling()
    if sample_num == 0:
        input_image.start_sampling()

    idx = input_image.is_sample_taken()
    # if idx != -1,sample this data, otherwise skip
    if idx != -1:
        # Get image data
        image_path = "test.jpg"
        image_data = np.array(random_crop(image_path))

        # Add data
        input_image.set_sample(idx, image_data.shape, image_data.flatten())
        sample_num += 1

        # If sampling of the present period have been completed, call finish_sample()
        if sample_num % ns == 0:
        input_image.finish_sampling()
        sample_num = 0
```

After running the demo program above, you can start the flask server with command ``visualdl`` :

```shell
visualdl --logdir ./log --host 0.0.0.0 --port 8080
```

By opening the URL [http://0.0.0.0:8080](http://0.0.0.0:8080) in your browser,then click the ``SAMPLES`` option at the top of the webpage, you will see the interface below.

<p align="center">
<img src="https://raw.githubusercontent.com/PaddlePaddle/VisualDL/develop/demo/component/usage-interface/image-interface.png" width=800><br/>
Figure 3. image component displays images <br/>
</p>

Each subgraph has a horizontal axis which can be dragged to display images of different steps.

### text -- component to display text

The <a name="4">text</a> component is used to visualize the text data. By passing the text data (type string) into the text() function, the frontend webpage will display the image directly.

The first step of using text component is initializing the member function text() of LogWriter instance, then you can add data through the member function add_record() of TextWriter instance.

* The member function text() of LogWriter instance :  

```python
def text(tag)
```

> :param tag : Combined with tag, used to determine the sub-frame of the image display.  

* The member function add_record() of TextWriter instance :  

```python
def add_record(step, str)
```

> :param step : Step number.  
> :param value : Input data, type is string.  

Demo 6. text demo program [Github](https://github.com/PaddlePaddle/VisualDL/blob/develop/demo/component/text-demo.py)

```python
# coding=utf-8
from visualdl import LogWriter

# create a LogWriter instance
log_writter = LogWriter("./log", sync_cycle=10)

# Create a TextWriter instance
with log_writter.mode("train") as logger:
    vdl_text_comp = logger.text(tag="test")

# Use member function add_record() to add data
for i in range(1, 6):
    vdl_text_comp.add_record(i, "这是第 %d 个 step 的数据。" % i)
    vdl_text_comp.add_record(i, "This is data %d ." % i)
```

After running the demo program above, you can start the flask server with command ``visualdl`` :

```shell
visualdl --logdir ./log --host 0.0.0.0 --port 8080
```

By opening the URL [http://0.0.0.0:8080](http://0.0.0.0:8080) in your browser,then click the ``SAMPLES`` option at the top of the webpage, you will see the interface below.

<p align="center">
<img src="https://raw.githubusercontent.com/PaddlePaddle/VisualDL/develop/demo/component/usage-interface/text-interface.png" width=800><br/>
Figure 4. text component displays texts  <br/>
</p>

Each subgraph has a horizontal axis which can be dragged to display text of different steps.

### audio -- component to play audio

The <a name="5"> audio</a> component is used to play audio. By passing the audio data (type numpy.ndarray) into the audio() function, users can play audio directly, or choose to download.  

The first step of using audio component is initializing the member function audio() of LogWriter instance. Then you can add data through the member functions start_sampling(), is_sample_taken(), set_sample(), and finish_sample() of AudioWriter instance.

* The member function audio() of LogWriter instance :

```python  
def audio(tag, num_samples, step_cycle)  
```

> :param tag : The audio writer will label the audio with tag.  
> :param num_samples : Appoint the number of samples to take in a step.  
> :param step_cycle : Store every `step_cycle` as a record, the default value is 1.  
> :return:  An AudioWriter instance to sample images.  

* Start a new sampling cycle, allocate memory space for the sampled data :

```python
def start_sampling()
```

* Determine whether the audio should be sampled or not. If the return value is -1, it means no sampling, otherwise it should be sampled :

```python
def is_sample_taken()
```

* Add audio data :

```python
def set_sample(index, audio_params, audio_data)
```

> :param index : Combined with tag, used to determine the sub-frame of the audio.  
> :param audio_params : The parameters of audio, [sample rate, sample width, channels].  
> :param audio_data : Audio data with type numpy.ndarray, member function flatten() can turn the shape to row vector.  

* End the current sampling period, load the sampled data into disk, and release the memory space :

```python
def finish_sample()  
```

Demo 7. audio demo program [Github](https://github.com/PaddlePaddle/VisualDL/blob/develop/demo/component/audio-demo.py)

```python
# coding=utf-8
import numpy as np
import wave
from visualdl import LogWriter


def read_audio_data(audio_path):
    """
    Read audio data
    """
    CHUNK = 4096
    f = wave.open(audio_path, "rb")
    wavdata = []
    chunk = f.readframes(CHUNK)

    while chunk:
        data = np.fromstring(chunk, dtype='uint8')
        wavdata.extend(data)
        chunk = f.readframes(CHUNK)

    # 8k sample rate, 16bit frame, 1 channel
    shape = [8000, 2, 1]

    return shape, wavdata


# Create a LogWriter instance
log_writter = LogWriter("./log", sync_cycle=10)

# Create an AudioWriter instance
ns = 2
with log_writter.mode("train") as logger:
    input_audio = logger.audio(tag="test", num_samples=ns)

# The variable sample_num is used to record the number of audio data that have been sampled
audio_sample_num = 0

for step in range(9):
# Set the condition of start_sampling()
if audio_sample_num == 0:
    input_audio.start_sampling()

    # Get idx
    idx = input_audio.is_sample_taken()
    # if idx != -1,sample this data, otherwise skip
    if idx != -1:
        # Read audio data
        audio_path = "test.wav"
        audio_shape, audio_data = read_audio_data(audio_path)

        # Add data through member function set_samle()
        input_audio.set_sample(idx, audio_shape, audio_data)
        audio_sample_num += 1

        #  If sampling of the present period have been completed, call finish_sample()
        if audio_sample_num % ns ==0:
            input_audio.finish_sampling()
            audio_sample_num = 0
```

After running the demo program above, you can start the flask server with command ``visualdl`` :

```shell
visualdl --logdir ./log --host 0.0.0.0 --port 8080
```

By opening the URL [http://0.0.0.0:8080](http://0.0.0.0:8080) in your browser,then click the ``SAMPLES`` option at the top of the webpage, you will see the interface below.

<p align="center">
<img src="https://raw.githubusercontent.com/PaddlePaddle/VisualDL/develop/demo/component/usage-interface/audio-interface.png" width=800><br/>
Figure 5. audio component displays audios <br/>
</p>

Each subgraph has a horizontal axis which can be dragged to play audio of different steps.

### high dimensional -- component of dimensionality reduction

The role of <a name="6">high dimensional</a> component is to map data into 2D or 3D space for embedding visualization, which is helpful for users to understand the relevance of different data.

The high dimensional component supports the following two dimensionality reduction algorithms :

* PCA    : Principle Component Analysis  
* [t-SNE](https://lvdmaaten.github.io/tsne/)  : t-distributed stochastic neighbor embedding  

The first step of using audio component is initializing the member function embedding() of LogWriter instance. Then you can add data through the member functions add_embeddings_with_word_dict() of EmbeddingWriter instance.


* The member function embedding() of LogWriter instance

```python
def embedding()  
```

* The member function add_embeddings_with_word_dict() of EmbeddingWriter instance :

```python
def add_embeddings_with_word_dict(data, Dict)  
```

> :param data : input data , type List[List(float)].  
> :param Dict : dictionary, type Dict[str, int].  

Demo 8. high dimensional demo program [Github](https://github.com/PaddlePaddle/VisualDL/blob/develop/demo/component/embedding-demo.py)

```python
# coding=utf-8
import numpy as np
from visualdl import LogWriter

# Create a LogWriter instance
log_writer = LogWriter("./log", sync_cycle=10)

# Create an EmbeddingWriter instance
with log_writer.mode("train") as logger:
    train_embedding = logger.embedding()

# Initialize data List[List(float)]  
hot_vectors = np.random.uniform(1, 2, size=(10, 3))  
word_dict = {
    "label_1": 5,
    "label_2": 4,
    "label_3": 3,
    "label_4": 2,
    "label_5": 1,}

# Add data through member function add_embeddings_with_word_dict(data, Dict)
train_embedding.add_embeddings_with_word_dict(hot_vectors, word_dict)
```

After running the demo program above, you can start the flask server with command ``visualdl`` :

```shell
visualdl --logdir ./log --host 0.0.0.0 --port 8080
```

By opening the URL [http://0.0.0.0:8080](http://0.0.0.0:8080) in your browser,then click the ``HIGHDIMENSIONAL`` option at the top of the webpage, you will see the interface below.

<p align="center">
<img src="https://raw.githubusercontent.com/PaddlePaddle/VisualDL/develop/demo/component/usage-interface/embedding-2D.png" width=800><br/>
Figure 6. high dimensional component displays plane coordinates <br/>
</p>

<p align="center">
<img src="https://raw.githubusercontent.com/PaddlePaddle/VisualDL/develop/demo/component/usage-interface/embedding-3D.png" width=800><br/>
Figure 7. High dimensional component displays Cartesian coordinates <br/>
</p>

## graph -- component to visualize neural network

The role of <a name="7">graph</a> component is to visualize neural network. This component can display models with
Paddle format or [ONNX](https://onnx.ai) format. The graph component can help users understand the model structure of the neural network, and also help to troubleshoot neural network configuration errors.

Unlike other components that need to record data, the only one prerequisite for using graph component is specifying the storage path of the model file. That is, adding the option --model_pb to the command ``visualdl`` to specify the storage path of the model file, then you can see the corresponding neural network in the frontend webpage.

Demo 9. graph demo program(How to save a Lenet-5 model by Paddle)[Github](https://github.com/PaddlePaddle/VisualDL/blob/develop/demo/component/graph-demo.py)

```python
# coding=utf-8
import paddle.fluid as fluid


def lenet_5(img):
    '''
    Define the Lenet-5 model
    '''
    conv1 = fluid.nets.simple_img_conv_pool(
        input=img,
        filter_size=5,
        num_filters=20,
        pool_size=2,
        pool_stride=2,
        act="relu")

    conv1_bn = fluid.layers.batch_norm(input=conv1)

    conv2 = fluid.nets.simple_img_conv_pool(
        input=conv1_bn,
        filter_size=5,
        num_filters=50,
        pool_size=2,
        pool_stride=2,
        act="relu")

    predition = fluid.layers.fc(input=conv2, size=10, act="softmax")
    return predition


# Variable assignment
image = fluid.layers.data(name="img", shape=[1, 28, 28], dtype="float32")
predition = lenet_5(image)

place = fluid.CPUPlace()
exe = fluid.Executor(place=place)
exe.run(fluid.default_startup_program())

# save the result to "./paddle_lenet_5_model"
fluid.io.save_inference_model(
    "./paddle_lenet_5_model",
    feeded_var_names=[image.name],
    target_vars=[predition],
    executor=exe)
```

After running the demo program above, you can start the flask server with command ``visualdl`` :  

```shell
visualdl --logdir ./log --host 0.0.0.0 --port 8080 --model_pb paddle_lenet_5_model
```

By opening the URL [http://0.0.0.0:8080](http://0.0.0.0:8080) in your browser,then click the `GRAPHS` option at the top of the webpage, you will see the interface below.

<p align="center">
<img src="https://raw.githubusercontent.com/PaddlePaddle/VisualDL/develop/demo/component/usage-interface/graph.png" width=800><br/>
Figure 8. graph component displays the model structure of Lenet-5 <br/>
</p>