Add high dimensional into vdl_create_log, update README and screenshots (#470)

66c979ef · Nicky Chan · daminglu · edec7564 · 66c979ef · 66c979ef
7 changed file
--- a/demo/vdl_create_scratch_log
+++ b/demo/vdl_create_scratch_log
@@ -149,6 +149,54 @@ with logw.mode("train") as logger:
        # Add a new text record to the log writer
        text.add_record(i, str)

+
+# Create embeddings
+with logw.mode("train") as logger:
+    embedding = logger.embedding()
+
+    hot_vectors = [
+        [10.0, 8.04, 2],
+        [8.0, 6.95, 2],
+        [13.0, 7.58, 2],
+        [9.0, 8.81, 3],
+        [11.0, 8.33, 4],
+        [14.0, 9.96, 5],
+        [6.0, 7.24, 6],
+        [4.0, 4.26, 7],
+        [12.0, 10.84, 8],
+        [7.0, 4.8, 1],
+        [5.0, 5.68, 2]
+    ]
+
+    labels = [
+        "yellow",
+        "blue",
+        "red",
+        "king",
+        "queen",
+        "man",
+        "women",
+        "kid",
+        "adult",
+        "light",
+        "dark"
+    ]
+    word_dict = {
+        "yellow": 1,
+        "blue": 2,
+        "red": 3,
+        "king": 4,
+        "queen": 5,
+        "man": 6,
+        "women": 7,
+        "kid": 8,
+        "adult": 9,
+        "light": 10,
+        "dark": 0
+    }
+    embedding.add_embeddings_with_word_dict(hot_vectors, word_dict)
+
+
 def download_graph_image():
    '''
    This is a scratch demo, it do not generate a ONNX proto, but just download an image

--- a/docs/getting_started/high_dimensional_2d.png
+++ b/docs/getting_started/high_dimensional_2d.png
--- a/docs/getting_started/high_dimensional_3d.png
+++ b/docs/getting_started/high_dimensional_3d.png
--- a/docs/getting_started/introduction_cn.md
+++ b/docs/getting_started/introduction_cn.md
@@ -52,6 +52,7 @@ visualdl --logdir ./log --port 8080
 1. Scalar. 支持Scalar打点数据展示，如上图所示：
 	- 可以将训练过程中的loss，准确率等信息以折现的形式体现出来，方便看出整体趋势。
 	- 可以在同一个图片中画出多条曲线，方便对比分析。
+
 2. Image. 支持图片展示。
 	- 输入图片展示，方便查看数据样本的质量
 	- 支持卷积层的结果展示，方便查看卷积之后的效果。
@@ -63,14 +64,19 @@ visualdl --logdir ./log --port 8080
        <img src="https://raw.githubusercontent.com/PaddlePaddle/VisualDL/develop/docs/getting_started/image-gan.png" height="300" width="300"/>
        </p>

-3. Histogram. 参数分布展示，方便查看参数矩阵中数值的分布曲线，以及随着训练的进行，参数数值分布的变化趋势。
+3. Audio. 支持播放音频样本。
+    - 输入音频或生成音频展示，让用户播放或下载
+    - 针对于语音识别和语音合成模型的训练
+    - 方便查看数据样本的质量或调查问题
+
+4. Histogram. 参数分布展示，方便查看参数矩阵中数值的分布曲线，以及随着训练的进行，参数数值分布的变化趋势。
 	- 帮助用户理解训练过程，随着训练的进行，参数从一种分布变化到另外一种分布，背后对应的原因。
 	- 方便用户判断训练是否正常，例如参数迅速变小到0或者迅速变大，说明有梯度消失或者梯度爆炸的问题。
        <p align="left">
        <img src="https://raw.githubusercontent.com/PaddlePaddle/VisualDL/develop/docs/getting_started/histogram.png" />
        </p>

-4. Graph. 方便查看深度神经网络的模型结构。
+5. Graph. 方便查看深度神经网络的模型结构。
 	- Graph支持直接对[ONNX](http://onnx.ai/)的模型进行预览，因为MXNet，Caffe2，Pytorch和CNTK都支持转成ONNX的模型，也就间接支持了这些框架的模型可视化功能。
 	- 便于排查网络配置的错误
 	- 帮助理解网络结构
@@ -78,6 +84,19 @@ visualdl --logdir ./log --port 8080
        <img src="https://raw.githubusercontent.com/PaddlePaddle/VisualDL/develop/docs/getting_started/graph.png" height="250" width="400"/>
        </p>

+6. High Dimensional: 用高维度数据映射在2D/3D来可视化嵌入
+    - 便于了解不同物件（文字，图片）的相似度，相关性
+    - 把物件嵌入到向量空间来可视化邻居距离和聚集
+    - 支持降维算法像PCA, T-SNE
+    - 从以下图片(`vdl_create_log`例子生成)能看出'man' and 'women', 'king' and 'queen'的相似性
+        <p align="left">
+        <img src="https://raw.githubusercontent.com/PaddlePaddle/VisualDL/develop/docs/getting_started/high_dimensional_2d.png" height="250" width="400"/>
+        </p>
+    - 这是在3D图的效果
+        <p align="right">
+        <img src="https://raw.githubusercontent.com/PaddlePaddle/VisualDL/develop/docs/getting_started/high_dimensional_3d.png" height="250" width="400"/>
+        </p>
+
 ### 完全开放
 Visual DL作为一个深度学习任务可视化工具，同时支持所有的深度学习框架。SDK层面可以方便的集成到Python或者C++项目中，Graph通过支持ONNX支持了PaddlePaddle、MxNet、PyTorch和Caffe2等流行的深度学习框架。


--- a/docs/getting_started/introduction_en.md
+++ b/docs/getting_started/introduction_en.md
@@ -68,7 +68,12 @@ visualdl --logdir ./log --port 8080
        <img src="https://raw.githubusercontent.com/PaddlePaddle/VisualDL/develop/docs/getting_started/image-gan.png" height="300" width="300"/>
        </p>

-3. Histogram: display of parameter distribution, easy to check distribution curves in each tensor,
+3. Audio: collect audio sampling for playback.
+    - display audio input or generated audio for users to play or download audio clip
+    - useful for training models for Speech Recognition or Text to Speech
+    - help users to investigate issues or check audio quality
+
+4. Histogram: display of parameter distribution, easy to check distribution curves in each tensor,
 show the trend of parameter distribution.

 	- help users understand the training process and the underneath reason for the change from one parameter distribution to another
@@ -78,7 +83,7 @@ show the trend of parameter distribution.
        <img src="https://raw.githubusercontent.com/PaddlePaddle/VisualDL/develop/docs/getting_started/histogram.png" />
        </p>

-4. Graph: visualize the model structure of deep learning networks.
+5. Graph: visualize the model structure of deep learning networks.
    - Graph supports the preview of [ONNX](http://onnx.ai/) model. Since models of MXNet, Caffe2, PyTorch and CNTK can be converted to ONNX models easily,
    Visual DL can also support these models indirectly
    - easy to see wrong configuration of a network
@@ -87,6 +92,19 @@ show the trend of parameter distribution.
        <img src="https://raw.githubusercontent.com/PaddlePaddle/VisualDL/develop/docs/getting_started/graph.png" height="250" width="400"/>
        </p>

+6. High Dimensional: visualize data embeddings by projects high-dimensional data into 2D / 3D.
+    - Help users understand the similarity, correlation of different objects (e.g. word / image)
+    - Map objects to vectors in vector space to visualize distance of neighbors and form clusters
+    - Support dimension reduction algorithm like PCA, T-SNE
+    - In following image(generated by `vdl_create_log` example), you can tell the similarity of 'man' and 'women', 'king' and 'queen' by their position.
+        <p align="left">
+        <img src="https://raw.githubusercontent.com/PaddlePaddle/VisualDL/develop/docs/getting_started/high_dimensional_2d.png" height="250" width="400"/>
+        </p>
+    - Here is how it look like in 3D:
+        <p align="right">
+        <img src="https://raw.githubusercontent.com/PaddlePaddle/VisualDL/develop/docs/getting_started/high_dimensional_3d.png" height="250" width="400"/>
+        </p>
+
 ### Purely Open Source
 As a deep learning visualization tool, Visual DL support most deep learning frameworks. On the SDK perspective,
 it is easy to integrate into Python and C++ projects. Through ONNX, Visual DL's Graph component can support

--- a/docs/getting_started/quick_start_cn.md
+++ b/docs/getting_started/quick_start_cn.md
@@ -6,8 +6,10 @@ VisualDL 是一个面向深度学习任务的可视化工具，可用于训练

 - scalar，趋势图，可用于训练测试误差的展示
 - image, 图片的可视化，可用于卷积层或者其他参数的图形化展示
+- audio, 可用于播放输入或生成的音频样本
 - histogram, 用于参数分布及变化趋势的展示
 - graph，用于训练模型结构的可视化
+- high dimensional, 用于可视化高纬度数据来展示物件相关性

 VisualDL提供原生的Python和C++ SDK，可以支持多种深度学习平台。用户可以在特定深度学习平台上利用Python SDK进行简单配置来支持可视化，也可以利用 C++ SDK深入嵌入到平台底层。


--- a/docs/getting_started/quick_start_en.md
+++ b/docs/getting_started/quick_start_en.md
@@ -6,8 +6,10 @@ Currently, VisualDL supports visualization features as follows:

 - Scalar: plot of trends, can be used to show error trends during training.
 - Image: image visualization, can be used to show intermediate images from CNN.
+- Audio: can be used to play input audio samples or generated audio samples.
 - Histogram: can be used to show parameter distribution and trend.
 - Graph: can be used to visualize model structure.
+- High dimensional: can be used to visualize embeddings projection.

 VisualDL provides both Python SDK and C++ SDK in nature. It can support various frameworks.
 Users can retrieve visualization data by simply adding a few lines of code using Pythong SDK.