diff --git a/RELEASE.md b/RELEASE.md
index cd090b96f6bae5093a2560f574c12e2e40069e53..fb222866179bd98d5aa3ad8b6d0167be5fac8156 100644
--- a/RELEASE.md
+++ b/RELEASE.md
@@ -1,25 +1,54 @@
-Release Notes
-=====
-
-v0.6.0 (2018-04-04)
+# v0.9.0 (2018-07-20)
 ------
-1. Change mace header interfaces, only including necessary methods.
+## Improvements
+1. New work flow and documents.
+2. Separate the model library from MACE library.
+3. Reduce the size of static and dynamic library.
+4. Support `ArgMax` Operations.
+5. Support `Deconvolution` of Caffe.
+6. Support NDK-17b.
 
-v0.6.2 (2018-05-17)
-------
-* Return status instead of abort when allocate failed
+## Incompatible Changes
+1. Use file to store OpenCL tuned parameters and Add `SetOpenCLParameterPath` API.
+
+## New APIs
+1. Add a new `MaceEngine::Init` API with model data file.
+
+## Bug Fixed
+1. Not unmap the model data file when load model from files with CPU runtime.
+2. 2D LWS tuning does not work.
+3. Winograd convolution of GPU failed when open tuning.
+4. Incorrect dynamic library of host.
 
-v0.6.3 (2018-05-21)
+## Acknowledgement
+Appreciate for the following guys contribute code to make MACE better.
+
+Zero King(@l2dy), James Bie(@JamesBie), Sun Aries(@SunAriesCN), Allen(@allen0125),
+conansherry(@conansherry), 黎明灰烬(@jackwish)
+
+
+# v0.8.0 (2018-05-31)
 ------
-1. support `float` `data_type` when running in GPU
+1. Change build and run tools
+2. Handle runtime failure
 
-v0.7.0 (2018-05-18)
+# v0.7.0 (2018-05-18)
 ------
 1. Change interface that report error type
 2. Improve CPU performance
 3. Merge CPU/GPU engine to on
 
-v0.8.0 (2018-05-31)
+# v0.6.3 (2018-05-21)
 ------
-1. Change build and run tools
-2. Handle runtime failure
+1. support `float` `data_type` when running in GPU
+
+
+# v0.6.2 (2018-05-17)
+------
+* Return status instead of abort when allocate failed
+
+
+# v0.6.0 (2018-04-04)
+------
+1. Change mace header interfaces, only including necessary methods.
+
diff --git a/docs/installation/env_requirement.rst b/docs/installation/env_requirement.rst
index 8f4d491c96351554069abe66f0a6abfd08282a8e..2cb7b6b35ae24cb0f5d5590298371c9fd87e274e 100644
--- a/docs/installation/env_requirement.rst
+++ b/docs/installation/env_requirement.rst
@@ -65,4 +65,4 @@ Optional dependencies
 .. note::
 
     - For Android build, `ANDROID_NDK_HOME` must be confifigured by using ``export ANDROID_NDK_HOME=/path/to/ndk``
-    - It will link ``libc++`` instead of ``libgnustl`` if ``NDK version >= r17b`` and ``bazel version >= 0.13.0``
+    - It will link ``libc++`` instead of ``gnustl`` if ``NDK version >= r17b`` and ``bazel version >= 0.13.0``, please refer to `NDK cpp-support <https://developer.android.com/ndk/guides/cpp-support>`__.
diff --git a/docs/user_guide/basic_usage.rst b/docs/user_guide/basic_usage.rst
index fac53270c8aa58f06237b7cb7ec5d2f7c5df4e22..72ed0b82f1766aea70cde83074c81de953120138 100644
--- a/docs/user_guide/basic_usage.rst
+++ b/docs/user_guide/basic_usage.rst
@@ -204,8 +204,9 @@ The generated model files will be stored in ``build/${library_name}/model`` fold
 =============================
 4. Build MACE into a library
 =============================
+You could Download the prebuilt MACE Library from `Github MACE release page <https://github.com/XiaoMi/mace/releases>`__.
 
-Use bazel to build MACE source code into a library.
+Or use bazel to build MACE source code into a library.
 
     .. code:: sh
 
@@ -259,9 +260,26 @@ to run and validate your model.
 6. Deploy your model into applications
 =======================================
 
+You could run model on CPU, GPU and DSP (based on the `runtime` in your model deployment file).
+However, there are some differences in different devices.
+
+* **CPU**
+
+    Almost all of mobile SoCs use ARM-based CPU architecture, so your model could run on different SoCs in theory.
+
+* **GPU**
+
+    Although most GPUs use OpenCL standard, but there are some SoCs not fully complying with the standard,
+    or the GPU is too low-level to use. So you should have some fallback strategies when the GPU run failed.
+
+* **DSP**
+
+    MACE only support Qualcomm DSP.
+
 In the converting and building steps, you've got the static/shared library, model files and
 header files.
 
+
 ``${library_name}`` is the name you defined in the first line of your deployment YAML file.
 
 .. note::
@@ -313,12 +331,7 @@ Please refer to \ ``mace/examples/example.cc``\ for full usage. The following li
     #include "mace/public/mace.h"
     #include "mace/public/mace_runtime.h"
 
-    // 0. Set pre-compiled OpenCL binary program file paths when available
-    if (device_type == DeviceType::GPU) {
-      mace::SetOpenCLBinaryPaths(opencl_binary_paths);
-    }
-
-    // 1. Set compiled OpenCL kernel cache, this is used to reduce the
+    // 0. Set compiled OpenCL kernel cache, this is used to reduce the
     // initialization time since the compiling is too slow. It's suggested
     // to set this even when pre-compiled OpenCL program file is provided
     // because the OpenCL version upgrade may also leads to kernel
@@ -328,14 +341,14 @@ Please refer to \ ``mace/examples/example.cc``\ for full usage. The following li
         new FileStorageFactory(file_path));
     ConfigKVStorageFactory(storage_factory);
 
-    // 2. Declare the device type (must be same with ``runtime`` in configuration file)
+    // 1. Declare the device type (must be same with ``runtime`` in configuration file)
     DeviceType device_type = DeviceType::GPU;
 
-    // 3. Define the input and output tensor names.
+    // 2. Define the input and output tensor names.
     std::vector<std::string> input_names = {...};
     std::vector<std::string> output_names = {...};
 
-    // 4. Create MaceEngine instance
+    // 3. Create MaceEngine instance
     std::shared_ptr<mace::MaceEngine> engine;
     MaceStatus create_engine_status;
 
@@ -348,10 +361,10 @@ Please refer to \ ``mace/examples/example.cc``\ for full usage. The following li
                                   device_type,
                                   &engine);
     if (create_engine_status != MaceStatus::MACE_SUCCESS) {
-      // Report error
+      // fall back to other strategy.
     }
 
-    // 5. Create Input and Output tensor buffers
+    // 4. Create Input and Output tensor buffers
     std::map<std::string, mace::MaceTensor> inputs;
     std::map<std::string, mace::MaceTensor> outputs;
     for (size_t i = 0; i < input_count; ++i) {
@@ -376,7 +389,7 @@ Please refer to \ ``mace/examples/example.cc``\ for full usage. The following li
       outputs[output_names[i]] = mace::MaceTensor(output_shapes[i], buffer_out);
     }
 
-    // 6. Run the model
+    // 5. Run the model
     MaceStatus status = engine.Run(inputs, &outputs);
 
 More details are in :doc:`advanced_usage`.