diff --git a/doc/design/speech/README.MD b/doc/design/speech/README.MD
index cc03aac7b4970a66a5c8c7591aecf6349ae92f8a..4509d6453d6774eba7268004d2fcca041eeaa7fe 100644
--- a/doc/design/speech/README.MD
+++ b/doc/design/speech/README.MD
@@ -142,13 +142,15 @@ TODO by Assignees
 
 <div align="center">
 <img src="image/beam_search.png" width=400><br/>
-Figure 2. Algorithm for Beam Search Decoder.
+Figure 2. Algorithm for CTC Beam Search Decoder.
 </div>
 
-- The **Beam Search Decoder** for DS2 CTC-trained network follows the similar approach in \[[3](#references)\] with a modification for the ambiguous part, as shown in Figure 2.
-- An **external defined scorer** would be passed into the decoder to evaluate a candidate prefix during decoding whenever a space character appended.
-- Such scorer is a unified class, may consisting of language model, word count or any customed evaluators.
-- The **language model** is built from Task 5, with a parameter should be carefully tuned to achieve minimum WER/CER (c.f. Task 7)
+- The **Beam Search Decoder** for DS2 CTC-trained network follows the similar approach in \[[3](#references)\] as shown in Figure 2, with two important modifications for the ambiguous parts: 
+   - 1) in the iterative computation of probabilities, the assignment operation is changed to accumulation for one prefix may comes from different paths; 
+   - 2) the if condition ```if l^+ not in A_prev then``` after probabilities' computation is deprecated for it is hard to understand and seems unnecessary.
+- An **external scorer** would be passed into the decoder to evaluate a candidate prefix during decoding whenever a white space appended in English decoding and any character appended in Mandarin decoding.
+- Such external scorer consists of language model, word count or any other customed scorers.
+- The **language model** is built from Task 5, with parameters should be carefully tuned to achieve minimum WER/CER (c.f. Task 7)
 - This decoder needs to perform with **high efficiency** for the convenience of parameters tuning and speech recognition in reality.