From 7c53d72719c6f964e81363fb502f757501c99446 Mon Sep 17 00:00:00 2001 From: Yibing Liu Date: Tue, 30 Jan 2018 22:12:57 -0800 Subject: [PATCH] Refine the design doc for ctc_beam_search_decoder --- doc/design/speech/README.MD | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/doc/design/speech/README.MD b/doc/design/speech/README.MD index cc03aac7b4..4509d6453d 100644 --- a/doc/design/speech/README.MD +++ b/doc/design/speech/README.MD @@ -142,13 +142,15 @@ TODO by Assignees

-Figure 2. Algorithm for Beam Search Decoder. +Figure 2. Algorithm for CTC Beam Search Decoder.
-- The **Beam Search Decoder** for DS2 CTC-trained network follows the similar approach in \[[3](#references)\] with a modification for the ambiguous part, as shown in Figure 2. -- An **external defined scorer** would be passed into the decoder to evaluate a candidate prefix during decoding whenever a space character appended. -- Such scorer is a unified class, may consisting of language model, word count or any customed evaluators. -- The **language model** is built from Task 5, with a parameter should be carefully tuned to achieve minimum WER/CER (c.f. Task 7) +- The **Beam Search Decoder** for DS2 CTC-trained network follows the similar approach in \[[3](#references)\] as shown in Figure 2, with two important modifications for the ambiguous parts: + - 1) in the iterative computation of probabilities, the assignment operation is changed to accumulation for one prefix may comes from different paths; + - 2) the if condition ```if l^+ not in A_prev then``` after probabilities' computation is deprecated for it is hard to understand and seems unnecessary. +- An **external scorer** would be passed into the decoder to evaluate a candidate prefix during decoding whenever a white space appended in English decoding and any character appended in Mandarin decoding. +- Such external scorer consists of language model, word count or any other customed scorers. +- The **language model** is built from Task 5, with parameters should be carefully tuned to achieve minimum WER/CER (c.f. Task 7) - This decoder needs to perform with **high efficiency** for the convenience of parameters tuning and speech recognition in reality. -- GitLab