diff --git a/docs/_tutorials/bert-pretraining.md b/docs/_tutorials/bert-pretraining.md
index 659c98be7e7feb4c5796d64e8e09efe767675f75..c1f607f75f8f0a7ddf6f333de1fad606a49f6200 100755
--- a/docs/_tutorials/bert-pretraining.md
+++ b/docs/_tutorials/bert-pretraining.md
@@ -373,7 +373,9 @@ for more details in
 
 ## DeepSpeed Single GPU Throughput Results
 
-![DeepSpeed Single GPU Bert Training Throughput](/assets/images/single-gpu-throughput.png){: .align-center}
+![DeepSpeed Single GPU Bert Training Throughput 128](/assets/images/deepspeed-throughput-seq128.png){: .align-center}
+
+![DeepSpeed Single GPU Bert Training Throughput 512](/assets/images/deepspeed-throughput-seq512.png){: .align-center}
 
 Compared to SOTA, DeepSpeed significantly improves single GPU performance for transformer-based model like BERT. Figure above shows the single GPU throughput of training BertBERT-Large optimized through DeepSpeed, compared with two well-known Pytorch implementations, NVIDIA BERT and HuggingFace BERT. DeepSpeed reaches as high as 64 and 53 teraflops throughputs (corresponding to 272 and 52 samples/second) for sequence lengths of 128 and 512, respectively, exhibiting up to 28% throughput improvements over NVIDIA BERT and up to 62% over HuggingFace BERT.  We also support up to 1.8x larger batch size without running out of memory.