diff --git a/tutorials/notebook/nlp_application.ipynb b/tutorials/notebook/nlp_application.ipynb
new file mode 100644
index 0000000000000000000000000000000000000000..d09dd09ff4d4a43004c58bac40d2697ac22b642a
--- /dev/null
+++ b/tutorials/notebook/nlp_application.ipynb
@@ -0,0 +1,5146 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 自然语言处理应用"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 概述\n",
+    "\n",
+    "情感分类是自然语言处理中文本分类问题的子集,属于自然语言处理最基础的应用。它是对带有感情色彩的主观性文本进行分析和推理的过程,即分析说话人的态度,是倾向正面还是反面。\n",
+    "\n",
+    "> 通常情况下,我们会把情感类别分为正面、反面和中性三类。虽然“面无表情”的评论也有不少;不过,大部分时候会只采用正面和反面的案例进行训练,下面这个数据集就是很好的例子。\n",
+    "\n",
+    "传统的文本主题分类问题的典型参考数据集为[20 Newsgroups](http://qwone.com/~jason/20Newsgroups/),该数据集由20组新闻数据组成,包含约20000个新闻文档。\n",
+    "其主题列表中有些类别的数据比较相似,例如comp.sys.ibm.pc.hardware和comp.sys.mac.hardware都是和电脑系统硬件相关的题目,相似度比较高。而有些主题类别的数据相对来说就毫无关联,例如misc.forsale和soc.religion.christian。\n",
+    "\n",
+    "就网络本身而言,文本主题分类的网络结构和情感分类的网络结构大致相似。在掌握了情感分类网络如何构造之后,很容易可以构造一个类似的网络,稍作调参即可用于文本主题分类任务。\n",
+    "\n",
+    "但在业务上下文侧,文本主题分类是分析文本讨论的客观内容,而情感分类是要从文本中得到它是否支持某种观点的信息。比如,“《阿甘正传》真是好看极了,影片主题明确,节奏流畅。”这句话,在文本主题分类是要将其归为类别为“电影”主题,而情感分类则要挖掘出这一影评的态度是正面还是负面。\n",
+    "\n",
+    "相对于传统的文本主题分类,情感分类较为简单,实用性也较强。常见的购物网站、电影网站都可以采集到相对高质量的数据集,也很容易给业务领域带来收益。例如,可以结合领域上下文,自动分析特定类型客户对当前产品的意见,可以分主题分用户类型对情感进行分析,以作针对性的处理,甚至基于此进一步推荐产品,提高转化率,带来更高的商业收益。\n",
+    "\n",
+    "特殊领域中,某些非极性词也充分表达了用户的情感倾向,比如下载使用APP时,“卡死了”、“下载太慢了”就表达了用户的负面情感倾向;股票领域中,“看涨”、“牛市”表达的就是用户的正面情感倾向。所以,本质上,我们希望模型能够在垂直领域中,挖掘出一些特殊的表达,作为极性词给情感分类系统使用:\n",
+    "\n",
+    "$垂直极性词 = 通用极性词 + 领域特有极性词$\n",
+    "\n",
+    "按照处理文本的粒度不同,情感分析可分为词语级、短语级、句子级、段落级以及篇章级等几个研究层次。这里以“段落级”为例,输入为一个段落,输出为影评是正面还是负面的信息。\n",
+    "\n",
+    "接下来,以IMDB影评情感分类为例来体验MindSpore在自然语言处理上的应用。"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 整体流程\n",
+    "\n",
+    "1. 准备环节。\n",
+    "2. 加载数据集,进行数据处理。\n",
+    "3. 定义网络。\n",
+    "4. 定义优化器和损失函数。\n",
+    "5. 使用网络训练数据,生成模型。\n",
+    "6. 得到模型之后,使用验证数据集,查看模型精度情况。"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 准备环节\n",
+    "\n",
+    "### 下载数据集\n",
+    "\n",
+    "本次体验采用IMDB影评数据集作为实验数据。\n",
+    "\n",
+    "1. 下载IMDB影评数据集,数据集下载地址:<http://ai.stanford.edu/~amaas/data/sentiment/>。\n",
+    "\n",
+    "    以下是负面影评(Negative)和正面影评(Positive)的案例。\n",
+    "\n",
+    "| Review  | Label  | \n",
+    "|:---|:---:|\n",
+    "| \"Quitting\" may be as much about exiting a pre-ordained identity as about drug withdrawal. As a rural guy coming to Beijing, class and success must have struck this young artist face on as an appeal to separate from his roots and far surpass his peasant parents' acting success. Troubles arise, however, when the new man is too new, when it demands too big a departure from family, history, nature, and personal identity. The ensuing splits, and confusion between the imaginary and the real and the dissonance between the ordinary and the heroic are the stuff of a gut check on the one hand or a complete escape from self on the other.  |  Negative |  \n",
+    "| This movie is amazing because the fact that the real people portray themselves and their real life experience and do such a good job it's like they're almost living the past over again. Jia Hongsheng plays himself an actor who quit everything except music and drugs struggling with depression and searching for the meaning of life while being angry at everyone especially the people who care for him most.  | Positive  |\n",
+    "    \n",
+    "    将下载好的数据集解压并放在当前工作目录下。\n",
+    "\n",
+    "\n",
+    "2. 下载GloVe文件\n",
+    "    下载并解压GloVe文件到当前工作目录下,修改解压后的目录名为`glove`,并在所有Glove文件开头处添加如下所示新的一行,意思是总共读取400000个单词,每个单词用300纬度的词向量表示。\n",
+    "\n",
+    "    ```\n",
+    "    400000 300\n",
+    "    ```\n",
+    "\n",
+    "    GloVe文件下载地址:<http://nlp.stanford.edu/data/glove.6B.zip>\n",
+    "\n",
+    "\n",
+    "3. 在当前工作目录创建名为`preprocess`的空目录,该目录将用于存储在数据集预处理操作中IMDB数据集转换为MindRecord格式后的文件。\n",
+    "\n",
+    "    此时当前工作目录结构如下所示。\n",
+    "    \n",
+    "    ```shell\n",
+    "    $ tree -L 2 lstm\n",
+    "    lstm\n",
+    "    ├── aclImdb\n",
+    "    │   ├── imdbEr.txt\n",
+    "    │   ├── imdb.vocab\n",
+    "    │   ├── README\n",
+    "    │   ├── test\n",
+    "    │   └── train\n",
+    "    ├── glove\n",
+    "    │   ├── glove.6B.100d.txt\n",
+    "    │   ├── glove.6B.200d.txt\n",
+    "    │   ├── glove.6B.300d.txt\n",
+    "    │   └── glove.6B.50d.txt\n",
+    "    └── preprocess\n",
+    "    ```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 确定评价标准\n",
+    "\n",
+    "作为典型的分类问题,情感分类的评价标准可以比照普通的分类问题处理。常见的精度(Accuracy)、精准度(Precision)、召回率(Recall)和F_beta分数都可以作为参考。\n",
+    "\n",
+    "$精度(Accuracy)= 分类正确的样本数目 / 总样本数目$\n",
+    "\n",
+    "$精准度(Precision)= 真阳性样本数目 / 所有预测类别为阳性的样本数目$\n",
+    "\n",
+    "$召回率(Recall)= 真阳性样本数目 / 所有真实类别为阳性的样本数目$ \n",
+    "\n",
+    "$F1分数 = (2 * Precision * Recall) / (Precision + Recall)$\n",
+    "\n",
+    "在IMDB这个数据集中,正负样本数差别不大,可以简单地用精度(accuracy)作为分类器的衡量标准。\n",
+    "\n",
+    "### 确定网络\n",
+    "\n",
+    "我们使用基于LSTM构建的SentimentNet网络进行自然语言处理。\n",
+    "\n",
+    "> LSTM(Long short-term memory,长短期记忆)网络是一种时间循环神经网络,适合于处理和预测时间序列中间隔和延迟非常长的重要事件。\n",
+    "> 本次体验面向GPU或CPU硬件平台。"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 配置运行信息"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "1. 使用`parser`模块传入运行必要的信息。\n",
+    "    \n",
+    "    - `preprocess`:是否预处理数据集,默认为否。\n",
+    "    - `aclimdb_path`:数据集存放路径。\n",
+    "    - `glove_path`:GloVe文件存放路径。\n",
+    "    - `preprocess_path`:预处理数据集的结果文件夹。\n",
+    "    - `ckpt_path`:CheckPoint文件路径。\n",
+    "    - `pre_trained`:预加载CheckPoint文件。\n",
+    "    - `device_target`:指定GPU或CPU环境。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "import argparse\n",
+    "\n",
+    "\n",
+    "parser = argparse.ArgumentParser(description='MindSpore LSTM Example')\n",
+    "parser.add_argument('--preprocess', type=str, default='false', choices=['true', 'false'],\n",
+    "                    help='whether to preprocess data.')\n",
+    "parser.add_argument('--aclimdb_path', type=str, default=\"./aclImdb\",\n",
+    "                    help='path where the dataset is stored.')\n",
+    "parser.add_argument('--glove_path', type=str, default=\"./glove\",\n",
+    "                    help='path where the GloVe is stored.')\n",
+    "parser.add_argument('--preprocess_path', type=str, default=\"./preprocess\",\n",
+    "                    help='path where the pre-process data is stored.')\n",
+    "parser.add_argument('--ckpt_path', type=str, default=\"./\",\n",
+    "                    help='the path to save the checkpoint file.')\n",
+    "parser.add_argument('--pre_trained', type=str, default=None,\n",
+    "                    help='the pretrained checkpoint file path.')\n",
+    "parser.add_argument('--device_target', type=str, default=\"GPU\", choices=['GPU', 'CPU'],\n",
+    "                    help='the target device to run, support \"GPU\", \"CPU\". Default: \"GPU\".')\n",
+    "args = parser.parse_args(['--device_target', 'GPU', '--preprocess', 'true'])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "2. 进行训练前,需要配置必要的信息,包括环境信息、执行的模式、后端信息及硬件信息。 \n",
+    "    \n",
+    "> 详细的接口配置信息,请参见MindSpore官网`context.set_context`API接口说明。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from mindspore import context\n",
+    "\n",
+    "\n",
+    "context.set_context(\n",
+    "        mode=context.GRAPH_MODE,\n",
+    "        save_graphs=False,\n",
+    "        device_target=args.device_target)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 配置SentimentNet网络参数\n",
+    "\n",
+    "在以下一段代码中配置基于LSTM构建的SentimentNet网络所需相关参数。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from easydict import EasyDict as edict\n",
+    "\n",
+    "\n",
+    "# LSTM CONFIG\n",
+    "lstm_cfg = edict({\n",
+    "    'num_classes': 2,\n",
+    "    'learning_rate': 0.1,\n",
+    "    'momentum': 0.9,\n",
+    "    'num_epochs': 10,\n",
+    "    'batch_size': 64,\n",
+    "    'embed_size': 300,\n",
+    "    'num_hiddens': 100,\n",
+    "    'num_layers': 2,\n",
+    "    'bidirectional': True,\n",
+    "    'save_checkpoint_steps': 390,\n",
+    "    'keep_checkpoint_max': 10\n",
+    "})\n",
+    "\n",
+    "cfg = lstm_cfg"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 数据处理\n",
+    "\n",
+    "## 预处理数据集"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "1. 定义`ImdbParser`类解析文本数据集,包括编码、分词、对齐、处理GloVe原始数据,使之能够适应网络结构。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "from itertools import chain\n",
+    "import numpy as np\n",
+    "import gensim\n",
+    "\n",
+    "\n",
+    "class ImdbParser():\n",
+    "    \"\"\"\n",
+    "    parse aclImdb data to features and labels.\n",
+    "    sentence->tokenized->encoded->padding->features\n",
+    "    \"\"\"\n",
+    "\n",
+    "    def __init__(self, imdb_path, glove_path, embed_size=300):\n",
+    "        self.__segs = ['train', 'test']\n",
+    "        self.__label_dic = {'pos': 1, 'neg': 0}\n",
+    "        self.__imdb_path = imdb_path\n",
+    "        self.__glove_dim = embed_size\n",
+    "        self.__glove_file = os.path.join(glove_path, 'glove.6B.' + str(self.__glove_dim) + 'd.txt')\n",
+    "\n",
+    "        # properties\n",
+    "        self.__imdb_datas = {}\n",
+    "        self.__features = {}\n",
+    "        self.__labels = {}\n",
+    "        self.__vacab = {}\n",
+    "        self.__word2idx = {}\n",
+    "        self.__weight_np = {}\n",
+    "        self.__wvmodel = None\n",
+    "\n",
+    "    def parse(self):\n",
+    "        \"\"\"\n",
+    "        parse imdb data to memory\n",
+    "        \"\"\"\n",
+    "        self.__wvmodel = gensim.models.KeyedVectors.load_word2vec_format(self.__glove_file)\n",
+    "\n",
+    "        for seg in self.__segs:\n",
+    "            self.__parse_imdb_datas(seg)\n",
+    "            self.__parse_features_and_labels(seg)\n",
+    "            self.__gen_weight_np(seg)\n",
+    "\n",
+    "    def __parse_imdb_datas(self, seg):\n",
+    "        \"\"\"\n",
+    "        load data from txt\n",
+    "        \"\"\"\n",
+    "        data_lists = []\n",
+    "        for label_name, label_id in self.__label_dic.items():\n",
+    "            sentence_dir = os.path.join(self.__imdb_path, seg, label_name)\n",
+    "            for file in os.listdir(sentence_dir):\n",
+    "                with open(os.path.join(sentence_dir, file), mode='r', encoding='utf8') as f:\n",
+    "                    sentence = f.read().replace('\\n', '')\n",
+    "                    data_lists.append([sentence, label_id])\n",
+    "        self.__imdb_datas[seg] = data_lists\n",
+    "\n",
+    "    def __parse_features_and_labels(self, seg):\n",
+    "        \"\"\"\n",
+    "        parse features and labels\n",
+    "        \"\"\"\n",
+    "        features = []\n",
+    "        labels = []\n",
+    "        for sentence, label in self.__imdb_datas[seg]:\n",
+    "            features.append(sentence)\n",
+    "            labels.append(label)\n",
+    "\n",
+    "        self.__features[seg] = features\n",
+    "        self.__labels[seg] = labels\n",
+    "\n",
+    "        # update feature to tokenized\n",
+    "        self.__updata_features_to_tokenized(seg)\n",
+    "        # parse vacab\n",
+    "        self.__parse_vacab(seg)\n",
+    "        # encode feature\n",
+    "        self.__encode_features(seg)\n",
+    "        # padding feature\n",
+    "        self.__padding_features(seg)\n",
+    "\n",
+    "    def __updata_features_to_tokenized(self, seg):\n",
+    "        tokenized_features = []\n",
+    "        for sentence in self.__features[seg]:\n",
+    "            tokenized_sentence = [word.lower() for word in sentence.split(\" \")]\n",
+    "            tokenized_features.append(tokenized_sentence)\n",
+    "        self.__features[seg] = tokenized_features\n",
+    "\n",
+    "    def __parse_vacab(self, seg):\n",
+    "        # vocab\n",
+    "        tokenized_features = self.__features[seg]\n",
+    "        vocab = set(chain(*tokenized_features))\n",
+    "        self.__vacab[seg] = vocab\n",
+    "\n",
+    "        # word_to_idx: {'hello': 1, 'world':111, ... '<unk>': 0}\n",
+    "        word_to_idx = {word: i + 1 for i, word in enumerate(vocab)}\n",
+    "        word_to_idx['<unk>'] = 0\n",
+    "        self.__word2idx[seg] = word_to_idx\n",
+    "\n",
+    "    def __encode_features(self, seg):\n",
+    "        \"\"\" encode word to index \"\"\"\n",
+    "        word_to_idx = self.__word2idx['train']\n",
+    "        encoded_features = []\n",
+    "        for tokenized_sentence in self.__features[seg]:\n",
+    "            encoded_sentence = []\n",
+    "            for word in tokenized_sentence:\n",
+    "                encoded_sentence.append(word_to_idx.get(word, 0))\n",
+    "            encoded_features.append(encoded_sentence)\n",
+    "        self.__features[seg] = encoded_features\n",
+    "\n",
+    "    def __padding_features(self, seg, maxlen=500, pad=0):\n",
+    "        \"\"\" pad all features to the same length \"\"\"\n",
+    "        padded_features = []\n",
+    "        for feature in self.__features[seg]:\n",
+    "            if len(feature) >= maxlen:\n",
+    "                padded_feature = feature[:maxlen]\n",
+    "            else:\n",
+    "                padded_feature = feature\n",
+    "                while len(padded_feature) < maxlen:\n",
+    "                    padded_feature.append(pad)\n",
+    "            padded_features.append(padded_feature)\n",
+    "        self.__features[seg] = padded_features\n",
+    "\n",
+    "    def __gen_weight_np(self, seg):\n",
+    "        \"\"\"\n",
+    "        generate weight by gensim\n",
+    "        \"\"\"\n",
+    "        weight_np = np.zeros((len(self.__word2idx[seg]), self.__glove_dim), dtype=np.float32)\n",
+    "        for word, idx in self.__word2idx[seg].items():\n",
+    "            if word not in self.__wvmodel:\n",
+    "                continue\n",
+    "            word_vector = self.__wvmodel.get_vector(word)\n",
+    "            weight_np[idx, :] = word_vector\n",
+    "\n",
+    "        self.__weight_np[seg] = weight_np\n",
+    "\n",
+    "    def get_datas(self, seg):\n",
+    "        \"\"\"\n",
+    "        return features, labels, and weight\n",
+    "        \"\"\"\n",
+    "        features = np.array(self.__features[seg]).astype(np.int32)\n",
+    "        labels = np.array(self.__labels[seg]).astype(np.int32)\n",
+    "        weight = np.array(self.__weight_np[seg])\n",
+    "        return features, labels, weight"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "2. 定义`convert_to_mindrecord`函数将数据集格式转换为MindRecord格式,便于MindSpore读取。\n",
+    "\n",
+    "    函数`_convert_to_mindrecord`中`weight.txt`为数据预处理后自动生成的weight参数信息文件。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "import numpy as np\n",
+    "from mindspore.mindrecord import FileWriter\n",
+    "\n",
+    "\n",
+    "def _convert_to_mindrecord(data_home, features, labels, weight_np=None, training=True):\n",
+    "    \"\"\"\n",
+    "    convert imdb dataset to mindrecoed dataset\n",
+    "    \"\"\"\n",
+    "    if weight_np is not None:\n",
+    "        np.savetxt(os.path.join(data_home, 'weight.txt'), weight_np)\n",
+    "\n",
+    "    # write mindrecord\n",
+    "    schema_json = {\"id\": {\"type\": \"int32\"},\n",
+    "                   \"label\": {\"type\": \"int32\"},\n",
+    "                   \"feature\": {\"type\": \"int32\", \"shape\": [-1]}}\n",
+    "\n",
+    "    data_dir = os.path.join(data_home, \"aclImdb_train.mindrecord\")\n",
+    "    if not training:\n",
+    "        data_dir = os.path.join(data_home, \"aclImdb_test.mindrecord\")\n",
+    "\n",
+    "    def get_imdb_data(features, labels):\n",
+    "        data_list = []\n",
+    "        for i, (label, feature) in enumerate(zip(labels, features)):\n",
+    "            data_json = {\"id\": i,\n",
+    "                         \"label\": int(label),\n",
+    "                         \"feature\": feature.reshape(-1)}\n",
+    "            data_list.append(data_json)\n",
+    "        return data_list\n",
+    "\n",
+    "    writer = FileWriter(data_dir, shard_num=4)\n",
+    "    data = get_imdb_data(features, labels)\n",
+    "    writer.add_schema(schema_json, \"nlp_schema\")\n",
+    "    writer.add_index([\"id\", \"label\"])\n",
+    "    writer.write_raw_data(data)\n",
+    "    writer.commit()\n",
+    "\n",
+    "\n",
+    "def convert_to_mindrecord(embed_size, aclimdb_path, preprocess_path, glove_path):\n",
+    "    \"\"\"\n",
+    "    convert imdb dataset to mindrecoed dataset\n",
+    "    \"\"\"\n",
+    "    parser = ImdbParser(aclimdb_path, glove_path, embed_size)\n",
+    "    parser.parse()\n",
+    "\n",
+    "    if not os.path.exists(preprocess_path):\n",
+    "        print(f\"preprocess path {preprocess_path} is not exist\")\n",
+    "        os.makedirs(preprocess_path)\n",
+    "\n",
+    "    train_features, train_labels, train_weight_np = parser.get_datas('train')\n",
+    "    _convert_to_mindrecord(preprocess_path, train_features, train_labels, train_weight_np)\n",
+    "\n",
+    "    test_features, test_labels, _ = parser.get_datas('test')\n",
+    "    _convert_to_mindrecord(preprocess_path, test_features, test_labels, training=False)\n",
+    "    "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "3. 调用`convert_to_mindrecord`函数执行数据集预处理,此处用时约3分钟。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "============== Starting Data Pre-processing ==============\n",
+      "======================= Successful =======================\n"
+     ]
+    }
+   ],
+   "source": [
+    "if args.preprocess == \"true\":\n",
+    "    print(\"============== Starting Data Pre-processing ==============\")\n",
+    "    convert_to_mindrecord(cfg.embed_size, args.aclimdb_path, args.preprocess_path, args.glove_path)\n",
+    "    print(\"======================= Successful =======================\")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "&nbsp;&nbsp;&nbsp;&nbsp;转换成功后会在`preprocess`目录下生成MindRecord文件,通常该操作在数据集不变的情况下,无需每次训练都执行,此时`preprocess`文件目录如下所示:\n",
+    "\n",
+    "```shell\n",
+    " $ tree preprocess\n",
+    " ├── aclImdb_test.mindrecord0\n",
+    " ├── aclImdb_test.mindrecord0.db\n",
+    " ├── aclImdb_test.mindrecord1\n",
+    " ├── aclImdb_test.mindrecord1.db\n",
+    " ├── aclImdb_test.mindrecord2\n",
+    " ├── aclImdb_test.mindrecord2.db\n",
+    " ├── aclImdb_test.mindrecord3\n",
+    " ├── aclImdb_test.mindrecord3.db\n",
+    " ├── aclImdb_train.mindrecord0\n",
+    " ├── aclImdb_train.mindrecord0.db\n",
+    " ├── aclImdb_train.mindrecord1\n",
+    " ├── aclImdb_train.mindrecord1.db\n",
+    " ├── aclImdb_train.mindrecord2\n",
+    " ├── aclImdb_train.mindrecord2.db\n",
+    " ├── aclImdb_train.mindrecord3\n",
+    " ├── aclImdb_train.mindrecord3.db\n",
+    " └── weight.txt\n",
+    "```\n",
+    "\n",
+    "- 以上各文件中:\n",
+    "    - 名称包含`aclImdb_train.mindrecord`的为转换后的MindRecord格式的训练数据集。\n",
+    "    - 名称包含`aclImdb_test.mindrecord`的为转换后的MindRecord格式的测试数据集。\n",
+    "    - `weight.txt`为预处理后自动生成的weight参数信息文件。\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "4. 定义创建数据集函数`lstm_create_dataset`,创建训练集`ds_train`。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "import mindspore.dataset as ds\n",
+    "\n",
+    "\n",
+    "def lstm_create_dataset(data_home, batch_size, repeat_num=1, training=True):\n",
+    "    \"\"\"Data operations.\"\"\"\n",
+    "    ds.config.set_seed(1)\n",
+    "    data_dir = os.path.join(data_home, \"aclImdb_train.mindrecord0\")\n",
+    "    if not training:\n",
+    "        data_dir = os.path.join(data_home, \"aclImdb_test.mindrecord0\")\n",
+    "\n",
+    "    data_set = ds.MindDataset(data_dir, columns_list=[\"feature\", \"label\"], num_parallel_workers=4)\n",
+    "\n",
+    "    # apply map operations on images\n",
+    "    data_set = data_set.shuffle(buffer_size=data_set.get_dataset_size())\n",
+    "    data_set = data_set.batch(batch_size=batch_size, drop_remainder=True)\n",
+    "    data_set = data_set.repeat(count=repeat_num)\n",
+    "\n",
+    "    return data_set\n",
+    "\n",
+    "ds_train = lstm_create_dataset(args.preprocess_path, cfg.batch_size, cfg.num_epochs)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "5. 通过`create_dict_iterator`方法创建字典迭代器,读取已创建的数据集`ds_train`中的数据。\n",
+    "\n",
+    "    运行以下一段代码,读取第1个`batch`中的`label`数据列表,和第1个`batch`中第1个元素的`feature`数据。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "The first batch contains label below:\n",
+      "[0 0 1 1 1 0 0 0 0 0 0 1 0 0 1 0 1 1 0 0 0 0 1 1 1 0 1 1 1 0 0 1 0 0 1 0 1\n",
+      " 0 0 0 0 1 0 0 1 1 1 0 0 0 1 1 1 1 0 1 0 0 1 1 0 1 1 0]\n",
+      "\n",
+      "The feature of the first item in the first batch is below vector:\n",
+      "[210974 227370 167874 221440 205821 250308  57410 167874 157597 211314\n",
+      " 104140 154424 238018 167874 216357  23869 209921 187724 131973 144940\n",
+      " 177558 221440 205821 119691 149127 137330 212709 117415  61509  42345\n",
+      " 166849 155531 219231  64473 210974 103293 225985 181047  41304 210974\n",
+      " 132905  33755  96216   8987 210974 195260 117816  15665 241057   8987\n",
+      "  93501 155531 118935 110275 101659 181047 226216 133895 114115   6596\n",
+      " 189694 210974  56753   3426  29344 103100 131973  46391  25351  35080\n",
+      "  27231  69404 190304 212709 117415 157277 167874 210974 109102  92239\n",
+      " 101085 123273  64473 117415 176947  27231 168206 219146 167874 210974\n",
+      " 227370  18539 155531 219231  64473 210974 155781  93577 192315 157597\n",
+      " 213189  66091 216583 100381 158491 181047  15368 221440  15353 137110\n",
+      " 190640 197076 150070 117415 216583  60701 227370 238232 117593 210974\n",
+      " 141131 167874  59907 238018 184247 156120  68959 117415 149618 167874\n",
+      " 110810 216614  80083  10164 238272 245070 213136 205012  53662 199932\n",
+      " 208769 153768 225055  86111 156120  55862 199932  25351 243753 156120\n",
+      " 232746 239534 210974   1793 245184 210005 232746   8987  25351 239106\n",
+      "  23869  89702 232746 175897 221440 202670 181047 117415 169623 206889\n",
+      " 236706 155531   8803 208769  36551      0      0      0      0      0\n",
+      "      0      0      0      0      0      0      0      0      0      0\n",
+      "      0      0      0      0      0      0      0      0      0      0\n",
+      "      0      0      0      0      0      0      0      0      0      0\n",
+      "      0      0      0      0      0      0      0      0      0      0\n",
+      "      0      0      0      0      0      0      0      0      0      0\n",
+      "      0      0      0      0      0      0      0      0      0      0\n",
+      "      0      0      0      0      0      0      0      0      0      0\n",
+      "      0      0      0      0      0      0      0      0      0      0\n",
+      "      0      0      0      0      0      0      0      0      0      0\n",
+      "      0      0      0      0      0      0      0      0      0      0\n",
+      "      0      0      0      0      0      0      0      0      0      0\n",
+      "      0      0      0      0      0      0      0      0      0      0\n",
+      "      0      0      0      0      0      0      0      0      0      0\n",
+      "      0      0      0      0      0      0      0      0      0      0\n",
+      "      0      0      0      0      0      0      0      0      0      0\n",
+      "      0      0      0      0      0      0      0      0      0      0\n",
+      "      0      0      0      0      0      0      0      0      0      0\n",
+      "      0      0      0      0      0      0      0      0      0      0\n",
+      "      0      0      0      0      0      0      0      0      0      0\n",
+      "      0      0      0      0      0      0      0      0      0      0\n",
+      "      0      0      0      0      0      0      0      0      0      0\n",
+      "      0      0      0      0      0      0      0      0      0      0\n",
+      "      0      0      0      0      0      0      0      0      0      0\n",
+      "      0      0      0      0      0      0      0      0      0      0\n",
+      "      0      0      0      0      0      0      0      0      0      0\n",
+      "      0      0      0      0      0      0      0      0      0      0\n",
+      "      0      0      0      0      0      0      0      0      0      0\n",
+      "      0      0      0      0      0      0      0      0      0      0\n",
+      "      0      0      0      0      0      0      0      0      0      0\n",
+      "      0      0      0      0      0      0      0      0      0      0\n",
+      "      0      0      0      0      0      0      0      0      0      0\n",
+      "      0      0      0      0      0      0      0      0      0      0]\n"
+     ]
+    }
+   ],
+   "source": [
+    "iterator = ds_train.create_dict_iterator().get_next()\n",
+    "first_batch_label = iterator[\"label\"]\n",
+    "first_batch_first_feature = iterator[\"feature\"][0]\n",
+    "print(f\"The first batch contains label below:\\n{first_batch_label}\\n\")\n",
+    "print(f\"The feature of the first item in the first batch is below vector:\\n{first_batch_first_feature}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 定义网络\n",
+    "\n",
+    "1. 导入初始化网络所需模块。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "from mindspore import Tensor, nn, context\n",
+    "from mindspore.ops import operations as P\n",
+    "from mindspore.train.serialization import load_param_into_net, load_checkpoint"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "2. 定义`lstm_default_state`函数来初始化网络参数及网络状态。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Initialize short-term memory (h) and long-term memory (c) to 0\n",
+    "def lstm_default_state(batch_size, hidden_size, num_layers, bidirectional):\n",
+    "    \"\"\"init default input.\"\"\"\n",
+    "    num_directions = 1\n",
+    "    if bidirectional:\n",
+    "        num_directions = 2\n",
+    "\n",
+    "    if context.get_context(\"device_target\") == \"CPU\":\n",
+    "        h_list = []\n",
+    "        c_list = []\n",
+    "        i = 0\n",
+    "        while i < num_layers:\n",
+    "            hi = Tensor(np.zeros((num_directions, batch_size, hidden_size)).astype(np.float32))\n",
+    "            h_list.append(hi)\n",
+    "            ci = Tensor(np.zeros((num_directions, batch_size, hidden_size)).astype(np.float32))\n",
+    "            c_list.append(ci)\n",
+    "            i = i + 1\n",
+    "        h = tuple(h_list)\n",
+    "        c = tuple(c_list)\n",
+    "        return h, c\n",
+    "\n",
+    "    h = Tensor(\n",
+    "        np.zeros((num_layers * num_directions, batch_size, hidden_size)).astype(np.float32))\n",
+    "    c = Tensor(\n",
+    "        np.zeros((num_layers * num_directions, batch_size, hidden_size)).astype(np.float32))\n",
+    "    return h, c"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "3. 使用`Cell`方法,定义网络结构(`SentimentNet`网络)。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class SentimentNet(nn.Cell):\n",
+    "    \"\"\"Sentiment network structure.\"\"\"\n",
+    "\n",
+    "    def __init__(self,\n",
+    "                 vocab_size,\n",
+    "                 embed_size,\n",
+    "                 num_hiddens,\n",
+    "                 num_layers,\n",
+    "                 bidirectional,\n",
+    "                 num_classes,\n",
+    "                 weight,\n",
+    "                 batch_size):\n",
+    "        super(SentimentNet, self).__init__()\n",
+    "        # Mapp words to vectors\n",
+    "        self.embedding = nn.Embedding(vocab_size,\n",
+    "                                      embed_size,\n",
+    "                                      embedding_table=weight)\n",
+    "        self.embedding.embedding_table.requires_grad = False\n",
+    "        self.trans = P.Transpose()\n",
+    "        self.perm = (1, 0, 2)\n",
+    "        self.encoder = nn.LSTM(input_size=embed_size,\n",
+    "                               hidden_size=num_hiddens,\n",
+    "                               num_layers=num_layers,\n",
+    "                               has_bias=True,\n",
+    "                               bidirectional=bidirectional,\n",
+    "                               dropout=0.0)\n",
+    "\n",
+    "        self.h, self.c = lstm_default_state(batch_size, num_hiddens, num_layers, bidirectional)\n",
+    "\n",
+    "        self.concat = P.Concat(1)\n",
+    "        if bidirectional:\n",
+    "            self.decoder = nn.Dense(num_hiddens * 4, num_classes)\n",
+    "        else:\n",
+    "            self.decoder = nn.Dense(num_hiddens * 2, num_classes)\n",
+    "\n",
+    "    def construct(self, inputs):\n",
+    "        # input:(64,500,300)\n",
+    "        embeddings = self.embedding(inputs)\n",
+    "        embeddings = self.trans(embeddings, self.perm)\n",
+    "        output, _ = self.encoder(embeddings, (self.h, self.c))\n",
+    "        # states[i] size(64,200)  -> encoding.size(64,400)\n",
+    "        encoding = self.concat((output[0], output[499]))\n",
+    "        outputs = self.decoder(encoding)\n",
+    "        return outputs"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "4. 实例化`SentimentNet`,创建网络,此步骤用时约1分钟。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "embedding_table = np.loadtxt(os.path.join(args.preprocess_path, \"weight.txt\")).astype(np.float32)\n",
+    "network = SentimentNet(vocab_size=embedding_table.shape[0],\n",
+    "                       embed_size=cfg.embed_size,\n",
+    "                       num_hiddens=cfg.num_hiddens,\n",
+    "                       num_layers=cfg.num_layers,\n",
+    "                       bidirectional=cfg.bidirectional,\n",
+    "                       num_classes=cfg.num_classes,\n",
+    "                       weight=Tensor(embedding_table),\n",
+    "                       batch_size=cfg.batch_size)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 定义优化器及损失函数\n",
+    "\n",
+    "运行以下一段代码,创建优化器和损失函数模型。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from mindspore import nn\n",
+    "\n",
+    "\n",
+    "loss = nn.SoftmaxCrossEntropyWithLogits(is_grad=False, sparse=True)\n",
+    "opt = nn.Momentum(network.trainable_params(), cfg.learning_rate, cfg.momentum)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 训练并保存模型\n",
+    "\n",
+    "加载训练数据集(`ds_train`)并配置好`CheckPoint`生成信息,然后使用`model.train`接口,进行模型训练,此步骤用时约7分钟。根据输出可以看到loss值随着训练逐步降低,最后达到0.262左右。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "============== Starting Training ==============\n",
+      "Epoch: [  1/ 10], step: [    1/  390], loss: [0.6938], avg loss: [0.6938], time: [445.6811ms]\n",
+      "Epoch: [  1/ 10], step: [    2/  390], loss: [0.6922], avg loss: [0.6930], time: [106.1635ms]\n",
+      "Epoch: [  1/ 10], step: [    3/  390], loss: [0.6917], avg loss: [0.6926], time: [103.0388ms]\n",
+      "Epoch: [  1/ 10], step: [    4/  390], loss: [0.6952], avg loss: [0.6932], time: [102.2997ms]\n",
+      "Epoch: [  1/ 10], step: [    5/  390], loss: [0.6868], avg loss: [0.6920], time: [102.2105ms]\n",
+      "Epoch: [  1/ 10], step: [    6/  390], loss: [0.6982], avg loss: [0.6930], time: [67.6618ms]\n",
+      "Epoch: [  1/ 10], step: [    7/  390], loss: [0.6856], avg loss: [0.6919], time: [99.7233ms]\n",
+      "Epoch: [  1/ 10], step: [    8/  390], loss: [0.6819], avg loss: [0.6907], time: [102.4535ms]\n",
+      "Epoch: [  1/ 10], step: [    9/  390], loss: [0.7372], avg loss: [0.6959], time: [99.7229ms]\n",
+      "Epoch: [  1/ 10], step: [   10/  390], loss: [0.6948], avg loss: [0.6957], time: [101.9838ms]\n",
+      "Epoch: [  1/ 10], step: [   11/  390], loss: [0.6961], avg loss: [0.6958], time: [98.3083ms]\n",
+      "Epoch: [  1/ 10], step: [   12/  390], loss: [0.6975], avg loss: [0.6959], time: [102.0112ms]\n",
+      "Epoch: [  1/ 10], step: [   13/  390], loss: [0.6931], avg loss: [0.6957], time: [99.0953ms]\n",
+      "Epoch: [  1/ 10], step: [   14/  390], loss: [0.6903], avg loss: [0.6953], time: [103.9104ms]\n",
+      "Epoch: [  1/ 10], step: [   15/  390], loss: [0.6720], avg loss: [0.6938], time: [98.5680ms]\n",
+      "Epoch: [  1/ 10], step: [   16/  390], loss: [0.7079], avg loss: [0.6946], time: [104.0378ms]\n",
+      "Epoch: [  1/ 10], step: [   17/  390], loss: [0.7125], avg loss: [0.6957], time: [102.7691ms]\n",
+      "Epoch: [  1/ 10], step: [   18/  390], loss: [0.7477], avg loss: [0.6986], time: [101.6769ms]\n",
+      "Epoch: [  1/ 10], step: [   19/  390], loss: [0.6924], avg loss: [0.6983], time: [102.2615ms]\n",
+      "Epoch: [  1/ 10], step: [   20/  390], loss: [0.7085], avg loss: [0.6988], time: [103.7412ms]\n",
+      "Epoch: [  1/ 10], step: [   21/  390], loss: [0.6958], avg loss: [0.6986], time: [99.9415ms]\n",
+      "Epoch: [  1/ 10], step: [   22/  390], loss: [0.6918], avg loss: [0.6983], time: [104.0506ms]\n",
+      "Epoch: [  1/ 10], step: [   23/  390], loss: [0.6985], avg loss: [0.6983], time: [99.7727ms]\n",
+      "Epoch: [  1/ 10], step: [   24/  390], loss: [0.6919], avg loss: [0.6981], time: [103.3425ms]\n",
+      "Epoch: [  1/ 10], step: [   25/  390], loss: [0.6858], avg loss: [0.6976], time: [100.5120ms]\n",
+      "Epoch: [  1/ 10], step: [   26/  390], loss: [0.6796], avg loss: [0.6969], time: [103.8098ms]\n",
+      "Epoch: [  1/ 10], step: [   27/  390], loss: [0.7113], avg loss: [0.6974], time: [100.4076ms]\n",
+      "Epoch: [  1/ 10], step: [   28/  390], loss: [0.7065], avg loss: [0.6977], time: [105.4525ms]\n",
+      "Epoch: [  1/ 10], step: [   29/  390], loss: [0.6910], avg loss: [0.6975], time: [101.6884ms]\n",
+      "Epoch: [  1/ 10], step: [   30/  390], loss: [0.6896], avg loss: [0.6972], time: [104.4266ms]\n",
+      "Epoch: [  1/ 10], step: [   31/  390], loss: [0.6968], avg loss: [0.6972], time: [98.8655ms]\n",
+      "Epoch: [  1/ 10], step: [   32/  390], loss: [0.6906], avg loss: [0.6970], time: [104.9941ms]\n",
+      "Epoch: [  1/ 10], step: [   33/  390], loss: [0.6932], avg loss: [0.6969], time: [99.1578ms]\n",
+      "Epoch: [  1/ 10], step: [   34/  390], loss: [0.6872], avg loss: [0.6966], time: [101.4924ms]\n",
+      "Epoch: [  1/ 10], step: [   35/  390], loss: [0.6887], avg loss: [0.6964], time: [100.5478ms]\n",
+      "Epoch: [  1/ 10], step: [   36/  390], loss: [0.6789], avg loss: [0.6959], time: [102.1488ms]\n",
+      "Epoch: [  1/ 10], step: [   37/  390], loss: [0.6729], avg loss: [0.6953], time: [99.9565ms]\n",
+      "Epoch: [  1/ 10], step: [   38/  390], loss: [0.7344], avg loss: [0.6963], time: [102.1416ms]\n",
+      "Epoch: [  1/ 10], step: [   39/  390], loss: [0.6946], avg loss: [0.6963], time: [101.9769ms]\n",
+      "Epoch: [  1/ 10], step: [   40/  390], loss: [0.6977], avg loss: [0.6963], time: [103.1592ms]\n",
+      "Epoch: [  1/ 10], step: [   41/  390], loss: [0.7134], avg loss: [0.6967], time: [97.4550ms]\n",
+      "Epoch: [  1/ 10], step: [   42/  390], loss: [0.6807], avg loss: [0.6963], time: [101.9244ms]\n",
+      "Epoch: [  1/ 10], step: [   43/  390], loss: [0.6798], avg loss: [0.6960], time: [98.6509ms]\n",
+      "Epoch: [  1/ 10], step: [   44/  390], loss: [0.7065], avg loss: [0.6962], time: [100.4102ms]\n",
+      "Epoch: [  1/ 10], step: [   45/  390], loss: [0.6930], avg loss: [0.6961], time: [99.2818ms]\n",
+      "Epoch: [  1/ 10], step: [   46/  390], loss: [0.6925], avg loss: [0.6960], time: [97.1210ms]\n",
+      "Epoch: [  1/ 10], step: [   47/  390], loss: [0.6824], avg loss: [0.6958], time: [98.3243ms]\n",
+      "Epoch: [  1/ 10], step: [   48/  390], loss: [0.7224], avg loss: [0.6963], time: [99.4642ms]\n",
+      "Epoch: [  1/ 10], step: [   49/  390], loss: [0.7051], avg loss: [0.6965], time: [95.3386ms]\n",
+      "Epoch: [  1/ 10], step: [   50/  390], loss: [0.7195], avg loss: [0.6970], time: [101.0215ms]\n",
+      "Epoch: [  1/ 10], step: [   51/  390], loss: [0.6927], avg loss: [0.6969], time: [96.9672ms]\n",
+      "Epoch: [  1/ 10], step: [   52/  390], loss: [0.7097], avg loss: [0.6971], time: [97.8920ms]\n",
+      "Epoch: [  1/ 10], step: [   53/  390], loss: [0.6849], avg loss: [0.6969], time: [96.2329ms]\n",
+      "Epoch: [  1/ 10], step: [   54/  390], loss: [0.6892], avg loss: [0.6967], time: [103.4982ms]\n",
+      "Epoch: [  1/ 10], step: [   55/  390], loss: [0.6926], avg loss: [0.6967], time: [95.6774ms]\n",
+      "Epoch: [  1/ 10], step: [   56/  390], loss: [0.6934], avg loss: [0.6966], time: [100.6739ms]\n",
+      "Epoch: [  1/ 10], step: [   57/  390], loss: [0.6891], avg loss: [0.6965], time: [97.0731ms]\n",
+      "Epoch: [  1/ 10], step: [   58/  390], loss: [0.7068], avg loss: [0.6967], time: [99.1342ms]\n",
+      "Epoch: [  1/ 10], step: [   59/  390], loss: [0.6920], avg loss: [0.6966], time: [96.6048ms]\n",
+      "Epoch: [  1/ 10], step: [   60/  390], loss: [0.7120], avg loss: [0.6968], time: [106.0467ms]\n",
+      "Epoch: [  1/ 10], step: [   61/  390], loss: [0.6930], avg loss: [0.6968], time: [98.2921ms]\n",
+      "Epoch: [  1/ 10], step: [   62/  390], loss: [0.7112], avg loss: [0.6970], time: [99.8714ms]\n",
+      "Epoch: [  1/ 10], step: [   63/  390], loss: [0.6845], avg loss: [0.6968], time: [99.9265ms]\n",
+      "Epoch: [  1/ 10], step: [   64/  390], loss: [0.6958], avg loss: [0.6968], time: [101.6951ms]\n",
+      "Epoch: [  1/ 10], step: [   65/  390], loss: [0.6909], avg loss: [0.6967], time: [95.9563ms]\n",
+      "Epoch: [  1/ 10], step: [   66/  390], loss: [0.6876], avg loss: [0.6966], time: [102.0942ms]\n",
+      "Epoch: [  1/ 10], step: [   67/  390], loss: [0.6800], avg loss: [0.6963], time: [97.1215ms]\n",
+      "Epoch: [  1/ 10], step: [   68/  390], loss: [0.7101], avg loss: [0.6965], time: [102.3653ms]\n",
+      "Epoch: [  1/ 10], step: [   69/  390], loss: [0.7078], avg loss: [0.6967], time: [97.5039ms]\n",
+      "Epoch: [  1/ 10], step: [   70/  390], loss: [0.6890], avg loss: [0.6966], time: [103.4834ms]\n",
+      "Epoch: [  1/ 10], step: [   71/  390], loss: [0.6859], avg loss: [0.6964], time: [98.1841ms]\n",
+      "Epoch: [  1/ 10], step: [   72/  390], loss: [0.6913], avg loss: [0.6963], time: [98.9609ms]\n",
+      "Epoch: [  1/ 10], step: [   73/  390], loss: [0.6935], avg loss: [0.6963], time: [98.4514ms]\n",
+      "Epoch: [  1/ 10], step: [   74/  390], loss: [0.6905], avg loss: [0.6962], time: [100.3788ms]\n",
+      "Epoch: [  1/ 10], step: [   75/  390], loss: [0.6936], avg loss: [0.6962], time: [99.1523ms]\n",
+      "Epoch: [  1/ 10], step: [   76/  390], loss: [0.6901], avg loss: [0.6961], time: [98.4559ms]\n",
+      "Epoch: [  1/ 10], step: [   77/  390], loss: [0.6826], avg loss: [0.6959], time: [96.8366ms]\n",
+      "Epoch: [  1/ 10], step: [   78/  390], loss: [0.6930], avg loss: [0.6959], time: [101.1457ms]\n",
+      "Epoch: [  1/ 10], step: [   79/  390], loss: [0.6936], avg loss: [0.6959], time: [98.6462ms]\n",
+      "Epoch: [  1/ 10], step: [   80/  390], loss: [0.6921], avg loss: [0.6958], time: [104.7125ms]\n",
+      "Epoch: [  1/ 10], step: [   81/  390], loss: [0.6839], avg loss: [0.6957], time: [95.8931ms]\n",
+      "Epoch: [  1/ 10], step: [   82/  390], loss: [0.6910], avg loss: [0.6956], time: [102.4179ms]\n",
+      "Epoch: [  1/ 10], step: [   83/  390], loss: [0.6954], avg loss: [0.6956], time: [96.1897ms]\n",
+      "Epoch: [  1/ 10], step: [   84/  390], loss: [0.6838], avg loss: [0.6955], time: [101.9053ms]\n",
+      "Epoch: [  1/ 10], step: [   85/  390], loss: [0.6928], avg loss: [0.6954], time: [96.2470ms]\n",
+      "Epoch: [  1/ 10], step: [   86/  390], loss: [0.6931], avg loss: [0.6954], time: [100.2293ms]\n",
+      "Epoch: [  1/ 10], step: [   87/  390], loss: [0.6784], avg loss: [0.6952], time: [99.4971ms]\n",
+      "Epoch: [  1/ 10], step: [   88/  390], loss: [0.6821], avg loss: [0.6951], time: [101.0315ms]\n",
+      "Epoch: [  1/ 10], step: [   89/  390], loss: [0.6899], avg loss: [0.6950], time: [96.1020ms]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: [  1/ 10], step: [   90/  390], loss: [0.6860], avg loss: [0.6949], time: [104.0313ms]\n",
+      "Epoch: [  1/ 10], step: [   91/  390], loss: [0.6900], avg loss: [0.6949], time: [98.9680ms]\n",
+      "Epoch: [  1/ 10], step: [   92/  390], loss: [0.6846], avg loss: [0.6947], time: [100.7631ms]\n",
+      "Epoch: [  1/ 10], step: [   93/  390], loss: [0.6833], avg loss: [0.6946], time: [99.0198ms]\n",
+      "Epoch: [  1/ 10], step: [   94/  390], loss: [0.6901], avg loss: [0.6946], time: [99.3226ms]\n",
+      "Epoch: [  1/ 10], step: [   95/  390], loss: [0.6831], avg loss: [0.6945], time: [97.3852ms]\n",
+      "Epoch: [  1/ 10], step: [   96/  390], loss: [0.7010], avg loss: [0.6945], time: [102.8271ms]\n",
+      "Epoch: [  1/ 10], step: [   97/  390], loss: [0.6925], avg loss: [0.6945], time: [96.1418ms]\n",
+      "Epoch: [  1/ 10], step: [   98/  390], loss: [0.6768], avg loss: [0.6943], time: [98.8572ms]\n",
+      "Epoch: [  1/ 10], step: [   99/  390], loss: [0.6848], avg loss: [0.6942], time: [96.3254ms]\n",
+      "Epoch: [  1/ 10], step: [  100/  390], loss: [0.6925], avg loss: [0.6942], time: [105.0456ms]\n",
+      "Epoch: [  1/ 10], step: [  101/  390], loss: [0.7067], avg loss: [0.6943], time: [96.3614ms]\n",
+      "Epoch: [  1/ 10], step: [  102/  390], loss: [0.7053], avg loss: [0.6944], time: [100.3454ms]\n",
+      "Epoch: [  1/ 10], step: [  103/  390], loss: [0.6841], avg loss: [0.6943], time: [97.2383ms]\n",
+      "Epoch: [  1/ 10], step: [  104/  390], loss: [0.6882], avg loss: [0.6943], time: [99.4568ms]\n",
+      "Epoch: [  1/ 10], step: [  105/  390], loss: [0.6794], avg loss: [0.6941], time: [97.8613ms]\n",
+      "Epoch: [  1/ 10], step: [  106/  390], loss: [0.6754], avg loss: [0.6940], time: [103.1914ms]\n",
+      "Epoch: [  1/ 10], step: [  107/  390], loss: [0.6788], avg loss: [0.6938], time: [100.8022ms]\n",
+      "Epoch: [  1/ 10], step: [  108/  390], loss: [0.6930], avg loss: [0.6938], time: [101.5067ms]\n",
+      "Epoch: [  1/ 10], step: [  109/  390], loss: [0.6792], avg loss: [0.6937], time: [98.3841ms]\n",
+      "Epoch: [  1/ 10], step: [  110/  390], loss: [0.6889], avg loss: [0.6936], time: [100.2100ms]\n",
+      "Epoch: [  1/ 10], step: [  111/  390], loss: [0.6800], avg loss: [0.6935], time: [99.3590ms]\n",
+      "Epoch: [  1/ 10], step: [  112/  390], loss: [0.6881], avg loss: [0.6935], time: [104.4703ms]\n",
+      "Epoch: [  1/ 10], step: [  113/  390], loss: [0.6866], avg loss: [0.6934], time: [99.1809ms]\n",
+      "Epoch: [  1/ 10], step: [  114/  390], loss: [0.6963], avg loss: [0.6934], time: [101.8333ms]\n",
+      "Epoch: [  1/ 10], step: [  115/  390], loss: [0.6698], avg loss: [0.6932], time: [102.0269ms]\n",
+      "Epoch: [  1/ 10], step: [  116/  390], loss: [0.6795], avg loss: [0.6931], time: [104.2218ms]\n",
+      "Epoch: [  1/ 10], step: [  117/  390], loss: [0.7177], avg loss: [0.6933], time: [98.0408ms]\n",
+      "Epoch: [  1/ 10], step: [  118/  390], loss: [0.6559], avg loss: [0.6930], time: [104.4662ms]\n",
+      "Epoch: [  1/ 10], step: [  119/  390], loss: [0.6949], avg loss: [0.6930], time: [97.7294ms]\n",
+      "Epoch: [  1/ 10], step: [  120/  390], loss: [0.6934], avg loss: [0.6930], time: [100.3757ms]\n",
+      "Epoch: [  1/ 10], step: [  121/  390], loss: [0.6854], avg loss: [0.6930], time: [95.2394ms]\n",
+      "Epoch: [  1/ 10], step: [  122/  390], loss: [0.6730], avg loss: [0.6928], time: [105.2535ms]\n",
+      "Epoch: [  1/ 10], step: [  123/  390], loss: [0.6616], avg loss: [0.6925], time: [96.2501ms]\n",
+      "Epoch: [  1/ 10], step: [  124/  390], loss: [0.6572], avg loss: [0.6923], time: [97.7559ms]\n",
+      "Epoch: [  1/ 10], step: [  125/  390], loss: [0.6612], avg loss: [0.6920], time: [96.9963ms]\n",
+      "Epoch: [  1/ 10], step: [  126/  390], loss: [0.6623], avg loss: [0.6918], time: [98.4409ms]\n",
+      "Epoch: [  1/ 10], step: [  127/  390], loss: [0.6790], avg loss: [0.6917], time: [96.9732ms]\n",
+      "Epoch: [  1/ 10], step: [  128/  390], loss: [0.6518], avg loss: [0.6914], time: [98.8829ms]\n",
+      "Epoch: [  1/ 10], step: [  129/  390], loss: [0.6196], avg loss: [0.6908], time: [97.8017ms]\n",
+      "Epoch: [  1/ 10], step: [  130/  390], loss: [0.6518], avg loss: [0.6905], time: [98.0737ms]\n",
+      "Epoch: [  1/ 10], step: [  131/  390], loss: [0.7111], avg loss: [0.6907], time: [101.8670ms]\n",
+      "Epoch: [  1/ 10], step: [  132/  390], loss: [0.6345], avg loss: [0.6902], time: [100.6875ms]\n",
+      "Epoch: [  1/ 10], step: [  133/  390], loss: [0.6846], avg loss: [0.6902], time: [100.5409ms]\n",
+      "Epoch: [  1/ 10], step: [  134/  390], loss: [0.6700], avg loss: [0.6900], time: [99.1569ms]\n",
+      "Epoch: [  1/ 10], step: [  135/  390], loss: [0.6939], avg loss: [0.6901], time: [98.1600ms]\n",
+      "Epoch: [  1/ 10], step: [  136/  390], loss: [0.6846], avg loss: [0.6900], time: [100.2150ms]\n",
+      "Epoch: [  1/ 10], step: [  137/  390], loss: [0.6408], avg loss: [0.6897], time: [99.3683ms]\n",
+      "Epoch: [  1/ 10], step: [  138/  390], loss: [0.6886], avg loss: [0.6897], time: [100.5194ms]\n",
+      "Epoch: [  1/ 10], step: [  139/  390], loss: [0.7377], avg loss: [0.6900], time: [97.7733ms]\n",
+      "Epoch: [  1/ 10], step: [  140/  390], loss: [0.7049], avg loss: [0.6901], time: [99.1342ms]\n",
+      "Epoch: [  1/ 10], step: [  141/  390], loss: [0.6946], avg loss: [0.6901], time: [101.1744ms]\n",
+      "Epoch: [  1/ 10], step: [  142/  390], loss: [0.7178], avg loss: [0.6903], time: [103.1477ms]\n",
+      "Epoch: [  1/ 10], step: [  143/  390], loss: [0.6664], avg loss: [0.6902], time: [96.6640ms]\n",
+      "Epoch: [  1/ 10], step: [  144/  390], loss: [0.6791], avg loss: [0.6901], time: [101.9955ms]\n",
+      "Epoch: [  1/ 10], step: [  145/  390], loss: [0.6599], avg loss: [0.6899], time: [96.8127ms]\n",
+      "Epoch: [  1/ 10], step: [  146/  390], loss: [0.6665], avg loss: [0.6897], time: [102.8697ms]\n",
+      "Epoch: [  1/ 10], step: [  147/  390], loss: [0.6800], avg loss: [0.6897], time: [97.5199ms]\n",
+      "Epoch: [  1/ 10], step: [  148/  390], loss: [0.6777], avg loss: [0.6896], time: [100.7779ms]\n",
+      "Epoch: [  1/ 10], step: [  149/  390], loss: [0.6690], avg loss: [0.6894], time: [96.0045ms]\n",
+      "Epoch: [  1/ 10], step: [  150/  390], loss: [0.6887], avg loss: [0.6894], time: [105.4056ms]\n",
+      "Epoch: [  1/ 10], step: [  151/  390], loss: [0.6878], avg loss: [0.6894], time: [102.3405ms]\n",
+      "Epoch: [  1/ 10], step: [  152/  390], loss: [0.7036], avg loss: [0.6895], time: [100.7700ms]\n",
+      "Epoch: [  1/ 10], step: [  153/  390], loss: [0.6570], avg loss: [0.6893], time: [98.1102ms]\n",
+      "Epoch: [  1/ 10], step: [  154/  390], loss: [0.6865], avg loss: [0.6893], time: [99.4499ms]\n",
+      "Epoch: [  1/ 10], step: [  155/  390], loss: [0.6811], avg loss: [0.6892], time: [102.2248ms]\n",
+      "Epoch: [  1/ 10], step: [  156/  390], loss: [0.6733], avg loss: [0.6891], time: [102.7901ms]\n",
+      "Epoch: [  1/ 10], step: [  157/  390], loss: [0.6737], avg loss: [0.6890], time: [98.1770ms]\n",
+      "Epoch: [  1/ 10], step: [  158/  390], loss: [0.6779], avg loss: [0.6890], time: [102.8569ms]\n",
+      "Epoch: [  1/ 10], step: [  159/  390], loss: [0.6573], avg loss: [0.6888], time: [98.7372ms]\n",
+      "Epoch: [  1/ 10], step: [  160/  390], loss: [0.6782], avg loss: [0.6887], time: [104.6941ms]\n",
+      "Epoch: [  1/ 10], step: [  161/  390], loss: [0.6704], avg loss: [0.6886], time: [98.0818ms]\n",
+      "Epoch: [  1/ 10], step: [  162/  390], loss: [0.6862], avg loss: [0.6886], time: [102.0994ms]\n",
+      "Epoch: [  1/ 10], step: [  163/  390], loss: [0.6740], avg loss: [0.6885], time: [95.8638ms]\n",
+      "Epoch: [  1/ 10], step: [  164/  390], loss: [0.6466], avg loss: [0.6882], time: [99.0460ms]\n",
+      "Epoch: [  1/ 10], step: [  165/  390], loss: [0.6506], avg loss: [0.6880], time: [98.9368ms]\n",
+      "Epoch: [  1/ 10], step: [  166/  390], loss: [0.6750], avg loss: [0.6879], time: [101.0432ms]\n",
+      "Epoch: [  1/ 10], step: [  167/  390], loss: [0.6466], avg loss: [0.6877], time: [101.4707ms]\n",
+      "Epoch: [  1/ 10], step: [  168/  390], loss: [0.6610], avg loss: [0.6875], time: [106.3678ms]\n",
+      "Epoch: [  1/ 10], step: [  169/  390], loss: [0.6550], avg loss: [0.6873], time: [98.6440ms]\n",
+      "Epoch: [  1/ 10], step: [  170/  390], loss: [0.6806], avg loss: [0.6873], time: [100.2948ms]\n",
+      "Epoch: [  1/ 10], step: [  171/  390], loss: [0.6723], avg loss: [0.6872], time: [98.7585ms]\n",
+      "Epoch: [  1/ 10], step: [  172/  390], loss: [0.6515], avg loss: [0.6870], time: [99.8044ms]\n",
+      "Epoch: [  1/ 10], step: [  173/  390], loss: [0.6704], avg loss: [0.6869], time: [101.5334ms]\n",
+      "Epoch: [  1/ 10], step: [  174/  390], loss: [0.6675], avg loss: [0.6868], time: [99.2222ms]\n",
+      "Epoch: [  1/ 10], step: [  175/  390], loss: [0.6535], avg loss: [0.6866], time: [99.7167ms]\n",
+      "Epoch: [  1/ 10], step: [  176/  390], loss: [0.6660], avg loss: [0.6865], time: [103.0588ms]\n",
+      "Epoch: [  1/ 10], step: [  177/  390], loss: [0.6390], avg loss: [0.6862], time: [99.4101ms]\n",
+      "Epoch: [  1/ 10], step: [  178/  390], loss: [0.6589], avg loss: [0.6861], time: [98.7556ms]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: [  1/ 10], step: [  179/  390], loss: [0.6838], avg loss: [0.6860], time: [97.2741ms]\n",
+      "Epoch: [  1/ 10], step: [  180/  390], loss: [0.7194], avg loss: [0.6862], time: [104.8265ms]\n",
+      "Epoch: [  1/ 10], step: [  181/  390], loss: [0.5811], avg loss: [0.6856], time: [96.4580ms]\n",
+      "Epoch: [  1/ 10], step: [  182/  390], loss: [0.7140], avg loss: [0.6858], time: [99.6931ms]\n",
+      "Epoch: [  1/ 10], step: [  183/  390], loss: [0.7558], avg loss: [0.6862], time: [100.7893ms]\n",
+      "Epoch: [  1/ 10], step: [  184/  390], loss: [0.6419], avg loss: [0.6859], time: [99.4534ms]\n",
+      "Epoch: [  1/ 10], step: [  185/  390], loss: [0.5970], avg loss: [0.6855], time: [98.1152ms]\n",
+      "Epoch: [  1/ 10], step: [  186/  390], loss: [0.7137], avg loss: [0.6856], time: [99.8573ms]\n",
+      "Epoch: [  1/ 10], step: [  187/  390], loss: [0.6258], avg loss: [0.6853], time: [99.4055ms]\n",
+      "Epoch: [  1/ 10], step: [  188/  390], loss: [0.6423], avg loss: [0.6851], time: [100.4550ms]\n",
+      "Epoch: [  1/ 10], step: [  189/  390], loss: [0.6785], avg loss: [0.6850], time: [95.7451ms]\n",
+      "Epoch: [  1/ 10], step: [  190/  390], loss: [0.6613], avg loss: [0.6849], time: [99.9112ms]\n",
+      "Epoch: [  1/ 10], step: [  191/  390], loss: [0.6538], avg loss: [0.6847], time: [97.4913ms]\n",
+      "Epoch: [  1/ 10], step: [  192/  390], loss: [0.6377], avg loss: [0.6845], time: [105.1736ms]\n",
+      "Epoch: [  1/ 10], step: [  193/  390], loss: [0.7727], avg loss: [0.6850], time: [98.9897ms]\n",
+      "Epoch: [  1/ 10], step: [  194/  390], loss: [0.6539], avg loss: [0.6848], time: [104.8808ms]\n",
+      "Epoch: [  1/ 10], step: [  195/  390], loss: [0.6855], avg loss: [0.6848], time: [101.0678ms]\n",
+      "Epoch: [  1/ 10], step: [  196/  390], loss: [0.6523], avg loss: [0.6846], time: [98.1519ms]\n",
+      "Epoch: [  1/ 10], step: [  197/  390], loss: [0.6892], avg loss: [0.6847], time: [97.2669ms]\n",
+      "Epoch: [  1/ 10], step: [  198/  390], loss: [0.6495], avg loss: [0.6845], time: [104.5499ms]\n",
+      "Epoch: [  1/ 10], step: [  199/  390], loss: [0.6546], avg loss: [0.6843], time: [96.3891ms]\n",
+      "Epoch: [  1/ 10], step: [  200/  390], loss: [0.6856], avg loss: [0.6843], time: [99.0260ms]\n",
+      "Epoch: [  1/ 10], step: [  201/  390], loss: [0.6739], avg loss: [0.6843], time: [98.8247ms]\n",
+      "Epoch: [  1/ 10], step: [  202/  390], loss: [0.6894], avg loss: [0.6843], time: [99.9596ms]\n",
+      "Epoch: [  1/ 10], step: [  203/  390], loss: [0.6625], avg loss: [0.6842], time: [98.8562ms]\n",
+      "Epoch: [  1/ 10], step: [  204/  390], loss: [0.6656], avg loss: [0.6841], time: [100.0981ms]\n",
+      "Epoch: [  1/ 10], step: [  205/  390], loss: [0.6302], avg loss: [0.6838], time: [96.4847ms]\n",
+      "Epoch: [  1/ 10], step: [  206/  390], loss: [0.6459], avg loss: [0.6837], time: [104.7690ms]\n",
+      "Epoch: [  1/ 10], step: [  207/  390], loss: [0.6626], avg loss: [0.6836], time: [98.5188ms]\n",
+      "Epoch: [  1/ 10], step: [  208/  390], loss: [0.6679], avg loss: [0.6835], time: [103.6048ms]\n",
+      "Epoch: [  1/ 10], step: [  209/  390], loss: [0.6209], avg loss: [0.6832], time: [97.0821ms]\n",
+      "Epoch: [  1/ 10], step: [  210/  390], loss: [0.6665], avg loss: [0.6831], time: [100.1060ms]\n",
+      "Epoch: [  1/ 10], step: [  211/  390], loss: [0.6486], avg loss: [0.6829], time: [96.8394ms]\n",
+      "Epoch: [  1/ 10], step: [  212/  390], loss: [0.6675], avg loss: [0.6829], time: [100.1658ms]\n",
+      "Epoch: [  1/ 10], step: [  213/  390], loss: [0.6709], avg loss: [0.6828], time: [97.8286ms]\n",
+      "Epoch: [  1/ 10], step: [  214/  390], loss: [0.6539], avg loss: [0.6827], time: [97.7886ms]\n",
+      "Epoch: [  1/ 10], step: [  215/  390], loss: [0.6299], avg loss: [0.6824], time: [97.8947ms]\n",
+      "Epoch: [  1/ 10], step: [  216/  390], loss: [0.6258], avg loss: [0.6822], time: [103.6613ms]\n",
+      "Epoch: [  1/ 10], step: [  217/  390], loss: [0.6113], avg loss: [0.6818], time: [96.6077ms]\n",
+      "Epoch: [  1/ 10], step: [  218/  390], loss: [0.6566], avg loss: [0.6817], time: [103.4837ms]\n",
+      "Epoch: [  1/ 10], step: [  219/  390], loss: [0.6309], avg loss: [0.6815], time: [96.1127ms]\n",
+      "Epoch: [  1/ 10], step: [  220/  390], loss: [0.7080], avg loss: [0.6816], time: [100.4570ms]\n",
+      "Epoch: [  1/ 10], step: [  221/  390], loss: [0.6745], avg loss: [0.6816], time: [95.4196ms]\n",
+      "Epoch: [  1/ 10], step: [  222/  390], loss: [0.7327], avg loss: [0.6818], time: [99.2377ms]\n",
+      "Epoch: [  1/ 10], step: [  223/  390], loss: [0.6556], avg loss: [0.6817], time: [101.9399ms]\n",
+      "Epoch: [  1/ 10], step: [  224/  390], loss: [0.5917], avg loss: [0.6813], time: [100.4891ms]\n",
+      "Epoch: [  1/ 10], step: [  225/  390], loss: [0.6625], avg loss: [0.6812], time: [96.1914ms]\n",
+      "Epoch: [  1/ 10], step: [  226/  390], loss: [0.5993], avg loss: [0.6808], time: [104.5396ms]\n",
+      "Epoch: [  1/ 10], step: [  227/  390], loss: [0.6162], avg loss: [0.6806], time: [99.0164ms]\n",
+      "Epoch: [  1/ 10], step: [  228/  390], loss: [0.5698], avg loss: [0.6801], time: [101.6934ms]\n",
+      "Epoch: [  1/ 10], step: [  229/  390], loss: [0.6088], avg loss: [0.6798], time: [98.6443ms]\n",
+      "Epoch: [  1/ 10], step: [  230/  390], loss: [0.6212], avg loss: [0.6795], time: [102.3512ms]\n",
+      "Epoch: [  1/ 10], step: [  231/  390], loss: [0.5745], avg loss: [0.6791], time: [98.6683ms]\n",
+      "Epoch: [  1/ 10], step: [  232/  390], loss: [0.6947], avg loss: [0.6791], time: [100.1174ms]\n",
+      "Epoch: [  1/ 10], step: [  233/  390], loss: [0.6499], avg loss: [0.6790], time: [97.1973ms]\n",
+      "Epoch: [  1/ 10], step: [  234/  390], loss: [0.6867], avg loss: [0.6790], time: [100.7817ms]\n",
+      "Epoch: [  1/ 10], step: [  235/  390], loss: [0.6241], avg loss: [0.6788], time: [96.5447ms]\n",
+      "Epoch: [  1/ 10], step: [  236/  390], loss: [0.8216], avg loss: [0.6794], time: [103.3659ms]\n",
+      "Epoch: [  1/ 10], step: [  237/  390], loss: [0.6029], avg loss: [0.6791], time: [98.2652ms]\n",
+      "Epoch: [  1/ 10], step: [  238/  390], loss: [0.7373], avg loss: [0.6793], time: [99.4091ms]\n",
+      "Epoch: [  1/ 10], step: [  239/  390], loss: [0.7275], avg loss: [0.6795], time: [95.6967ms]\n",
+      "Epoch: [  1/ 10], step: [  240/  390], loss: [0.6317], avg loss: [0.6793], time: [101.9759ms]\n",
+      "Epoch: [  1/ 10], step: [  241/  390], loss: [0.6836], avg loss: [0.6793], time: [95.0050ms]\n",
+      "Epoch: [  1/ 10], step: [  242/  390], loss: [0.7143], avg loss: [0.6795], time: [99.9835ms]\n",
+      "Epoch: [  1/ 10], step: [  243/  390], loss: [0.6408], avg loss: [0.6793], time: [100.2116ms]\n",
+      "Epoch: [  1/ 10], step: [  244/  390], loss: [0.6520], avg loss: [0.6792], time: [106.4565ms]\n",
+      "Epoch: [  1/ 10], step: [  245/  390], loss: [0.6602], avg loss: [0.6791], time: [97.6610ms]\n",
+      "Epoch: [  1/ 10], step: [  246/  390], loss: [0.6279], avg loss: [0.6789], time: [99.3872ms]\n",
+      "Epoch: [  1/ 10], step: [  247/  390], loss: [0.6336], avg loss: [0.6787], time: [100.8461ms]\n",
+      "Epoch: [  1/ 10], step: [  248/  390], loss: [0.6832], avg loss: [0.6788], time: [102.9501ms]\n",
+      "Epoch: [  1/ 10], step: [  249/  390], loss: [0.6762], avg loss: [0.6788], time: [97.1913ms]\n",
+      "Epoch: [  1/ 10], step: [  250/  390], loss: [0.7123], avg loss: [0.6789], time: [104.1181ms]\n",
+      "Epoch: [  1/ 10], step: [  251/  390], loss: [0.7057], avg loss: [0.6790], time: [98.4380ms]\n",
+      "Epoch: [  1/ 10], step: [  252/  390], loss: [0.6579], avg loss: [0.6789], time: [100.9729ms]\n",
+      "Epoch: [  1/ 10], step: [  253/  390], loss: [0.6746], avg loss: [0.6789], time: [99.4725ms]\n",
+      "Epoch: [  1/ 10], step: [  254/  390], loss: [0.6690], avg loss: [0.6789], time: [101.0892ms]\n",
+      "Epoch: [  1/ 10], step: [  255/  390], loss: [0.6963], avg loss: [0.6789], time: [100.0776ms]\n",
+      "Epoch: [  1/ 10], step: [  256/  390], loss: [0.6519], avg loss: [0.6788], time: [101.8677ms]\n",
+      "Epoch: [  1/ 10], step: [  257/  390], loss: [0.6771], avg loss: [0.6788], time: [95.8588ms]\n",
+      "Epoch: [  1/ 10], step: [  258/  390], loss: [0.6355], avg loss: [0.6786], time: [100.1673ms]\n",
+      "Epoch: [  1/ 10], step: [  259/  390], loss: [0.6587], avg loss: [0.6786], time: [96.8349ms]\n",
+      "Epoch: [  1/ 10], step: [  260/  390], loss: [0.6374], avg loss: [0.6784], time: [100.6339ms]\n",
+      "Epoch: [  1/ 10], step: [  261/  390], loss: [0.6249], avg loss: [0.6782], time: [101.6564ms]\n",
+      "Epoch: [  1/ 10], step: [  262/  390], loss: [0.6486], avg loss: [0.6781], time: [99.8287ms]\n",
+      "Epoch: [  1/ 10], step: [  263/  390], loss: [0.6340], avg loss: [0.6779], time: [101.6524ms]\n",
+      "Epoch: [  1/ 10], step: [  264/  390], loss: [0.6180], avg loss: [0.6777], time: [103.4558ms]\n",
+      "Epoch: [  1/ 10], step: [  265/  390], loss: [0.6825], avg loss: [0.6777], time: [98.4805ms]\n",
+      "Epoch: [  1/ 10], step: [  266/  390], loss: [0.6412], avg loss: [0.6776], time: [103.6050ms]\n",
+      "Epoch: [  1/ 10], step: [  267/  390], loss: [0.6883], avg loss: [0.6776], time: [97.1916ms]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: [  1/ 10], step: [  268/  390], loss: [0.6293], avg loss: [0.6774], time: [99.0884ms]\n",
+      "Epoch: [  1/ 10], step: [  269/  390], loss: [0.6679], avg loss: [0.6774], time: [98.5043ms]\n",
+      "Epoch: [  1/ 10], step: [  270/  390], loss: [0.6610], avg loss: [0.6773], time: [104.2540ms]\n",
+      "Epoch: [  1/ 10], step: [  271/  390], loss: [0.6144], avg loss: [0.6771], time: [98.5131ms]\n",
+      "Epoch: [  1/ 10], step: [  272/  390], loss: [0.6461], avg loss: [0.6770], time: [98.3980ms]\n",
+      "Epoch: [  1/ 10], step: [  273/  390], loss: [0.6446], avg loss: [0.6769], time: [97.9280ms]\n",
+      "Epoch: [  1/ 10], step: [  274/  390], loss: [0.7186], avg loss: [0.6770], time: [100.8170ms]\n",
+      "Epoch: [  1/ 10], step: [  275/  390], loss: [0.7003], avg loss: [0.6771], time: [100.1592ms]\n",
+      "Epoch: [  1/ 10], step: [  276/  390], loss: [0.6935], avg loss: [0.6772], time: [101.7003ms]\n",
+      "Epoch: [  1/ 10], step: [  277/  390], loss: [0.7605], avg loss: [0.6775], time: [102.7503ms]\n",
+      "Epoch: [  1/ 10], step: [  278/  390], loss: [0.6664], avg loss: [0.6774], time: [102.6635ms]\n",
+      "Epoch: [  1/ 10], step: [  279/  390], loss: [0.5582], avg loss: [0.6770], time: [97.8162ms]\n",
+      "Epoch: [  1/ 10], step: [  280/  390], loss: [0.6123], avg loss: [0.6768], time: [100.5421ms]\n",
+      "Epoch: [  1/ 10], step: [  281/  390], loss: [0.6410], avg loss: [0.6766], time: [96.9605ms]\n",
+      "Epoch: [  1/ 10], step: [  282/  390], loss: [0.6696], avg loss: [0.6766], time: [102.0803ms]\n",
+      "Epoch: [  1/ 10], step: [  283/  390], loss: [0.6637], avg loss: [0.6766], time: [97.7211ms]\n",
+      "Epoch: [  1/ 10], step: [  284/  390], loss: [0.6558], avg loss: [0.6765], time: [101.4807ms]\n",
+      "Epoch: [  1/ 10], step: [  285/  390], loss: [0.6364], avg loss: [0.6764], time: [97.6901ms]\n",
+      "Epoch: [  1/ 10], step: [  286/  390], loss: [0.6613], avg loss: [0.6763], time: [104.0516ms]\n",
+      "Epoch: [  1/ 10], step: [  287/  390], loss: [0.6815], avg loss: [0.6763], time: [97.9962ms]\n",
+      "Epoch: [  1/ 10], step: [  288/  390], loss: [0.6551], avg loss: [0.6763], time: [99.8640ms]\n",
+      "Epoch: [  1/ 10], step: [  289/  390], loss: [0.6071], avg loss: [0.6760], time: [97.0018ms]\n",
+      "Epoch: [  1/ 10], step: [  290/  390], loss: [0.6287], avg loss: [0.6759], time: [98.8023ms]\n",
+      "Epoch: [  1/ 10], step: [  291/  390], loss: [0.6090], avg loss: [0.6756], time: [97.9438ms]\n",
+      "Epoch: [  1/ 10], step: [  292/  390], loss: [0.6697], avg loss: [0.6756], time: [102.2677ms]\n",
+      "Epoch: [  1/ 10], step: [  293/  390], loss: [0.6100], avg loss: [0.6754], time: [100.4937ms]\n",
+      "Epoch: [  1/ 10], step: [  294/  390], loss: [0.6452], avg loss: [0.6753], time: [99.9360ms]\n",
+      "Epoch: [  1/ 10], step: [  295/  390], loss: [0.5721], avg loss: [0.6749], time: [98.2065ms]\n",
+      "Epoch: [  1/ 10], step: [  296/  390], loss: [0.6412], avg loss: [0.6748], time: [102.8442ms]\n",
+      "Epoch: [  1/ 10], step: [  297/  390], loss: [0.6133], avg loss: [0.6746], time: [97.3332ms]\n",
+      "Epoch: [  1/ 10], step: [  298/  390], loss: [0.7127], avg loss: [0.6747], time: [100.2576ms]\n",
+      "Epoch: [  1/ 10], step: [  299/  390], loss: [0.6043], avg loss: [0.6745], time: [99.0622ms]\n",
+      "Epoch: [  1/ 10], step: [  300/  390], loss: [0.6349], avg loss: [0.6744], time: [101.5828ms]\n",
+      "Epoch: [  1/ 10], step: [  301/  390], loss: [0.6233], avg loss: [0.6742], time: [101.6316ms]\n",
+      "Epoch: [  1/ 10], step: [  302/  390], loss: [0.6955], avg loss: [0.6743], time: [100.1947ms]\n",
+      "Epoch: [  1/ 10], step: [  303/  390], loss: [0.5825], avg loss: [0.6740], time: [98.4261ms]\n",
+      "Epoch: [  1/ 10], step: [  304/  390], loss: [0.6163], avg loss: [0.6738], time: [101.7635ms]\n",
+      "Epoch: [  1/ 10], step: [  305/  390], loss: [0.6739], avg loss: [0.6738], time: [99.4513ms]\n",
+      "Epoch: [  1/ 10], step: [  306/  390], loss: [0.6409], avg loss: [0.6737], time: [102.5743ms]\n",
+      "Epoch: [  1/ 10], step: [  307/  390], loss: [0.6608], avg loss: [0.6736], time: [97.9807ms]\n",
+      "Epoch: [  1/ 10], step: [  308/  390], loss: [0.6505], avg loss: [0.6736], time: [104.2030ms]\n",
+      "Epoch: [  1/ 10], step: [  309/  390], loss: [0.6090], avg loss: [0.6733], time: [95.8602ms]\n",
+      "Epoch: [  1/ 10], step: [  310/  390], loss: [0.6088], avg loss: [0.6731], time: [101.6569ms]\n",
+      "Epoch: [  1/ 10], step: [  311/  390], loss: [0.6254], avg loss: [0.6730], time: [98.6173ms]\n",
+      "Epoch: [  1/ 10], step: [  312/  390], loss: [0.6485], avg loss: [0.6729], time: [99.7775ms]\n",
+      "Epoch: [  1/ 10], step: [  313/  390], loss: [0.7142], avg loss: [0.6730], time: [95.9973ms]\n",
+      "Epoch: [  1/ 10], step: [  314/  390], loss: [0.5787], avg loss: [0.6727], time: [103.0574ms]\n",
+      "Epoch: [  1/ 10], step: [  315/  390], loss: [0.6295], avg loss: [0.6726], time: [99.2737ms]\n",
+      "Epoch: [  1/ 10], step: [  316/  390], loss: [0.6210], avg loss: [0.6724], time: [101.5892ms]\n",
+      "Epoch: [  1/ 10], step: [  317/  390], loss: [0.7650], avg loss: [0.6727], time: [96.6787ms]\n",
+      "Epoch: [  1/ 10], step: [  318/  390], loss: [0.6355], avg loss: [0.6726], time: [104.0988ms]\n",
+      "Epoch: [  1/ 10], step: [  319/  390], loss: [0.6717], avg loss: [0.6726], time: [101.0303ms]\n",
+      "Epoch: [  1/ 10], step: [  320/  390], loss: [0.7392], avg loss: [0.6728], time: [100.6036ms]\n",
+      "Epoch: [  1/ 10], step: [  321/  390], loss: [0.6969], avg loss: [0.6729], time: [97.0156ms]\n",
+      "Epoch: [  1/ 10], step: [  322/  390], loss: [0.6394], avg loss: [0.6728], time: [104.1842ms]\n",
+      "Epoch: [  1/ 10], step: [  323/  390], loss: [0.6603], avg loss: [0.6727], time: [101.1569ms]\n",
+      "Epoch: [  1/ 10], step: [  324/  390], loss: [0.6058], avg loss: [0.6725], time: [99.5979ms]\n",
+      "Epoch: [  1/ 10], step: [  325/  390], loss: [0.6332], avg loss: [0.6724], time: [97.6386ms]\n",
+      "Epoch: [  1/ 10], step: [  326/  390], loss: [0.6236], avg loss: [0.6723], time: [100.7688ms]\n",
+      "Epoch: [  1/ 10], step: [  327/  390], loss: [0.6483], avg loss: [0.6722], time: [99.6730ms]\n",
+      "Epoch: [  1/ 10], step: [  328/  390], loss: [0.6229], avg loss: [0.6720], time: [98.9954ms]\n",
+      "Epoch: [  1/ 10], step: [  329/  390], loss: [0.6022], avg loss: [0.6718], time: [99.1223ms]\n",
+      "Epoch: [  1/ 10], step: [  330/  390], loss: [0.6393], avg loss: [0.6717], time: [99.3533ms]\n",
+      "Epoch: [  1/ 10], step: [  331/  390], loss: [0.5813], avg loss: [0.6715], time: [99.1678ms]\n",
+      "Epoch: [  1/ 10], step: [  332/  390], loss: [0.6013], avg loss: [0.6712], time: [101.7318ms]\n",
+      "Epoch: [  1/ 10], step: [  333/  390], loss: [0.6026], avg loss: [0.6710], time: [97.6651ms]\n",
+      "Epoch: [  1/ 10], step: [  334/  390], loss: [0.5768], avg loss: [0.6708], time: [99.6068ms]\n",
+      "Epoch: [  1/ 10], step: [  335/  390], loss: [0.6915], avg loss: [0.6708], time: [99.7696ms]\n",
+      "Epoch: [  1/ 10], step: [  336/  390], loss: [0.6256], avg loss: [0.6707], time: [104.4483ms]\n",
+      "Epoch: [  1/ 10], step: [  337/  390], loss: [0.7781], avg loss: [0.6710], time: [101.9986ms]\n",
+      "Epoch: [  1/ 10], step: [  338/  390], loss: [0.7050], avg loss: [0.6711], time: [107.2345ms]\n",
+      "Epoch: [  1/ 10], step: [  339/  390], loss: [0.7328], avg loss: [0.6713], time: [102.3960ms]\n",
+      "Epoch: [  1/ 10], step: [  340/  390], loss: [0.7076], avg loss: [0.6714], time: [101.7249ms]\n",
+      "Epoch: [  1/ 10], step: [  341/  390], loss: [0.7222], avg loss: [0.6715], time: [98.3591ms]\n",
+      "Epoch: [  1/ 10], step: [  342/  390], loss: [0.6022], avg loss: [0.6713], time: [103.7915ms]\n",
+      "Epoch: [  1/ 10], step: [  343/  390], loss: [0.6293], avg loss: [0.6712], time: [103.1730ms]\n",
+      "Epoch: [  1/ 10], step: [  344/  390], loss: [0.6443], avg loss: [0.6711], time: [98.4271ms]\n",
+      "Epoch: [  1/ 10], step: [  345/  390], loss: [0.6849], avg loss: [0.6712], time: [101.1391ms]\n",
+      "Epoch: [  1/ 10], step: [  346/  390], loss: [0.6910], avg loss: [0.6712], time: [102.3850ms]\n",
+      "Epoch: [  1/ 10], step: [  347/  390], loss: [0.7112], avg loss: [0.6714], time: [100.4372ms]\n",
+      "Epoch: [  1/ 10], step: [  348/  390], loss: [0.7019], avg loss: [0.6714], time: [100.4596ms]\n",
+      "Epoch: [  1/ 10], step: [  349/  390], loss: [0.6608], avg loss: [0.6714], time: [98.1493ms]\n",
+      "Epoch: [  1/ 10], step: [  350/  390], loss: [0.6993], avg loss: [0.6715], time: [98.9270ms]\n",
+      "Epoch: [  1/ 10], step: [  351/  390], loss: [0.6632], avg loss: [0.6715], time: [95.9554ms]\n",
+      "Epoch: [  1/ 10], step: [  352/  390], loss: [0.6706], avg loss: [0.6715], time: [100.6606ms]\n",
+      "Epoch: [  1/ 10], step: [  353/  390], loss: [0.6401], avg loss: [0.6714], time: [96.2012ms]\n",
+      "Epoch: [  1/ 10], step: [  354/  390], loss: [0.6503], avg loss: [0.6713], time: [98.9563ms]\n",
+      "Epoch: [  1/ 10], step: [  355/  390], loss: [0.6477], avg loss: [0.6712], time: [96.5178ms]\n",
+      "Epoch: [  1/ 10], step: [  356/  390], loss: [0.6509], avg loss: [0.6712], time: [103.2085ms]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: [  1/ 10], step: [  357/  390], loss: [0.6403], avg loss: [0.6711], time: [98.1786ms]\n",
+      "Epoch: [  1/ 10], step: [  358/  390], loss: [0.6679], avg loss: [0.6711], time: [103.6110ms]\n",
+      "Epoch: [  1/ 10], step: [  359/  390], loss: [0.6559], avg loss: [0.6711], time: [97.9443ms]\n",
+      "Epoch: [  1/ 10], step: [  360/  390], loss: [0.6298], avg loss: [0.6709], time: [103.2884ms]\n",
+      "Epoch: [  1/ 10], step: [  361/  390], loss: [0.6193], avg loss: [0.6708], time: [98.4299ms]\n",
+      "Epoch: [  1/ 10], step: [  362/  390], loss: [0.6649], avg loss: [0.6708], time: [102.1912ms]\n",
+      "Epoch: [  1/ 10], step: [  363/  390], loss: [0.6179], avg loss: [0.6706], time: [102.8066ms]\n",
+      "Epoch: [  1/ 10], step: [  364/  390], loss: [0.6771], avg loss: [0.6707], time: [102.7441ms]\n",
+      "Epoch: [  1/ 10], step: [  365/  390], loss: [0.6193], avg loss: [0.6705], time: [97.2888ms]\n",
+      "Epoch: [  1/ 10], step: [  366/  390], loss: [0.5615], avg loss: [0.6702], time: [100.9829ms]\n",
+      "Epoch: [  1/ 10], step: [  367/  390], loss: [0.6999], avg loss: [0.6703], time: [101.9204ms]\n",
+      "Epoch: [  1/ 10], step: [  368/  390], loss: [0.6330], avg loss: [0.6702], time: [103.1630ms]\n",
+      "Epoch: [  1/ 10], step: [  369/  390], loss: [0.6941], avg loss: [0.6703], time: [100.9564ms]\n",
+      "Epoch: [  1/ 10], step: [  370/  390], loss: [0.7298], avg loss: [0.6704], time: [104.1873ms]\n",
+      "Epoch: [  1/ 10], step: [  371/  390], loss: [0.7247], avg loss: [0.6706], time: [96.9987ms]\n",
+      "Epoch: [  1/ 10], step: [  372/  390], loss: [0.5866], avg loss: [0.6703], time: [100.1570ms]\n",
+      "Epoch: [  1/ 10], step: [  373/  390], loss: [0.6025], avg loss: [0.6702], time: [93.6594ms]\n",
+      "Epoch: [  1/ 10], step: [  374/  390], loss: [0.6047], avg loss: [0.6700], time: [103.0066ms]\n",
+      "Epoch: [  1/ 10], step: [  375/  390], loss: [0.5705], avg loss: [0.6697], time: [97.8637ms]\n",
+      "Epoch: [  1/ 10], step: [  376/  390], loss: [0.7009], avg loss: [0.6698], time: [106.2992ms]\n",
+      "Epoch: [  1/ 10], step: [  377/  390], loss: [0.6272], avg loss: [0.6697], time: [102.3777ms]\n",
+      "Epoch: [  1/ 10], step: [  378/  390], loss: [0.6697], avg loss: [0.6697], time: [105.3488ms]\n",
+      "Epoch: [  1/ 10], step: [  379/  390], loss: [0.6578], avg loss: [0.6697], time: [98.5255ms]\n",
+      "Epoch: [  1/ 10], step: [  380/  390], loss: [0.5431], avg loss: [0.6693], time: [101.3367ms]\n",
+      "Epoch: [  1/ 10], step: [  381/  390], loss: [0.7024], avg loss: [0.6694], time: [99.8781ms]\n",
+      "Epoch: [  1/ 10], step: [  382/  390], loss: [0.5866], avg loss: [0.6692], time: [102.2394ms]\n",
+      "Epoch: [  1/ 10], step: [  383/  390], loss: [0.6498], avg loss: [0.6691], time: [97.1584ms]\n",
+      "Epoch: [  1/ 10], step: [  384/  390], loss: [0.5926], avg loss: [0.6689], time: [101.0160ms]\n",
+      "Epoch: [  1/ 10], step: [  385/  390], loss: [0.6094], avg loss: [0.6688], time: [94.8484ms]\n",
+      "Epoch: [  1/ 10], step: [  386/  390], loss: [0.5663], avg loss: [0.6685], time: [104.9602ms]\n",
+      "Epoch: [  1/ 10], step: [  387/  390], loss: [0.6087], avg loss: [0.6684], time: [99.5123ms]\n",
+      "Epoch: [  1/ 10], step: [  388/  390], loss: [0.5394], avg loss: [0.6680], time: [102.6874ms]\n",
+      "Epoch: [  1/ 10], step: [  389/  390], loss: [0.7825], avg loss: [0.6683], time: [97.5268ms]\n",
+      "Epoch: [  1/ 10], step: [  390/  390], loss: [0.6069], avg loss: [0.6682], time: [956.3572ms]\n",
+      "Epoch time: 40590.405, per step time: 104.078\n",
+      "Epoch time: 40590.832, per step time: 104.079, avg loss: 0.668\n",
+      "************************************************************\n",
+      "Epoch: [  2/ 10], step: [    1/  390], loss: [0.7305], avg loss: [0.7305], time: [100.4219ms]\n",
+      "Epoch: [  2/ 10], step: [    2/  390], loss: [0.7044], avg loss: [0.7175], time: [103.8322ms]\n",
+      "Epoch: [  2/ 10], step: [    3/  390], loss: [0.5188], avg loss: [0.6512], time: [103.1067ms]\n",
+      "Epoch: [  2/ 10], step: [    4/  390], loss: [0.5801], avg loss: [0.6334], time: [100.9061ms]\n",
+      "Epoch: [  2/ 10], step: [    5/  390], loss: [0.6629], avg loss: [0.6393], time: [105.7405ms]\n",
+      "Epoch: [  2/ 10], step: [    6/  390], loss: [0.6763], avg loss: [0.6455], time: [101.5539ms]\n",
+      "Epoch: [  2/ 10], step: [    7/  390], loss: [0.6314], avg loss: [0.6435], time: [103.1678ms]\n",
+      "Epoch: [  2/ 10], step: [    8/  390], loss: [0.6936], avg loss: [0.6497], time: [102.1249ms]\n",
+      "Epoch: [  2/ 10], step: [    9/  390], loss: [0.5945], avg loss: [0.6436], time: [102.1895ms]\n",
+      "Epoch: [  2/ 10], step: [   10/  390], loss: [0.7017], avg loss: [0.6494], time: [101.5260ms]\n",
+      "Epoch: [  2/ 10], step: [   11/  390], loss: [0.6935], avg loss: [0.6534], time: [105.4053ms]\n",
+      "Epoch: [  2/ 10], step: [   12/  390], loss: [0.6426], avg loss: [0.6525], time: [101.2661ms]\n",
+      "Epoch: [  2/ 10], step: [   13/  390], loss: [0.6689], avg loss: [0.6538], time: [105.3002ms]\n",
+      "Epoch: [  2/ 10], step: [   14/  390], loss: [0.6623], avg loss: [0.6544], time: [102.6111ms]\n",
+      "Epoch: [  2/ 10], step: [   15/  390], loss: [0.6948], avg loss: [0.6571], time: [101.8302ms]\n",
+      "Epoch: [  2/ 10], step: [   16/  390], loss: [0.6518], avg loss: [0.6568], time: [100.3373ms]\n",
+      "Epoch: [  2/ 10], step: [   17/  390], loss: [0.6611], avg loss: [0.6570], time: [102.5856ms]\n",
+      "Epoch: [  2/ 10], step: [   18/  390], loss: [0.6519], avg loss: [0.6567], time: [101.8844ms]\n",
+      "Epoch: [  2/ 10], step: [   19/  390], loss: [0.6549], avg loss: [0.6566], time: [103.6978ms]\n",
+      "Epoch: [  2/ 10], step: [   20/  390], loss: [0.6685], avg loss: [0.6572], time: [100.9917ms]\n",
+      "Epoch: [  2/ 10], step: [   21/  390], loss: [0.6782], avg loss: [0.6582], time: [101.8064ms]\n",
+      "Epoch: [  2/ 10], step: [   22/  390], loss: [0.6741], avg loss: [0.6589], time: [105.1960ms]\n",
+      "Epoch: [  2/ 10], step: [   23/  390], loss: [0.6394], avg loss: [0.6581], time: [102.1986ms]\n",
+      "Epoch: [  2/ 10], step: [   24/  390], loss: [0.6587], avg loss: [0.6581], time: [100.5142ms]\n",
+      "Epoch: [  2/ 10], step: [   25/  390], loss: [0.6442], avg loss: [0.6576], time: [105.2723ms]\n",
+      "Epoch: [  2/ 10], step: [   26/  390], loss: [0.6268], avg loss: [0.6564], time: [101.6653ms]\n",
+      "Epoch: [  2/ 10], step: [   27/  390], loss: [0.6517], avg loss: [0.6562], time: [101.8386ms]\n",
+      "Epoch: [  2/ 10], step: [   28/  390], loss: [0.6195], avg loss: [0.6549], time: [100.9421ms]\n",
+      "Epoch: [  2/ 10], step: [   29/  390], loss: [0.6192], avg loss: [0.6537], time: [101.0842ms]\n",
+      "Epoch: [  2/ 10], step: [   30/  390], loss: [0.6432], avg loss: [0.6533], time: [99.7970ms]\n",
+      "Epoch: [  2/ 10], step: [   31/  390], loss: [0.6170], avg loss: [0.6521], time: [102.1228ms]\n",
+      "Epoch: [  2/ 10], step: [   32/  390], loss: [0.6446], avg loss: [0.6519], time: [98.0394ms]\n",
+      "Epoch: [  2/ 10], step: [   33/  390], loss: [0.6830], avg loss: [0.6528], time: [103.9677ms]\n",
+      "Epoch: [  2/ 10], step: [   34/  390], loss: [0.6451], avg loss: [0.6526], time: [102.4153ms]\n",
+      "Epoch: [  2/ 10], step: [   35/  390], loss: [0.6049], avg loss: [0.6513], time: [102.6518ms]\n",
+      "Epoch: [  2/ 10], step: [   36/  390], loss: [0.6155], avg loss: [0.6503], time: [104.4526ms]\n",
+      "Epoch: [  2/ 10], step: [   37/  390], loss: [0.6176], avg loss: [0.6494], time: [102.4578ms]\n",
+      "Epoch: [  2/ 10], step: [   38/  390], loss: [0.7299], avg loss: [0.6515], time: [100.3780ms]\n",
+      "Epoch: [  2/ 10], step: [   39/  390], loss: [0.6515], avg loss: [0.6515], time: [105.2492ms]\n",
+      "Epoch: [  2/ 10], step: [   40/  390], loss: [0.5711], avg loss: [0.6495], time: [100.1759ms]\n",
+      "Epoch: [  2/ 10], step: [   41/  390], loss: [0.6730], avg loss: [0.6501], time: [106.7350ms]\n",
+      "Epoch: [  2/ 10], step: [   42/  390], loss: [0.6650], avg loss: [0.6504], time: [99.0667ms]\n",
+      "Epoch: [  2/ 10], step: [   43/  390], loss: [0.6340], avg loss: [0.6500], time: [105.6554ms]\n",
+      "Epoch: [  2/ 10], step: [   44/  390], loss: [0.5755], avg loss: [0.6483], time: [102.4828ms]\n",
+      "Epoch: [  2/ 10], step: [   45/  390], loss: [0.6111], avg loss: [0.6475], time: [105.1023ms]\n",
+      "Epoch: [  2/ 10], step: [   46/  390], loss: [0.5814], avg loss: [0.6461], time: [100.3659ms]\n",
+      "Epoch: [  2/ 10], step: [   47/  390], loss: [0.6620], avg loss: [0.6464], time: [103.0633ms]\n",
+      "Epoch: [  2/ 10], step: [   48/  390], loss: [0.5942], avg loss: [0.6453], time: [102.9394ms]\n",
+      "Epoch: [  2/ 10], step: [   49/  390], loss: [0.7082], avg loss: [0.6466], time: [101.4242ms]\n",
+      "Epoch: [  2/ 10], step: [   50/  390], loss: [0.5765], avg loss: [0.6452], time: [101.2902ms]\n",
+      "Epoch: [  2/ 10], step: [   51/  390], loss: [0.5995], avg loss: [0.6443], time: [104.4326ms]\n",
+      "Epoch: [  2/ 10], step: [   52/  390], loss: [0.6466], avg loss: [0.6444], time: [101.6693ms]\n",
+      "Epoch: [  2/ 10], step: [   53/  390], loss: [0.5725], avg loss: [0.6430], time: [106.1947ms]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: [  2/ 10], step: [   54/  390], loss: [0.5748], avg loss: [0.6417], time: [99.1259ms]\n",
+      "Epoch: [  2/ 10], step: [   55/  390], loss: [0.5293], avg loss: [0.6397], time: [104.4934ms]\n",
+      "Epoch: [  2/ 10], step: [   56/  390], loss: [0.5660], avg loss: [0.6384], time: [100.8847ms]\n",
+      "Epoch: [  2/ 10], step: [   57/  390], loss: [0.5283], avg loss: [0.6364], time: [108.6614ms]\n",
+      "Epoch: [  2/ 10], step: [   58/  390], loss: [0.5347], avg loss: [0.6347], time: [101.3596ms]\n",
+      "Epoch: [  2/ 10], step: [   59/  390], loss: [0.5154], avg loss: [0.6327], time: [100.2188ms]\n",
+      "Epoch: [  2/ 10], step: [   60/  390], loss: [0.6732], avg loss: [0.6333], time: [99.3276ms]\n",
+      "Epoch: [  2/ 10], step: [   61/  390], loss: [0.5197], avg loss: [0.6315], time: [104.8396ms]\n",
+      "Epoch: [  2/ 10], step: [   62/  390], loss: [0.7254], avg loss: [0.6330], time: [100.4605ms]\n",
+      "Epoch: [  2/ 10], step: [   63/  390], loss: [0.9070], avg loss: [0.6373], time: [108.4003ms]\n",
+      "Epoch: [  2/ 10], step: [   64/  390], loss: [0.5558], avg loss: [0.6361], time: [100.4941ms]\n",
+      "Epoch: [  2/ 10], step: [   65/  390], loss: [1.0045], avg loss: [0.6417], time: [103.8544ms]\n",
+      "Epoch: [  2/ 10], step: [   66/  390], loss: [0.8933], avg loss: [0.6456], time: [100.5349ms]\n",
+      "Epoch: [  2/ 10], step: [   67/  390], loss: [1.0105], avg loss: [0.6510], time: [103.8167ms]\n",
+      "Epoch: [  2/ 10], step: [   68/  390], loss: [0.6910], avg loss: [0.6516], time: [102.1645ms]\n",
+      "Epoch: [  2/ 10], step: [   69/  390], loss: [0.6372], avg loss: [0.6514], time: [103.2877ms]\n",
+      "Epoch: [  2/ 10], step: [   70/  390], loss: [0.6704], avg loss: [0.6517], time: [100.8592ms]\n",
+      "Epoch: [  2/ 10], step: [   71/  390], loss: [0.7066], avg loss: [0.6524], time: [103.1342ms]\n",
+      "Epoch: [  2/ 10], step: [   72/  390], loss: [0.7282], avg loss: [0.6535], time: [100.5018ms]\n",
+      "Epoch: [  2/ 10], step: [   73/  390], loss: [0.7256], avg loss: [0.6545], time: [107.3256ms]\n",
+      "Epoch: [  2/ 10], step: [   74/  390], loss: [0.7049], avg loss: [0.6551], time: [102.4539ms]\n",
+      "Epoch: [  2/ 10], step: [   75/  390], loss: [0.7688], avg loss: [0.6567], time: [103.9751ms]\n",
+      "Epoch: [  2/ 10], step: [   76/  390], loss: [0.6864], avg loss: [0.6571], time: [98.9461ms]\n",
+      "Epoch: [  2/ 10], step: [   77/  390], loss: [0.6767], avg loss: [0.6573], time: [107.2121ms]\n",
+      "Epoch: [  2/ 10], step: [   78/  390], loss: [0.6959], avg loss: [0.6578], time: [100.4789ms]\n",
+      "Epoch: [  2/ 10], step: [   79/  390], loss: [0.6960], avg loss: [0.6583], time: [107.1048ms]\n",
+      "Epoch: [  2/ 10], step: [   80/  390], loss: [0.6875], avg loss: [0.6587], time: [100.2309ms]\n",
+      "Epoch: [  2/ 10], step: [   81/  390], loss: [0.6882], avg loss: [0.6590], time: [102.3347ms]\n",
+      "Epoch: [  2/ 10], step: [   82/  390], loss: [0.6958], avg loss: [0.6595], time: [103.2414ms]\n",
+      "Epoch: [  2/ 10], step: [   83/  390], loss: [0.6996], avg loss: [0.6599], time: [106.8683ms]\n",
+      "Epoch: [  2/ 10], step: [   84/  390], loss: [0.6975], avg loss: [0.6604], time: [101.3956ms]\n",
+      "Epoch: [  2/ 10], step: [   85/  390], loss: [0.6863], avg loss: [0.6607], time: [102.7884ms]\n",
+      "Epoch: [  2/ 10], step: [   86/  390], loss: [0.6881], avg loss: [0.6610], time: [101.3696ms]\n",
+      "Epoch: [  2/ 10], step: [   87/  390], loss: [0.6797], avg loss: [0.6612], time: [102.0527ms]\n",
+      "Epoch: [  2/ 10], step: [   88/  390], loss: [0.6784], avg loss: [0.6614], time: [102.5679ms]\n",
+      "Epoch: [  2/ 10], step: [   89/  390], loss: [0.6775], avg loss: [0.6616], time: [103.6713ms]\n",
+      "Epoch: [  2/ 10], step: [   90/  390], loss: [0.6681], avg loss: [0.6617], time: [101.7852ms]\n",
+      "Epoch: [  2/ 10], step: [   91/  390], loss: [0.6906], avg loss: [0.6620], time: [105.4647ms]\n",
+      "Epoch: [  2/ 10], step: [   92/  390], loss: [0.6787], avg loss: [0.6622], time: [99.4866ms]\n",
+      "Epoch: [  2/ 10], step: [   93/  390], loss: [0.6724], avg loss: [0.6623], time: [101.1353ms]\n",
+      "Epoch: [  2/ 10], step: [   94/  390], loss: [0.6556], avg loss: [0.6622], time: [100.1878ms]\n",
+      "Epoch: [  2/ 10], step: [   95/  390], loss: [0.6690], avg loss: [0.6623], time: [104.3959ms]\n",
+      "Epoch: [  2/ 10], step: [   96/  390], loss: [0.6389], avg loss: [0.6620], time: [102.6981ms]\n",
+      "Epoch: [  2/ 10], step: [   97/  390], loss: [0.6665], avg loss: [0.6621], time: [102.3762ms]\n",
+      "Epoch: [  2/ 10], step: [   98/  390], loss: [0.6657], avg loss: [0.6621], time: [98.5200ms]\n",
+      "Epoch: [  2/ 10], step: [   99/  390], loss: [0.6476], avg loss: [0.6620], time: [104.3057ms]\n",
+      "Epoch: [  2/ 10], step: [  100/  390], loss: [0.6320], avg loss: [0.6617], time: [100.6835ms]\n",
+      "Epoch: [  2/ 10], step: [  101/  390], loss: [0.6269], avg loss: [0.6613], time: [102.8285ms]\n",
+      "Epoch: [  2/ 10], step: [  102/  390], loss: [0.6891], avg loss: [0.6616], time: [100.7109ms]\n",
+      "Epoch: [  2/ 10], step: [  103/  390], loss: [0.6737], avg loss: [0.6617], time: [107.4386ms]\n",
+      "Epoch: [  2/ 10], step: [  104/  390], loss: [0.6194], avg loss: [0.6613], time: [99.6273ms]\n",
+      "Epoch: [  2/ 10], step: [  105/  390], loss: [0.6310], avg loss: [0.6610], time: [104.0485ms]\n",
+      "Epoch: [  2/ 10], step: [  106/  390], loss: [0.6765], avg loss: [0.6612], time: [99.9987ms]\n",
+      "Epoch: [  2/ 10], step: [  107/  390], loss: [0.5332], avg loss: [0.6600], time: [100.6081ms]\n",
+      "Epoch: [  2/ 10], step: [  108/  390], loss: [0.6403], avg loss: [0.6598], time: [97.5361ms]\n",
+      "Epoch: [  2/ 10], step: [  109/  390], loss: [0.6084], avg loss: [0.6593], time: [102.1194ms]\n",
+      "Epoch: [  2/ 10], step: [  110/  390], loss: [0.6587], avg loss: [0.6593], time: [100.1413ms]\n",
+      "Epoch: [  2/ 10], step: [  111/  390], loss: [0.5721], avg loss: [0.6585], time: [100.9169ms]\n",
+      "Epoch: [  2/ 10], step: [  112/  390], loss: [0.6253], avg loss: [0.6582], time: [101.2611ms]\n",
+      "Epoch: [  2/ 10], step: [  113/  390], loss: [0.5386], avg loss: [0.6572], time: [105.2873ms]\n",
+      "Epoch: [  2/ 10], step: [  114/  390], loss: [0.6135], avg loss: [0.6568], time: [103.6065ms]\n",
+      "Epoch: [  2/ 10], step: [  115/  390], loss: [0.4770], avg loss: [0.6552], time: [104.1057ms]\n",
+      "Epoch: [  2/ 10], step: [  116/  390], loss: [0.5140], avg loss: [0.6540], time: [100.6660ms]\n",
+      "Epoch: [  2/ 10], step: [  117/  390], loss: [0.7868], avg loss: [0.6552], time: [102.5906ms]\n",
+      "Epoch: [  2/ 10], step: [  118/  390], loss: [0.6497], avg loss: [0.6551], time: [99.2391ms]\n",
+      "Epoch: [  2/ 10], step: [  119/  390], loss: [0.6640], avg loss: [0.6552], time: [104.5156ms]\n",
+      "Epoch: [  2/ 10], step: [  120/  390], loss: [0.7578], avg loss: [0.6560], time: [99.5181ms]\n",
+      "Epoch: [  2/ 10], step: [  121/  390], loss: [0.6687], avg loss: [0.6561], time: [101.7456ms]\n",
+      "Epoch: [  2/ 10], step: [  122/  390], loss: [0.5661], avg loss: [0.6554], time: [99.5414ms]\n",
+      "Epoch: [  2/ 10], step: [  123/  390], loss: [0.5133], avg loss: [0.6542], time: [105.3555ms]\n",
+      "Epoch: [  2/ 10], step: [  124/  390], loss: [0.6696], avg loss: [0.6544], time: [100.5726ms]\n",
+      "Epoch: [  2/ 10], step: [  125/  390], loss: [0.5755], avg loss: [0.6537], time: [104.1057ms]\n",
+      "Epoch: [  2/ 10], step: [  126/  390], loss: [0.6681], avg loss: [0.6539], time: [105.4931ms]\n",
+      "Epoch: [  2/ 10], step: [  127/  390], loss: [0.6086], avg loss: [0.6535], time: [104.8403ms]\n",
+      "Epoch: [  2/ 10], step: [  128/  390], loss: [0.6800], avg loss: [0.6537], time: [99.4642ms]\n",
+      "Epoch: [  2/ 10], step: [  129/  390], loss: [0.6341], avg loss: [0.6536], time: [102.6721ms]\n",
+      "Epoch: [  2/ 10], step: [  130/  390], loss: [0.5987], avg loss: [0.6531], time: [100.3625ms]\n",
+      "Epoch: [  2/ 10], step: [  131/  390], loss: [0.7033], avg loss: [0.6535], time: [103.8911ms]\n",
+      "Epoch: [  2/ 10], step: [  132/  390], loss: [0.6140], avg loss: [0.6532], time: [103.9319ms]\n",
+      "Epoch: [  2/ 10], step: [  133/  390], loss: [0.6079], avg loss: [0.6529], time: [106.1463ms]\n",
+      "Epoch: [  2/ 10], step: [  134/  390], loss: [0.7079], avg loss: [0.6533], time: [99.8135ms]\n",
+      "Epoch: [  2/ 10], step: [  135/  390], loss: [0.5892], avg loss: [0.6528], time: [106.6921ms]\n",
+      "Epoch: [  2/ 10], step: [  136/  390], loss: [0.6120], avg loss: [0.6525], time: [104.1679ms]\n",
+      "Epoch: [  2/ 10], step: [  137/  390], loss: [0.5910], avg loss: [0.6521], time: [103.2820ms]\n",
+      "Epoch: [  2/ 10], step: [  138/  390], loss: [0.6155], avg loss: [0.6518], time: [99.8282ms]\n",
+      "Epoch: [  2/ 10], step: [  139/  390], loss: [0.5877], avg loss: [0.6513], time: [101.1064ms]\n",
+      "Epoch: [  2/ 10], step: [  140/  390], loss: [0.6593], avg loss: [0.6514], time: [99.6432ms]\n",
+      "Epoch: [  2/ 10], step: [  141/  390], loss: [0.6068], avg loss: [0.6511], time: [103.2786ms]\n",
+      "Epoch: [  2/ 10], step: [  142/  390], loss: [0.5731], avg loss: [0.6505], time: [101.6634ms]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: [  2/ 10], step: [  143/  390], loss: [0.5716], avg loss: [0.6500], time: [102.3705ms]\n",
+      "Epoch: [  2/ 10], step: [  144/  390], loss: [0.6271], avg loss: [0.6498], time: [98.9797ms]\n",
+      "Epoch: [  2/ 10], step: [  145/  390], loss: [0.5050], avg loss: [0.6488], time: [101.5060ms]\n",
+      "Epoch: [  2/ 10], step: [  146/  390], loss: [0.5590], avg loss: [0.6482], time: [98.9730ms]\n",
+      "Epoch: [  2/ 10], step: [  147/  390], loss: [0.6321], avg loss: [0.6481], time: [102.5190ms]\n",
+      "Epoch: [  2/ 10], step: [  148/  390], loss: [0.6130], avg loss: [0.6479], time: [100.9171ms]\n",
+      "Epoch: [  2/ 10], step: [  149/  390], loss: [0.5702], avg loss: [0.6473], time: [105.2413ms]\n",
+      "Epoch: [  2/ 10], step: [  150/  390], loss: [0.5732], avg loss: [0.6468], time: [99.3552ms]\n",
+      "Epoch: [  2/ 10], step: [  151/  390], loss: [0.5903], avg loss: [0.6465], time: [100.8067ms]\n",
+      "Epoch: [  2/ 10], step: [  152/  390], loss: [0.5511], avg loss: [0.6458], time: [100.9417ms]\n",
+      "Epoch: [  2/ 10], step: [  153/  390], loss: [0.6821], avg loss: [0.6461], time: [102.9837ms]\n",
+      "Epoch: [  2/ 10], step: [  154/  390], loss: [0.4778], avg loss: [0.6450], time: [101.2075ms]\n",
+      "Epoch: [  2/ 10], step: [  155/  390], loss: [0.6927], avg loss: [0.6453], time: [102.1085ms]\n",
+      "Epoch: [  2/ 10], step: [  156/  390], loss: [0.5322], avg loss: [0.6446], time: [101.6080ms]\n",
+      "Epoch: [  2/ 10], step: [  157/  390], loss: [0.4992], avg loss: [0.6436], time: [104.5997ms]\n",
+      "Epoch: [  2/ 10], step: [  158/  390], loss: [0.5179], avg loss: [0.6428], time: [102.2282ms]\n",
+      "Epoch: [  2/ 10], step: [  159/  390], loss: [0.7331], avg loss: [0.6434], time: [106.7560ms]\n",
+      "Epoch: [  2/ 10], step: [  160/  390], loss: [0.6702], avg loss: [0.6436], time: [102.6788ms]\n",
+      "Epoch: [  2/ 10], step: [  161/  390], loss: [0.5674], avg loss: [0.6431], time: [104.6824ms]\n",
+      "Epoch: [  2/ 10], step: [  162/  390], loss: [0.6555], avg loss: [0.6432], time: [102.5145ms]\n",
+      "Epoch: [  2/ 10], step: [  163/  390], loss: [0.6740], avg loss: [0.6434], time: [104.1384ms]\n",
+      "Epoch: [  2/ 10], step: [  164/  390], loss: [0.6001], avg loss: [0.6431], time: [101.2747ms]\n",
+      "Epoch: [  2/ 10], step: [  165/  390], loss: [0.6950], avg loss: [0.6434], time: [101.4607ms]\n",
+      "Epoch: [  2/ 10], step: [  166/  390], loss: [0.6409], avg loss: [0.6434], time: [98.8727ms]\n",
+      "Epoch: [  2/ 10], step: [  167/  390], loss: [0.5637], avg loss: [0.6429], time: [101.1057ms]\n",
+      "Epoch: [  2/ 10], step: [  168/  390], loss: [0.5931], avg loss: [0.6426], time: [100.9135ms]\n",
+      "Epoch: [  2/ 10], step: [  169/  390], loss: [0.5834], avg loss: [0.6423], time: [101.4235ms]\n",
+      "Epoch: [  2/ 10], step: [  170/  390], loss: [0.6347], avg loss: [0.6422], time: [100.3559ms]\n",
+      "Epoch: [  2/ 10], step: [  171/  390], loss: [0.5378], avg loss: [0.6416], time: [101.4481ms]\n",
+      "Epoch: [  2/ 10], step: [  172/  390], loss: [0.5672], avg loss: [0.6412], time: [102.9112ms]\n",
+      "Epoch: [  2/ 10], step: [  173/  390], loss: [0.5801], avg loss: [0.6408], time: [105.4759ms]\n",
+      "Epoch: [  2/ 10], step: [  174/  390], loss: [0.4901], avg loss: [0.6400], time: [100.9290ms]\n",
+      "Epoch: [  2/ 10], step: [  175/  390], loss: [0.6125], avg loss: [0.6398], time: [101.1415ms]\n",
+      "Epoch: [  2/ 10], step: [  176/  390], loss: [0.5406], avg loss: [0.6393], time: [99.3505ms]\n",
+      "Epoch: [  2/ 10], step: [  177/  390], loss: [0.5562], avg loss: [0.6388], time: [104.1486ms]\n",
+      "Epoch: [  2/ 10], step: [  178/  390], loss: [0.5569], avg loss: [0.6383], time: [97.9052ms]\n",
+      "Epoch: [  2/ 10], step: [  179/  390], loss: [0.3951], avg loss: [0.6370], time: [104.2259ms]\n",
+      "Epoch: [  2/ 10], step: [  180/  390], loss: [0.5006], avg loss: [0.6362], time: [103.7629ms]\n",
+      "Epoch: [  2/ 10], step: [  181/  390], loss: [0.5864], avg loss: [0.6359], time: [105.7148ms]\n",
+      "Epoch: [  2/ 10], step: [  182/  390], loss: [0.4957], avg loss: [0.6352], time: [101.3072ms]\n",
+      "Epoch: [  2/ 10], step: [  183/  390], loss: [0.6649], avg loss: [0.6353], time: [104.5079ms]\n",
+      "Epoch: [  2/ 10], step: [  184/  390], loss: [0.6399], avg loss: [0.6354], time: [99.3087ms]\n",
+      "Epoch: [  2/ 10], step: [  185/  390], loss: [0.5149], avg loss: [0.6347], time: [103.5523ms]\n",
+      "Epoch: [  2/ 10], step: [  186/  390], loss: [0.4174], avg loss: [0.6335], time: [99.2980ms]\n",
+      "Epoch: [  2/ 10], step: [  187/  390], loss: [0.7648], avg loss: [0.6342], time: [102.2778ms]\n",
+      "Epoch: [  2/ 10], step: [  188/  390], loss: [0.5523], avg loss: [0.6338], time: [97.9698ms]\n",
+      "Epoch: [  2/ 10], step: [  189/  390], loss: [0.5934], avg loss: [0.6336], time: [105.1064ms]\n",
+      "Epoch: [  2/ 10], step: [  190/  390], loss: [0.6805], avg loss: [0.6338], time: [99.9675ms]\n",
+      "Epoch: [  2/ 10], step: [  191/  390], loss: [0.6683], avg loss: [0.6340], time: [103.8826ms]\n",
+      "Epoch: [  2/ 10], step: [  192/  390], loss: [0.6629], avg loss: [0.6342], time: [100.7442ms]\n",
+      "Epoch: [  2/ 10], step: [  193/  390], loss: [0.6230], avg loss: [0.6341], time: [105.6695ms]\n",
+      "Epoch: [  2/ 10], step: [  194/  390], loss: [0.6168], avg loss: [0.6340], time: [99.7014ms]\n",
+      "Epoch: [  2/ 10], step: [  195/  390], loss: [0.6821], avg loss: [0.6343], time: [102.6065ms]\n",
+      "Epoch: [  2/ 10], step: [  196/  390], loss: [0.7211], avg loss: [0.6347], time: [104.3158ms]\n",
+      "Epoch: [  2/ 10], step: [  197/  390], loss: [0.6533], avg loss: [0.6348], time: [105.0341ms]\n",
+      "Epoch: [  2/ 10], step: [  198/  390], loss: [0.6404], avg loss: [0.6348], time: [99.8521ms]\n",
+      "Epoch: [  2/ 10], step: [  199/  390], loss: [0.6608], avg loss: [0.6350], time: [104.3358ms]\n",
+      "Epoch: [  2/ 10], step: [  200/  390], loss: [0.6375], avg loss: [0.6350], time: [99.2339ms]\n",
+      "Epoch: [  2/ 10], step: [  201/  390], loss: [0.6338], avg loss: [0.6350], time: [104.8193ms]\n",
+      "Epoch: [  2/ 10], step: [  202/  390], loss: [0.6354], avg loss: [0.6350], time: [105.6454ms]\n",
+      "Epoch: [  2/ 10], step: [  203/  390], loss: [0.6465], avg loss: [0.6350], time: [101.4972ms]\n",
+      "Epoch: [  2/ 10], step: [  204/  390], loss: [0.6536], avg loss: [0.6351], time: [102.9661ms]\n",
+      "Epoch: [  2/ 10], step: [  205/  390], loss: [0.5844], avg loss: [0.6349], time: [106.3495ms]\n",
+      "Epoch: [  2/ 10], step: [  206/  390], loss: [0.6177], avg loss: [0.6348], time: [101.8763ms]\n",
+      "Epoch: [  2/ 10], step: [  207/  390], loss: [0.5648], avg loss: [0.6344], time: [101.6600ms]\n",
+      "Epoch: [  2/ 10], step: [  208/  390], loss: [0.6025], avg loss: [0.6343], time: [101.8512ms]\n",
+      "Epoch: [  2/ 10], step: [  209/  390], loss: [0.6338], avg loss: [0.6343], time: [102.0007ms]\n",
+      "Epoch: [  2/ 10], step: [  210/  390], loss: [0.6129], avg loss: [0.6342], time: [100.6916ms]\n",
+      "Epoch: [  2/ 10], step: [  211/  390], loss: [0.5973], avg loss: [0.6340], time: [103.9431ms]\n",
+      "Epoch: [  2/ 10], step: [  212/  390], loss: [0.5701], avg loss: [0.6337], time: [101.2611ms]\n",
+      "Epoch: [  2/ 10], step: [  213/  390], loss: [0.6290], avg loss: [0.6337], time: [105.3917ms]\n",
+      "Epoch: [  2/ 10], step: [  214/  390], loss: [0.6365], avg loss: [0.6337], time: [99.4079ms]\n",
+      "Epoch: [  2/ 10], step: [  215/  390], loss: [0.5804], avg loss: [0.6335], time: [106.0677ms]\n",
+      "Epoch: [  2/ 10], step: [  216/  390], loss: [0.5661], avg loss: [0.6331], time: [100.2049ms]\n",
+      "Epoch: [  2/ 10], step: [  217/  390], loss: [0.5607], avg loss: [0.6328], time: [102.9911ms]\n",
+      "Epoch: [  2/ 10], step: [  218/  390], loss: [0.5945], avg loss: [0.6326], time: [102.4287ms]\n",
+      "Epoch: [  2/ 10], step: [  219/  390], loss: [0.5714], avg loss: [0.6324], time: [100.8162ms]\n",
+      "Epoch: [  2/ 10], step: [  220/  390], loss: [0.5354], avg loss: [0.6319], time: [101.4338ms]\n",
+      "Epoch: [  2/ 10], step: [  221/  390], loss: [0.5116], avg loss: [0.6314], time: [107.0015ms]\n",
+      "Epoch: [  2/ 10], step: [  222/  390], loss: [0.6198], avg loss: [0.6313], time: [99.7632ms]\n",
+      "Epoch: [  2/ 10], step: [  223/  390], loss: [0.6505], avg loss: [0.6314], time: [104.2926ms]\n",
+      "Epoch: [  2/ 10], step: [  224/  390], loss: [0.5248], avg loss: [0.6309], time: [103.3173ms]\n",
+      "Epoch: [  2/ 10], step: [  225/  390], loss: [0.6669], avg loss: [0.6311], time: [104.2821ms]\n",
+      "Epoch: [  2/ 10], step: [  226/  390], loss: [0.5932], avg loss: [0.6309], time: [102.8402ms]\n",
+      "Epoch: [  2/ 10], step: [  227/  390], loss: [0.5155], avg loss: [0.6304], time: [103.1370ms]\n",
+      "Epoch: [  2/ 10], step: [  228/  390], loss: [0.7595], avg loss: [0.6310], time: [101.3536ms]\n",
+      "Epoch: [  2/ 10], step: [  229/  390], loss: [0.5325], avg loss: [0.6305], time: [107.0580ms]\n",
+      "Epoch: [  2/ 10], step: [  230/  390], loss: [0.4261], avg loss: [0.6297], time: [100.3525ms]\n",
+      "Epoch: [  2/ 10], step: [  231/  390], loss: [0.7548], avg loss: [0.6302], time: [101.5737ms]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: [  2/ 10], step: [  232/  390], loss: [0.5745], avg loss: [0.6300], time: [99.3531ms]\n",
+      "Epoch: [  2/ 10], step: [  233/  390], loss: [0.5614], avg loss: [0.6297], time: [107.9669ms]\n",
+      "Epoch: [  2/ 10], step: [  234/  390], loss: [0.5357], avg loss: [0.6293], time: [101.3925ms]\n",
+      "Epoch: [  2/ 10], step: [  235/  390], loss: [0.5186], avg loss: [0.6288], time: [102.6874ms]\n",
+      "Epoch: [  2/ 10], step: [  236/  390], loss: [0.6700], avg loss: [0.6290], time: [103.9753ms]\n",
+      "Epoch: [  2/ 10], step: [  237/  390], loss: [0.5584], avg loss: [0.6287], time: [101.3067ms]\n",
+      "Epoch: [  2/ 10], step: [  238/  390], loss: [0.5589], avg loss: [0.6284], time: [103.8983ms]\n",
+      "Epoch: [  2/ 10], step: [  239/  390], loss: [0.5363], avg loss: [0.6280], time: [105.3576ms]\n",
+      "Epoch: [  2/ 10], step: [  240/  390], loss: [0.5776], avg loss: [0.6278], time: [101.3453ms]\n",
+      "Epoch: [  2/ 10], step: [  241/  390], loss: [0.7283], avg loss: [0.6282], time: [104.6221ms]\n",
+      "Epoch: [  2/ 10], step: [  242/  390], loss: [0.5002], avg loss: [0.6277], time: [98.7802ms]\n",
+      "Epoch: [  2/ 10], step: [  243/  390], loss: [0.5267], avg loss: [0.6273], time: [107.5363ms]\n",
+      "Epoch: [  2/ 10], step: [  244/  390], loss: [0.7191], avg loss: [0.6276], time: [103.0767ms]\n",
+      "Epoch: [  2/ 10], step: [  245/  390], loss: [0.5527], avg loss: [0.6273], time: [106.7441ms]\n",
+      "Epoch: [  2/ 10], step: [  246/  390], loss: [0.6456], avg loss: [0.6274], time: [102.1268ms]\n",
+      "Epoch: [  2/ 10], step: [  247/  390], loss: [0.4888], avg loss: [0.6268], time: [105.3157ms]\n",
+      "Epoch: [  2/ 10], step: [  248/  390], loss: [0.5648], avg loss: [0.6266], time: [97.7314ms]\n",
+      "Epoch: [  2/ 10], step: [  249/  390], loss: [0.5652], avg loss: [0.6263], time: [103.8227ms]\n",
+      "Epoch: [  2/ 10], step: [  250/  390], loss: [0.5415], avg loss: [0.6260], time: [101.5322ms]\n",
+      "Epoch: [  2/ 10], step: [  251/  390], loss: [0.5158], avg loss: [0.6256], time: [104.6233ms]\n",
+      "Epoch: [  2/ 10], step: [  252/  390], loss: [0.6121], avg loss: [0.6255], time: [100.1446ms]\n",
+      "Epoch: [  2/ 10], step: [  253/  390], loss: [0.4672], avg loss: [0.6249], time: [102.5453ms]\n",
+      "Epoch: [  2/ 10], step: [  254/  390], loss: [0.5177], avg loss: [0.6245], time: [100.5530ms]\n",
+      "Epoch: [  2/ 10], step: [  255/  390], loss: [0.5891], avg loss: [0.6243], time: [103.0157ms]\n",
+      "Epoch: [  2/ 10], step: [  256/  390], loss: [0.5838], avg loss: [0.6242], time: [101.9323ms]\n",
+      "Epoch: [  2/ 10], step: [  257/  390], loss: [0.5129], avg loss: [0.6237], time: [103.1694ms]\n",
+      "Epoch: [  2/ 10], step: [  258/  390], loss: [0.4615], avg loss: [0.6231], time: [100.1391ms]\n",
+      "Epoch: [  2/ 10], step: [  259/  390], loss: [0.4765], avg loss: [0.6225], time: [104.8985ms]\n",
+      "Epoch: [  2/ 10], step: [  260/  390], loss: [0.5161], avg loss: [0.6221], time: [99.1123ms]\n",
+      "Epoch: [  2/ 10], step: [  261/  390], loss: [0.5247], avg loss: [0.6218], time: [104.0046ms]\n",
+      "Epoch: [  2/ 10], step: [  262/  390], loss: [0.4824], avg loss: [0.6212], time: [103.3409ms]\n",
+      "Epoch: [  2/ 10], step: [  263/  390], loss: [0.4950], avg loss: [0.6207], time: [102.7055ms]\n",
+      "Epoch: [  2/ 10], step: [  264/  390], loss: [0.4001], avg loss: [0.6199], time: [105.7043ms]\n",
+      "Epoch: [  2/ 10], step: [  265/  390], loss: [0.3896], avg loss: [0.6190], time: [102.8132ms]\n",
+      "Epoch: [  2/ 10], step: [  266/  390], loss: [0.5145], avg loss: [0.6186], time: [100.1244ms]\n",
+      "Epoch: [  2/ 10], step: [  267/  390], loss: [0.4265], avg loss: [0.6179], time: [105.5131ms]\n",
+      "Epoch: [  2/ 10], step: [  268/  390], loss: [0.3818], avg loss: [0.6170], time: [96.6427ms]\n",
+      "Epoch: [  2/ 10], step: [  269/  390], loss: [0.2814], avg loss: [0.6158], time: [106.8134ms]\n",
+      "Epoch: [  2/ 10], step: [  270/  390], loss: [0.5369], avg loss: [0.6155], time: [101.3644ms]\n",
+      "Epoch: [  2/ 10], step: [  271/  390], loss: [0.3595], avg loss: [0.6146], time: [103.8542ms]\n",
+      "Epoch: [  2/ 10], step: [  272/  390], loss: [0.4517], avg loss: [0.6140], time: [105.8245ms]\n",
+      "Epoch: [  2/ 10], step: [  273/  390], loss: [0.7099], avg loss: [0.6143], time: [104.4259ms]\n",
+      "Epoch: [  2/ 10], step: [  274/  390], loss: [0.4052], avg loss: [0.6135], time: [99.7348ms]\n",
+      "Epoch: [  2/ 10], step: [  275/  390], loss: [0.4128], avg loss: [0.6128], time: [105.0575ms]\n",
+      "Epoch: [  2/ 10], step: [  276/  390], loss: [0.7017], avg loss: [0.6131], time: [101.4626ms]\n",
+      "Epoch: [  2/ 10], step: [  277/  390], loss: [0.4718], avg loss: [0.6126], time: [104.1567ms]\n",
+      "Epoch: [  2/ 10], step: [  278/  390], loss: [0.4687], avg loss: [0.6121], time: [101.0737ms]\n",
+      "Epoch: [  2/ 10], step: [  279/  390], loss: [0.4270], avg loss: [0.6114], time: [102.0095ms]\n",
+      "Epoch: [  2/ 10], step: [  280/  390], loss: [0.4992], avg loss: [0.6110], time: [102.6990ms]\n",
+      "Epoch: [  2/ 10], step: [  281/  390], loss: [0.4861], avg loss: [0.6106], time: [105.7308ms]\n",
+      "Epoch: [  2/ 10], step: [  282/  390], loss: [0.5556], avg loss: [0.6104], time: [100.7905ms]\n",
+      "Epoch: [  2/ 10], step: [  283/  390], loss: [0.5015], avg loss: [0.6100], time: [103.3075ms]\n",
+      "Epoch: [  2/ 10], step: [  284/  390], loss: [0.5049], avg loss: [0.6097], time: [101.5048ms]\n",
+      "Epoch: [  2/ 10], step: [  285/  390], loss: [0.5007], avg loss: [0.6093], time: [99.9162ms]\n",
+      "Epoch: [  2/ 10], step: [  286/  390], loss: [0.5154], avg loss: [0.6089], time: [103.2648ms]\n",
+      "Epoch: [  2/ 10], step: [  287/  390], loss: [0.5927], avg loss: [0.6089], time: [107.6899ms]\n",
+      "Epoch: [  2/ 10], step: [  288/  390], loss: [0.5553], avg loss: [0.6087], time: [98.1123ms]\n",
+      "Epoch: [  2/ 10], step: [  289/  390], loss: [0.5091], avg loss: [0.6084], time: [106.3700ms]\n",
+      "Epoch: [  2/ 10], step: [  290/  390], loss: [0.4555], avg loss: [0.6078], time: [101.5131ms]\n",
+      "Epoch: [  2/ 10], step: [  291/  390], loss: [0.4482], avg loss: [0.6073], time: [103.6847ms]\n",
+      "Epoch: [  2/ 10], step: [  292/  390], loss: [0.4880], avg loss: [0.6069], time: [97.4264ms]\n",
+      "Epoch: [  2/ 10], step: [  293/  390], loss: [0.4739], avg loss: [0.6064], time: [102.1864ms]\n",
+      "Epoch: [  2/ 10], step: [  294/  390], loss: [0.4351], avg loss: [0.6058], time: [100.0397ms]\n",
+      "Epoch: [  2/ 10], step: [  295/  390], loss: [0.5434], avg loss: [0.6056], time: [107.4185ms]\n",
+      "Epoch: [  2/ 10], step: [  296/  390], loss: [0.4808], avg loss: [0.6052], time: [100.1728ms]\n",
+      "Epoch: [  2/ 10], step: [  297/  390], loss: [0.5042], avg loss: [0.6049], time: [104.5783ms]\n",
+      "Epoch: [  2/ 10], step: [  298/  390], loss: [0.4165], avg loss: [0.6042], time: [102.1380ms]\n",
+      "Epoch: [  2/ 10], step: [  299/  390], loss: [0.3246], avg loss: [0.6033], time: [106.3824ms]\n",
+      "Epoch: [  2/ 10], step: [  300/  390], loss: [0.4363], avg loss: [0.6027], time: [99.7972ms]\n",
+      "Epoch: [  2/ 10], step: [  301/  390], loss: [0.4205], avg loss: [0.6021], time: [101.7594ms]\n",
+      "Epoch: [  2/ 10], step: [  302/  390], loss: [0.4846], avg loss: [0.6017], time: [98.7082ms]\n",
+      "Epoch: [  2/ 10], step: [  303/  390], loss: [0.3752], avg loss: [0.6010], time: [102.6835ms]\n",
+      "Epoch: [  2/ 10], step: [  304/  390], loss: [0.5174], avg loss: [0.6007], time: [99.9498ms]\n",
+      "Epoch: [  2/ 10], step: [  305/  390], loss: [0.4815], avg loss: [0.6003], time: [105.4149ms]\n",
+      "Epoch: [  2/ 10], step: [  306/  390], loss: [0.5788], avg loss: [0.6003], time: [100.3840ms]\n",
+      "Epoch: [  2/ 10], step: [  307/  390], loss: [0.3501], avg loss: [0.5994], time: [102.5753ms]\n",
+      "Epoch: [  2/ 10], step: [  308/  390], loss: [0.5348], avg loss: [0.5992], time: [100.6083ms]\n",
+      "Epoch: [  2/ 10], step: [  309/  390], loss: [0.4691], avg loss: [0.5988], time: [107.1885ms]\n",
+      "Epoch: [  2/ 10], step: [  310/  390], loss: [0.5035], avg loss: [0.5985], time: [103.5054ms]\n",
+      "Epoch: [  2/ 10], step: [  311/  390], loss: [0.5681], avg loss: [0.5984], time: [107.4185ms]\n",
+      "Epoch: [  2/ 10], step: [  312/  390], loss: [0.5657], avg loss: [0.5983], time: [101.8474ms]\n",
+      "Epoch: [  2/ 10], step: [  313/  390], loss: [0.4784], avg loss: [0.5979], time: [104.7087ms]\n",
+      "Epoch: [  2/ 10], step: [  314/  390], loss: [0.5547], avg loss: [0.5978], time: [103.3826ms]\n",
+      "Epoch: [  2/ 10], step: [  315/  390], loss: [0.5812], avg loss: [0.5977], time: [102.4528ms]\n",
+      "Epoch: [  2/ 10], step: [  316/  390], loss: [0.4795], avg loss: [0.5974], time: [100.2314ms]\n",
+      "Epoch: [  2/ 10], step: [  317/  390], loss: [0.5181], avg loss: [0.5971], time: [105.5236ms]\n",
+      "Epoch: [  2/ 10], step: [  318/  390], loss: [0.4481], avg loss: [0.5966], time: [99.0460ms]\n",
+      "Epoch: [  2/ 10], step: [  319/  390], loss: [0.3989], avg loss: [0.5960], time: [101.9380ms]\n",
+      "Epoch: [  2/ 10], step: [  320/  390], loss: [0.4208], avg loss: [0.5955], time: [99.8416ms]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: [  2/ 10], step: [  321/  390], loss: [0.3705], avg loss: [0.5948], time: [105.0887ms]\n",
+      "Epoch: [  2/ 10], step: [  322/  390], loss: [0.4149], avg loss: [0.5942], time: [101.4295ms]\n",
+      "Epoch: [  2/ 10], step: [  323/  390], loss: [0.4527], avg loss: [0.5938], time: [102.3750ms]\n",
+      "Epoch: [  2/ 10], step: [  324/  390], loss: [0.3693], avg loss: [0.5931], time: [100.5075ms]\n",
+      "Epoch: [  2/ 10], step: [  325/  390], loss: [0.4761], avg loss: [0.5927], time: [100.6401ms]\n",
+      "Epoch: [  2/ 10], step: [  326/  390], loss: [0.3317], avg loss: [0.5919], time: [102.6881ms]\n",
+      "Epoch: [  2/ 10], step: [  327/  390], loss: [0.5316], avg loss: [0.5917], time: [103.8661ms]\n",
+      "Epoch: [  2/ 10], step: [  328/  390], loss: [0.4163], avg loss: [0.5912], time: [99.2351ms]\n",
+      "Epoch: [  2/ 10], step: [  329/  390], loss: [0.3904], avg loss: [0.5906], time: [100.8196ms]\n",
+      "Epoch: [  2/ 10], step: [  330/  390], loss: [0.6191], avg loss: [0.5907], time: [99.3040ms]\n",
+      "Epoch: [  2/ 10], step: [  331/  390], loss: [0.3622], avg loss: [0.5900], time: [107.1546ms]\n",
+      "Epoch: [  2/ 10], step: [  332/  390], loss: [0.4183], avg loss: [0.5895], time: [99.9868ms]\n",
+      "Epoch: [  2/ 10], step: [  333/  390], loss: [0.5975], avg loss: [0.5895], time: [100.5914ms]\n",
+      "Epoch: [  2/ 10], step: [  334/  390], loss: [0.3783], avg loss: [0.5889], time: [102.3099ms]\n",
+      "Epoch: [  2/ 10], step: [  335/  390], loss: [0.4401], avg loss: [0.5884], time: [104.2039ms]\n",
+      "Epoch: [  2/ 10], step: [  336/  390], loss: [0.3810], avg loss: [0.5878], time: [99.0837ms]\n",
+      "Epoch: [  2/ 10], step: [  337/  390], loss: [0.3814], avg loss: [0.5872], time: [108.1743ms]\n",
+      "Epoch: [  2/ 10], step: [  338/  390], loss: [0.4297], avg loss: [0.5867], time: [102.0694ms]\n",
+      "Epoch: [  2/ 10], step: [  339/  390], loss: [0.2906], avg loss: [0.5858], time: [102.8275ms]\n",
+      "Epoch: [  2/ 10], step: [  340/  390], loss: [0.3323], avg loss: [0.5851], time: [99.8542ms]\n",
+      "Epoch: [  2/ 10], step: [  341/  390], loss: [0.4465], avg loss: [0.5847], time: [106.7357ms]\n",
+      "Epoch: [  2/ 10], step: [  342/  390], loss: [0.4510], avg loss: [0.5843], time: [101.4016ms]\n",
+      "Epoch: [  2/ 10], step: [  343/  390], loss: [0.4552], avg loss: [0.5839], time: [103.0991ms]\n",
+      "Epoch: [  2/ 10], step: [  344/  390], loss: [0.3955], avg loss: [0.5834], time: [97.3434ms]\n",
+      "Epoch: [  2/ 10], step: [  345/  390], loss: [0.3395], avg loss: [0.5827], time: [105.7241ms]\n",
+      "Epoch: [  2/ 10], step: [  346/  390], loss: [0.5065], avg loss: [0.5825], time: [69.6723ms]\n",
+      "Epoch: [  2/ 10], step: [  347/  390], loss: [0.4705], avg loss: [0.5821], time: [106.5094ms]\n",
+      "Epoch: [  2/ 10], step: [  348/  390], loss: [0.4732], avg loss: [0.5818], time: [101.4545ms]\n",
+      "Epoch: [  2/ 10], step: [  349/  390], loss: [0.3764], avg loss: [0.5812], time: [105.5155ms]\n",
+      "Epoch: [  2/ 10], step: [  350/  390], loss: [0.3716], avg loss: [0.5806], time: [105.5231ms]\n",
+      "Epoch: [  2/ 10], step: [  351/  390], loss: [0.4724], avg loss: [0.5803], time: [104.2888ms]\n",
+      "Epoch: [  2/ 10], step: [  352/  390], loss: [0.3549], avg loss: [0.5797], time: [102.2933ms]\n",
+      "Epoch: [  2/ 10], step: [  353/  390], loss: [0.4010], avg loss: [0.5792], time: [103.0416ms]\n",
+      "Epoch: [  2/ 10], step: [  354/  390], loss: [0.4539], avg loss: [0.5788], time: [98.9168ms]\n",
+      "Epoch: [  2/ 10], step: [  355/  390], loss: [0.5552], avg loss: [0.5788], time: [103.7295ms]\n",
+      "Epoch: [  2/ 10], step: [  356/  390], loss: [0.3861], avg loss: [0.5782], time: [101.1715ms]\n",
+      "Epoch: [  2/ 10], step: [  357/  390], loss: [0.4465], avg loss: [0.5778], time: [104.1949ms]\n",
+      "Epoch: [  2/ 10], step: [  358/  390], loss: [0.3775], avg loss: [0.5773], time: [97.9333ms]\n",
+      "Epoch: [  2/ 10], step: [  359/  390], loss: [0.5041], avg loss: [0.5771], time: [107.3239ms]\n",
+      "Epoch: [  2/ 10], step: [  360/  390], loss: [0.4034], avg loss: [0.5766], time: [101.4485ms]\n",
+      "Epoch: [  2/ 10], step: [  361/  390], loss: [0.3989], avg loss: [0.5761], time: [102.8931ms]\n",
+      "Epoch: [  2/ 10], step: [  362/  390], loss: [0.4578], avg loss: [0.5758], time: [101.5456ms]\n",
+      "Epoch: [  2/ 10], step: [  363/  390], loss: [0.4256], avg loss: [0.5754], time: [101.1083ms]\n",
+      "Epoch: [  2/ 10], step: [  364/  390], loss: [0.4483], avg loss: [0.5750], time: [101.5882ms]\n",
+      "Epoch: [  2/ 10], step: [  365/  390], loss: [0.5041], avg loss: [0.5748], time: [106.0119ms]\n",
+      "Epoch: [  2/ 10], step: [  366/  390], loss: [0.4134], avg loss: [0.5744], time: [103.4594ms]\n",
+      "Epoch: [  2/ 10], step: [  367/  390], loss: [0.5226], avg loss: [0.5742], time: [104.2194ms]\n",
+      "Epoch: [  2/ 10], step: [  368/  390], loss: [0.3384], avg loss: [0.5736], time: [99.5936ms]\n",
+      "Epoch: [  2/ 10], step: [  369/  390], loss: [0.4365], avg loss: [0.5732], time: [105.2415ms]\n",
+      "Epoch: [  2/ 10], step: [  370/  390], loss: [0.3390], avg loss: [0.5726], time: [98.5243ms]\n",
+      "Epoch: [  2/ 10], step: [  371/  390], loss: [0.3794], avg loss: [0.5721], time: [103.2960ms]\n",
+      "Epoch: [  2/ 10], step: [  372/  390], loss: [0.4667], avg loss: [0.5718], time: [100.5466ms]\n",
+      "Epoch: [  2/ 10], step: [  373/  390], loss: [0.2798], avg loss: [0.5710], time: [102.4997ms]\n",
+      "Epoch: [  2/ 10], step: [  374/  390], loss: [0.4289], avg loss: [0.5706], time: [98.2299ms]\n",
+      "Epoch: [  2/ 10], step: [  375/  390], loss: [0.4372], avg loss: [0.5703], time: [104.4865ms]\n",
+      "Epoch: [  2/ 10], step: [  376/  390], loss: [0.3608], avg loss: [0.5697], time: [102.8581ms]\n",
+      "Epoch: [  2/ 10], step: [  377/  390], loss: [0.3193], avg loss: [0.5691], time: [105.5598ms]\n",
+      "Epoch: [  2/ 10], step: [  378/  390], loss: [0.3597], avg loss: [0.5685], time: [100.1291ms]\n",
+      "Epoch: [  2/ 10], step: [  379/  390], loss: [0.4859], avg loss: [0.5683], time: [101.2230ms]\n",
+      "Epoch: [  2/ 10], step: [  380/  390], loss: [0.3780], avg loss: [0.5678], time: [99.0679ms]\n",
+      "Epoch: [  2/ 10], step: [  381/  390], loss: [0.3072], avg loss: [0.5671], time: [103.2448ms]\n",
+      "Epoch: [  2/ 10], step: [  382/  390], loss: [0.4727], avg loss: [0.5668], time: [99.4911ms]\n",
+      "Epoch: [  2/ 10], step: [  383/  390], loss: [0.4112], avg loss: [0.5664], time: [104.1186ms]\n",
+      "Epoch: [  2/ 10], step: [  384/  390], loss: [0.4523], avg loss: [0.5661], time: [103.1640ms]\n",
+      "Epoch: [  2/ 10], step: [  385/  390], loss: [0.3574], avg loss: [0.5656], time: [100.5161ms]\n",
+      "Epoch: [  2/ 10], step: [  386/  390], loss: [0.3551], avg loss: [0.5651], time: [102.0844ms]\n",
+      "Epoch: [  2/ 10], step: [  387/  390], loss: [0.5766], avg loss: [0.5651], time: [105.4604ms]\n",
+      "Epoch: [  2/ 10], step: [  388/  390], loss: [0.5247], avg loss: [0.5650], time: [98.8855ms]\n",
+      "Epoch: [  2/ 10], step: [  389/  390], loss: [0.4281], avg loss: [0.5646], time: [99.1461ms]\n",
+      "Epoch: [  2/ 10], step: [  390/  390], loss: [0.4206], avg loss: [0.5643], time: [917.7203ms]\n",
+      "Epoch time: 41080.828, per step time: 105.335\n",
+      "Epoch time: 41081.172, per step time: 105.336, avg loss: 0.564\n",
+      "************************************************************\n",
+      "Epoch: [  3/ 10], step: [    1/  390], loss: [0.3717], avg loss: [0.3717], time: [101.0344ms]\n",
+      "Epoch: [  3/ 10], step: [    2/  390], loss: [0.4016], avg loss: [0.3867], time: [104.5682ms]\n",
+      "Epoch: [  3/ 10], step: [    3/  390], loss: [0.4964], avg loss: [0.4233], time: [104.6176ms]\n",
+      "Epoch: [  3/ 10], step: [    4/  390], loss: [0.4364], avg loss: [0.4265], time: [103.5764ms]\n",
+      "Epoch: [  3/ 10], step: [    5/  390], loss: [0.4573], avg loss: [0.4327], time: [108.1645ms]\n",
+      "Epoch: [  3/ 10], step: [    6/  390], loss: [0.4915], avg loss: [0.4425], time: [106.4374ms]\n",
+      "Epoch: [  3/ 10], step: [    7/  390], loss: [0.3635], avg loss: [0.4312], time: [107.9173ms]\n",
+      "Epoch: [  3/ 10], step: [    8/  390], loss: [0.4102], avg loss: [0.4286], time: [102.2503ms]\n",
+      "Epoch: [  3/ 10], step: [    9/  390], loss: [0.4057], avg loss: [0.4260], time: [107.1496ms]\n",
+      "Epoch: [  3/ 10], step: [   10/  390], loss: [0.4424], avg loss: [0.4277], time: [102.6711ms]\n",
+      "Epoch: [  3/ 10], step: [   11/  390], loss: [0.4570], avg loss: [0.4303], time: [107.5852ms]\n",
+      "Epoch: [  3/ 10], step: [   12/  390], loss: [0.4399], avg loss: [0.4311], time: [105.2186ms]\n",
+      "Epoch: [  3/ 10], step: [   13/  390], loss: [0.3412], avg loss: [0.4242], time: [104.2027ms]\n",
+      "Epoch: [  3/ 10], step: [   14/  390], loss: [0.4659], avg loss: [0.4272], time: [106.1070ms]\n",
+      "Epoch: [  3/ 10], step: [   15/  390], loss: [0.5166], avg loss: [0.4332], time: [106.4236ms]\n",
+      "Epoch: [  3/ 10], step: [   16/  390], loss: [0.3432], avg loss: [0.4275], time: [101.7649ms]\n",
+      "Epoch: [  3/ 10], step: [   17/  390], loss: [0.2530], avg loss: [0.4173], time: [104.6968ms]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: [  3/ 10], step: [   18/  390], loss: [0.3993], avg loss: [0.4163], time: [104.6028ms]\n",
+      "Epoch: [  3/ 10], step: [   19/  390], loss: [0.4321], avg loss: [0.4171], time: [106.9975ms]\n",
+      "Epoch: [  3/ 10], step: [   20/  390], loss: [0.3459], avg loss: [0.4135], time: [101.6135ms]\n",
+      "Epoch: [  3/ 10], step: [   21/  390], loss: [0.3473], avg loss: [0.4104], time: [106.1561ms]\n",
+      "Epoch: [  3/ 10], step: [   22/  390], loss: [0.4423], avg loss: [0.4118], time: [102.2394ms]\n",
+      "Epoch: [  3/ 10], step: [   23/  390], loss: [0.5265], avg loss: [0.4168], time: [106.6220ms]\n",
+      "Epoch: [  3/ 10], step: [   24/  390], loss: [0.4170], avg loss: [0.4168], time: [105.6414ms]\n",
+      "Epoch: [  3/ 10], step: [   25/  390], loss: [0.4483], avg loss: [0.4181], time: [108.6771ms]\n",
+      "Epoch: [  3/ 10], step: [   26/  390], loss: [0.5304], avg loss: [0.4224], time: [107.1980ms]\n",
+      "Epoch: [  3/ 10], step: [   27/  390], loss: [0.4433], avg loss: [0.4232], time: [105.8927ms]\n",
+      "Epoch: [  3/ 10], step: [   28/  390], loss: [0.4486], avg loss: [0.4241], time: [100.7690ms]\n",
+      "Epoch: [  3/ 10], step: [   29/  390], loss: [0.3785], avg loss: [0.4225], time: [104.8286ms]\n",
+      "Epoch: [  3/ 10], step: [   30/  390], loss: [0.4524], avg loss: [0.4235], time: [100.2972ms]\n",
+      "Epoch: [  3/ 10], step: [   31/  390], loss: [0.4300], avg loss: [0.4237], time: [104.8031ms]\n",
+      "Epoch: [  3/ 10], step: [   32/  390], loss: [0.3490], avg loss: [0.4214], time: [102.2110ms]\n",
+      "Epoch: [  3/ 10], step: [   33/  390], loss: [0.4418], avg loss: [0.4220], time: [104.5287ms]\n",
+      "Epoch: [  3/ 10], step: [   34/  390], loss: [0.4400], avg loss: [0.4225], time: [105.4187ms]\n",
+      "Epoch: [  3/ 10], step: [   35/  390], loss: [0.4215], avg loss: [0.4225], time: [106.2911ms]\n",
+      "Epoch: [  3/ 10], step: [   36/  390], loss: [0.4959], avg loss: [0.4245], time: [103.3683ms]\n",
+      "Epoch: [  3/ 10], step: [   37/  390], loss: [0.4083], avg loss: [0.4241], time: [105.6094ms]\n",
+      "Epoch: [  3/ 10], step: [   38/  390], loss: [0.3641], avg loss: [0.4225], time: [103.6904ms]\n",
+      "Epoch: [  3/ 10], step: [   39/  390], loss: [0.4726], avg loss: [0.4238], time: [104.6524ms]\n",
+      "Epoch: [  3/ 10], step: [   40/  390], loss: [0.3642], avg loss: [0.4223], time: [102.7660ms]\n",
+      "Epoch: [  3/ 10], step: [   41/  390], loss: [0.4058], avg loss: [0.4219], time: [105.2840ms]\n",
+      "Epoch: [  3/ 10], step: [   42/  390], loss: [0.4929], avg loss: [0.4236], time: [107.5082ms]\n",
+      "Epoch: [  3/ 10], step: [   43/  390], loss: [0.3960], avg loss: [0.4230], time: [104.9497ms]\n",
+      "Epoch: [  3/ 10], step: [   44/  390], loss: [0.5293], avg loss: [0.4254], time: [101.9835ms]\n",
+      "Epoch: [  3/ 10], step: [   45/  390], loss: [0.4512], avg loss: [0.4260], time: [104.2707ms]\n",
+      "Epoch: [  3/ 10], step: [   46/  390], loss: [0.4348], avg loss: [0.4261], time: [104.0783ms]\n",
+      "Epoch: [  3/ 10], step: [   47/  390], loss: [0.3913], avg loss: [0.4254], time: [110.6560ms]\n",
+      "Epoch: [  3/ 10], step: [   48/  390], loss: [0.5439], avg loss: [0.4279], time: [106.7963ms]\n",
+      "Epoch: [  3/ 10], step: [   49/  390], loss: [0.3946], avg loss: [0.4272], time: [104.5280ms]\n",
+      "Epoch: [  3/ 10], step: [   50/  390], loss: [0.3742], avg loss: [0.4261], time: [101.4905ms]\n",
+      "Epoch: [  3/ 10], step: [   51/  390], loss: [0.3904], avg loss: [0.4254], time: [108.8462ms]\n",
+      "Epoch: [  3/ 10], step: [   52/  390], loss: [0.3143], avg loss: [0.4233], time: [105.3758ms]\n",
+      "Epoch: [  3/ 10], step: [   53/  390], loss: [0.3225], avg loss: [0.4214], time: [104.1429ms]\n",
+      "Epoch: [  3/ 10], step: [   54/  390], loss: [0.5099], avg loss: [0.4230], time: [105.7603ms]\n",
+      "Epoch: [  3/ 10], step: [   55/  390], loss: [0.3449], avg loss: [0.4216], time: [104.8827ms]\n",
+      "Epoch: [  3/ 10], step: [   56/  390], loss: [0.3859], avg loss: [0.4210], time: [103.6806ms]\n",
+      "Epoch: [  3/ 10], step: [   57/  390], loss: [0.3710], avg loss: [0.4201], time: [106.0722ms]\n",
+      "Epoch: [  3/ 10], step: [   58/  390], loss: [0.3936], avg loss: [0.4196], time: [100.5962ms]\n",
+      "Epoch: [  3/ 10], step: [   59/  390], loss: [0.2827], avg loss: [0.4173], time: [106.4043ms]\n",
+      "Epoch: [  3/ 10], step: [   60/  390], loss: [0.2523], avg loss: [0.4146], time: [104.5611ms]\n",
+      "Epoch: [  3/ 10], step: [   61/  390], loss: [0.2955], avg loss: [0.4126], time: [104.2995ms]\n",
+      "Epoch: [  3/ 10], step: [   62/  390], loss: [0.3792], avg loss: [0.4121], time: [105.1340ms]\n",
+      "Epoch: [  3/ 10], step: [   63/  390], loss: [0.3951], avg loss: [0.4118], time: [104.1362ms]\n",
+      "Epoch: [  3/ 10], step: [   64/  390], loss: [0.3538], avg loss: [0.4109], time: [102.9949ms]\n",
+      "Epoch: [  3/ 10], step: [   65/  390], loss: [0.2615], avg loss: [0.4086], time: [103.0068ms]\n",
+      "Epoch: [  3/ 10], step: [   66/  390], loss: [0.2563], avg loss: [0.4063], time: [102.3242ms]\n",
+      "Epoch: [  3/ 10], step: [   67/  390], loss: [0.3461], avg loss: [0.4054], time: [109.4780ms]\n",
+      "Epoch: [  3/ 10], step: [   68/  390], loss: [0.4189], avg loss: [0.4056], time: [101.6436ms]\n",
+      "Epoch: [  3/ 10], step: [   69/  390], loss: [0.1861], avg loss: [0.4024], time: [108.0732ms]\n",
+      "Epoch: [  3/ 10], step: [   70/  390], loss: [0.5654], avg loss: [0.4047], time: [100.9367ms]\n",
+      "Epoch: [  3/ 10], step: [   71/  390], loss: [0.3408], avg loss: [0.4038], time: [109.4954ms]\n",
+      "Epoch: [  3/ 10], step: [   72/  390], loss: [0.4145], avg loss: [0.4040], time: [101.1722ms]\n",
+      "Epoch: [  3/ 10], step: [   73/  390], loss: [0.3291], avg loss: [0.4030], time: [104.6176ms]\n",
+      "Epoch: [  3/ 10], step: [   74/  390], loss: [0.3935], avg loss: [0.4028], time: [103.2479ms]\n",
+      "Epoch: [  3/ 10], step: [   75/  390], loss: [0.4106], avg loss: [0.4029], time: [103.3683ms]\n",
+      "Epoch: [  3/ 10], step: [   76/  390], loss: [0.4341], avg loss: [0.4033], time: [107.6040ms]\n",
+      "Epoch: [  3/ 10], step: [   77/  390], loss: [0.3573], avg loss: [0.4028], time: [105.2480ms]\n",
+      "Epoch: [  3/ 10], step: [   78/  390], loss: [0.2479], avg loss: [0.4008], time: [101.9013ms]\n",
+      "Epoch: [  3/ 10], step: [   79/  390], loss: [0.3640], avg loss: [0.4003], time: [106.1730ms]\n",
+      "Epoch: [  3/ 10], step: [   80/  390], loss: [0.2931], avg loss: [0.3990], time: [105.1567ms]\n",
+      "Epoch: [  3/ 10], step: [   81/  390], loss: [0.4537], avg loss: [0.3996], time: [106.9849ms]\n",
+      "Epoch: [  3/ 10], step: [   82/  390], loss: [0.3663], avg loss: [0.3992], time: [102.8011ms]\n",
+      "Epoch: [  3/ 10], step: [   83/  390], loss: [0.4545], avg loss: [0.3999], time: [107.5125ms]\n",
+      "Epoch: [  3/ 10], step: [   84/  390], loss: [0.3072], avg loss: [0.3988], time: [105.4533ms]\n",
+      "Epoch: [  3/ 10], step: [   85/  390], loss: [0.3475], avg loss: [0.3982], time: [103.9536ms]\n",
+      "Epoch: [  3/ 10], step: [   86/  390], loss: [0.3380], avg loss: [0.3975], time: [103.4360ms]\n",
+      "Epoch: [  3/ 10], step: [   87/  390], loss: [0.3027], avg loss: [0.3964], time: [104.1820ms]\n",
+      "Epoch: [  3/ 10], step: [   88/  390], loss: [0.3898], avg loss: [0.3963], time: [102.8357ms]\n",
+      "Epoch: [  3/ 10], step: [   89/  390], loss: [0.3724], avg loss: [0.3961], time: [109.4525ms]\n",
+      "Epoch: [  3/ 10], step: [   90/  390], loss: [0.3696], avg loss: [0.3958], time: [103.3630ms]\n",
+      "Epoch: [  3/ 10], step: [   91/  390], loss: [0.5897], avg loss: [0.3979], time: [103.6150ms]\n",
+      "Epoch: [  3/ 10], step: [   92/  390], loss: [0.3328], avg loss: [0.3972], time: [105.8285ms]\n",
+      "Epoch: [  3/ 10], step: [   93/  390], loss: [0.4406], avg loss: [0.3977], time: [104.1517ms]\n",
+      "Epoch: [  3/ 10], step: [   94/  390], loss: [0.3753], avg loss: [0.3974], time: [106.1327ms]\n",
+      "Epoch: [  3/ 10], step: [   95/  390], loss: [0.4312], avg loss: [0.3978], time: [102.4258ms]\n",
+      "Epoch: [  3/ 10], step: [   96/  390], loss: [0.2916], avg loss: [0.3967], time: [105.0375ms]\n",
+      "Epoch: [  3/ 10], step: [   97/  390], loss: [0.4791], avg loss: [0.3975], time: [104.1269ms]\n",
+      "Epoch: [  3/ 10], step: [   98/  390], loss: [0.4071], avg loss: [0.3976], time: [102.0269ms]\n",
+      "Epoch: [  3/ 10], step: [   99/  390], loss: [0.3603], avg loss: [0.3972], time: [102.6518ms]\n",
+      "Epoch: [  3/ 10], step: [  100/  390], loss: [0.2947], avg loss: [0.3962], time: [102.4597ms]\n",
+      "Epoch: [  3/ 10], step: [  101/  390], loss: [0.3169], avg loss: [0.3954], time: [108.8419ms]\n",
+      "Epoch: [  3/ 10], step: [  102/  390], loss: [0.3696], avg loss: [0.3952], time: [104.0246ms]\n",
+      "Epoch: [  3/ 10], step: [  103/  390], loss: [0.3359], avg loss: [0.3946], time: [108.8769ms]\n",
+      "Epoch: [  3/ 10], step: [  104/  390], loss: [0.3557], avg loss: [0.3942], time: [102.0548ms]\n",
+      "Epoch: [  3/ 10], step: [  105/  390], loss: [0.4236], avg loss: [0.3945], time: [103.1075ms]\n",
+      "Epoch: [  3/ 10], step: [  106/  390], loss: [0.3706], avg loss: [0.3943], time: [103.7037ms]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: [  3/ 10], step: [  107/  390], loss: [0.4050], avg loss: [0.3944], time: [105.6411ms]\n",
+      "Epoch: [  3/ 10], step: [  108/  390], loss: [0.4224], avg loss: [0.3946], time: [105.2477ms]\n",
+      "Epoch: [  3/ 10], step: [  109/  390], loss: [0.3945], avg loss: [0.3946], time: [104.9845ms]\n",
+      "Epoch: [  3/ 10], step: [  110/  390], loss: [0.3166], avg loss: [0.3939], time: [102.7188ms]\n",
+      "Epoch: [  3/ 10], step: [  111/  390], loss: [0.4504], avg loss: [0.3944], time: [106.3836ms]\n",
+      "Epoch: [  3/ 10], step: [  112/  390], loss: [0.4167], avg loss: [0.3946], time: [105.3114ms]\n",
+      "Epoch: [  3/ 10], step: [  113/  390], loss: [0.4151], avg loss: [0.3948], time: [104.2454ms]\n",
+      "Epoch: [  3/ 10], step: [  114/  390], loss: [0.4592], avg loss: [0.3954], time: [101.6955ms]\n",
+      "Epoch: [  3/ 10], step: [  115/  390], loss: [0.4591], avg loss: [0.3959], time: [108.1009ms]\n",
+      "Epoch: [  3/ 10], step: [  116/  390], loss: [0.4377], avg loss: [0.3963], time: [102.1514ms]\n",
+      "Epoch: [  3/ 10], step: [  117/  390], loss: [0.3935], avg loss: [0.3963], time: [105.4153ms]\n",
+      "Epoch: [  3/ 10], step: [  118/  390], loss: [0.4603], avg loss: [0.3968], time: [106.0941ms]\n",
+      "Epoch: [  3/ 10], step: [  119/  390], loss: [0.4321], avg loss: [0.3971], time: [105.8371ms]\n",
+      "Epoch: [  3/ 10], step: [  120/  390], loss: [0.3649], avg loss: [0.3968], time: [102.9572ms]\n",
+      "Epoch: [  3/ 10], step: [  121/  390], loss: [0.2203], avg loss: [0.3954], time: [109.3593ms]\n",
+      "Epoch: [  3/ 10], step: [  122/  390], loss: [0.4187], avg loss: [0.3956], time: [103.2066ms]\n",
+      "Epoch: [  3/ 10], step: [  123/  390], loss: [0.4314], avg loss: [0.3959], time: [108.8202ms]\n",
+      "Epoch: [  3/ 10], step: [  124/  390], loss: [0.4402], avg loss: [0.3962], time: [100.9719ms]\n",
+      "Epoch: [  3/ 10], step: [  125/  390], loss: [0.4183], avg loss: [0.3964], time: [104.0988ms]\n",
+      "Epoch: [  3/ 10], step: [  126/  390], loss: [0.2995], avg loss: [0.3956], time: [105.5896ms]\n",
+      "Epoch: [  3/ 10], step: [  127/  390], loss: [0.5258], avg loss: [0.3966], time: [104.3465ms]\n",
+      "Epoch: [  3/ 10], step: [  128/  390], loss: [0.3425], avg loss: [0.3962], time: [103.1077ms]\n",
+      "Epoch: [  3/ 10], step: [  129/  390], loss: [0.4904], avg loss: [0.3970], time: [105.9737ms]\n",
+      "Epoch: [  3/ 10], step: [  130/  390], loss: [0.3656], avg loss: [0.3967], time: [102.1028ms]\n",
+      "Epoch: [  3/ 10], step: [  131/  390], loss: [0.2937], avg loss: [0.3959], time: [102.6547ms]\n",
+      "Epoch: [  3/ 10], step: [  132/  390], loss: [0.3514], avg loss: [0.3956], time: [105.2051ms]\n",
+      "Epoch: [  3/ 10], step: [  133/  390], loss: [0.4062], avg loss: [0.3957], time: [104.4257ms]\n",
+      "Epoch: [  3/ 10], step: [  134/  390], loss: [0.4585], avg loss: [0.3961], time: [105.3288ms]\n",
+      "Epoch: [  3/ 10], step: [  135/  390], loss: [0.4663], avg loss: [0.3967], time: [104.2590ms]\n",
+      "Epoch: [  3/ 10], step: [  136/  390], loss: [0.4121], avg loss: [0.3968], time: [101.5935ms]\n",
+      "Epoch: [  3/ 10], step: [  137/  390], loss: [0.5713], avg loss: [0.3980], time: [105.9940ms]\n",
+      "Epoch: [  3/ 10], step: [  138/  390], loss: [0.5436], avg loss: [0.3991], time: [104.3868ms]\n",
+      "Epoch: [  3/ 10], step: [  139/  390], loss: [0.3907], avg loss: [0.3990], time: [107.1291ms]\n",
+      "Epoch: [  3/ 10], step: [  140/  390], loss: [0.3895], avg loss: [0.3990], time: [103.6448ms]\n",
+      "Epoch: [  3/ 10], step: [  141/  390], loss: [0.2858], avg loss: [0.3982], time: [107.0457ms]\n",
+      "Epoch: [  3/ 10], step: [  142/  390], loss: [0.3387], avg loss: [0.3978], time: [104.5744ms]\n",
+      "Epoch: [  3/ 10], step: [  143/  390], loss: [0.2160], avg loss: [0.3965], time: [107.1973ms]\n",
+      "Epoch: [  3/ 10], step: [  144/  390], loss: [0.3003], avg loss: [0.3958], time: [103.3521ms]\n",
+      "Epoch: [  3/ 10], step: [  145/  390], loss: [0.4193], avg loss: [0.3960], time: [108.0887ms]\n",
+      "Epoch: [  3/ 10], step: [  146/  390], loss: [0.2822], avg loss: [0.3952], time: [101.7621ms]\n",
+      "Epoch: [  3/ 10], step: [  147/  390], loss: [0.4882], avg loss: [0.3958], time: [103.4474ms]\n",
+      "Epoch: [  3/ 10], step: [  148/  390], loss: [0.3009], avg loss: [0.3952], time: [107.7070ms]\n",
+      "Epoch: [  3/ 10], step: [  149/  390], loss: [0.4665], avg loss: [0.3957], time: [106.5691ms]\n",
+      "Epoch: [  3/ 10], step: [  150/  390], loss: [0.1979], avg loss: [0.3943], time: [102.4261ms]\n",
+      "Epoch: [  3/ 10], step: [  151/  390], loss: [0.5718], avg loss: [0.3955], time: [105.1044ms]\n",
+      "Epoch: [  3/ 10], step: [  152/  390], loss: [0.4232], avg loss: [0.3957], time: [105.0439ms]\n",
+      "Epoch: [  3/ 10], step: [  153/  390], loss: [0.3551], avg loss: [0.3954], time: [108.5839ms]\n",
+      "Epoch: [  3/ 10], step: [  154/  390], loss: [0.4726], avg loss: [0.3959], time: [107.5211ms]\n",
+      "Epoch: [  3/ 10], step: [  155/  390], loss: [0.4916], avg loss: [0.3966], time: [107.9369ms]\n",
+      "Epoch: [  3/ 10], step: [  156/  390], loss: [0.2972], avg loss: [0.3959], time: [103.8926ms]\n",
+      "Epoch: [  3/ 10], step: [  157/  390], loss: [0.5057], avg loss: [0.3966], time: [105.1493ms]\n",
+      "Epoch: [  3/ 10], step: [  158/  390], loss: [0.3771], avg loss: [0.3965], time: [102.5431ms]\n",
+      "Epoch: [  3/ 10], step: [  159/  390], loss: [0.4795], avg loss: [0.3970], time: [103.7226ms]\n",
+      "Epoch: [  3/ 10], step: [  160/  390], loss: [0.3869], avg loss: [0.3970], time: [102.3602ms]\n",
+      "Epoch: [  3/ 10], step: [  161/  390], loss: [0.4202], avg loss: [0.3971], time: [103.5981ms]\n",
+      "Epoch: [  3/ 10], step: [  162/  390], loss: [0.4563], avg loss: [0.3975], time: [100.3494ms]\n",
+      "Epoch: [  3/ 10], step: [  163/  390], loss: [0.4568], avg loss: [0.3978], time: [104.7633ms]\n",
+      "Epoch: [  3/ 10], step: [  164/  390], loss: [0.4694], avg loss: [0.3983], time: [101.3589ms]\n",
+      "Epoch: [  3/ 10], step: [  165/  390], loss: [0.4631], avg loss: [0.3987], time: [105.4330ms]\n",
+      "Epoch: [  3/ 10], step: [  166/  390], loss: [0.4519], avg loss: [0.3990], time: [99.0994ms]\n",
+      "Epoch: [  3/ 10], step: [  167/  390], loss: [0.3601], avg loss: [0.3987], time: [104.2192ms]\n",
+      "Epoch: [  3/ 10], step: [  168/  390], loss: [0.4120], avg loss: [0.3988], time: [105.3808ms]\n",
+      "Epoch: [  3/ 10], step: [  169/  390], loss: [0.4180], avg loss: [0.3989], time: [104.8212ms]\n",
+      "Epoch: [  3/ 10], step: [  170/  390], loss: [0.4114], avg loss: [0.3990], time: [104.3897ms]\n",
+      "Epoch: [  3/ 10], step: [  171/  390], loss: [0.4114], avg loss: [0.3991], time: [103.5337ms]\n",
+      "Epoch: [  3/ 10], step: [  172/  390], loss: [0.4159], avg loss: [0.3992], time: [102.5863ms]\n",
+      "Epoch: [  3/ 10], step: [  173/  390], loss: [0.4097], avg loss: [0.3992], time: [108.9649ms]\n",
+      "Epoch: [  3/ 10], step: [  174/  390], loss: [0.4147], avg loss: [0.3993], time: [101.7361ms]\n",
+      "Epoch: [  3/ 10], step: [  175/  390], loss: [0.4558], avg loss: [0.3997], time: [102.9305ms]\n",
+      "Epoch: [  3/ 10], step: [  176/  390], loss: [0.4649], avg loss: [0.4000], time: [105.9370ms]\n",
+      "Epoch: [  3/ 10], step: [  177/  390], loss: [0.3569], avg loss: [0.3998], time: [106.8923ms]\n",
+      "Epoch: [  3/ 10], step: [  178/  390], loss: [0.3931], avg loss: [0.3997], time: [106.4448ms]\n",
+      "Epoch: [  3/ 10], step: [  179/  390], loss: [0.4755], avg loss: [0.4002], time: [108.0317ms]\n",
+      "Epoch: [  3/ 10], step: [  180/  390], loss: [0.3079], avg loss: [0.3997], time: [107.6589ms]\n",
+      "Epoch: [  3/ 10], step: [  181/  390], loss: [0.2524], avg loss: [0.3988], time: [105.7515ms]\n",
+      "Epoch: [  3/ 10], step: [  182/  390], loss: [0.4180], avg loss: [0.3989], time: [104.0635ms]\n",
+      "Epoch: [  3/ 10], step: [  183/  390], loss: [0.3591], avg loss: [0.3987], time: [107.6572ms]\n",
+      "Epoch: [  3/ 10], step: [  184/  390], loss: [0.4032], avg loss: [0.3988], time: [104.4724ms]\n",
+      "Epoch: [  3/ 10], step: [  185/  390], loss: [0.4342], avg loss: [0.3989], time: [103.9944ms]\n",
+      "Epoch: [  3/ 10], step: [  186/  390], loss: [0.4754], avg loss: [0.3994], time: [107.6875ms]\n",
+      "Epoch: [  3/ 10], step: [  187/  390], loss: [0.4542], avg loss: [0.3996], time: [103.8816ms]\n",
+      "Epoch: [  3/ 10], step: [  188/  390], loss: [0.4420], avg loss: [0.3999], time: [103.5764ms]\n",
+      "Epoch: [  3/ 10], step: [  189/  390], loss: [0.4167], avg loss: [0.4000], time: [102.6843ms]\n",
+      "Epoch: [  3/ 10], step: [  190/  390], loss: [0.3310], avg loss: [0.3996], time: [106.3318ms]\n",
+      "Epoch: [  3/ 10], step: [  191/  390], loss: [0.3687], avg loss: [0.3994], time: [104.1405ms]\n",
+      "Epoch: [  3/ 10], step: [  192/  390], loss: [0.5318], avg loss: [0.4001], time: [106.7257ms]\n",
+      "Epoch: [  3/ 10], step: [  193/  390], loss: [0.4974], avg loss: [0.4006], time: [104.1036ms]\n",
+      "Epoch: [  3/ 10], step: [  194/  390], loss: [0.3833], avg loss: [0.4005], time: [100.6978ms]\n",
+      "Epoch: [  3/ 10], step: [  195/  390], loss: [0.3165], avg loss: [0.4001], time: [104.9170ms]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: [  3/ 10], step: [  196/  390], loss: [0.3696], avg loss: [0.4000], time: [104.5880ms]\n",
+      "Epoch: [  3/ 10], step: [  197/  390], loss: [0.3521], avg loss: [0.3997], time: [102.7956ms]\n",
+      "Epoch: [  3/ 10], step: [  198/  390], loss: [0.3601], avg loss: [0.3995], time: [104.1501ms]\n",
+      "Epoch: [  3/ 10], step: [  199/  390], loss: [0.4757], avg loss: [0.3999], time: [102.9751ms]\n",
+      "Epoch: [  3/ 10], step: [  200/  390], loss: [0.4163], avg loss: [0.4000], time: [103.6022ms]\n",
+      "Epoch: [  3/ 10], step: [  201/  390], loss: [0.3398], avg loss: [0.3997], time: [104.8150ms]\n",
+      "Epoch: [  3/ 10], step: [  202/  390], loss: [0.4203], avg loss: [0.3998], time: [103.7111ms]\n",
+      "Epoch: [  3/ 10], step: [  203/  390], loss: [0.3198], avg loss: [0.3994], time: [102.7951ms]\n",
+      "Epoch: [  3/ 10], step: [  204/  390], loss: [0.3190], avg loss: [0.3990], time: [103.6525ms]\n",
+      "Epoch: [  3/ 10], step: [  205/  390], loss: [0.3116], avg loss: [0.3986], time: [103.7445ms]\n",
+      "Epoch: [  3/ 10], step: [  206/  390], loss: [0.3934], avg loss: [0.3985], time: [103.9851ms]\n",
+      "Epoch: [  3/ 10], step: [  207/  390], loss: [0.4535], avg loss: [0.3988], time: [107.6472ms]\n",
+      "Epoch: [  3/ 10], step: [  208/  390], loss: [0.4659], avg loss: [0.3991], time: [104.4419ms]\n",
+      "Epoch: [  3/ 10], step: [  209/  390], loss: [0.3414], avg loss: [0.3989], time: [104.8763ms]\n",
+      "Epoch: [  3/ 10], step: [  210/  390], loss: [0.4802], avg loss: [0.3992], time: [105.4237ms]\n",
+      "Epoch: [  3/ 10], step: [  211/  390], loss: [0.5756], avg loss: [0.4001], time: [104.9984ms]\n",
+      "Epoch: [  3/ 10], step: [  212/  390], loss: [0.3171], avg loss: [0.3997], time: [104.9805ms]\n",
+      "Epoch: [  3/ 10], step: [  213/  390], loss: [0.4107], avg loss: [0.3997], time: [106.6091ms]\n",
+      "Epoch: [  3/ 10], step: [  214/  390], loss: [0.3674], avg loss: [0.3996], time: [104.7139ms]\n",
+      "Epoch: [  3/ 10], step: [  215/  390], loss: [0.4184], avg loss: [0.3997], time: [103.4434ms]\n",
+      "Epoch: [  3/ 10], step: [  216/  390], loss: [0.3420], avg loss: [0.3994], time: [104.7640ms]\n",
+      "Epoch: [  3/ 10], step: [  217/  390], loss: [0.6002], avg loss: [0.4003], time: [106.9977ms]\n",
+      "Epoch: [  3/ 10], step: [  218/  390], loss: [0.2872], avg loss: [0.3998], time: [105.8819ms]\n",
+      "Epoch: [  3/ 10], step: [  219/  390], loss: [0.3229], avg loss: [0.3995], time: [105.8280ms]\n",
+      "Epoch: [  3/ 10], step: [  220/  390], loss: [0.4415], avg loss: [0.3997], time: [102.8762ms]\n",
+      "Epoch: [  3/ 10], step: [  221/  390], loss: [0.3746], avg loss: [0.3995], time: [106.3321ms]\n",
+      "Epoch: [  3/ 10], step: [  222/  390], loss: [0.2635], avg loss: [0.3989], time: [103.1306ms]\n",
+      "Epoch: [  3/ 10], step: [  223/  390], loss: [0.3991], avg loss: [0.3989], time: [105.2806ms]\n",
+      "Epoch: [  3/ 10], step: [  224/  390], loss: [0.3567], avg loss: [0.3987], time: [102.5407ms]\n",
+      "Epoch: [  3/ 10], step: [  225/  390], loss: [0.3465], avg loss: [0.3985], time: [105.3410ms]\n",
+      "Epoch: [  3/ 10], step: [  226/  390], loss: [0.3587], avg loss: [0.3983], time: [105.5880ms]\n",
+      "Epoch: [  3/ 10], step: [  227/  390], loss: [0.5150], avg loss: [0.3988], time: [104.2035ms]\n",
+      "Epoch: [  3/ 10], step: [  228/  390], loss: [0.4710], avg loss: [0.3992], time: [104.1911ms]\n",
+      "Epoch: [  3/ 10], step: [  229/  390], loss: [0.2521], avg loss: [0.3985], time: [109.4351ms]\n",
+      "Epoch: [  3/ 10], step: [  230/  390], loss: [0.4252], avg loss: [0.3986], time: [101.8806ms]\n",
+      "Epoch: [  3/ 10], step: [  231/  390], loss: [0.3643], avg loss: [0.3985], time: [104.1234ms]\n",
+      "Epoch: [  3/ 10], step: [  232/  390], loss: [0.4818], avg loss: [0.3988], time: [100.6560ms]\n",
+      "Epoch: [  3/ 10], step: [  233/  390], loss: [0.4397], avg loss: [0.3990], time: [109.3936ms]\n",
+      "Epoch: [  3/ 10], step: [  234/  390], loss: [0.3876], avg loss: [0.3990], time: [101.9568ms]\n",
+      "Epoch: [  3/ 10], step: [  235/  390], loss: [0.3596], avg loss: [0.3988], time: [105.1791ms]\n",
+      "Epoch: [  3/ 10], step: [  236/  390], loss: [0.3529], avg loss: [0.3986], time: [101.6073ms]\n",
+      "Epoch: [  3/ 10], step: [  237/  390], loss: [0.3215], avg loss: [0.3983], time: [108.1150ms]\n",
+      "Epoch: [  3/ 10], step: [  238/  390], loss: [0.4018], avg loss: [0.3983], time: [107.6634ms]\n",
+      "Epoch: [  3/ 10], step: [  239/  390], loss: [0.4951], avg loss: [0.3987], time: [105.2194ms]\n",
+      "Epoch: [  3/ 10], step: [  240/  390], loss: [0.5848], avg loss: [0.3995], time: [102.4096ms]\n",
+      "Epoch: [  3/ 10], step: [  241/  390], loss: [0.2801], avg loss: [0.3990], time: [105.2697ms]\n",
+      "Epoch: [  3/ 10], step: [  242/  390], loss: [0.3817], avg loss: [0.3989], time: [101.7780ms]\n",
+      "Epoch: [  3/ 10], step: [  243/  390], loss: [0.3129], avg loss: [0.3986], time: [108.2616ms]\n",
+      "Epoch: [  3/ 10], step: [  244/  390], loss: [0.3563], avg loss: [0.3984], time: [102.7660ms]\n",
+      "Epoch: [  3/ 10], step: [  245/  390], loss: [0.4328], avg loss: [0.3985], time: [102.7133ms]\n",
+      "Epoch: [  3/ 10], step: [  246/  390], loss: [0.2599], avg loss: [0.3980], time: [104.3119ms]\n",
+      "Epoch: [  3/ 10], step: [  247/  390], loss: [0.3628], avg loss: [0.3978], time: [104.1727ms]\n",
+      "Epoch: [  3/ 10], step: [  248/  390], loss: [0.3745], avg loss: [0.3977], time: [105.3689ms]\n",
+      "Epoch: [  3/ 10], step: [  249/  390], loss: [0.5442], avg loss: [0.3983], time: [103.2224ms]\n",
+      "Epoch: [  3/ 10], step: [  250/  390], loss: [0.2922], avg loss: [0.3979], time: [102.6263ms]\n",
+      "Epoch: [  3/ 10], step: [  251/  390], loss: [0.5088], avg loss: [0.3983], time: [105.4394ms]\n",
+      "Epoch: [  3/ 10], step: [  252/  390], loss: [0.4104], avg loss: [0.3984], time: [102.2727ms]\n",
+      "Epoch: [  3/ 10], step: [  253/  390], loss: [0.3428], avg loss: [0.3982], time: [105.7491ms]\n",
+      "Epoch: [  3/ 10], step: [  254/  390], loss: [0.2948], avg loss: [0.3978], time: [101.4915ms]\n",
+      "Epoch: [  3/ 10], step: [  255/  390], loss: [0.2938], avg loss: [0.3973], time: [108.3610ms]\n",
+      "Epoch: [  3/ 10], step: [  256/  390], loss: [0.3375], avg loss: [0.3971], time: [101.7067ms]\n",
+      "Epoch: [  3/ 10], step: [  257/  390], loss: [0.4268], avg loss: [0.3972], time: [105.7875ms]\n",
+      "Epoch: [  3/ 10], step: [  258/  390], loss: [0.4184], avg loss: [0.3973], time: [106.6694ms]\n",
+      "Epoch: [  3/ 10], step: [  259/  390], loss: [0.4208], avg loss: [0.3974], time: [103.3854ms]\n",
+      "Epoch: [  3/ 10], step: [  260/  390], loss: [0.4031], avg loss: [0.3974], time: [101.1827ms]\n",
+      "Epoch: [  3/ 10], step: [  261/  390], loss: [0.4611], avg loss: [0.3977], time: [108.3522ms]\n",
+      "Epoch: [  3/ 10], step: [  262/  390], loss: [0.4319], avg loss: [0.3978], time: [107.0139ms]\n",
+      "Epoch: [  3/ 10], step: [  263/  390], loss: [0.3944], avg loss: [0.3978], time: [104.6364ms]\n",
+      "Epoch: [  3/ 10], step: [  264/  390], loss: [0.3305], avg loss: [0.3975], time: [101.7764ms]\n",
+      "Epoch: [  3/ 10], step: [  265/  390], loss: [0.3527], avg loss: [0.3974], time: [108.5494ms]\n",
+      "Epoch: [  3/ 10], step: [  266/  390], loss: [0.4057], avg loss: [0.3974], time: [104.1663ms]\n",
+      "Epoch: [  3/ 10], step: [  267/  390], loss: [0.4273], avg loss: [0.3975], time: [103.1766ms]\n",
+      "Epoch: [  3/ 10], step: [  268/  390], loss: [0.3185], avg loss: [0.3972], time: [103.4164ms]\n",
+      "Epoch: [  3/ 10], step: [  269/  390], loss: [0.3514], avg loss: [0.3970], time: [106.0576ms]\n",
+      "Epoch: [  3/ 10], step: [  270/  390], loss: [0.3194], avg loss: [0.3967], time: [102.9601ms]\n",
+      "Epoch: [  3/ 10], step: [  271/  390], loss: [0.3234], avg loss: [0.3965], time: [107.7216ms]\n",
+      "Epoch: [  3/ 10], step: [  272/  390], loss: [0.4830], avg loss: [0.3968], time: [101.5913ms]\n",
+      "Epoch: [  3/ 10], step: [  273/  390], loss: [0.4117], avg loss: [0.3969], time: [104.7854ms]\n",
+      "Epoch: [  3/ 10], step: [  274/  390], loss: [0.4786], avg loss: [0.3971], time: [104.3963ms]\n",
+      "Epoch: [  3/ 10], step: [  275/  390], loss: [0.4281], avg loss: [0.3973], time: [109.1735ms]\n",
+      "Epoch: [  3/ 10], step: [  276/  390], loss: [0.3829], avg loss: [0.3972], time: [104.0406ms]\n",
+      "Epoch: [  3/ 10], step: [  277/  390], loss: [0.5034], avg loss: [0.3976], time: [106.9276ms]\n",
+      "Epoch: [  3/ 10], step: [  278/  390], loss: [0.5044], avg loss: [0.3980], time: [101.9976ms]\n",
+      "Epoch: [  3/ 10], step: [  279/  390], loss: [0.4408], avg loss: [0.3981], time: [107.4753ms]\n",
+      "Epoch: [  3/ 10], step: [  280/  390], loss: [0.3188], avg loss: [0.3978], time: [101.8198ms]\n",
+      "Epoch: [  3/ 10], step: [  281/  390], loss: [0.3911], avg loss: [0.3978], time: [109.5376ms]\n",
+      "Epoch: [  3/ 10], step: [  282/  390], loss: [0.3954], avg loss: [0.3978], time: [103.4002ms]\n",
+      "Epoch: [  3/ 10], step: [  283/  390], loss: [0.4993], avg loss: [0.3982], time: [104.7726ms]\n",
+      "Epoch: [  3/ 10], step: [  284/  390], loss: [0.3837], avg loss: [0.3981], time: [102.5836ms]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: [  3/ 10], step: [  285/  390], loss: [0.4163], avg loss: [0.3982], time: [105.0472ms]\n",
+      "Epoch: [  3/ 10], step: [  286/  390], loss: [0.4400], avg loss: [0.3983], time: [101.8260ms]\n",
+      "Epoch: [  3/ 10], step: [  287/  390], loss: [0.5866], avg loss: [0.3990], time: [107.6553ms]\n",
+      "Epoch: [  3/ 10], step: [  288/  390], loss: [0.5641], avg loss: [0.3996], time: [104.2180ms]\n",
+      "Epoch: [  3/ 10], step: [  289/  390], loss: [0.4612], avg loss: [0.3998], time: [105.5670ms]\n",
+      "Epoch: [  3/ 10], step: [  290/  390], loss: [0.2980], avg loss: [0.3994], time: [101.8584ms]\n",
+      "Epoch: [  3/ 10], step: [  291/  390], loss: [0.4731], avg loss: [0.3997], time: [107.9223ms]\n",
+      "Epoch: [  3/ 10], step: [  292/  390], loss: [0.3319], avg loss: [0.3994], time: [102.9286ms]\n",
+      "Epoch: [  3/ 10], step: [  293/  390], loss: [0.2109], avg loss: [0.3988], time: [102.7219ms]\n",
+      "Epoch: [  3/ 10], step: [  294/  390], loss: [0.3556], avg loss: [0.3987], time: [106.2779ms]\n",
+      "Epoch: [  3/ 10], step: [  295/  390], loss: [0.5077], avg loss: [0.3990], time: [108.3124ms]\n",
+      "Epoch: [  3/ 10], step: [  296/  390], loss: [0.3730], avg loss: [0.3989], time: [105.7329ms]\n",
+      "Epoch: [  3/ 10], step: [  297/  390], loss: [0.3788], avg loss: [0.3989], time: [103.3728ms]\n",
+      "Epoch: [  3/ 10], step: [  298/  390], loss: [0.4189], avg loss: [0.3989], time: [102.2112ms]\n",
+      "Epoch: [  3/ 10], step: [  299/  390], loss: [0.4771], avg loss: [0.3992], time: [107.6407ms]\n",
+      "Epoch: [  3/ 10], step: [  300/  390], loss: [0.4764], avg loss: [0.3995], time: [106.6926ms]\n",
+      "Epoch: [  3/ 10], step: [  301/  390], loss: [0.2127], avg loss: [0.3988], time: [109.6792ms]\n",
+      "Epoch: [  3/ 10], step: [  302/  390], loss: [0.3632], avg loss: [0.3987], time: [106.6904ms]\n",
+      "Epoch: [  3/ 10], step: [  303/  390], loss: [0.4322], avg loss: [0.3988], time: [107.4171ms]\n",
+      "Epoch: [  3/ 10], step: [  304/  390], loss: [0.2149], avg loss: [0.3982], time: [102.3970ms]\n",
+      "Epoch: [  3/ 10], step: [  305/  390], loss: [0.3922], avg loss: [0.3982], time: [103.8301ms]\n",
+      "Epoch: [  3/ 10], step: [  306/  390], loss: [0.3648], avg loss: [0.3981], time: [104.2843ms]\n",
+      "Epoch: [  3/ 10], step: [  307/  390], loss: [0.4253], avg loss: [0.3982], time: [108.5942ms]\n",
+      "Epoch: [  3/ 10], step: [  308/  390], loss: [0.2997], avg loss: [0.3979], time: [108.4452ms]\n",
+      "Epoch: [  3/ 10], step: [  309/  390], loss: [0.4857], avg loss: [0.3981], time: [106.3452ms]\n",
+      "Epoch: [  3/ 10], step: [  310/  390], loss: [0.2400], avg loss: [0.3976], time: [104.8474ms]\n",
+      "Epoch: [  3/ 10], step: [  311/  390], loss: [0.3372], avg loss: [0.3974], time: [106.9918ms]\n",
+      "Epoch: [  3/ 10], step: [  312/  390], loss: [0.3999], avg loss: [0.3974], time: [107.0898ms]\n",
+      "Epoch: [  3/ 10], step: [  313/  390], loss: [0.3966], avg loss: [0.3974], time: [104.3730ms]\n",
+      "Epoch: [  3/ 10], step: [  314/  390], loss: [0.3356], avg loss: [0.3972], time: [106.7452ms]\n",
+      "Epoch: [  3/ 10], step: [  315/  390], loss: [0.4338], avg loss: [0.3974], time: [104.3062ms]\n",
+      "Epoch: [  3/ 10], step: [  316/  390], loss: [0.4492], avg loss: [0.3975], time: [106.8077ms]\n",
+      "Epoch: [  3/ 10], step: [  317/  390], loss: [0.4842], avg loss: [0.3978], time: [105.7394ms]\n",
+      "Epoch: [  3/ 10], step: [  318/  390], loss: [0.4107], avg loss: [0.3978], time: [103.0300ms]\n",
+      "Epoch: [  3/ 10], step: [  319/  390], loss: [0.4075], avg loss: [0.3979], time: [109.4158ms]\n",
+      "Epoch: [  3/ 10], step: [  320/  390], loss: [0.2865], avg loss: [0.3975], time: [107.0471ms]\n",
+      "Epoch: [  3/ 10], step: [  321/  390], loss: [0.4206], avg loss: [0.3976], time: [107.0685ms]\n",
+      "Epoch: [  3/ 10], step: [  322/  390], loss: [0.3023], avg loss: [0.3973], time: [104.3916ms]\n",
+      "Epoch: [  3/ 10], step: [  323/  390], loss: [0.5861], avg loss: [0.3979], time: [104.4352ms]\n",
+      "Epoch: [  3/ 10], step: [  324/  390], loss: [0.3894], avg loss: [0.3979], time: [103.2722ms]\n",
+      "Epoch: [  3/ 10], step: [  325/  390], loss: [0.4065], avg loss: [0.3979], time: [105.9616ms]\n",
+      "Epoch: [  3/ 10], step: [  326/  390], loss: [0.4846], avg loss: [0.3982], time: [106.9846ms]\n",
+      "Epoch: [  3/ 10], step: [  327/  390], loss: [0.3179], avg loss: [0.3979], time: [104.4197ms]\n",
+      "Epoch: [  3/ 10], step: [  328/  390], loss: [0.4151], avg loss: [0.3980], time: [102.4847ms]\n",
+      "Epoch: [  3/ 10], step: [  329/  390], loss: [0.4456], avg loss: [0.3981], time: [106.0786ms]\n",
+      "Epoch: [  3/ 10], step: [  330/  390], loss: [0.5323], avg loss: [0.3985], time: [104.0373ms]\n",
+      "Epoch: [  3/ 10], step: [  331/  390], loss: [0.4364], avg loss: [0.3986], time: [107.2137ms]\n",
+      "Epoch: [  3/ 10], step: [  332/  390], loss: [0.3513], avg loss: [0.3985], time: [104.4378ms]\n",
+      "Epoch: [  3/ 10], step: [  333/  390], loss: [0.3349], avg loss: [0.3983], time: [104.3508ms]\n",
+      "Epoch: [  3/ 10], step: [  334/  390], loss: [0.4467], avg loss: [0.3984], time: [106.8215ms]\n",
+      "Epoch: [  3/ 10], step: [  335/  390], loss: [0.3192], avg loss: [0.3982], time: [107.9407ms]\n",
+      "Epoch: [  3/ 10], step: [  336/  390], loss: [0.3861], avg loss: [0.3982], time: [106.5843ms]\n",
+      "Epoch: [  3/ 10], step: [  337/  390], loss: [0.4852], avg loss: [0.3984], time: [105.2516ms]\n",
+      "Epoch: [  3/ 10], step: [  338/  390], loss: [0.5865], avg loss: [0.3990], time: [106.8594ms]\n",
+      "Epoch: [  3/ 10], step: [  339/  390], loss: [0.4505], avg loss: [0.3991], time: [103.7941ms]\n",
+      "Epoch: [  3/ 10], step: [  340/  390], loss: [0.3992], avg loss: [0.3991], time: [106.0836ms]\n",
+      "Epoch: [  3/ 10], step: [  341/  390], loss: [0.4544], avg loss: [0.3993], time: [102.8435ms]\n",
+      "Epoch: [  3/ 10], step: [  342/  390], loss: [0.6408], avg loss: [0.4000], time: [101.7888ms]\n",
+      "Epoch: [  3/ 10], step: [  343/  390], loss: [0.4806], avg loss: [0.4002], time: [105.1824ms]\n",
+      "Epoch: [  3/ 10], step: [  344/  390], loss: [0.4758], avg loss: [0.4005], time: [102.1531ms]\n",
+      "Epoch: [  3/ 10], step: [  345/  390], loss: [0.3838], avg loss: [0.4004], time: [108.0832ms]\n",
+      "Epoch: [  3/ 10], step: [  346/  390], loss: [0.4273], avg loss: [0.4005], time: [107.1842ms]\n",
+      "Epoch: [  3/ 10], step: [  347/  390], loss: [0.3675], avg loss: [0.4004], time: [107.6250ms]\n",
+      "Epoch: [  3/ 10], step: [  348/  390], loss: [0.4613], avg loss: [0.4006], time: [102.1802ms]\n",
+      "Epoch: [  3/ 10], step: [  349/  390], loss: [0.5186], avg loss: [0.4009], time: [105.7131ms]\n",
+      "Epoch: [  3/ 10], step: [  350/  390], loss: [0.4531], avg loss: [0.4011], time: [106.6077ms]\n",
+      "Epoch: [  3/ 10], step: [  351/  390], loss: [0.3558], avg loss: [0.4009], time: [104.0854ms]\n",
+      "Epoch: [  3/ 10], step: [  352/  390], loss: [0.3800], avg loss: [0.4009], time: [103.4341ms]\n",
+      "Epoch: [  3/ 10], step: [  353/  390], loss: [0.4185], avg loss: [0.4009], time: [106.5884ms]\n",
+      "Epoch: [  3/ 10], step: [  354/  390], loss: [0.3551], avg loss: [0.4008], time: [100.7278ms]\n",
+      "Epoch: [  3/ 10], step: [  355/  390], loss: [0.3627], avg loss: [0.4007], time: [104.9283ms]\n",
+      "Epoch: [  3/ 10], step: [  356/  390], loss: [0.3571], avg loss: [0.4006], time: [101.3572ms]\n",
+      "Epoch: [  3/ 10], step: [  357/  390], loss: [0.5939], avg loss: [0.4011], time: [104.5790ms]\n",
+      "Epoch: [  3/ 10], step: [  358/  390], loss: [0.5010], avg loss: [0.4014], time: [103.0738ms]\n",
+      "Epoch: [  3/ 10], step: [  359/  390], loss: [0.3568], avg loss: [0.4013], time: [108.9163ms]\n",
+      "Epoch: [  3/ 10], step: [  360/  390], loss: [0.3379], avg loss: [0.4011], time: [102.8593ms]\n",
+      "Epoch: [  3/ 10], step: [  361/  390], loss: [0.3807], avg loss: [0.4010], time: [108.5019ms]\n",
+      "Epoch: [  3/ 10], step: [  362/  390], loss: [0.5156], avg loss: [0.4013], time: [103.0893ms]\n",
+      "Epoch: [  3/ 10], step: [  363/  390], loss: [0.4275], avg loss: [0.4014], time: [104.1322ms]\n",
+      "Epoch: [  3/ 10], step: [  364/  390], loss: [0.4519], avg loss: [0.4015], time: [105.7711ms]\n",
+      "Epoch: [  3/ 10], step: [  365/  390], loss: [0.4699], avg loss: [0.4017], time: [105.3543ms]\n",
+      "Epoch: [  3/ 10], step: [  366/  390], loss: [0.3991], avg loss: [0.4017], time: [103.7087ms]\n",
+      "Epoch: [  3/ 10], step: [  367/  390], loss: [0.5582], avg loss: [0.4022], time: [104.6586ms]\n",
+      "Epoch: [  3/ 10], step: [  368/  390], loss: [0.3483], avg loss: [0.4020], time: [102.0648ms]\n",
+      "Epoch: [  3/ 10], step: [  369/  390], loss: [0.5089], avg loss: [0.4023], time: [102.4358ms]\n",
+      "Epoch: [  3/ 10], step: [  370/  390], loss: [0.4907], avg loss: [0.4025], time: [103.3907ms]\n",
+      "Epoch: [  3/ 10], step: [  371/  390], loss: [0.3668], avg loss: [0.4024], time: [105.0577ms]\n",
+      "Epoch: [  3/ 10], step: [  372/  390], loss: [0.4605], avg loss: [0.4026], time: [102.5267ms]\n",
+      "Epoch: [  3/ 10], step: [  373/  390], loss: [0.4048], avg loss: [0.4026], time: [110.1303ms]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: [  3/ 10], step: [  374/  390], loss: [0.3921], avg loss: [0.4026], time: [107.5480ms]\n",
+      "Epoch: [  3/ 10], step: [  375/  390], loss: [0.4149], avg loss: [0.4026], time: [106.8847ms]\n",
+      "Epoch: [  3/ 10], step: [  376/  390], loss: [0.4907], avg loss: [0.4028], time: [104.5911ms]\n",
+      "Epoch: [  3/ 10], step: [  377/  390], loss: [0.3688], avg loss: [0.4027], time: [109.8232ms]\n",
+      "Epoch: [  3/ 10], step: [  378/  390], loss: [0.3472], avg loss: [0.4026], time: [101.5713ms]\n",
+      "Epoch: [  3/ 10], step: [  379/  390], loss: [0.4601], avg loss: [0.4028], time: [109.2470ms]\n",
+      "Epoch: [  3/ 10], step: [  380/  390], loss: [0.3989], avg loss: [0.4027], time: [105.3076ms]\n",
+      "Epoch: [  3/ 10], step: [  381/  390], loss: [0.4383], avg loss: [0.4028], time: [104.4583ms]\n",
+      "Epoch: [  3/ 10], step: [  382/  390], loss: [0.4026], avg loss: [0.4028], time: [105.3464ms]\n",
+      "Epoch: [  3/ 10], step: [  383/  390], loss: [0.4012], avg loss: [0.4028], time: [101.9688ms]\n",
+      "Epoch: [  3/ 10], step: [  384/  390], loss: [0.3780], avg loss: [0.4028], time: [103.9000ms]\n",
+      "Epoch: [  3/ 10], step: [  385/  390], loss: [0.4996], avg loss: [0.4030], time: [109.7128ms]\n",
+      "Epoch: [  3/ 10], step: [  386/  390], loss: [0.4128], avg loss: [0.4030], time: [101.9919ms]\n",
+      "Epoch: [  3/ 10], step: [  387/  390], loss: [0.4403], avg loss: [0.4031], time: [105.8900ms]\n",
+      "Epoch: [  3/ 10], step: [  388/  390], loss: [0.3133], avg loss: [0.4029], time: [100.8637ms]\n",
+      "Epoch: [  3/ 10], step: [  389/  390], loss: [0.3768], avg loss: [0.4028], time: [101.2883ms]\n",
+      "Epoch: [  3/ 10], step: [  390/  390], loss: [0.3408], avg loss: [0.4027], time: [828.4981ms]\n",
+      "Epoch time: 41946.674, per step time: 107.556\n",
+      "Epoch time: 41946.990, per step time: 107.556, avg loss: 0.403\n",
+      "************************************************************\n",
+      "Epoch: [  4/ 10], step: [    1/  390], loss: [0.4017], avg loss: [0.4017], time: [73.9217ms]\n",
+      "Epoch: [  4/ 10], step: [    2/  390], loss: [0.4795], avg loss: [0.4406], time: [103.4141ms]\n",
+      "Epoch: [  4/ 10], step: [    3/  390], loss: [0.2870], avg loss: [0.3894], time: [107.5652ms]\n",
+      "Epoch: [  4/ 10], step: [    4/  390], loss: [0.4298], avg loss: [0.3995], time: [107.0776ms]\n",
+      "Epoch: [  4/ 10], step: [    5/  390], loss: [0.3789], avg loss: [0.3954], time: [107.3534ms]\n",
+      "Epoch: [  4/ 10], step: [    6/  390], loss: [0.3850], avg loss: [0.3936], time: [104.1341ms]\n",
+      "Epoch: [  4/ 10], step: [    7/  390], loss: [0.5787], avg loss: [0.4201], time: [102.0460ms]\n",
+      "Epoch: [  4/ 10], step: [    8/  390], loss: [0.4739], avg loss: [0.4268], time: [104.0175ms]\n",
+      "Epoch: [  4/ 10], step: [    9/  390], loss: [0.3946], avg loss: [0.4232], time: [105.8216ms]\n",
+      "Epoch: [  4/ 10], step: [   10/  390], loss: [0.4048], avg loss: [0.4214], time: [103.3876ms]\n",
+      "Epoch: [  4/ 10], step: [   11/  390], loss: [0.2484], avg loss: [0.4057], time: [102.9871ms]\n",
+      "Epoch: [  4/ 10], step: [   12/  390], loss: [0.2323], avg loss: [0.3912], time: [100.4424ms]\n",
+      "Epoch: [  4/ 10], step: [   13/  390], loss: [0.4067], avg loss: [0.3924], time: [101.6319ms]\n",
+      "Epoch: [  4/ 10], step: [   14/  390], loss: [0.3270], avg loss: [0.3877], time: [102.6273ms]\n",
+      "Epoch: [  4/ 10], step: [   15/  390], loss: [0.4092], avg loss: [0.3892], time: [102.1852ms]\n",
+      "Epoch: [  4/ 10], step: [   16/  390], loss: [0.3262], avg loss: [0.3852], time: [106.1547ms]\n",
+      "Epoch: [  4/ 10], step: [   17/  390], loss: [0.3273], avg loss: [0.3818], time: [104.9578ms]\n",
+      "Epoch: [  4/ 10], step: [   18/  390], loss: [0.3551], avg loss: [0.3803], time: [104.7392ms]\n",
+      "Epoch: [  4/ 10], step: [   19/  390], loss: [0.2978], avg loss: [0.3760], time: [103.5285ms]\n",
+      "Epoch: [  4/ 10], step: [   20/  390], loss: [0.3568], avg loss: [0.3750], time: [104.4667ms]\n",
+      "Epoch: [  4/ 10], step: [   21/  390], loss: [0.3576], avg loss: [0.3742], time: [106.6606ms]\n",
+      "Epoch: [  4/ 10], step: [   22/  390], loss: [0.4565], avg loss: [0.3779], time: [108.4642ms]\n",
+      "Epoch: [  4/ 10], step: [   23/  390], loss: [0.3130], avg loss: [0.3751], time: [106.9276ms]\n",
+      "Epoch: [  4/ 10], step: [   24/  390], loss: [0.3228], avg loss: [0.3729], time: [101.8312ms]\n",
+      "Epoch: [  4/ 10], step: [   25/  390], loss: [0.4285], avg loss: [0.3752], time: [106.0030ms]\n",
+      "Epoch: [  4/ 10], step: [   26/  390], loss: [0.4040], avg loss: [0.3763], time: [105.2804ms]\n",
+      "Epoch: [  4/ 10], step: [   27/  390], loss: [0.2316], avg loss: [0.3709], time: [107.4708ms]\n",
+      "Epoch: [  4/ 10], step: [   28/  390], loss: [0.2661], avg loss: [0.3672], time: [103.3499ms]\n",
+      "Epoch: [  4/ 10], step: [   29/  390], loss: [0.3404], avg loss: [0.3662], time: [102.5293ms]\n",
+      "Epoch: [  4/ 10], step: [   30/  390], loss: [0.4828], avg loss: [0.3701], time: [105.6926ms]\n",
+      "Epoch: [  4/ 10], step: [   31/  390], loss: [0.3574], avg loss: [0.3697], time: [106.9551ms]\n",
+      "Epoch: [  4/ 10], step: [   32/  390], loss: [0.5177], avg loss: [0.3743], time: [107.4872ms]\n",
+      "Epoch: [  4/ 10], step: [   33/  390], loss: [0.4476], avg loss: [0.3766], time: [105.5844ms]\n",
+      "Epoch: [  4/ 10], step: [   34/  390], loss: [0.4039], avg loss: [0.3774], time: [102.4349ms]\n",
+      "Epoch: [  4/ 10], step: [   35/  390], loss: [0.4306], avg loss: [0.3789], time: [102.9930ms]\n",
+      "Epoch: [  4/ 10], step: [   36/  390], loss: [0.3846], avg loss: [0.3790], time: [104.8679ms]\n",
+      "Epoch: [  4/ 10], step: [   37/  390], loss: [0.3046], avg loss: [0.3770], time: [103.7214ms]\n",
+      "Epoch: [  4/ 10], step: [   38/  390], loss: [0.3345], avg loss: [0.3759], time: [104.6548ms]\n",
+      "Epoch: [  4/ 10], step: [   39/  390], loss: [0.4613], avg loss: [0.3781], time: [102.4151ms]\n",
+      "Epoch: [  4/ 10], step: [   40/  390], loss: [0.4372], avg loss: [0.3796], time: [104.0227ms]\n",
+      "Epoch: [  4/ 10], step: [   41/  390], loss: [0.3131], avg loss: [0.3780], time: [104.7094ms]\n",
+      "Epoch: [  4/ 10], step: [   42/  390], loss: [0.3185], avg loss: [0.3765], time: [103.5342ms]\n",
+      "Epoch: [  4/ 10], step: [   43/  390], loss: [0.4237], avg loss: [0.3776], time: [106.8850ms]\n",
+      "Epoch: [  4/ 10], step: [   44/  390], loss: [0.3446], avg loss: [0.3769], time: [104.2166ms]\n",
+      "Epoch: [  4/ 10], step: [   45/  390], loss: [0.3386], avg loss: [0.3760], time: [102.8478ms]\n",
+      "Epoch: [  4/ 10], step: [   46/  390], loss: [0.2380], avg loss: [0.3730], time: [106.4036ms]\n",
+      "Epoch: [  4/ 10], step: [   47/  390], loss: [0.2631], avg loss: [0.3707], time: [100.6525ms]\n",
+      "Epoch: [  4/ 10], step: [   48/  390], loss: [0.3154], avg loss: [0.3695], time: [103.7781ms]\n",
+      "Epoch: [  4/ 10], step: [   49/  390], loss: [0.3512], avg loss: [0.3692], time: [104.8393ms]\n",
+      "Epoch: [  4/ 10], step: [   50/  390], loss: [0.3820], avg loss: [0.3694], time: [105.8657ms]\n",
+      "Epoch: [  4/ 10], step: [   51/  390], loss: [0.4683], avg loss: [0.3714], time: [103.6565ms]\n",
+      "Epoch: [  4/ 10], step: [   52/  390], loss: [0.3854], avg loss: [0.3716], time: [104.9857ms]\n",
+      "Epoch: [  4/ 10], step: [   53/  390], loss: [0.4999], avg loss: [0.3741], time: [101.6917ms]\n",
+      "Epoch: [  4/ 10], step: [   54/  390], loss: [0.5073], avg loss: [0.3765], time: [105.3748ms]\n",
+      "Epoch: [  4/ 10], step: [   55/  390], loss: [0.4146], avg loss: [0.3772], time: [105.8059ms]\n",
+      "Epoch: [  4/ 10], step: [   56/  390], loss: [0.4214], avg loss: [0.3780], time: [103.3521ms]\n",
+      "Epoch: [  4/ 10], step: [   57/  390], loss: [0.3034], avg loss: [0.3767], time: [106.8072ms]\n",
+      "Epoch: [  4/ 10], step: [   58/  390], loss: [0.3051], avg loss: [0.3755], time: [103.3835ms]\n",
+      "Epoch: [  4/ 10], step: [   59/  390], loss: [0.3742], avg loss: [0.3754], time: [105.8295ms]\n",
+      "Epoch: [  4/ 10], step: [   60/  390], loss: [0.4394], avg loss: [0.3765], time: [105.0005ms]\n",
+      "Epoch: [  4/ 10], step: [   61/  390], loss: [0.2594], avg loss: [0.3746], time: [106.1208ms]\n",
+      "Epoch: [  4/ 10], step: [   62/  390], loss: [0.4522], avg loss: [0.3758], time: [106.8385ms]\n",
+      "Epoch: [  4/ 10], step: [   63/  390], loss: [0.4361], avg loss: [0.3768], time: [102.1011ms]\n",
+      "Epoch: [  4/ 10], step: [   64/  390], loss: [0.3397], avg loss: [0.3762], time: [104.5163ms]\n",
+      "Epoch: [  4/ 10], step: [   65/  390], loss: [0.2726], avg loss: [0.3746], time: [105.8998ms]\n",
+      "Epoch: [  4/ 10], step: [   66/  390], loss: [0.3973], avg loss: [0.3750], time: [103.3664ms]\n",
+      "Epoch: [  4/ 10], step: [   67/  390], loss: [0.3567], avg loss: [0.3747], time: [105.4764ms]\n",
+      "Epoch: [  4/ 10], step: [   68/  390], loss: [0.3505], avg loss: [0.3743], time: [102.8385ms]\n",
+      "Epoch: [  4/ 10], step: [   69/  390], loss: [0.3896], avg loss: [0.3746], time: [103.4474ms]\n",
+      "Epoch: [  4/ 10], step: [   70/  390], loss: [0.3462], avg loss: [0.3741], time: [105.1519ms]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: [  4/ 10], step: [   71/  390], loss: [0.3085], avg loss: [0.3732], time: [103.1454ms]\n",
+      "Epoch: [  4/ 10], step: [   72/  390], loss: [0.2767], avg loss: [0.3719], time: [102.9990ms]\n",
+      "Epoch: [  4/ 10], step: [   73/  390], loss: [0.3353], avg loss: [0.3714], time: [107.5771ms]\n",
+      "Epoch: [  4/ 10], step: [   74/  390], loss: [0.4800], avg loss: [0.3729], time: [104.0356ms]\n",
+      "Epoch: [  4/ 10], step: [   75/  390], loss: [0.2814], avg loss: [0.3716], time: [104.0728ms]\n",
+      "Epoch: [  4/ 10], step: [   76/  390], loss: [0.4233], avg loss: [0.3723], time: [104.9471ms]\n",
+      "Epoch: [  4/ 10], step: [   77/  390], loss: [0.2641], avg loss: [0.3709], time: [103.7886ms]\n",
+      "Epoch: [  4/ 10], step: [   78/  390], loss: [0.3865], avg loss: [0.3711], time: [107.5280ms]\n",
+      "Epoch: [  4/ 10], step: [   79/  390], loss: [0.2459], avg loss: [0.3695], time: [106.9174ms]\n",
+      "Epoch: [  4/ 10], step: [   80/  390], loss: [0.4205], avg loss: [0.3702], time: [104.5945ms]\n",
+      "Epoch: [  4/ 10], step: [   81/  390], loss: [0.4781], avg loss: [0.3715], time: [102.2487ms]\n",
+      "Epoch: [  4/ 10], step: [   82/  390], loss: [0.5155], avg loss: [0.3732], time: [103.3859ms]\n",
+      "Epoch: [  4/ 10], step: [   83/  390], loss: [0.3062], avg loss: [0.3724], time: [107.1780ms]\n",
+      "Epoch: [  4/ 10], step: [   84/  390], loss: [0.4246], avg loss: [0.3731], time: [105.0212ms]\n",
+      "Epoch: [  4/ 10], step: [   85/  390], loss: [0.4452], avg loss: [0.3739], time: [104.6870ms]\n",
+      "Epoch: [  4/ 10], step: [   86/  390], loss: [0.4439], avg loss: [0.3747], time: [102.5512ms]\n",
+      "Epoch: [  4/ 10], step: [   87/  390], loss: [0.3794], avg loss: [0.3748], time: [106.2095ms]\n",
+      "Epoch: [  4/ 10], step: [   88/  390], loss: [0.4272], avg loss: [0.3754], time: [104.0113ms]\n",
+      "Epoch: [  4/ 10], step: [   89/  390], loss: [0.3608], avg loss: [0.3752], time: [105.8640ms]\n",
+      "Epoch: [  4/ 10], step: [   90/  390], loss: [0.3053], avg loss: [0.3744], time: [106.1811ms]\n",
+      "Epoch: [  4/ 10], step: [   91/  390], loss: [0.3505], avg loss: [0.3742], time: [106.4234ms]\n",
+      "Epoch: [  4/ 10], step: [   92/  390], loss: [0.2630], avg loss: [0.3730], time: [101.3699ms]\n",
+      "Epoch: [  4/ 10], step: [   93/  390], loss: [0.4086], avg loss: [0.3733], time: [102.1168ms]\n",
+      "Epoch: [  4/ 10], step: [   94/  390], loss: [0.3074], avg loss: [0.3726], time: [106.9517ms]\n",
+      "Epoch: [  4/ 10], step: [   95/  390], loss: [0.2860], avg loss: [0.3717], time: [107.4505ms]\n",
+      "Epoch: [  4/ 10], step: [   96/  390], loss: [0.3472], avg loss: [0.3715], time: [105.2582ms]\n",
+      "Epoch: [  4/ 10], step: [   97/  390], loss: [0.4399], avg loss: [0.3722], time: [104.7673ms]\n",
+      "Epoch: [  4/ 10], step: [   98/  390], loss: [0.2984], avg loss: [0.3714], time: [102.5717ms]\n",
+      "Epoch: [  4/ 10], step: [   99/  390], loss: [0.5062], avg loss: [0.3728], time: [106.9460ms]\n",
+      "Epoch: [  4/ 10], step: [  100/  390], loss: [0.5517], avg loss: [0.3746], time: [105.6576ms]\n",
+      "Epoch: [  4/ 10], step: [  101/  390], loss: [0.5153], avg loss: [0.3760], time: [105.8364ms]\n",
+      "Epoch: [  4/ 10], step: [  102/  390], loss: [0.4030], avg loss: [0.3762], time: [105.5653ms]\n",
+      "Epoch: [  4/ 10], step: [  103/  390], loss: [0.3423], avg loss: [0.3759], time: [105.3305ms]\n",
+      "Epoch: [  4/ 10], step: [  104/  390], loss: [0.5257], avg loss: [0.3773], time: [102.0906ms]\n",
+      "Epoch: [  4/ 10], step: [  105/  390], loss: [0.3724], avg loss: [0.3773], time: [104.0003ms]\n",
+      "Epoch: [  4/ 10], step: [  106/  390], loss: [0.3023], avg loss: [0.3766], time: [106.1201ms]\n",
+      "Epoch: [  4/ 10], step: [  107/  390], loss: [0.3482], avg loss: [0.3763], time: [107.1177ms]\n",
+      "Epoch: [  4/ 10], step: [  108/  390], loss: [0.3615], avg loss: [0.3762], time: [106.6062ms]\n",
+      "Epoch: [  4/ 10], step: [  109/  390], loss: [0.4316], avg loss: [0.3767], time: [103.5779ms]\n",
+      "Epoch: [  4/ 10], step: [  110/  390], loss: [0.3250], avg loss: [0.3762], time: [104.2085ms]\n",
+      "Epoch: [  4/ 10], step: [  111/  390], loss: [0.4009], avg loss: [0.3764], time: [107.6093ms]\n",
+      "Epoch: [  4/ 10], step: [  112/  390], loss: [0.3942], avg loss: [0.3766], time: [106.7224ms]\n",
+      "Epoch: [  4/ 10], step: [  113/  390], loss: [0.2140], avg loss: [0.3752], time: [104.5911ms]\n",
+      "Epoch: [  4/ 10], step: [  114/  390], loss: [0.4001], avg loss: [0.3754], time: [103.5399ms]\n",
+      "Epoch: [  4/ 10], step: [  115/  390], loss: [0.4625], avg loss: [0.3761], time: [106.3213ms]\n",
+      "Epoch: [  4/ 10], step: [  116/  390], loss: [0.3707], avg loss: [0.3761], time: [108.2938ms]\n",
+      "Epoch: [  4/ 10], step: [  117/  390], loss: [0.5109], avg loss: [0.3772], time: [108.2978ms]\n",
+      "Epoch: [  4/ 10], step: [  118/  390], loss: [0.3670], avg loss: [0.3772], time: [107.1320ms]\n",
+      "Epoch: [  4/ 10], step: [  119/  390], loss: [0.3501], avg loss: [0.3769], time: [106.8544ms]\n",
+      "Epoch: [  4/ 10], step: [  120/  390], loss: [0.3834], avg loss: [0.3770], time: [106.4882ms]\n",
+      "Epoch: [  4/ 10], step: [  121/  390], loss: [0.3532], avg loss: [0.3768], time: [107.4162ms]\n",
+      "Epoch: [  4/ 10], step: [  122/  390], loss: [0.3031], avg loss: [0.3762], time: [103.9529ms]\n",
+      "Epoch: [  4/ 10], step: [  123/  390], loss: [0.3020], avg loss: [0.3756], time: [105.1888ms]\n",
+      "Epoch: [  4/ 10], step: [  124/  390], loss: [0.2292], avg loss: [0.3744], time: [106.4696ms]\n",
+      "Epoch: [  4/ 10], step: [  125/  390], loss: [0.4072], avg loss: [0.3747], time: [108.0911ms]\n",
+      "Epoch: [  4/ 10], step: [  126/  390], loss: [0.3180], avg loss: [0.3742], time: [108.1126ms]\n",
+      "Epoch: [  4/ 10], step: [  127/  390], loss: [0.3820], avg loss: [0.3743], time: [102.9165ms]\n",
+      "Epoch: [  4/ 10], step: [  128/  390], loss: [0.4190], avg loss: [0.3746], time: [106.5381ms]\n",
+      "Epoch: [  4/ 10], step: [  129/  390], loss: [0.2390], avg loss: [0.3736], time: [112.4887ms]\n",
+      "Epoch: [  4/ 10], step: [  130/  390], loss: [0.3056], avg loss: [0.3731], time: [106.4742ms]\n",
+      "Epoch: [  4/ 10], step: [  131/  390], loss: [0.3209], avg loss: [0.3727], time: [107.4369ms]\n",
+      "Epoch: [  4/ 10], step: [  132/  390], loss: [0.3113], avg loss: [0.3722], time: [105.5963ms]\n",
+      "Epoch: [  4/ 10], step: [  133/  390], loss: [0.2161], avg loss: [0.3710], time: [105.7923ms]\n",
+      "Epoch: [  4/ 10], step: [  134/  390], loss: [0.3602], avg loss: [0.3709], time: [107.3329ms]\n",
+      "Epoch: [  4/ 10], step: [  135/  390], loss: [0.3843], avg loss: [0.3710], time: [107.0983ms]\n",
+      "Epoch: [  4/ 10], step: [  136/  390], loss: [0.4002], avg loss: [0.3712], time: [105.2377ms]\n",
+      "Epoch: [  4/ 10], step: [  137/  390], loss: [0.3382], avg loss: [0.3710], time: [106.5493ms]\n",
+      "Epoch: [  4/ 10], step: [  138/  390], loss: [0.4547], avg loss: [0.3716], time: [104.9621ms]\n",
+      "Epoch: [  4/ 10], step: [  139/  390], loss: [0.4897], avg loss: [0.3725], time: [104.1026ms]\n",
+      "Epoch: [  4/ 10], step: [  140/  390], loss: [0.2613], avg loss: [0.3717], time: [108.1474ms]\n",
+      "Epoch: [  4/ 10], step: [  141/  390], loss: [0.3163], avg loss: [0.3713], time: [109.3452ms]\n",
+      "Epoch: [  4/ 10], step: [  142/  390], loss: [0.3970], avg loss: [0.3715], time: [106.0774ms]\n",
+      "Epoch: [  4/ 10], step: [  143/  390], loss: [0.4706], avg loss: [0.3722], time: [108.7382ms]\n",
+      "Epoch: [  4/ 10], step: [  144/  390], loss: [0.2520], avg loss: [0.3713], time: [107.2977ms]\n",
+      "Epoch: [  4/ 10], step: [  145/  390], loss: [0.2754], avg loss: [0.3707], time: [105.1550ms]\n",
+      "Epoch: [  4/ 10], step: [  146/  390], loss: [0.3478], avg loss: [0.3705], time: [103.5702ms]\n",
+      "Epoch: [  4/ 10], step: [  147/  390], loss: [0.3348], avg loss: [0.3703], time: [107.3720ms]\n",
+      "Epoch: [  4/ 10], step: [  148/  390], loss: [0.4345], avg loss: [0.3707], time: [108.3019ms]\n",
+      "Epoch: [  4/ 10], step: [  149/  390], loss: [0.2415], avg loss: [0.3698], time: [105.2392ms]\n",
+      "Epoch: [  4/ 10], step: [  150/  390], loss: [0.4655], avg loss: [0.3705], time: [106.0359ms]\n",
+      "Epoch: [  4/ 10], step: [  151/  390], loss: [0.3261], avg loss: [0.3702], time: [105.4292ms]\n",
+      "Epoch: [  4/ 10], step: [  152/  390], loss: [0.5246], avg loss: [0.3712], time: [105.9971ms]\n",
+      "Epoch: [  4/ 10], step: [  153/  390], loss: [0.4512], avg loss: [0.3717], time: [107.9559ms]\n",
+      "Epoch: [  4/ 10], step: [  154/  390], loss: [0.2818], avg loss: [0.3711], time: [105.0174ms]\n",
+      "Epoch: [  4/ 10], step: [  155/  390], loss: [0.4020], avg loss: [0.3713], time: [107.2338ms]\n",
+      "Epoch: [  4/ 10], step: [  156/  390], loss: [0.3509], avg loss: [0.3712], time: [106.5938ms]\n",
+      "Epoch: [  4/ 10], step: [  157/  390], loss: [0.5440], avg loss: [0.3723], time: [111.7208ms]\n",
+      "Epoch: [  4/ 10], step: [  158/  390], loss: [0.3820], avg loss: [0.3724], time: [107.2078ms]\n",
+      "Epoch: [  4/ 10], step: [  159/  390], loss: [0.3345], avg loss: [0.3721], time: [107.3508ms]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: [  4/ 10], step: [  160/  390], loss: [0.4387], avg loss: [0.3725], time: [105.3257ms]\n",
+      "Epoch: [  4/ 10], step: [  161/  390], loss: [0.3441], avg loss: [0.3724], time: [105.7281ms]\n",
+      "Epoch: [  4/ 10], step: [  162/  390], loss: [0.3684], avg loss: [0.3723], time: [105.7646ms]\n",
+      "Epoch: [  4/ 10], step: [  163/  390], loss: [0.3465], avg loss: [0.3722], time: [106.7050ms]\n",
+      "Epoch: [  4/ 10], step: [  164/  390], loss: [0.5299], avg loss: [0.3731], time: [105.3362ms]\n",
+      "Epoch: [  4/ 10], step: [  165/  390], loss: [0.5045], avg loss: [0.3739], time: [106.9767ms]\n",
+      "Epoch: [  4/ 10], step: [  166/  390], loss: [0.3958], avg loss: [0.3741], time: [106.9121ms]\n",
+      "Epoch: [  4/ 10], step: [  167/  390], loss: [0.3517], avg loss: [0.3739], time: [107.1458ms]\n",
+      "Epoch: [  4/ 10], step: [  168/  390], loss: [0.4668], avg loss: [0.3745], time: [107.9512ms]\n",
+      "Epoch: [  4/ 10], step: [  169/  390], loss: [0.2722], avg loss: [0.3739], time: [102.9236ms]\n",
+      "Epoch: [  4/ 10], step: [  170/  390], loss: [0.4252], avg loss: [0.3742], time: [104.2573ms]\n",
+      "Epoch: [  4/ 10], step: [  171/  390], loss: [0.4219], avg loss: [0.3745], time: [109.7882ms]\n",
+      "Epoch: [  4/ 10], step: [  172/  390], loss: [0.4034], avg loss: [0.3746], time: [107.9049ms]\n",
+      "Epoch: [  4/ 10], step: [  173/  390], loss: [0.4636], avg loss: [0.3751], time: [105.2358ms]\n",
+      "Epoch: [  4/ 10], step: [  174/  390], loss: [0.3881], avg loss: [0.3752], time: [107.3976ms]\n",
+      "Epoch: [  4/ 10], step: [  175/  390], loss: [0.3162], avg loss: [0.3749], time: [103.8535ms]\n",
+      "Epoch: [  4/ 10], step: [  176/  390], loss: [0.3936], avg loss: [0.3750], time: [107.2187ms]\n",
+      "Epoch: [  4/ 10], step: [  177/  390], loss: [0.3591], avg loss: [0.3749], time: [106.3905ms]\n",
+      "Epoch: [  4/ 10], step: [  178/  390], loss: [0.3104], avg loss: [0.3745], time: [106.2949ms]\n",
+      "Epoch: [  4/ 10], step: [  179/  390], loss: [0.2385], avg loss: [0.3738], time: [105.2456ms]\n",
+      "Epoch: [  4/ 10], step: [  180/  390], loss: [0.2899], avg loss: [0.3733], time: [105.3367ms]\n",
+      "Epoch: [  4/ 10], step: [  181/  390], loss: [0.3091], avg loss: [0.3729], time: [106.3597ms]\n",
+      "Epoch: [  4/ 10], step: [  182/  390], loss: [0.4573], avg loss: [0.3734], time: [105.8962ms]\n",
+      "Epoch: [  4/ 10], step: [  183/  390], loss: [0.4415], avg loss: [0.3738], time: [105.4540ms]\n",
+      "Epoch: [  4/ 10], step: [  184/  390], loss: [0.2995], avg loss: [0.3734], time: [103.1325ms]\n",
+      "Epoch: [  4/ 10], step: [  185/  390], loss: [0.2719], avg loss: [0.3728], time: [107.6870ms]\n",
+      "Epoch: [  4/ 10], step: [  186/  390], loss: [0.3571], avg loss: [0.3727], time: [106.4153ms]\n",
+      "Epoch: [  4/ 10], step: [  187/  390], loss: [0.3442], avg loss: [0.3726], time: [108.4838ms]\n",
+      "Epoch: [  4/ 10], step: [  188/  390], loss: [0.3863], avg loss: [0.3727], time: [105.6416ms]\n",
+      "Epoch: [  4/ 10], step: [  189/  390], loss: [0.3299], avg loss: [0.3724], time: [107.1110ms]\n",
+      "Epoch: [  4/ 10], step: [  190/  390], loss: [0.2998], avg loss: [0.3721], time: [104.6271ms]\n",
+      "Epoch: [  4/ 10], step: [  191/  390], loss: [0.3399], avg loss: [0.3719], time: [109.9551ms]\n",
+      "Epoch: [  4/ 10], step: [  192/  390], loss: [0.2481], avg loss: [0.3712], time: [106.9565ms]\n",
+      "Epoch: [  4/ 10], step: [  193/  390], loss: [0.3842], avg loss: [0.3713], time: [105.9043ms]\n",
+      "Epoch: [  4/ 10], step: [  194/  390], loss: [0.3805], avg loss: [0.3714], time: [107.5363ms]\n",
+      "Epoch: [  4/ 10], step: [  195/  390], loss: [0.4114], avg loss: [0.3716], time: [106.0455ms]\n",
+      "Epoch: [  4/ 10], step: [  196/  390], loss: [0.2850], avg loss: [0.3711], time: [106.7135ms]\n",
+      "Epoch: [  4/ 10], step: [  197/  390], loss: [0.2693], avg loss: [0.3706], time: [107.2354ms]\n",
+      "Epoch: [  4/ 10], step: [  198/  390], loss: [0.2606], avg loss: [0.3701], time: [104.0814ms]\n",
+      "Epoch: [  4/ 10], step: [  199/  390], loss: [0.3752], avg loss: [0.3701], time: [105.1664ms]\n",
+      "Epoch: [  4/ 10], step: [  200/  390], loss: [0.4419], avg loss: [0.3704], time: [107.0666ms]\n",
+      "Epoch: [  4/ 10], step: [  201/  390], loss: [0.3777], avg loss: [0.3705], time: [106.7691ms]\n",
+      "Epoch: [  4/ 10], step: [  202/  390], loss: [0.4244], avg loss: [0.3707], time: [105.9108ms]\n",
+      "Epoch: [  4/ 10], step: [  203/  390], loss: [0.3185], avg loss: [0.3705], time: [108.5169ms]\n",
+      "Epoch: [  4/ 10], step: [  204/  390], loss: [0.3078], avg loss: [0.3702], time: [104.2497ms]\n",
+      "Epoch: [  4/ 10], step: [  205/  390], loss: [0.3949], avg loss: [0.3703], time: [104.3434ms]\n",
+      "Epoch: [  4/ 10], step: [  206/  390], loss: [0.3288], avg loss: [0.3701], time: [104.9526ms]\n",
+      "Epoch: [  4/ 10], step: [  207/  390], loss: [0.4153], avg loss: [0.3703], time: [104.3918ms]\n",
+      "Epoch: [  4/ 10], step: [  208/  390], loss: [0.2307], avg loss: [0.3696], time: [106.4024ms]\n",
+      "Epoch: [  4/ 10], step: [  209/  390], loss: [0.3982], avg loss: [0.3698], time: [105.8433ms]\n",
+      "Epoch: [  4/ 10], step: [  210/  390], loss: [0.3027], avg loss: [0.3695], time: [106.5593ms]\n",
+      "Epoch: [  4/ 10], step: [  211/  390], loss: [0.3901], avg loss: [0.3696], time: [104.8863ms]\n",
+      "Epoch: [  4/ 10], step: [  212/  390], loss: [0.4023], avg loss: [0.3697], time: [106.0429ms]\n",
+      "Epoch: [  4/ 10], step: [  213/  390], loss: [0.2610], avg loss: [0.3692], time: [105.1226ms]\n",
+      "Epoch: [  4/ 10], step: [  214/  390], loss: [0.3141], avg loss: [0.3689], time: [106.7166ms]\n",
+      "Epoch: [  4/ 10], step: [  215/  390], loss: [0.2775], avg loss: [0.3685], time: [107.2128ms]\n",
+      "Epoch: [  4/ 10], step: [  216/  390], loss: [0.4507], avg loss: [0.3689], time: [107.1441ms]\n",
+      "Epoch: [  4/ 10], step: [  217/  390], loss: [0.3489], avg loss: [0.3688], time: [109.1814ms]\n",
+      "Epoch: [  4/ 10], step: [  218/  390], loss: [0.4935], avg loss: [0.3694], time: [105.3557ms]\n",
+      "Epoch: [  4/ 10], step: [  219/  390], loss: [0.3538], avg loss: [0.3693], time: [108.0081ms]\n",
+      "Epoch: [  4/ 10], step: [  220/  390], loss: [0.3235], avg loss: [0.3691], time: [106.7159ms]\n",
+      "Epoch: [  4/ 10], step: [  221/  390], loss: [0.2939], avg loss: [0.3688], time: [105.4449ms]\n",
+      "Epoch: [  4/ 10], step: [  222/  390], loss: [0.3348], avg loss: [0.3686], time: [105.8254ms]\n",
+      "Epoch: [  4/ 10], step: [  223/  390], loss: [0.3916], avg loss: [0.3687], time: [106.7023ms]\n",
+      "Epoch: [  4/ 10], step: [  224/  390], loss: [0.4481], avg loss: [0.3691], time: [104.8224ms]\n",
+      "Epoch: [  4/ 10], step: [  225/  390], loss: [0.2748], avg loss: [0.3686], time: [104.7447ms]\n",
+      "Epoch: [  4/ 10], step: [  226/  390], loss: [0.3481], avg loss: [0.3686], time: [105.0262ms]\n",
+      "Epoch: [  4/ 10], step: [  227/  390], loss: [0.4186], avg loss: [0.3688], time: [109.0863ms]\n",
+      "Epoch: [  4/ 10], step: [  228/  390], loss: [0.4347], avg loss: [0.3691], time: [108.5703ms]\n",
+      "Epoch: [  4/ 10], step: [  229/  390], loss: [0.3251], avg loss: [0.3689], time: [104.6019ms]\n",
+      "Epoch: [  4/ 10], step: [  230/  390], loss: [0.3473], avg loss: [0.3688], time: [106.7584ms]\n",
+      "Epoch: [  4/ 10], step: [  231/  390], loss: [0.3915], avg loss: [0.3689], time: [106.0672ms]\n",
+      "Epoch: [  4/ 10], step: [  232/  390], loss: [0.2889], avg loss: [0.3685], time: [106.2105ms]\n",
+      "Epoch: [  4/ 10], step: [  233/  390], loss: [0.2659], avg loss: [0.3681], time: [105.5975ms]\n",
+      "Epoch: [  4/ 10], step: [  234/  390], loss: [0.3052], avg loss: [0.3678], time: [108.1386ms]\n",
+      "Epoch: [  4/ 10], step: [  235/  390], loss: [0.4258], avg loss: [0.3681], time: [103.0502ms]\n",
+      "Epoch: [  4/ 10], step: [  236/  390], loss: [0.3783], avg loss: [0.3681], time: [108.2599ms]\n",
+      "Epoch: [  4/ 10], step: [  237/  390], loss: [0.4851], avg loss: [0.3686], time: [107.0790ms]\n",
+      "Epoch: [  4/ 10], step: [  238/  390], loss: [0.3114], avg loss: [0.3684], time: [106.1127ms]\n",
+      "Epoch: [  4/ 10], step: [  239/  390], loss: [0.2487], avg loss: [0.3679], time: [107.1804ms]\n",
+      "Epoch: [  4/ 10], step: [  240/  390], loss: [0.4030], avg loss: [0.3680], time: [106.5235ms]\n",
+      "Epoch: [  4/ 10], step: [  241/  390], loss: [0.4842], avg loss: [0.3685], time: [105.6583ms]\n",
+      "Epoch: [  4/ 10], step: [  242/  390], loss: [0.4098], avg loss: [0.3687], time: [106.6108ms]\n",
+      "Epoch: [  4/ 10], step: [  243/  390], loss: [0.2414], avg loss: [0.3681], time: [108.7084ms]\n",
+      "Epoch: [  4/ 10], step: [  244/  390], loss: [0.5210], avg loss: [0.3688], time: [106.1161ms]\n",
+      "Epoch: [  4/ 10], step: [  245/  390], loss: [0.3267], avg loss: [0.3686], time: [108.6352ms]\n",
+      "Epoch: [  4/ 10], step: [  246/  390], loss: [0.4094], avg loss: [0.3688], time: [107.4882ms]\n",
+      "Epoch: [  4/ 10], step: [  247/  390], loss: [0.3241], avg loss: [0.3686], time: [107.6438ms]\n",
+      "Epoch: [  4/ 10], step: [  248/  390], loss: [0.4039], avg loss: [0.3687], time: [109.0815ms]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: [  4/ 10], step: [  249/  390], loss: [0.2710], avg loss: [0.3683], time: [106.2400ms]\n",
+      "Epoch: [  4/ 10], step: [  250/  390], loss: [0.3260], avg loss: [0.3682], time: [103.6398ms]\n",
+      "Epoch: [  4/ 10], step: [  251/  390], loss: [0.3744], avg loss: [0.3682], time: [108.1443ms]\n",
+      "Epoch: [  4/ 10], step: [  252/  390], loss: [0.2942], avg loss: [0.3679], time: [103.0304ms]\n",
+      "Epoch: [  4/ 10], step: [  253/  390], loss: [0.4133], avg loss: [0.3681], time: [103.5023ms]\n",
+      "Epoch: [  4/ 10], step: [  254/  390], loss: [0.2983], avg loss: [0.3678], time: [109.1344ms]\n",
+      "Epoch: [  4/ 10], step: [  255/  390], loss: [0.4217], avg loss: [0.3680], time: [103.4021ms]\n",
+      "Epoch: [  4/ 10], step: [  256/  390], loss: [0.3493], avg loss: [0.3679], time: [105.2632ms]\n",
+      "Epoch: [  4/ 10], step: [  257/  390], loss: [0.2805], avg loss: [0.3676], time: [110.6668ms]\n",
+      "Epoch: [  4/ 10], step: [  258/  390], loss: [0.3151], avg loss: [0.3674], time: [108.0148ms]\n",
+      "Epoch: [  4/ 10], step: [  259/  390], loss: [0.3350], avg loss: [0.3673], time: [104.0602ms]\n",
+      "Epoch: [  4/ 10], step: [  260/  390], loss: [0.5220], avg loss: [0.3679], time: [108.6056ms]\n",
+      "Epoch: [  4/ 10], step: [  261/  390], loss: [0.2808], avg loss: [0.3675], time: [107.3999ms]\n",
+      "Epoch: [  4/ 10], step: [  262/  390], loss: [0.2904], avg loss: [0.3672], time: [104.6116ms]\n",
+      "Epoch: [  4/ 10], step: [  263/  390], loss: [0.4144], avg loss: [0.3674], time: [108.1769ms]\n",
+      "Epoch: [  4/ 10], step: [  264/  390], loss: [0.3710], avg loss: [0.3674], time: [107.1186ms]\n",
+      "Epoch: [  4/ 10], step: [  265/  390], loss: [0.2993], avg loss: [0.3672], time: [106.6625ms]\n",
+      "Epoch: [  4/ 10], step: [  266/  390], loss: [0.3192], avg loss: [0.3670], time: [105.6526ms]\n",
+      "Epoch: [  4/ 10], step: [  267/  390], loss: [0.2591], avg loss: [0.3666], time: [109.6139ms]\n",
+      "Epoch: [  4/ 10], step: [  268/  390], loss: [0.4449], avg loss: [0.3669], time: [109.1521ms]\n",
+      "Epoch: [  4/ 10], step: [  269/  390], loss: [0.3405], avg loss: [0.3668], time: [103.2643ms]\n",
+      "Epoch: [  4/ 10], step: [  270/  390], loss: [0.3951], avg loss: [0.3669], time: [105.6435ms]\n",
+      "Epoch: [  4/ 10], step: [  271/  390], loss: [0.3147], avg loss: [0.3667], time: [107.1754ms]\n",
+      "Epoch: [  4/ 10], step: [  272/  390], loss: [0.3204], avg loss: [0.3665], time: [104.6278ms]\n",
+      "Epoch: [  4/ 10], step: [  273/  390], loss: [0.5377], avg loss: [0.3671], time: [105.9399ms]\n",
+      "Epoch: [  4/ 10], step: [  274/  390], loss: [0.3847], avg loss: [0.3672], time: [109.0810ms]\n",
+      "Epoch: [  4/ 10], step: [  275/  390], loss: [0.4134], avg loss: [0.3674], time: [105.8214ms]\n",
+      "Epoch: [  4/ 10], step: [  276/  390], loss: [0.3202], avg loss: [0.3672], time: [106.6880ms]\n",
+      "Epoch: [  4/ 10], step: [  277/  390], loss: [0.3618], avg loss: [0.3672], time: [101.3110ms]\n",
+      "Epoch: [  4/ 10], step: [  278/  390], loss: [0.4502], avg loss: [0.3675], time: [104.4438ms]\n",
+      "Epoch: [  4/ 10], step: [  279/  390], loss: [0.3401], avg loss: [0.3674], time: [104.1548ms]\n",
+      "Epoch: [  4/ 10], step: [  280/  390], loss: [0.4656], avg loss: [0.3677], time: [106.7429ms]\n",
+      "Epoch: [  4/ 10], step: [  281/  390], loss: [0.4343], avg loss: [0.3680], time: [107.2030ms]\n",
+      "Epoch: [  4/ 10], step: [  282/  390], loss: [0.3462], avg loss: [0.3679], time: [104.3711ms]\n",
+      "Epoch: [  4/ 10], step: [  283/  390], loss: [0.3591], avg loss: [0.3679], time: [103.3731ms]\n",
+      "Epoch: [  4/ 10], step: [  284/  390], loss: [0.2983], avg loss: [0.3676], time: [103.8666ms]\n",
+      "Epoch: [  4/ 10], step: [  285/  390], loss: [0.4017], avg loss: [0.3677], time: [104.7235ms]\n",
+      "Epoch: [  4/ 10], step: [  286/  390], loss: [0.2940], avg loss: [0.3675], time: [105.6240ms]\n",
+      "Epoch: [  4/ 10], step: [  287/  390], loss: [0.4052], avg loss: [0.3676], time: [107.6298ms]\n",
+      "Epoch: [  4/ 10], step: [  288/  390], loss: [0.2970], avg loss: [0.3674], time: [106.0195ms]\n",
+      "Epoch: [  4/ 10], step: [  289/  390], loss: [0.4640], avg loss: [0.3677], time: [108.1021ms]\n",
+      "Epoch: [  4/ 10], step: [  290/  390], loss: [0.2613], avg loss: [0.3673], time: [105.8490ms]\n",
+      "Epoch: [  4/ 10], step: [  291/  390], loss: [0.2677], avg loss: [0.3670], time: [106.0126ms]\n",
+      "Epoch: [  4/ 10], step: [  292/  390], loss: [0.3928], avg loss: [0.3671], time: [104.8071ms]\n",
+      "Epoch: [  4/ 10], step: [  293/  390], loss: [0.3033], avg loss: [0.3669], time: [107.0890ms]\n",
+      "Epoch: [  4/ 10], step: [  294/  390], loss: [0.3590], avg loss: [0.3668], time: [105.8483ms]\n",
+      "Epoch: [  4/ 10], step: [  295/  390], loss: [0.6220], avg loss: [0.3677], time: [108.0527ms]\n",
+      "Epoch: [  4/ 10], step: [  296/  390], loss: [0.4165], avg loss: [0.3679], time: [109.5693ms]\n",
+      "Epoch: [  4/ 10], step: [  297/  390], loss: [0.3620], avg loss: [0.3679], time: [108.3882ms]\n",
+      "Epoch: [  4/ 10], step: [  298/  390], loss: [0.3527], avg loss: [0.3678], time: [104.9139ms]\n",
+      "Epoch: [  4/ 10], step: [  299/  390], loss: [0.3109], avg loss: [0.3676], time: [107.6729ms]\n",
+      "Epoch: [  4/ 10], step: [  300/  390], loss: [0.4211], avg loss: [0.3678], time: [105.7062ms]\n",
+      "Epoch: [  4/ 10], step: [  301/  390], loss: [0.3927], avg loss: [0.3679], time: [107.1277ms]\n",
+      "Epoch: [  4/ 10], step: [  302/  390], loss: [0.3385], avg loss: [0.3678], time: [105.7725ms]\n",
+      "Epoch: [  4/ 10], step: [  303/  390], loss: [0.3242], avg loss: [0.3676], time: [104.2752ms]\n",
+      "Epoch: [  4/ 10], step: [  304/  390], loss: [0.3999], avg loss: [0.3677], time: [106.4508ms]\n",
+      "Epoch: [  4/ 10], step: [  305/  390], loss: [0.2473], avg loss: [0.3673], time: [107.9853ms]\n",
+      "Epoch: [  4/ 10], step: [  306/  390], loss: [0.4007], avg loss: [0.3675], time: [106.3206ms]\n",
+      "Epoch: [  4/ 10], step: [  307/  390], loss: [0.3748], avg loss: [0.3675], time: [105.5491ms]\n",
+      "Epoch: [  4/ 10], step: [  308/  390], loss: [0.3003], avg loss: [0.3673], time: [104.3818ms]\n",
+      "Epoch: [  4/ 10], step: [  309/  390], loss: [0.4165], avg loss: [0.3674], time: [106.9283ms]\n",
+      "Epoch: [  4/ 10], step: [  310/  390], loss: [0.2449], avg loss: [0.3670], time: [106.4966ms]\n",
+      "Epoch: [  4/ 10], step: [  311/  390], loss: [0.3170], avg loss: [0.3669], time: [106.8151ms]\n",
+      "Epoch: [  4/ 10], step: [  312/  390], loss: [0.3388], avg loss: [0.3668], time: [106.7402ms]\n",
+      "Epoch: [  4/ 10], step: [  313/  390], loss: [0.5385], avg loss: [0.3673], time: [103.4353ms]\n",
+      "Epoch: [  4/ 10], step: [  314/  390], loss: [0.3794], avg loss: [0.3674], time: [104.6982ms]\n",
+      "Epoch: [  4/ 10], step: [  315/  390], loss: [0.2365], avg loss: [0.3669], time: [107.2829ms]\n",
+      "Epoch: [  4/ 10], step: [  316/  390], loss: [0.4281], avg loss: [0.3671], time: [104.3134ms]\n",
+      "Epoch: [  4/ 10], step: [  317/  390], loss: [0.3258], avg loss: [0.3670], time: [107.7969ms]\n",
+      "Epoch: [  4/ 10], step: [  318/  390], loss: [0.4437], avg loss: [0.3672], time: [107.4538ms]\n",
+      "Epoch: [  4/ 10], step: [  319/  390], loss: [0.3517], avg loss: [0.3672], time: [106.1678ms]\n",
+      "Epoch: [  4/ 10], step: [  320/  390], loss: [0.3266], avg loss: [0.3671], time: [107.7993ms]\n",
+      "Epoch: [  4/ 10], step: [  321/  390], loss: [0.3717], avg loss: [0.3671], time: [105.9566ms]\n",
+      "Epoch: [  4/ 10], step: [  322/  390], loss: [0.4069], avg loss: [0.3672], time: [108.2475ms]\n",
+      "Epoch: [  4/ 10], step: [  323/  390], loss: [0.3395], avg loss: [0.3671], time: [108.8541ms]\n",
+      "Epoch: [  4/ 10], step: [  324/  390], loss: [0.4231], avg loss: [0.3673], time: [105.6015ms]\n",
+      "Epoch: [  4/ 10], step: [  325/  390], loss: [0.4355], avg loss: [0.3675], time: [108.5124ms]\n",
+      "Epoch: [  4/ 10], step: [  326/  390], loss: [0.2874], avg loss: [0.3673], time: [105.7117ms]\n",
+      "Epoch: [  4/ 10], step: [  327/  390], loss: [0.3945], avg loss: [0.3673], time: [109.5996ms]\n",
+      "Epoch: [  4/ 10], step: [  328/  390], loss: [0.3845], avg loss: [0.3674], time: [106.7383ms]\n",
+      "Epoch: [  4/ 10], step: [  329/  390], loss: [0.4375], avg loss: [0.3676], time: [107.0645ms]\n",
+      "Epoch: [  4/ 10], step: [  330/  390], loss: [0.3023], avg loss: [0.3674], time: [108.0580ms]\n",
+      "Epoch: [  4/ 10], step: [  331/  390], loss: [0.4047], avg loss: [0.3675], time: [106.3502ms]\n",
+      "Epoch: [  4/ 10], step: [  332/  390], loss: [0.3946], avg loss: [0.3676], time: [107.4476ms]\n",
+      "Epoch: [  4/ 10], step: [  333/  390], loss: [0.3176], avg loss: [0.3675], time: [105.3164ms]\n",
+      "Epoch: [  4/ 10], step: [  334/  390], loss: [0.4214], avg loss: [0.3676], time: [104.3229ms]\n",
+      "Epoch: [  4/ 10], step: [  335/  390], loss: [0.4775], avg loss: [0.3679], time: [108.3291ms]\n",
+      "Epoch: [  4/ 10], step: [  336/  390], loss: [0.3526], avg loss: [0.3679], time: [108.1722ms]\n",
+      "Epoch: [  4/ 10], step: [  337/  390], loss: [0.4519], avg loss: [0.3681], time: [110.6203ms]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: [  4/ 10], step: [  338/  390], loss: [0.4460], avg loss: [0.3684], time: [105.4652ms]\n",
+      "Epoch: [  4/ 10], step: [  339/  390], loss: [0.3561], avg loss: [0.3683], time: [105.1250ms]\n",
+      "Epoch: [  4/ 10], step: [  340/  390], loss: [0.5193], avg loss: [0.3688], time: [104.8603ms]\n",
+      "Epoch: [  4/ 10], step: [  341/  390], loss: [0.4446], avg loss: [0.3690], time: [105.6721ms]\n",
+      "Epoch: [  4/ 10], step: [  342/  390], loss: [0.3434], avg loss: [0.3689], time: [105.7959ms]\n",
+      "Epoch: [  4/ 10], step: [  343/  390], loss: [0.3595], avg loss: [0.3689], time: [105.6604ms]\n",
+      "Epoch: [  4/ 10], step: [  344/  390], loss: [0.4241], avg loss: [0.3691], time: [107.3353ms]\n",
+      "Epoch: [  4/ 10], step: [  345/  390], loss: [0.2956], avg loss: [0.3689], time: [110.8987ms]\n",
+      "Epoch: [  4/ 10], step: [  346/  390], loss: [0.3377], avg loss: [0.3688], time: [107.3465ms]\n",
+      "Epoch: [  4/ 10], step: [  347/  390], loss: [0.3574], avg loss: [0.3687], time: [109.6387ms]\n",
+      "Epoch: [  4/ 10], step: [  348/  390], loss: [0.4708], avg loss: [0.3690], time: [106.5981ms]\n",
+      "Epoch: [  4/ 10], step: [  349/  390], loss: [0.4382], avg loss: [0.3692], time: [108.3109ms]\n",
+      "Epoch: [  4/ 10], step: [  350/  390], loss: [0.3674], avg loss: [0.3692], time: [106.4668ms]\n",
+      "Epoch: [  4/ 10], step: [  351/  390], loss: [0.5617], avg loss: [0.3698], time: [107.5759ms]\n",
+      "Epoch: [  4/ 10], step: [  352/  390], loss: [0.3479], avg loss: [0.3697], time: [104.9974ms]\n",
+      "Epoch: [  4/ 10], step: [  353/  390], loss: [0.4457], avg loss: [0.3699], time: [104.9132ms]\n",
+      "Epoch: [  4/ 10], step: [  354/  390], loss: [0.4470], avg loss: [0.3701], time: [104.7776ms]\n",
+      "Epoch: [  4/ 10], step: [  355/  390], loss: [0.3042], avg loss: [0.3700], time: [105.3958ms]\n",
+      "Epoch: [  4/ 10], step: [  356/  390], loss: [0.4274], avg loss: [0.3701], time: [103.8568ms]\n",
+      "Epoch: [  4/ 10], step: [  357/  390], loss: [0.3954], avg loss: [0.3702], time: [107.7995ms]\n",
+      "Epoch: [  4/ 10], step: [  358/  390], loss: [0.3816], avg loss: [0.3702], time: [106.2849ms]\n",
+      "Epoch: [  4/ 10], step: [  359/  390], loss: [0.3290], avg loss: [0.3701], time: [109.7863ms]\n",
+      "Epoch: [  4/ 10], step: [  360/  390], loss: [0.3382], avg loss: [0.3700], time: [105.7122ms]\n",
+      "Epoch: [  4/ 10], step: [  361/  390], loss: [0.4071], avg loss: [0.3701], time: [106.6349ms]\n",
+      "Epoch: [  4/ 10], step: [  362/  390], loss: [0.3767], avg loss: [0.3701], time: [104.5048ms]\n",
+      "Epoch: [  4/ 10], step: [  363/  390], loss: [0.4927], avg loss: [0.3705], time: [106.5168ms]\n",
+      "Epoch: [  4/ 10], step: [  364/  390], loss: [0.3349], avg loss: [0.3704], time: [105.2248ms]\n",
+      "Epoch: [  4/ 10], step: [  365/  390], loss: [0.3436], avg loss: [0.3703], time: [106.6382ms]\n",
+      "Epoch: [  4/ 10], step: [  366/  390], loss: [0.2961], avg loss: [0.3701], time: [104.9166ms]\n",
+      "Epoch: [  4/ 10], step: [  367/  390], loss: [0.2820], avg loss: [0.3699], time: [106.4293ms]\n",
+      "Epoch: [  4/ 10], step: [  368/  390], loss: [0.3242], avg loss: [0.3697], time: [104.6309ms]\n",
+      "Epoch: [  4/ 10], step: [  369/  390], loss: [0.3750], avg loss: [0.3697], time: [108.9773ms]\n",
+      "Epoch: [  4/ 10], step: [  370/  390], loss: [0.4032], avg loss: [0.3698], time: [106.9911ms]\n",
+      "Epoch: [  4/ 10], step: [  371/  390], loss: [0.2909], avg loss: [0.3696], time: [109.1559ms]\n",
+      "Epoch: [  4/ 10], step: [  372/  390], loss: [0.3955], avg loss: [0.3697], time: [108.0177ms]\n",
+      "Epoch: [  4/ 10], step: [  373/  390], loss: [0.2918], avg loss: [0.3695], time: [110.2009ms]\n",
+      "Epoch: [  4/ 10], step: [  374/  390], loss: [0.3997], avg loss: [0.3696], time: [106.7991ms]\n",
+      "Epoch: [  4/ 10], step: [  375/  390], loss: [0.3154], avg loss: [0.3694], time: [108.1967ms]\n",
+      "Epoch: [  4/ 10], step: [  376/  390], loss: [0.3779], avg loss: [0.3694], time: [107.9221ms]\n",
+      "Epoch: [  4/ 10], step: [  377/  390], loss: [0.3876], avg loss: [0.3695], time: [105.7892ms]\n",
+      "Epoch: [  4/ 10], step: [  378/  390], loss: [0.5116], avg loss: [0.3699], time: [103.4391ms]\n",
+      "Epoch: [  4/ 10], step: [  379/  390], loss: [0.2980], avg loss: [0.3697], time: [106.8671ms]\n",
+      "Epoch: [  4/ 10], step: [  380/  390], loss: [0.2813], avg loss: [0.3694], time: [105.5164ms]\n",
+      "Epoch: [  4/ 10], step: [  381/  390], loss: [0.2438], avg loss: [0.3691], time: [108.1378ms]\n",
+      "Epoch: [  4/ 10], step: [  382/  390], loss: [0.3873], avg loss: [0.3692], time: [106.9922ms]\n",
+      "Epoch: [  4/ 10], step: [  383/  390], loss: [0.3675], avg loss: [0.3692], time: [108.4902ms]\n",
+      "Epoch: [  4/ 10], step: [  384/  390], loss: [0.4243], avg loss: [0.3693], time: [104.8622ms]\n",
+      "Epoch: [  4/ 10], step: [  385/  390], loss: [0.3276], avg loss: [0.3692], time: [107.1630ms]\n",
+      "Epoch: [  4/ 10], step: [  386/  390], loss: [0.2505], avg loss: [0.3689], time: [105.1219ms]\n",
+      "Epoch: [  4/ 10], step: [  387/  390], loss: [0.2351], avg loss: [0.3685], time: [108.8781ms]\n",
+      "Epoch: [  4/ 10], step: [  388/  390], loss: [0.2487], avg loss: [0.3682], time: [105.7706ms]\n",
+      "Epoch: [  4/ 10], step: [  389/  390], loss: [0.3252], avg loss: [0.3681], time: [103.2357ms]\n",
+      "Epoch: [  4/ 10], step: [  390/  390], loss: [0.3969], avg loss: [0.3682], time: [920.7304ms]\n",
+      "Epoch time: 42451.183, per step time: 108.849\n",
+      "Epoch time: 42451.503, per step time: 108.850, avg loss: 0.368\n",
+      "************************************************************\n",
+      "Epoch: [  5/ 10], step: [    1/  390], loss: [0.2794], avg loss: [0.2794], time: [98.0172ms]\n",
+      "Epoch: [  5/ 10], step: [    2/  390], loss: [0.2933], avg loss: [0.2863], time: [99.0131ms]\n",
+      "Epoch: [  5/ 10], step: [    3/  390], loss: [0.3252], avg loss: [0.2993], time: [99.0434ms]\n",
+      "Epoch: [  5/ 10], step: [    4/  390], loss: [0.4135], avg loss: [0.3279], time: [102.1309ms]\n",
+      "Epoch: [  5/ 10], step: [    5/  390], loss: [0.3011], avg loss: [0.3225], time: [98.5022ms]\n",
+      "Epoch: [  5/ 10], step: [    6/  390], loss: [0.2266], avg loss: [0.3065], time: [98.1512ms]\n",
+      "Epoch: [  5/ 10], step: [    7/  390], loss: [0.3133], avg loss: [0.3075], time: [100.8203ms]\n",
+      "Epoch: [  5/ 10], step: [    8/  390], loss: [0.3449], avg loss: [0.3122], time: [100.8372ms]\n",
+      "Epoch: [  5/ 10], step: [    9/  390], loss: [0.3031], avg loss: [0.3112], time: [99.9768ms]\n",
+      "Epoch: [  5/ 10], step: [   10/  390], loss: [0.3289], avg loss: [0.3129], time: [101.7046ms]\n",
+      "Epoch: [  5/ 10], step: [   11/  390], loss: [0.3923], avg loss: [0.3201], time: [98.9933ms]\n",
+      "Epoch: [  5/ 10], step: [   12/  390], loss: [0.3127], avg loss: [0.3195], time: [99.1669ms]\n",
+      "Epoch: [  5/ 10], step: [   13/  390], loss: [0.3678], avg loss: [0.3232], time: [102.7796ms]\n",
+      "Epoch: [  5/ 10], step: [   14/  390], loss: [0.3622], avg loss: [0.3260], time: [98.2735ms]\n",
+      "Epoch: [  5/ 10], step: [   15/  390], loss: [0.2448], avg loss: [0.3206], time: [101.1198ms]\n",
+      "Epoch: [  5/ 10], step: [   16/  390], loss: [0.2788], avg loss: [0.3180], time: [101.7115ms]\n",
+      "Epoch: [  5/ 10], step: [   17/  390], loss: [0.3236], avg loss: [0.3183], time: [101.3196ms]\n",
+      "Epoch: [  5/ 10], step: [   18/  390], loss: [0.4522], avg loss: [0.3258], time: [101.2344ms]\n",
+      "Epoch: [  5/ 10], step: [   19/  390], loss: [0.2819], avg loss: [0.3234], time: [97.2536ms]\n",
+      "Epoch: [  5/ 10], step: [   20/  390], loss: [0.2288], avg loss: [0.3187], time: [99.6263ms]\n",
+      "Epoch: [  5/ 10], step: [   21/  390], loss: [0.2689], avg loss: [0.3163], time: [100.7798ms]\n",
+      "Epoch: [  5/ 10], step: [   22/  390], loss: [0.4091], avg loss: [0.3206], time: [102.3359ms]\n",
+      "Epoch: [  5/ 10], step: [   23/  390], loss: [0.2462], avg loss: [0.3173], time: [97.8270ms]\n",
+      "Epoch: [  5/ 10], step: [   24/  390], loss: [0.3900], avg loss: [0.3203], time: [102.9096ms]\n",
+      "Epoch: [  5/ 10], step: [   25/  390], loss: [0.3287], avg loss: [0.3207], time: [102.8814ms]\n",
+      "Epoch: [  5/ 10], step: [   26/  390], loss: [0.3620], avg loss: [0.3223], time: [101.8305ms]\n",
+      "Epoch: [  5/ 10], step: [   27/  390], loss: [0.3002], avg loss: [0.3215], time: [97.9817ms]\n",
+      "Epoch: [  5/ 10], step: [   28/  390], loss: [0.2733], avg loss: [0.3197], time: [102.3424ms]\n",
+      "Epoch: [  5/ 10], step: [   29/  390], loss: [0.3498], avg loss: [0.3208], time: [99.0362ms]\n",
+      "Epoch: [  5/ 10], step: [   30/  390], loss: [0.3848], avg loss: [0.3229], time: [100.5993ms]\n",
+      "Epoch: [  5/ 10], step: [   31/  390], loss: [0.3515], avg loss: [0.3238], time: [99.0884ms]\n",
+      "Epoch: [  5/ 10], step: [   32/  390], loss: [0.3267], avg loss: [0.3239], time: [102.6182ms]\n",
+      "Epoch: [  5/ 10], step: [   33/  390], loss: [0.2962], avg loss: [0.3231], time: [103.2467ms]\n",
+      "Epoch: [  5/ 10], step: [   34/  390], loss: [0.3273], avg loss: [0.3232], time: [100.3430ms]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: [  5/ 10], step: [   35/  390], loss: [0.3577], avg loss: [0.3242], time: [100.6477ms]\n",
+      "Epoch: [  5/ 10], step: [   36/  390], loss: [0.4371], avg loss: [0.3273], time: [100.9886ms]\n",
+      "Epoch: [  5/ 10], step: [   37/  390], loss: [0.4086], avg loss: [0.3295], time: [100.7073ms]\n",
+      "Epoch: [  5/ 10], step: [   38/  390], loss: [0.1705], avg loss: [0.3253], time: [101.3937ms]\n",
+      "Epoch: [  5/ 10], step: [   39/  390], loss: [0.3365], avg loss: [0.3256], time: [97.3103ms]\n",
+      "Epoch: [  5/ 10], step: [   40/  390], loss: [0.3910], avg loss: [0.3273], time: [100.9321ms]\n",
+      "Epoch: [  5/ 10], step: [   41/  390], loss: [0.3509], avg loss: [0.3278], time: [97.9929ms]\n",
+      "Epoch: [  5/ 10], step: [   42/  390], loss: [0.4014], avg loss: [0.3296], time: [98.8083ms]\n",
+      "Epoch: [  5/ 10], step: [   43/  390], loss: [0.2674], avg loss: [0.3281], time: [103.3001ms]\n",
+      "Epoch: [  5/ 10], step: [   44/  390], loss: [0.3730], avg loss: [0.3292], time: [99.5758ms]\n",
+      "Epoch: [  5/ 10], step: [   45/  390], loss: [0.2710], avg loss: [0.3279], time: [102.5946ms]\n",
+      "Epoch: [  5/ 10], step: [   46/  390], loss: [0.2464], avg loss: [0.3261], time: [98.9909ms]\n",
+      "Epoch: [  5/ 10], step: [   47/  390], loss: [0.3998], avg loss: [0.3277], time: [98.2902ms]\n",
+      "Epoch: [  5/ 10], step: [   48/  390], loss: [0.2825], avg loss: [0.3267], time: [101.1927ms]\n",
+      "Epoch: [  5/ 10], step: [   49/  390], loss: [0.2899], avg loss: [0.3260], time: [103.0283ms]\n",
+      "Epoch: [  5/ 10], step: [   50/  390], loss: [0.2653], avg loss: [0.3248], time: [102.0355ms]\n",
+      "Epoch: [  5/ 10], step: [   51/  390], loss: [0.3137], avg loss: [0.3245], time: [99.5808ms]\n",
+      "Epoch: [  5/ 10], step: [   52/  390], loss: [0.2977], avg loss: [0.3240], time: [102.6967ms]\n",
+      "Epoch: [  5/ 10], step: [   53/  390], loss: [0.1626], avg loss: [0.3210], time: [101.0261ms]\n",
+      "Epoch: [  5/ 10], step: [   54/  390], loss: [0.3451], avg loss: [0.3214], time: [100.6527ms]\n",
+      "Epoch: [  5/ 10], step: [   55/  390], loss: [0.4533], avg loss: [0.3238], time: [101.9166ms]\n",
+      "Epoch: [  5/ 10], step: [   56/  390], loss: [0.3027], avg loss: [0.3234], time: [99.3133ms]\n",
+      "Epoch: [  5/ 10], step: [   57/  390], loss: [0.3573], avg loss: [0.3240], time: [102.1547ms]\n",
+      "Epoch: [  5/ 10], step: [   58/  390], loss: [0.2549], avg loss: [0.3229], time: [101.6333ms]\n",
+      "Epoch: [  5/ 10], step: [   59/  390], loss: [0.3431], avg loss: [0.3232], time: [104.0702ms]\n",
+      "Epoch: [  5/ 10], step: [   60/  390], loss: [0.3799], avg loss: [0.3241], time: [101.0134ms]\n",
+      "Epoch: [  5/ 10], step: [   61/  390], loss: [0.2788], avg loss: [0.3234], time: [101.4233ms]\n",
+      "Epoch: [  5/ 10], step: [   62/  390], loss: [0.2534], avg loss: [0.3223], time: [99.6974ms]\n",
+      "Epoch: [  5/ 10], step: [   63/  390], loss: [0.4903], avg loss: [0.3249], time: [102.4032ms]\n",
+      "Epoch: [  5/ 10], step: [   64/  390], loss: [0.3201], avg loss: [0.3249], time: [99.7505ms]\n",
+      "Epoch: [  5/ 10], step: [   65/  390], loss: [0.3645], avg loss: [0.3255], time: [98.7120ms]\n",
+      "Epoch: [  5/ 10], step: [   66/  390], loss: [0.2357], avg loss: [0.3241], time: [101.2895ms]\n",
+      "Epoch: [  5/ 10], step: [   67/  390], loss: [0.3705], avg loss: [0.3248], time: [97.6963ms]\n",
+      "Epoch: [  5/ 10], step: [   68/  390], loss: [0.1633], avg loss: [0.3224], time: [102.0422ms]\n",
+      "Epoch: [  5/ 10], step: [   69/  390], loss: [0.2591], avg loss: [0.3215], time: [104.1415ms]\n",
+      "Epoch: [  5/ 10], step: [   70/  390], loss: [0.3557], avg loss: [0.3220], time: [102.0610ms]\n",
+      "Epoch: [  5/ 10], step: [   71/  390], loss: [0.2731], avg loss: [0.3213], time: [103.0972ms]\n",
+      "Epoch: [  5/ 10], step: [   72/  390], loss: [0.4700], avg loss: [0.3234], time: [99.8461ms]\n",
+      "Epoch: [  5/ 10], step: [   73/  390], loss: [0.3538], avg loss: [0.3238], time: [98.6857ms]\n",
+      "Epoch: [  5/ 10], step: [   74/  390], loss: [0.2912], avg loss: [0.3233], time: [100.5046ms]\n",
+      "Epoch: [  5/ 10], step: [   75/  390], loss: [0.3697], avg loss: [0.3240], time: [99.0720ms]\n",
+      "Epoch: [  5/ 10], step: [   76/  390], loss: [0.4126], avg loss: [0.3251], time: [100.3754ms]\n",
+      "Epoch: [  5/ 10], step: [   77/  390], loss: [0.4306], avg loss: [0.3265], time: [102.3202ms]\n",
+      "Epoch: [  5/ 10], step: [   78/  390], loss: [0.3097], avg loss: [0.3263], time: [101.7010ms]\n",
+      "Epoch: [  5/ 10], step: [   79/  390], loss: [0.2506], avg loss: [0.3253], time: [99.2160ms]\n",
+      "Epoch: [  5/ 10], step: [   80/  390], loss: [0.3555], avg loss: [0.3257], time: [100.8434ms]\n",
+      "Epoch: [  5/ 10], step: [   81/  390], loss: [0.4372], avg loss: [0.3271], time: [100.6939ms]\n",
+      "Epoch: [  5/ 10], step: [   82/  390], loss: [0.3791], avg loss: [0.3277], time: [98.4166ms]\n",
+      "Epoch: [  5/ 10], step: [   83/  390], loss: [0.3631], avg loss: [0.3281], time: [104.9848ms]\n",
+      "Epoch: [  5/ 10], step: [   84/  390], loss: [0.2663], avg loss: [0.3274], time: [102.7405ms]\n",
+      "Epoch: [  5/ 10], step: [   85/  390], loss: [0.4309], avg loss: [0.3286], time: [99.1342ms]\n",
+      "Epoch: [  5/ 10], step: [   86/  390], loss: [0.3595], avg loss: [0.3290], time: [101.1102ms]\n",
+      "Epoch: [  5/ 10], step: [   87/  390], loss: [0.3064], avg loss: [0.3287], time: [99.0789ms]\n",
+      "Epoch: [  5/ 10], step: [   88/  390], loss: [0.3514], avg loss: [0.3290], time: [102.4284ms]\n",
+      "Epoch: [  5/ 10], step: [   89/  390], loss: [0.3699], avg loss: [0.3294], time: [104.1958ms]\n",
+      "Epoch: [  5/ 10], step: [   90/  390], loss: [0.4920], avg loss: [0.3313], time: [102.2644ms]\n",
+      "Epoch: [  5/ 10], step: [   91/  390], loss: [0.2617], avg loss: [0.3305], time: [104.6135ms]\n",
+      "Epoch: [  5/ 10], step: [   92/  390], loss: [0.3189], avg loss: [0.3304], time: [99.1240ms]\n",
+      "Epoch: [  5/ 10], step: [   93/  390], loss: [0.2781], avg loss: [0.3298], time: [101.6312ms]\n",
+      "Epoch: [  5/ 10], step: [   94/  390], loss: [0.2895], avg loss: [0.3294], time: [101.8772ms]\n",
+      "Epoch: [  5/ 10], step: [   95/  390], loss: [0.2069], avg loss: [0.3281], time: [97.9164ms]\n",
+      "Epoch: [  5/ 10], step: [   96/  390], loss: [0.4565], avg loss: [0.3294], time: [99.3772ms]\n",
+      "Epoch: [  5/ 10], step: [   97/  390], loss: [0.2529], avg loss: [0.3286], time: [99.6089ms]\n",
+      "Epoch: [  5/ 10], step: [   98/  390], loss: [0.2671], avg loss: [0.3280], time: [103.4021ms]\n",
+      "Epoch: [  5/ 10], step: [   99/  390], loss: [0.2349], avg loss: [0.3271], time: [103.9906ms]\n",
+      "Epoch: [  5/ 10], step: [  100/  390], loss: [0.5263], avg loss: [0.3291], time: [102.5908ms]\n",
+      "Epoch: [  5/ 10], step: [  101/  390], loss: [0.4659], avg loss: [0.3304], time: [103.2996ms]\n",
+      "Epoch: [  5/ 10], step: [  102/  390], loss: [0.2615], avg loss: [0.3297], time: [101.5222ms]\n",
+      "Epoch: [  5/ 10], step: [  103/  390], loss: [0.4434], avg loss: [0.3308], time: [98.7415ms]\n",
+      "Epoch: [  5/ 10], step: [  104/  390], loss: [0.3079], avg loss: [0.3306], time: [99.9544ms]\n",
+      "Epoch: [  5/ 10], step: [  105/  390], loss: [0.4543], avg loss: [0.3318], time: [98.9020ms]\n",
+      "Epoch: [  5/ 10], step: [  106/  390], loss: [0.4415], avg loss: [0.3328], time: [103.9250ms]\n",
+      "Epoch: [  5/ 10], step: [  107/  390], loss: [0.2911], avg loss: [0.3324], time: [98.3868ms]\n",
+      "Epoch: [  5/ 10], step: [  108/  390], loss: [0.2849], avg loss: [0.3320], time: [98.7260ms]\n",
+      "Epoch: [  5/ 10], step: [  109/  390], loss: [0.2857], avg loss: [0.3316], time: [103.0591ms]\n",
+      "Epoch: [  5/ 10], step: [  110/  390], loss: [0.4117], avg loss: [0.3323], time: [100.4229ms]\n",
+      "Epoch: [  5/ 10], step: [  111/  390], loss: [0.3222], avg loss: [0.3322], time: [102.7026ms]\n",
+      "Epoch: [  5/ 10], step: [  112/  390], loss: [0.3745], avg loss: [0.3326], time: [103.8859ms]\n",
+      "Epoch: [  5/ 10], step: [  113/  390], loss: [0.3251], avg loss: [0.3325], time: [104.0967ms]\n",
+      "Epoch: [  5/ 10], step: [  114/  390], loss: [0.3649], avg loss: [0.3328], time: [97.3947ms]\n",
+      "Epoch: [  5/ 10], step: [  115/  390], loss: [0.4835], avg loss: [0.3341], time: [98.9718ms]\n",
+      "Epoch: [  5/ 10], step: [  116/  390], loss: [0.3027], avg loss: [0.3338], time: [97.8193ms]\n",
+      "Epoch: [  5/ 10], step: [  117/  390], loss: [0.2808], avg loss: [0.3334], time: [101.1531ms]\n",
+      "Epoch: [  5/ 10], step: [  118/  390], loss: [0.4715], avg loss: [0.3346], time: [100.4035ms]\n",
+      "Epoch: [  5/ 10], step: [  119/  390], loss: [0.2866], avg loss: [0.3342], time: [100.5137ms]\n",
+      "Epoch: [  5/ 10], step: [  120/  390], loss: [0.2574], avg loss: [0.3335], time: [102.9720ms]\n",
+      "Epoch: [  5/ 10], step: [  121/  390], loss: [0.4101], avg loss: [0.3342], time: [103.4522ms]\n",
+      "Epoch: [  5/ 10], step: [  122/  390], loss: [0.4093], avg loss: [0.3348], time: [99.1914ms]\n",
+      "Epoch: [  5/ 10], step: [  123/  390], loss: [0.3165], avg loss: [0.3346], time: [102.6032ms]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: [  5/ 10], step: [  124/  390], loss: [0.3165], avg loss: [0.3345], time: [98.5579ms]\n",
+      "Epoch: [  5/ 10], step: [  125/  390], loss: [0.2910], avg loss: [0.3341], time: [104.8245ms]\n",
+      "Epoch: [  5/ 10], step: [  126/  390], loss: [0.4151], avg loss: [0.3348], time: [100.1546ms]\n",
+      "Epoch: [  5/ 10], step: [  127/  390], loss: [0.3650], avg loss: [0.3350], time: [98.5594ms]\n",
+      "Epoch: [  5/ 10], step: [  128/  390], loss: [0.4466], avg loss: [0.3359], time: [98.1710ms]\n",
+      "Epoch: [  5/ 10], step: [  129/  390], loss: [0.3491], avg loss: [0.3360], time: [102.2282ms]\n",
+      "Epoch: [  5/ 10], step: [  130/  390], loss: [0.3943], avg loss: [0.3364], time: [102.3917ms]\n",
+      "Epoch: [  5/ 10], step: [  131/  390], loss: [0.3831], avg loss: [0.3368], time: [102.0710ms]\n",
+      "Epoch: [  5/ 10], step: [  132/  390], loss: [0.3353], avg loss: [0.3368], time: [99.2439ms]\n",
+      "Epoch: [  5/ 10], step: [  133/  390], loss: [0.3608], avg loss: [0.3370], time: [99.8654ms]\n",
+      "Epoch: [  5/ 10], step: [  134/  390], loss: [0.3089], avg loss: [0.3367], time: [102.2341ms]\n",
+      "Epoch: [  5/ 10], step: [  135/  390], loss: [0.3661], avg loss: [0.3370], time: [101.4662ms]\n",
+      "Epoch: [  5/ 10], step: [  136/  390], loss: [0.2462], avg loss: [0.3363], time: [100.6999ms]\n",
+      "Epoch: [  5/ 10], step: [  137/  390], loss: [0.2555], avg loss: [0.3357], time: [100.4691ms]\n",
+      "Epoch: [  5/ 10], step: [  138/  390], loss: [0.3958], avg loss: [0.3361], time: [103.4663ms]\n",
+      "Epoch: [  5/ 10], step: [  139/  390], loss: [0.3909], avg loss: [0.3365], time: [103.5306ms]\n",
+      "Epoch: [  5/ 10], step: [  140/  390], loss: [0.4445], avg loss: [0.3373], time: [99.4577ms]\n",
+      "Epoch: [  5/ 10], step: [  141/  390], loss: [0.3978], avg loss: [0.3377], time: [102.4783ms]\n",
+      "Epoch: [  5/ 10], step: [  142/  390], loss: [0.4142], avg loss: [0.3383], time: [103.3499ms]\n",
+      "Epoch: [  5/ 10], step: [  143/  390], loss: [0.5226], avg loss: [0.3396], time: [101.3515ms]\n",
+      "Epoch: [  5/ 10], step: [  144/  390], loss: [0.4125], avg loss: [0.3401], time: [100.8694ms]\n",
+      "Epoch: [  5/ 10], step: [  145/  390], loss: [0.2795], avg loss: [0.3397], time: [98.3696ms]\n",
+      "Epoch: [  5/ 10], step: [  146/  390], loss: [0.3510], avg loss: [0.3397], time: [101.3107ms]\n",
+      "Epoch: [  5/ 10], step: [  147/  390], loss: [0.3275], avg loss: [0.3396], time: [100.5538ms]\n",
+      "Epoch: [  5/ 10], step: [  148/  390], loss: [0.5054], avg loss: [0.3408], time: [100.9009ms]\n",
+      "Epoch: [  5/ 10], step: [  149/  390], loss: [0.3694], avg loss: [0.3410], time: [99.3698ms]\n",
+      "Epoch: [  5/ 10], step: [  150/  390], loss: [0.5045], avg loss: [0.3420], time: [103.3928ms]\n",
+      "Epoch: [  5/ 10], step: [  151/  390], loss: [0.3543], avg loss: [0.3421], time: [99.5314ms]\n",
+      "Epoch: [  5/ 10], step: [  152/  390], loss: [0.3545], avg loss: [0.3422], time: [101.6512ms]\n",
+      "Epoch: [  5/ 10], step: [  153/  390], loss: [0.3695], avg loss: [0.3424], time: [103.3027ms]\n",
+      "Epoch: [  5/ 10], step: [  154/  390], loss: [0.3324], avg loss: [0.3423], time: [99.4759ms]\n",
+      "Epoch: [  5/ 10], step: [  155/  390], loss: [0.4030], avg loss: [0.3427], time: [102.0808ms]\n",
+      "Epoch: [  5/ 10], step: [  156/  390], loss: [0.3399], avg loss: [0.3427], time: [101.4318ms]\n",
+      "Epoch: [  5/ 10], step: [  157/  390], loss: [0.2697], avg loss: [0.3422], time: [98.8903ms]\n",
+      "Epoch: [  5/ 10], step: [  158/  390], loss: [0.3390], avg loss: [0.3422], time: [100.8635ms]\n",
+      "Epoch: [  5/ 10], step: [  159/  390], loss: [0.3495], avg loss: [0.3423], time: [98.5677ms]\n",
+      "Epoch: [  5/ 10], step: [  160/  390], loss: [0.3949], avg loss: [0.3426], time: [102.8528ms]\n",
+      "Epoch: [  5/ 10], step: [  161/  390], loss: [0.3042], avg loss: [0.3424], time: [103.6329ms]\n",
+      "Epoch: [  5/ 10], step: [  162/  390], loss: [0.2852], avg loss: [0.3420], time: [98.7141ms]\n",
+      "Epoch: [  5/ 10], step: [  163/  390], loss: [0.4251], avg loss: [0.3425], time: [101.6843ms]\n",
+      "Epoch: [  5/ 10], step: [  164/  390], loss: [0.2808], avg loss: [0.3421], time: [99.1621ms]\n",
+      "Epoch: [  5/ 10], step: [  165/  390], loss: [0.4844], avg loss: [0.3430], time: [101.5959ms]\n",
+      "Epoch: [  5/ 10], step: [  166/  390], loss: [0.3811], avg loss: [0.3432], time: [100.4267ms]\n",
+      "Epoch: [  5/ 10], step: [  167/  390], loss: [0.4935], avg loss: [0.3441], time: [104.2638ms]\n",
+      "Epoch: [  5/ 10], step: [  168/  390], loss: [0.3312], avg loss: [0.3440], time: [99.5424ms]\n",
+      "Epoch: [  5/ 10], step: [  169/  390], loss: [0.3287], avg loss: [0.3440], time: [100.5833ms]\n",
+      "Epoch: [  5/ 10], step: [  170/  390], loss: [0.2893], avg loss: [0.3436], time: [99.6578ms]\n",
+      "Epoch: [  5/ 10], step: [  171/  390], loss: [0.3934], avg loss: [0.3439], time: [100.3275ms]\n",
+      "Epoch: [  5/ 10], step: [  172/  390], loss: [0.3728], avg loss: [0.3441], time: [101.0427ms]\n",
+      "Epoch: [  5/ 10], step: [  173/  390], loss: [0.4014], avg loss: [0.3444], time: [100.5661ms]\n",
+      "Epoch: [  5/ 10], step: [  174/  390], loss: [0.3923], avg loss: [0.3447], time: [100.2057ms]\n",
+      "Epoch: [  5/ 10], step: [  175/  390], loss: [0.3733], avg loss: [0.3449], time: [103.4234ms]\n",
+      "Epoch: [  5/ 10], step: [  176/  390], loss: [0.2801], avg loss: [0.3445], time: [101.1100ms]\n",
+      "Epoch: [  5/ 10], step: [  177/  390], loss: [0.4638], avg loss: [0.3452], time: [99.8030ms]\n",
+      "Epoch: [  5/ 10], step: [  178/  390], loss: [0.4426], avg loss: [0.3457], time: [102.2031ms]\n",
+      "Epoch: [  5/ 10], step: [  179/  390], loss: [0.3452], avg loss: [0.3457], time: [101.6047ms]\n",
+      "Epoch: [  5/ 10], step: [  180/  390], loss: [0.4646], avg loss: [0.3464], time: [100.5373ms]\n",
+      "Epoch: [  5/ 10], step: [  181/  390], loss: [0.3066], avg loss: [0.3462], time: [103.2481ms]\n",
+      "Epoch: [  5/ 10], step: [  182/  390], loss: [0.3812], avg loss: [0.3463], time: [101.4385ms]\n",
+      "Epoch: [  5/ 10], step: [  183/  390], loss: [0.3036], avg loss: [0.3461], time: [101.9561ms]\n",
+      "Epoch: [  5/ 10], step: [  184/  390], loss: [0.3178], avg loss: [0.3460], time: [102.9384ms]\n",
+      "Epoch: [  5/ 10], step: [  185/  390], loss: [0.3505], avg loss: [0.3460], time: [101.8963ms]\n",
+      "Epoch: [  5/ 10], step: [  186/  390], loss: [0.4441], avg loss: [0.3465], time: [102.1221ms]\n",
+      "Epoch: [  5/ 10], step: [  187/  390], loss: [0.2443], avg loss: [0.3460], time: [98.6722ms]\n",
+      "Epoch: [  5/ 10], step: [  188/  390], loss: [0.3056], avg loss: [0.3457], time: [103.1771ms]\n",
+      "Epoch: [  5/ 10], step: [  189/  390], loss: [0.2921], avg loss: [0.3455], time: [99.9930ms]\n",
+      "Epoch: [  5/ 10], step: [  190/  390], loss: [0.2108], avg loss: [0.3448], time: [101.2313ms]\n",
+      "Epoch: [  5/ 10], step: [  191/  390], loss: [0.3682], avg loss: [0.3449], time: [99.5693ms]\n",
+      "Epoch: [  5/ 10], step: [  192/  390], loss: [0.3154], avg loss: [0.3447], time: [100.4727ms]\n",
+      "Epoch: [  5/ 10], step: [  193/  390], loss: [0.3327], avg loss: [0.3447], time: [99.0317ms]\n",
+      "Epoch: [  5/ 10], step: [  194/  390], loss: [0.3686], avg loss: [0.3448], time: [98.9630ms]\n",
+      "Epoch: [  5/ 10], step: [  195/  390], loss: [0.3824], avg loss: [0.3450], time: [98.0654ms]\n",
+      "Epoch: [  5/ 10], step: [  196/  390], loss: [0.2827], avg loss: [0.3447], time: [98.7494ms]\n",
+      "Epoch: [  5/ 10], step: [  197/  390], loss: [0.3519], avg loss: [0.3447], time: [101.0053ms]\n",
+      "Epoch: [  5/ 10], step: [  198/  390], loss: [0.2818], avg loss: [0.3444], time: [101.0916ms]\n",
+      "Epoch: [  5/ 10], step: [  199/  390], loss: [0.2671], avg loss: [0.3440], time: [101.9878ms]\n",
+      "Epoch: [  5/ 10], step: [  200/  390], loss: [0.2776], avg loss: [0.3437], time: [101.5325ms]\n",
+      "Epoch: [  5/ 10], step: [  201/  390], loss: [0.4823], avg loss: [0.3444], time: [105.1292ms]\n",
+      "Epoch: [  5/ 10], step: [  202/  390], loss: [0.2648], avg loss: [0.3440], time: [99.5777ms]\n",
+      "Epoch: [  5/ 10], step: [  203/  390], loss: [0.2620], avg loss: [0.3436], time: [102.6535ms]\n",
+      "Epoch: [  5/ 10], step: [  204/  390], loss: [0.3181], avg loss: [0.3434], time: [100.5137ms]\n",
+      "Epoch: [  5/ 10], step: [  205/  390], loss: [0.2479], avg loss: [0.3430], time: [101.0752ms]\n",
+      "Epoch: [  5/ 10], step: [  206/  390], loss: [0.4319], avg loss: [0.3434], time: [99.4537ms]\n",
+      "Epoch: [  5/ 10], step: [  207/  390], loss: [0.3991], avg loss: [0.3437], time: [102.4334ms]\n",
+      "Epoch: [  5/ 10], step: [  208/  390], loss: [0.3004], avg loss: [0.3435], time: [102.9854ms]\n",
+      "Epoch: [  5/ 10], step: [  209/  390], loss: [0.3004], avg loss: [0.3432], time: [100.8348ms]\n",
+      "Epoch: [  5/ 10], step: [  210/  390], loss: [0.3069], avg loss: [0.3431], time: [101.9218ms]\n",
+      "Epoch: [  5/ 10], step: [  211/  390], loss: [0.2957], avg loss: [0.3429], time: [98.5749ms]\n",
+      "Epoch: [  5/ 10], step: [  212/  390], loss: [0.2999], avg loss: [0.3426], time: [104.1019ms]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: [  5/ 10], step: [  213/  390], loss: [0.4016], avg loss: [0.3429], time: [103.5557ms]\n",
+      "Epoch: [  5/ 10], step: [  214/  390], loss: [0.2758], avg loss: [0.3426], time: [99.9570ms]\n",
+      "Epoch: [  5/ 10], step: [  215/  390], loss: [0.4611], avg loss: [0.3432], time: [102.7234ms]\n",
+      "Epoch: [  5/ 10], step: [  216/  390], loss: [0.3102], avg loss: [0.3430], time: [101.8171ms]\n",
+      "Epoch: [  5/ 10], step: [  217/  390], loss: [0.3919], avg loss: [0.3432], time: [104.2428ms]\n",
+      "Epoch: [  5/ 10], step: [  218/  390], loss: [0.3644], avg loss: [0.3433], time: [102.9439ms]\n",
+      "Epoch: [  5/ 10], step: [  219/  390], loss: [0.3343], avg loss: [0.3433], time: [101.6750ms]\n",
+      "Epoch: [  5/ 10], step: [  220/  390], loss: [0.3409], avg loss: [0.3433], time: [100.8224ms]\n",
+      "Epoch: [  5/ 10], step: [  221/  390], loss: [0.3408], avg loss: [0.3433], time: [100.2448ms]\n",
+      "Epoch: [  5/ 10], step: [  222/  390], loss: [0.3310], avg loss: [0.3432], time: [101.1682ms]\n",
+      "Epoch: [  5/ 10], step: [  223/  390], loss: [0.3425], avg loss: [0.3432], time: [103.1351ms]\n",
+      "Epoch: [  5/ 10], step: [  224/  390], loss: [0.2430], avg loss: [0.3428], time: [100.4946ms]\n",
+      "Epoch: [  5/ 10], step: [  225/  390], loss: [0.2700], avg loss: [0.3424], time: [99.5657ms]\n",
+      "Epoch: [  5/ 10], step: [  226/  390], loss: [0.4033], avg loss: [0.3427], time: [99.9317ms]\n",
+      "Epoch: [  5/ 10], step: [  227/  390], loss: [0.3329], avg loss: [0.3427], time: [103.3795ms]\n",
+      "Epoch: [  5/ 10], step: [  228/  390], loss: [0.4596], avg loss: [0.3432], time: [102.3130ms]\n",
+      "Epoch: [  5/ 10], step: [  229/  390], loss: [0.3272], avg loss: [0.3431], time: [99.1611ms]\n",
+      "Epoch: [  5/ 10], step: [  230/  390], loss: [0.2274], avg loss: [0.3426], time: [103.5240ms]\n",
+      "Epoch: [  5/ 10], step: [  231/  390], loss: [0.4503], avg loss: [0.3431], time: [97.9187ms]\n",
+      "Epoch: [  5/ 10], step: [  232/  390], loss: [0.2505], avg loss: [0.3427], time: [99.9558ms]\n",
+      "Epoch: [  5/ 10], step: [  233/  390], loss: [0.3719], avg loss: [0.3428], time: [100.0600ms]\n",
+      "Epoch: [  5/ 10], step: [  234/  390], loss: [0.2949], avg loss: [0.3426], time: [101.8853ms]\n",
+      "Epoch: [  5/ 10], step: [  235/  390], loss: [0.3854], avg loss: [0.3428], time: [100.0516ms]\n",
+      "Epoch: [  5/ 10], step: [  236/  390], loss: [0.5405], avg loss: [0.3436], time: [101.5785ms]\n",
+      "Epoch: [  5/ 10], step: [  237/  390], loss: [0.3014], avg loss: [0.3434], time: [100.5096ms]\n",
+      "Epoch: [  5/ 10], step: [  238/  390], loss: [0.3945], avg loss: [0.3437], time: [100.4689ms]\n",
+      "Epoch: [  5/ 10], step: [  239/  390], loss: [0.3244], avg loss: [0.3436], time: [98.5432ms]\n",
+      "Epoch: [  5/ 10], step: [  240/  390], loss: [0.4346], avg loss: [0.3440], time: [101.4802ms]\n",
+      "Epoch: [  5/ 10], step: [  241/  390], loss: [0.3247], avg loss: [0.3439], time: [102.1733ms]\n",
+      "Epoch: [  5/ 10], step: [  242/  390], loss: [0.4067], avg loss: [0.3441], time: [102.2019ms]\n",
+      "Epoch: [  5/ 10], step: [  243/  390], loss: [0.4058], avg loss: [0.3444], time: [98.9230ms]\n",
+      "Epoch: [  5/ 10], step: [  244/  390], loss: [0.3316], avg loss: [0.3443], time: [101.2557ms]\n",
+      "Epoch: [  5/ 10], step: [  245/  390], loss: [0.3552], avg loss: [0.3444], time: [104.2919ms]\n",
+      "Epoch: [  5/ 10], step: [  246/  390], loss: [0.2829], avg loss: [0.3441], time: [98.7475ms]\n",
+      "Epoch: [  5/ 10], step: [  247/  390], loss: [0.3828], avg loss: [0.3443], time: [99.9448ms]\n",
+      "Epoch: [  5/ 10], step: [  248/  390], loss: [0.3679], avg loss: [0.3444], time: [103.4460ms]\n",
+      "Epoch: [  5/ 10], step: [  249/  390], loss: [0.3295], avg loss: [0.3443], time: [103.2743ms]\n",
+      "Epoch: [  5/ 10], step: [  250/  390], loss: [0.2944], avg loss: [0.3441], time: [102.1883ms]\n",
+      "Epoch: [  5/ 10], step: [  251/  390], loss: [0.2622], avg loss: [0.3438], time: [104.0289ms]\n",
+      "Epoch: [  5/ 10], step: [  252/  390], loss: [0.4662], avg loss: [0.3443], time: [100.0810ms]\n",
+      "Epoch: [  5/ 10], step: [  253/  390], loss: [0.4145], avg loss: [0.3446], time: [99.1349ms]\n",
+      "Epoch: [  5/ 10], step: [  254/  390], loss: [0.2235], avg loss: [0.3441], time: [102.3924ms]\n",
+      "Epoch: [  5/ 10], step: [  255/  390], loss: [0.3826], avg loss: [0.3442], time: [101.0902ms]\n",
+      "Epoch: [  5/ 10], step: [  256/  390], loss: [0.4591], avg loss: [0.3447], time: [97.8513ms]\n",
+      "Epoch: [  5/ 10], step: [  257/  390], loss: [0.2777], avg loss: [0.3444], time: [102.4303ms]\n",
+      "Epoch: [  5/ 10], step: [  258/  390], loss: [0.3017], avg loss: [0.3443], time: [98.4631ms]\n",
+      "Epoch: [  5/ 10], step: [  259/  390], loss: [0.1980], avg loss: [0.3437], time: [99.7696ms]\n",
+      "Epoch: [  5/ 10], step: [  260/  390], loss: [0.3733], avg loss: [0.3438], time: [99.4329ms]\n",
+      "Epoch: [  5/ 10], step: [  261/  390], loss: [0.3896], avg loss: [0.3440], time: [103.1668ms]\n",
+      "Epoch: [  5/ 10], step: [  262/  390], loss: [0.3417], avg loss: [0.3440], time: [99.0276ms]\n",
+      "Epoch: [  5/ 10], step: [  263/  390], loss: [0.4144], avg loss: [0.3442], time: [103.0836ms]\n",
+      "Epoch: [  5/ 10], step: [  264/  390], loss: [0.3417], avg loss: [0.3442], time: [99.4034ms]\n",
+      "Epoch: [  5/ 10], step: [  265/  390], loss: [0.3956], avg loss: [0.3444], time: [102.0124ms]\n",
+      "Epoch: [  5/ 10], step: [  266/  390], loss: [0.4007], avg loss: [0.3446], time: [103.3103ms]\n",
+      "Epoch: [  5/ 10], step: [  267/  390], loss: [0.3253], avg loss: [0.3446], time: [99.8549ms]\n",
+      "Epoch: [  5/ 10], step: [  268/  390], loss: [0.3239], avg loss: [0.3445], time: [99.4136ms]\n",
+      "Epoch: [  5/ 10], step: [  269/  390], loss: [0.2131], avg loss: [0.3440], time: [103.1957ms]\n",
+      "Epoch: [  5/ 10], step: [  270/  390], loss: [0.3470], avg loss: [0.3440], time: [99.1795ms]\n",
+      "Epoch: [  5/ 10], step: [  271/  390], loss: [0.2773], avg loss: [0.3438], time: [99.3474ms]\n",
+      "Epoch: [  5/ 10], step: [  272/  390], loss: [0.4068], avg loss: [0.3440], time: [104.6441ms]\n",
+      "Epoch: [  5/ 10], step: [  273/  390], loss: [0.2524], avg loss: [0.3437], time: [102.6011ms]\n",
+      "Epoch: [  5/ 10], step: [  274/  390], loss: [0.2715], avg loss: [0.3434], time: [100.2104ms]\n",
+      "Epoch: [  5/ 10], step: [  275/  390], loss: [0.2724], avg loss: [0.3431], time: [102.6239ms]\n",
+      "Epoch: [  5/ 10], step: [  276/  390], loss: [0.4075], avg loss: [0.3434], time: [99.4756ms]\n",
+      "Epoch: [  5/ 10], step: [  277/  390], loss: [0.1439], avg loss: [0.3426], time: [98.9230ms]\n",
+      "Epoch: [  5/ 10], step: [  278/  390], loss: [0.2628], avg loss: [0.3424], time: [101.5642ms]\n",
+      "Epoch: [  5/ 10], step: [  279/  390], loss: [0.2270], avg loss: [0.3419], time: [100.0366ms]\n",
+      "Epoch: [  5/ 10], step: [  280/  390], loss: [0.3230], avg loss: [0.3419], time: [102.0269ms]\n",
+      "Epoch: [  5/ 10], step: [  281/  390], loss: [0.3329], avg loss: [0.3418], time: [104.5034ms]\n",
+      "Epoch: [  5/ 10], step: [  282/  390], loss: [0.3126], avg loss: [0.3417], time: [102.5646ms]\n",
+      "Epoch: [  5/ 10], step: [  283/  390], loss: [0.3559], avg loss: [0.3418], time: [104.7299ms]\n",
+      "Epoch: [  5/ 10], step: [  284/  390], loss: [0.4573], avg loss: [0.3422], time: [100.5504ms]\n",
+      "Epoch: [  5/ 10], step: [  285/  390], loss: [0.3536], avg loss: [0.3422], time: [101.4774ms]\n",
+      "Epoch: [  5/ 10], step: [  286/  390], loss: [0.2524], avg loss: [0.3419], time: [101.6161ms]\n",
+      "Epoch: [  5/ 10], step: [  287/  390], loss: [0.4055], avg loss: [0.3421], time: [102.3798ms]\n",
+      "Epoch: [  5/ 10], step: [  288/  390], loss: [0.2159], avg loss: [0.3417], time: [104.3766ms]\n",
+      "Epoch: [  5/ 10], step: [  289/  390], loss: [0.3166], avg loss: [0.3416], time: [104.6736ms]\n",
+      "Epoch: [  5/ 10], step: [  290/  390], loss: [0.3783], avg loss: [0.3417], time: [100.4434ms]\n",
+      "Epoch: [  5/ 10], step: [  291/  390], loss: [0.4178], avg loss: [0.3420], time: [99.9520ms]\n",
+      "Epoch: [  5/ 10], step: [  292/  390], loss: [0.4205], avg loss: [0.3423], time: [101.4216ms]\n",
+      "Epoch: [  5/ 10], step: [  293/  390], loss: [0.3483], avg loss: [0.3423], time: [102.9370ms]\n",
+      "Epoch: [  5/ 10], step: [  294/  390], loss: [0.5168], avg loss: [0.3429], time: [100.2884ms]\n",
+      "Epoch: [  5/ 10], step: [  295/  390], loss: [0.4163], avg loss: [0.3431], time: [99.8797ms]\n",
+      "Epoch: [  5/ 10], step: [  296/  390], loss: [0.3834], avg loss: [0.3433], time: [101.5699ms]\n",
+      "Epoch: [  5/ 10], step: [  297/  390], loss: [0.3833], avg loss: [0.3434], time: [100.7218ms]\n",
+      "Epoch: [  5/ 10], step: [  298/  390], loss: [0.4084], avg loss: [0.3436], time: [101.0218ms]\n",
+      "Epoch: [  5/ 10], step: [  299/  390], loss: [0.4530], avg loss: [0.3440], time: [104.2936ms]\n",
+      "Epoch: [  5/ 10], step: [  300/  390], loss: [0.2934], avg loss: [0.3438], time: [103.4582ms]\n",
+      "Epoch: [  5/ 10], step: [  301/  390], loss: [0.4108], avg loss: [0.3441], time: [98.2459ms]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: [  5/ 10], step: [  302/  390], loss: [0.3439], avg loss: [0.3440], time: [102.3622ms]\n",
+      "Epoch: [  5/ 10], step: [  303/  390], loss: [0.4070], avg loss: [0.3443], time: [104.6326ms]\n",
+      "Epoch: [  5/ 10], step: [  304/  390], loss: [0.4360], avg loss: [0.3446], time: [100.9424ms]\n",
+      "Epoch: [  5/ 10], step: [  305/  390], loss: [0.4695], avg loss: [0.3450], time: [98.6810ms]\n",
+      "Epoch: [  5/ 10], step: [  306/  390], loss: [0.2571], avg loss: [0.3447], time: [101.9230ms]\n",
+      "Epoch: [  5/ 10], step: [  307/  390], loss: [0.2597], avg loss: [0.3444], time: [98.5708ms]\n",
+      "Epoch: [  5/ 10], step: [  308/  390], loss: [0.3709], avg loss: [0.3445], time: [98.8483ms]\n",
+      "Epoch: [  5/ 10], step: [  309/  390], loss: [0.2729], avg loss: [0.3443], time: [100.6372ms]\n",
+      "Epoch: [  5/ 10], step: [  310/  390], loss: [0.3060], avg loss: [0.3441], time: [100.9982ms]\n",
+      "Epoch: [  5/ 10], step: [  311/  390], loss: [0.2724], avg loss: [0.3439], time: [102.2642ms]\n",
+      "Epoch: [  5/ 10], step: [  312/  390], loss: [0.4042], avg loss: [0.3441], time: [101.1384ms]\n",
+      "Epoch: [  5/ 10], step: [  313/  390], loss: [0.3170], avg loss: [0.3440], time: [101.8574ms]\n",
+      "Epoch: [  5/ 10], step: [  314/  390], loss: [0.2852], avg loss: [0.3438], time: [100.0388ms]\n",
+      "Epoch: [  5/ 10], step: [  315/  390], loss: [0.3810], avg loss: [0.3439], time: [103.9889ms]\n",
+      "Epoch: [  5/ 10], step: [  316/  390], loss: [0.4999], avg loss: [0.3444], time: [104.8150ms]\n",
+      "Epoch: [  5/ 10], step: [  317/  390], loss: [0.3802], avg loss: [0.3445], time: [104.3453ms]\n",
+      "Epoch: [  5/ 10], step: [  318/  390], loss: [0.4756], avg loss: [0.3450], time: [100.0638ms]\n",
+      "Epoch: [  5/ 10], step: [  319/  390], loss: [0.2718], avg loss: [0.3447], time: [102.3681ms]\n",
+      "Epoch: [  5/ 10], step: [  320/  390], loss: [0.4197], avg loss: [0.3450], time: [101.9974ms]\n",
+      "Epoch: [  5/ 10], step: [  321/  390], loss: [0.2601], avg loss: [0.3447], time: [104.8346ms]\n",
+      "Epoch: [  5/ 10], step: [  322/  390], loss: [0.2091], avg loss: [0.3443], time: [98.9163ms]\n",
+      "Epoch: [  5/ 10], step: [  323/  390], loss: [0.4082], avg loss: [0.3445], time: [103.3297ms]\n",
+      "Epoch: [  5/ 10], step: [  324/  390], loss: [0.2823], avg loss: [0.3443], time: [98.2492ms]\n",
+      "Epoch: [  5/ 10], step: [  325/  390], loss: [0.3926], avg loss: [0.3444], time: [100.7259ms]\n",
+      "Epoch: [  5/ 10], step: [  326/  390], loss: [0.2773], avg loss: [0.3442], time: [102.9773ms]\n",
+      "Epoch: [  5/ 10], step: [  327/  390], loss: [0.4278], avg loss: [0.3445], time: [102.3576ms]\n",
+      "Epoch: [  5/ 10], step: [  328/  390], loss: [0.2811], avg loss: [0.3443], time: [100.8708ms]\n",
+      "Epoch: [  5/ 10], step: [  329/  390], loss: [0.2949], avg loss: [0.3441], time: [102.0057ms]\n",
+      "Epoch: [  5/ 10], step: [  330/  390], loss: [0.3619], avg loss: [0.3442], time: [98.7999ms]\n",
+      "Epoch: [  5/ 10], step: [  331/  390], loss: [0.3774], avg loss: [0.3443], time: [103.1680ms]\n",
+      "Epoch: [  5/ 10], step: [  332/  390], loss: [0.3439], avg loss: [0.3443], time: [100.1785ms]\n",
+      "Epoch: [  5/ 10], step: [  333/  390], loss: [0.3816], avg loss: [0.3444], time: [99.3967ms]\n",
+      "Epoch: [  5/ 10], step: [  334/  390], loss: [0.3978], avg loss: [0.3446], time: [97.2393ms]\n",
+      "Epoch: [  5/ 10], step: [  335/  390], loss: [0.3064], avg loss: [0.3445], time: [102.1647ms]\n",
+      "Epoch: [  5/ 10], step: [  336/  390], loss: [0.4427], avg loss: [0.3447], time: [102.5717ms]\n",
+      "Epoch: [  5/ 10], step: [  337/  390], loss: [0.3247], avg loss: [0.3447], time: [101.8920ms]\n",
+      "Epoch: [  5/ 10], step: [  338/  390], loss: [0.3244], avg loss: [0.3446], time: [100.7366ms]\n",
+      "Epoch: [  5/ 10], step: [  339/  390], loss: [0.4572], avg loss: [0.3450], time: [99.7155ms]\n",
+      "Epoch: [  5/ 10], step: [  340/  390], loss: [0.3603], avg loss: [0.3450], time: [102.0741ms]\n",
+      "Epoch: [  5/ 10], step: [  341/  390], loss: [0.2594], avg loss: [0.3448], time: [103.5466ms]\n",
+      "Epoch: [  5/ 10], step: [  342/  390], loss: [0.4625], avg loss: [0.3451], time: [102.2420ms]\n",
+      "Epoch: [  5/ 10], step: [  343/  390], loss: [0.4464], avg loss: [0.3454], time: [103.6968ms]\n",
+      "Epoch: [  5/ 10], step: [  344/  390], loss: [0.3788], avg loss: [0.3455], time: [102.4415ms]\n",
+      "Epoch: [  5/ 10], step: [  345/  390], loss: [0.3054], avg loss: [0.3454], time: [104.5787ms]\n",
+      "Epoch: [  5/ 10], step: [  346/  390], loss: [0.4174], avg loss: [0.3456], time: [102.5310ms]\n",
+      "Epoch: [  5/ 10], step: [  347/  390], loss: [0.2062], avg loss: [0.3452], time: [105.1259ms]\n",
+      "Epoch: [  5/ 10], step: [  348/  390], loss: [0.3455], avg loss: [0.3452], time: [99.9708ms]\n",
+      "Epoch: [  5/ 10], step: [  349/  390], loss: [0.4392], avg loss: [0.3454], time: [102.7250ms]\n",
+      "Epoch: [  5/ 10], step: [  350/  390], loss: [0.3018], avg loss: [0.3453], time: [103.0922ms]\n",
+      "Epoch: [  5/ 10], step: [  351/  390], loss: [0.2346], avg loss: [0.3450], time: [103.0869ms]\n",
+      "Epoch: [  5/ 10], step: [  352/  390], loss: [0.2619], avg loss: [0.3448], time: [102.4044ms]\n",
+      "Epoch: [  5/ 10], step: [  353/  390], loss: [0.2922], avg loss: [0.3446], time: [99.6437ms]\n",
+      "Epoch: [  5/ 10], step: [  354/  390], loss: [0.2231], avg loss: [0.3443], time: [98.7210ms]\n",
+      "Epoch: [  5/ 10], step: [  355/  390], loss: [0.4164], avg loss: [0.3445], time: [100.8945ms]\n",
+      "Epoch: [  5/ 10], step: [  356/  390], loss: [0.2650], avg loss: [0.3443], time: [104.3913ms]\n",
+      "Epoch: [  5/ 10], step: [  357/  390], loss: [0.2103], avg loss: [0.3439], time: [104.9824ms]\n",
+      "Epoch: [  5/ 10], step: [  358/  390], loss: [0.4690], avg loss: [0.3442], time: [98.4094ms]\n",
+      "Epoch: [  5/ 10], step: [  359/  390], loss: [0.2352], avg loss: [0.3439], time: [99.5321ms]\n",
+      "Epoch: [  5/ 10], step: [  360/  390], loss: [0.1806], avg loss: [0.3435], time: [101.3615ms]\n",
+      "Epoch: [  5/ 10], step: [  361/  390], loss: [0.3843], avg loss: [0.3436], time: [98.7771ms]\n",
+      "Epoch: [  5/ 10], step: [  362/  390], loss: [0.2840], avg loss: [0.3434], time: [102.2055ms]\n",
+      "Epoch: [  5/ 10], step: [  363/  390], loss: [0.2744], avg loss: [0.3432], time: [99.4380ms]\n",
+      "Epoch: [  5/ 10], step: [  364/  390], loss: [0.3938], avg loss: [0.3434], time: [99.0360ms]\n",
+      "Epoch: [  5/ 10], step: [  365/  390], loss: [0.2933], avg loss: [0.3432], time: [100.8515ms]\n",
+      "Epoch: [  5/ 10], step: [  366/  390], loss: [0.4054], avg loss: [0.3434], time: [100.8928ms]\n",
+      "Epoch: [  5/ 10], step: [  367/  390], loss: [0.3868], avg loss: [0.3435], time: [103.4963ms]\n",
+      "Epoch: [  5/ 10], step: [  368/  390], loss: [0.5758], avg loss: [0.3442], time: [99.6766ms]\n",
+      "Epoch: [  5/ 10], step: [  369/  390], loss: [0.4107], avg loss: [0.3443], time: [99.5615ms]\n",
+      "Epoch: [  5/ 10], step: [  370/  390], loss: [0.1999], avg loss: [0.3439], time: [100.8880ms]\n",
+      "Epoch: [  5/ 10], step: [  371/  390], loss: [0.3547], avg loss: [0.3440], time: [99.0014ms]\n",
+      "Epoch: [  5/ 10], step: [  372/  390], loss: [0.4353], avg loss: [0.3442], time: [102.4394ms]\n",
+      "Epoch: [  5/ 10], step: [  373/  390], loss: [0.4284], avg loss: [0.3444], time: [102.0892ms]\n",
+      "Epoch: [  5/ 10], step: [  374/  390], loss: [0.4428], avg loss: [0.3447], time: [98.7518ms]\n",
+      "Epoch: [  5/ 10], step: [  375/  390], loss: [0.3787], avg loss: [0.3448], time: [105.1798ms]\n",
+      "Epoch: [  5/ 10], step: [  376/  390], loss: [0.4395], avg loss: [0.3451], time: [99.2367ms]\n",
+      "Epoch: [  5/ 10], step: [  377/  390], loss: [0.4732], avg loss: [0.3454], time: [103.2321ms]\n",
+      "Epoch: [  5/ 10], step: [  378/  390], loss: [0.5450], avg loss: [0.3459], time: [102.8581ms]\n",
+      "Epoch: [  5/ 10], step: [  379/  390], loss: [0.4199], avg loss: [0.3461], time: [98.7153ms]\n",
+      "Epoch: [  5/ 10], step: [  380/  390], loss: [0.3545], avg loss: [0.3461], time: [102.5324ms]\n",
+      "Epoch: [  5/ 10], step: [  381/  390], loss: [0.3200], avg loss: [0.3461], time: [103.4598ms]\n",
+      "Epoch: [  5/ 10], step: [  382/  390], loss: [0.2886], avg loss: [0.3459], time: [102.0243ms]\n",
+      "Epoch: [  5/ 10], step: [  383/  390], loss: [0.4360], avg loss: [0.3462], time: [101.8207ms]\n",
+      "Epoch: [  5/ 10], step: [  384/  390], loss: [0.3312], avg loss: [0.3461], time: [100.0907ms]\n",
+      "Epoch: [  5/ 10], step: [  385/  390], loss: [0.4088], avg loss: [0.3463], time: [99.1423ms]\n",
+      "Epoch: [  5/ 10], step: [  386/  390], loss: [0.2987], avg loss: [0.3462], time: [102.1488ms]\n",
+      "Epoch: [  5/ 10], step: [  387/  390], loss: [0.3314], avg loss: [0.3461], time: [104.4145ms]\n",
+      "Epoch: [  5/ 10], step: [  388/  390], loss: [0.3461], avg loss: [0.3461], time: [99.1771ms]\n",
+      "Epoch: [  5/ 10], step: [  389/  390], loss: [0.2056], avg loss: [0.3458], time: [98.6011ms]\n",
+      "Epoch: [  5/ 10], step: [  390/  390], loss: [0.3620], avg loss: [0.3458], time: [866.3347ms]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch time: 40546.816, per step time: 103.966\n",
+      "Epoch time: 40547.118, per step time: 103.967, avg loss: 0.346\n",
+      "************************************************************\n",
+      "Epoch: [  6/ 10], step: [    1/  390], loss: [0.3137], avg loss: [0.3137], time: [102.8788ms]\n",
+      "Epoch: [  6/ 10], step: [    2/  390], loss: [0.3295], avg loss: [0.3216], time: [107.4462ms]\n",
+      "Epoch: [  6/ 10], step: [    3/  390], loss: [0.4285], avg loss: [0.3572], time: [107.7762ms]\n",
+      "Epoch: [  6/ 10], step: [    4/  390], loss: [0.2917], avg loss: [0.3409], time: [104.9762ms]\n",
+      "Epoch: [  6/ 10], step: [    5/  390], loss: [0.3357], avg loss: [0.3398], time: [104.1481ms]\n",
+      "Epoch: [  6/ 10], step: [    6/  390], loss: [0.3456], avg loss: [0.3408], time: [105.6588ms]\n",
+      "Epoch: [  6/ 10], step: [    7/  390], loss: [0.4375], avg loss: [0.3546], time: [105.3269ms]\n",
+      "Epoch: [  6/ 10], step: [    8/  390], loss: [0.3685], avg loss: [0.3563], time: [100.5785ms]\n",
+      "Epoch: [  6/ 10], step: [    9/  390], loss: [0.2734], avg loss: [0.3471], time: [106.1952ms]\n",
+      "Epoch: [  6/ 10], step: [   10/  390], loss: [0.2983], avg loss: [0.3422], time: [103.3828ms]\n",
+      "Epoch: [  6/ 10], step: [   11/  390], loss: [0.3373], avg loss: [0.3418], time: [108.0239ms]\n",
+      "Epoch: [  6/ 10], step: [   12/  390], loss: [0.3792], avg loss: [0.3449], time: [104.2621ms]\n",
+      "Epoch: [  6/ 10], step: [   13/  390], loss: [0.2534], avg loss: [0.3379], time: [104.8844ms]\n",
+      "Epoch: [  6/ 10], step: [   14/  390], loss: [0.2555], avg loss: [0.3320], time: [106.4305ms]\n",
+      "Epoch: [  6/ 10], step: [   15/  390], loss: [0.2536], avg loss: [0.3268], time: [107.0900ms]\n",
+      "Epoch: [  6/ 10], step: [   16/  390], loss: [0.2763], avg loss: [0.3236], time: [105.6392ms]\n",
+      "Epoch: [  6/ 10], step: [   17/  390], loss: [0.3496], avg loss: [0.3251], time: [108.0658ms]\n",
+      "Epoch: [  6/ 10], step: [   18/  390], loss: [0.2546], avg loss: [0.3212], time: [103.7407ms]\n",
+      "Epoch: [  6/ 10], step: [   19/  390], loss: [0.4003], avg loss: [0.3254], time: [103.9438ms]\n",
+      "Epoch: [  6/ 10], step: [   20/  390], loss: [0.4276], avg loss: [0.3305], time: [106.1435ms]\n",
+      "Epoch: [  6/ 10], step: [   21/  390], loss: [0.3958], avg loss: [0.3336], time: [67.6832ms]\n",
+      "Epoch: [  6/ 10], step: [   22/  390], loss: [0.2281], avg loss: [0.3288], time: [105.9005ms]\n",
+      "Epoch: [  6/ 10], step: [   23/  390], loss: [0.3480], avg loss: [0.3296], time: [106.1544ms]\n",
+      "Epoch: [  6/ 10], step: [   24/  390], loss: [0.3870], avg loss: [0.3320], time: [105.8366ms]\n",
+      "Epoch: [  6/ 10], step: [   25/  390], loss: [0.2697], avg loss: [0.3295], time: [106.5907ms]\n",
+      "Epoch: [  6/ 10], step: [   26/  390], loss: [0.2907], avg loss: [0.3280], time: [103.4799ms]\n",
+      "Epoch: [  6/ 10], step: [   27/  390], loss: [0.3572], avg loss: [0.3291], time: [108.3992ms]\n",
+      "Epoch: [  6/ 10], step: [   28/  390], loss: [0.3893], avg loss: [0.3313], time: [107.5842ms]\n",
+      "Epoch: [  6/ 10], step: [   29/  390], loss: [0.2259], avg loss: [0.3276], time: [105.2296ms]\n",
+      "Epoch: [  6/ 10], step: [   30/  390], loss: [0.3245], avg loss: [0.3275], time: [100.3568ms]\n",
+      "Epoch: [  6/ 10], step: [   31/  390], loss: [0.3229], avg loss: [0.3274], time: [103.5793ms]\n",
+      "Epoch: [  6/ 10], step: [   32/  390], loss: [0.4215], avg loss: [0.3303], time: [101.8555ms]\n",
+      "Epoch: [  6/ 10], step: [   33/  390], loss: [0.3496], avg loss: [0.3309], time: [107.1312ms]\n",
+      "Epoch: [  6/ 10], step: [   34/  390], loss: [0.2681], avg loss: [0.3291], time: [99.5495ms]\n",
+      "Epoch: [  6/ 10], step: [   35/  390], loss: [0.2482], avg loss: [0.3268], time: [104.9037ms]\n",
+      "Epoch: [  6/ 10], step: [   36/  390], loss: [0.2724], avg loss: [0.3252], time: [103.7097ms]\n",
+      "Epoch: [  6/ 10], step: [   37/  390], loss: [0.2379], avg loss: [0.3229], time: [104.9495ms]\n",
+      "Epoch: [  6/ 10], step: [   38/  390], loss: [0.3819], avg loss: [0.3244], time: [107.4395ms]\n",
+      "Epoch: [  6/ 10], step: [   39/  390], loss: [0.3537], avg loss: [0.3252], time: [102.6006ms]\n",
+      "Epoch: [  6/ 10], step: [   40/  390], loss: [0.4310], avg loss: [0.3278], time: [103.6968ms]\n",
+      "Epoch: [  6/ 10], step: [   41/  390], loss: [0.2783], avg loss: [0.3266], time: [105.6654ms]\n",
+      "Epoch: [  6/ 10], step: [   42/  390], loss: [0.2990], avg loss: [0.3260], time: [104.0280ms]\n",
+      "Epoch: [  6/ 10], step: [   43/  390], loss: [0.2777], avg loss: [0.3248], time: [106.4887ms]\n",
+      "Epoch: [  6/ 10], step: [   44/  390], loss: [0.3549], avg loss: [0.3255], time: [100.8370ms]\n",
+      "Epoch: [  6/ 10], step: [   45/  390], loss: [0.3157], avg loss: [0.3253], time: [102.5386ms]\n",
+      "Epoch: [  6/ 10], step: [   46/  390], loss: [0.3321], avg loss: [0.3255], time: [101.4981ms]\n",
+      "Epoch: [  6/ 10], step: [   47/  390], loss: [0.3563], avg loss: [0.3261], time: [107.5685ms]\n",
+      "Epoch: [  6/ 10], step: [   48/  390], loss: [0.4130], avg loss: [0.3279], time: [104.3987ms]\n",
+      "Epoch: [  6/ 10], step: [   49/  390], loss: [0.3645], avg loss: [0.3287], time: [105.2272ms]\n",
+      "Epoch: [  6/ 10], step: [   50/  390], loss: [0.2529], avg loss: [0.3272], time: [101.3525ms]\n",
+      "Epoch: [  6/ 10], step: [   51/  390], loss: [0.2823], avg loss: [0.3263], time: [106.8044ms]\n",
+      "Epoch: [  6/ 10], step: [   52/  390], loss: [0.3664], avg loss: [0.3270], time: [104.0533ms]\n",
+      "Epoch: [  6/ 10], step: [   53/  390], loss: [0.2778], avg loss: [0.3261], time: [104.0053ms]\n",
+      "Epoch: [  6/ 10], step: [   54/  390], loss: [0.2984], avg loss: [0.3256], time: [103.2319ms]\n",
+      "Epoch: [  6/ 10], step: [   55/  390], loss: [0.2269], avg loss: [0.3238], time: [103.8632ms]\n",
+      "Epoch: [  6/ 10], step: [   56/  390], loss: [0.4109], avg loss: [0.3254], time: [100.3292ms]\n",
+      "Epoch: [  6/ 10], step: [   57/  390], loss: [0.4286], avg loss: [0.3272], time: [103.4818ms]\n",
+      "Epoch: [  6/ 10], step: [   58/  390], loss: [0.2945], avg loss: [0.3266], time: [101.8953ms]\n",
+      "Epoch: [  6/ 10], step: [   59/  390], loss: [0.4755], avg loss: [0.3291], time: [104.1601ms]\n",
+      "Epoch: [  6/ 10], step: [   60/  390], loss: [0.4181], avg loss: [0.3306], time: [103.5502ms]\n",
+      "Epoch: [  6/ 10], step: [   61/  390], loss: [0.4213], avg loss: [0.3321], time: [105.1967ms]\n",
+      "Epoch: [  6/ 10], step: [   62/  390], loss: [0.1686], avg loss: [0.3295], time: [105.5331ms]\n",
+      "Epoch: [  6/ 10], step: [   63/  390], loss: [0.2477], avg loss: [0.3282], time: [106.0550ms]\n",
+      "Epoch: [  6/ 10], step: [   64/  390], loss: [0.2404], avg loss: [0.3268], time: [106.8165ms]\n",
+      "Epoch: [  6/ 10], step: [   65/  390], loss: [0.3538], avg loss: [0.3272], time: [109.0724ms]\n",
+      "Epoch: [  6/ 10], step: [   66/  390], loss: [0.2904], avg loss: [0.3267], time: [104.3298ms]\n",
+      "Epoch: [  6/ 10], step: [   67/  390], loss: [0.4119], avg loss: [0.3279], time: [108.0468ms]\n",
+      "Epoch: [  6/ 10], step: [   68/  390], loss: [0.3131], avg loss: [0.3277], time: [105.8502ms]\n",
+      "Epoch: [  6/ 10], step: [   69/  390], loss: [0.4042], avg loss: [0.3288], time: [108.2604ms]\n",
+      "Epoch: [  6/ 10], step: [   70/  390], loss: [0.4035], avg loss: [0.3299], time: [104.0134ms]\n",
+      "Epoch: [  6/ 10], step: [   71/  390], loss: [0.3474], avg loss: [0.3301], time: [106.9710ms]\n",
+      "Epoch: [  6/ 10], step: [   72/  390], loss: [0.4037], avg loss: [0.3312], time: [102.5674ms]\n",
+      "Epoch: [  6/ 10], step: [   73/  390], loss: [0.2797], avg loss: [0.3305], time: [102.9291ms]\n",
+      "Epoch: [  6/ 10], step: [   74/  390], loss: [0.3334], avg loss: [0.3305], time: [106.0703ms]\n",
+      "Epoch: [  6/ 10], step: [   75/  390], loss: [0.2892], avg loss: [0.3299], time: [102.3366ms]\n",
+      "Epoch: [  6/ 10], step: [   76/  390], loss: [0.4234], avg loss: [0.3312], time: [105.3853ms]\n",
+      "Epoch: [  6/ 10], step: [   77/  390], loss: [0.2536], avg loss: [0.3302], time: [105.9346ms]\n",
+      "Epoch: [  6/ 10], step: [   78/  390], loss: [0.3701], avg loss: [0.3307], time: [102.2868ms]\n",
+      "Epoch: [  6/ 10], step: [   79/  390], loss: [0.4579], avg loss: [0.3323], time: [103.2307ms]\n",
+      "Epoch: [  6/ 10], step: [   80/  390], loss: [0.3049], avg loss: [0.3319], time: [101.6500ms]\n",
+      "Epoch: [  6/ 10], step: [   81/  390], loss: [0.3158], avg loss: [0.3317], time: [102.8347ms]\n",
+      "Epoch: [  6/ 10], step: [   82/  390], loss: [0.4254], avg loss: [0.3329], time: [105.4041ms]\n",
+      "Epoch: [  6/ 10], step: [   83/  390], loss: [0.2563], avg loss: [0.3320], time: [103.4667ms]\n",
+      "Epoch: [  6/ 10], step: [   84/  390], loss: [0.3178], avg loss: [0.3318], time: [101.6469ms]\n",
+      "Epoch: [  6/ 10], step: [   85/  390], loss: [0.3254], avg loss: [0.3317], time: [103.3061ms]\n",
+      "Epoch: [  6/ 10], step: [   86/  390], loss: [0.2758], avg loss: [0.3311], time: [102.5198ms]\n",
+      "Epoch: [  6/ 10], step: [   87/  390], loss: [0.4271], avg loss: [0.3322], time: [103.7211ms]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: [  6/ 10], step: [   88/  390], loss: [0.3815], avg loss: [0.3327], time: [99.7779ms]\n",
+      "Epoch: [  6/ 10], step: [   89/  390], loss: [0.3205], avg loss: [0.3326], time: [102.2894ms]\n",
+      "Epoch: [  6/ 10], step: [   90/  390], loss: [0.1674], avg loss: [0.3308], time: [107.3177ms]\n",
+      "Epoch: [  6/ 10], step: [   91/  390], loss: [0.3302], avg loss: [0.3308], time: [104.4667ms]\n",
+      "Epoch: [  6/ 10], step: [   92/  390], loss: [0.3680], avg loss: [0.3312], time: [105.5598ms]\n",
+      "Epoch: [  6/ 10], step: [   93/  390], loss: [0.3370], avg loss: [0.3312], time: [103.6875ms]\n",
+      "Epoch: [  6/ 10], step: [   94/  390], loss: [0.3272], avg loss: [0.3312], time: [105.0935ms]\n",
+      "Epoch: [  6/ 10], step: [   95/  390], loss: [0.3728], avg loss: [0.3316], time: [108.2509ms]\n",
+      "Epoch: [  6/ 10], step: [   96/  390], loss: [0.2415], avg loss: [0.3307], time: [104.2969ms]\n",
+      "Epoch: [  6/ 10], step: [   97/  390], loss: [0.3413], avg loss: [0.3308], time: [106.3817ms]\n",
+      "Epoch: [  6/ 10], step: [   98/  390], loss: [0.2772], avg loss: [0.3302], time: [104.6352ms]\n",
+      "Epoch: [  6/ 10], step: [   99/  390], loss: [0.3638], avg loss: [0.3306], time: [103.6198ms]\n",
+      "Epoch: [  6/ 10], step: [  100/  390], loss: [0.4868], avg loss: [0.3321], time: [104.7192ms]\n",
+      "Epoch: [  6/ 10], step: [  101/  390], loss: [0.2709], avg loss: [0.3315], time: [104.6910ms]\n",
+      "Epoch: [  6/ 10], step: [  102/  390], loss: [0.3050], avg loss: [0.3313], time: [102.3922ms]\n",
+      "Epoch: [  6/ 10], step: [  103/  390], loss: [0.3113], avg loss: [0.3311], time: [103.4689ms]\n",
+      "Epoch: [  6/ 10], step: [  104/  390], loss: [0.3130], avg loss: [0.3309], time: [101.4366ms]\n",
+      "Epoch: [  6/ 10], step: [  105/  390], loss: [0.2987], avg loss: [0.3306], time: [109.1599ms]\n",
+      "Epoch: [  6/ 10], step: [  106/  390], loss: [0.2144], avg loss: [0.3295], time: [105.4850ms]\n",
+      "Epoch: [  6/ 10], step: [  107/  390], loss: [0.4136], avg loss: [0.3303], time: [105.8519ms]\n",
+      "Epoch: [  6/ 10], step: [  108/  390], loss: [0.2410], avg loss: [0.3295], time: [104.0261ms]\n",
+      "Epoch: [  6/ 10], step: [  109/  390], loss: [0.3518], avg loss: [0.3297], time: [104.0432ms]\n",
+      "Epoch: [  6/ 10], step: [  110/  390], loss: [0.3474], avg loss: [0.3298], time: [105.0935ms]\n",
+      "Epoch: [  6/ 10], step: [  111/  390], loss: [0.2430], avg loss: [0.3290], time: [105.2408ms]\n",
+      "Epoch: [  6/ 10], step: [  112/  390], loss: [0.3468], avg loss: [0.3292], time: [100.6699ms]\n",
+      "Epoch: [  6/ 10], step: [  113/  390], loss: [0.3406], avg loss: [0.3293], time: [105.6204ms]\n",
+      "Epoch: [  6/ 10], step: [  114/  390], loss: [0.3484], avg loss: [0.3295], time: [100.8778ms]\n",
+      "Epoch: [  6/ 10], step: [  115/  390], loss: [0.3458], avg loss: [0.3296], time: [103.7869ms]\n",
+      "Epoch: [  6/ 10], step: [  116/  390], loss: [0.4029], avg loss: [0.3302], time: [102.7882ms]\n",
+      "Epoch: [  6/ 10], step: [  117/  390], loss: [0.3123], avg loss: [0.3301], time: [105.2265ms]\n",
+      "Epoch: [  6/ 10], step: [  118/  390], loss: [0.2976], avg loss: [0.3298], time: [101.3033ms]\n",
+      "Epoch: [  6/ 10], step: [  119/  390], loss: [0.2587], avg loss: [0.3292], time: [106.1120ms]\n",
+      "Epoch: [  6/ 10], step: [  120/  390], loss: [0.2946], avg loss: [0.3289], time: [103.9181ms]\n",
+      "Epoch: [  6/ 10], step: [  121/  390], loss: [0.5230], avg loss: [0.3305], time: [104.0273ms]\n",
+      "Epoch: [  6/ 10], step: [  122/  390], loss: [0.2541], avg loss: [0.3299], time: [107.4820ms]\n",
+      "Epoch: [  6/ 10], step: [  123/  390], loss: [0.4289], avg loss: [0.3307], time: [102.1929ms]\n",
+      "Epoch: [  6/ 10], step: [  124/  390], loss: [0.3652], avg loss: [0.3310], time: [102.3753ms]\n",
+      "Epoch: [  6/ 10], step: [  125/  390], loss: [0.2435], avg loss: [0.3303], time: [103.3547ms]\n",
+      "Epoch: [  6/ 10], step: [  126/  390], loss: [0.3469], avg loss: [0.3304], time: [103.7183ms]\n",
+      "Epoch: [  6/ 10], step: [  127/  390], loss: [0.3319], avg loss: [0.3304], time: [103.5509ms]\n",
+      "Epoch: [  6/ 10], step: [  128/  390], loss: [0.3387], avg loss: [0.3305], time: [103.8632ms]\n",
+      "Epoch: [  6/ 10], step: [  129/  390], loss: [0.2644], avg loss: [0.3300], time: [102.5698ms]\n",
+      "Epoch: [  6/ 10], step: [  130/  390], loss: [0.2812], avg loss: [0.3296], time: [104.6865ms]\n",
+      "Epoch: [  6/ 10], step: [  131/  390], loss: [0.2899], avg loss: [0.3293], time: [106.7574ms]\n",
+      "Epoch: [  6/ 10], step: [  132/  390], loss: [0.2739], avg loss: [0.3289], time: [102.3071ms]\n",
+      "Epoch: [  6/ 10], step: [  133/  390], loss: [0.1730], avg loss: [0.3277], time: [105.8102ms]\n",
+      "Epoch: [  6/ 10], step: [  134/  390], loss: [0.3183], avg loss: [0.3276], time: [100.9240ms]\n",
+      "Epoch: [  6/ 10], step: [  135/  390], loss: [0.3891], avg loss: [0.3281], time: [102.3219ms]\n",
+      "Epoch: [  6/ 10], step: [  136/  390], loss: [0.3395], avg loss: [0.3282], time: [103.2267ms]\n",
+      "Epoch: [  6/ 10], step: [  137/  390], loss: [0.2796], avg loss: [0.3278], time: [105.4187ms]\n",
+      "Epoch: [  6/ 10], step: [  138/  390], loss: [0.4936], avg loss: [0.3290], time: [104.6247ms]\n",
+      "Epoch: [  6/ 10], step: [  139/  390], loss: [0.4189], avg loss: [0.3297], time: [106.3464ms]\n",
+      "Epoch: [  6/ 10], step: [  140/  390], loss: [0.3429], avg loss: [0.3298], time: [104.7597ms]\n",
+      "Epoch: [  6/ 10], step: [  141/  390], loss: [0.2839], avg loss: [0.3294], time: [103.1654ms]\n",
+      "Epoch: [  6/ 10], step: [  142/  390], loss: [0.3150], avg loss: [0.3293], time: [99.8304ms]\n",
+      "Epoch: [  6/ 10], step: [  143/  390], loss: [0.3406], avg loss: [0.3294], time: [107.5263ms]\n",
+      "Epoch: [  6/ 10], step: [  144/  390], loss: [0.3555], avg loss: [0.3296], time: [105.5160ms]\n",
+      "Epoch: [  6/ 10], step: [  145/  390], loss: [0.2782], avg loss: [0.3293], time: [101.9528ms]\n",
+      "Epoch: [  6/ 10], step: [  146/  390], loss: [0.2559], avg loss: [0.3288], time: [105.1157ms]\n",
+      "Epoch: [  6/ 10], step: [  147/  390], loss: [0.3379], avg loss: [0.3288], time: [106.3020ms]\n",
+      "Epoch: [  6/ 10], step: [  148/  390], loss: [0.3768], avg loss: [0.3291], time: [104.6147ms]\n",
+      "Epoch: [  6/ 10], step: [  149/  390], loss: [0.3913], avg loss: [0.3296], time: [106.1707ms]\n",
+      "Epoch: [  6/ 10], step: [  150/  390], loss: [0.2264], avg loss: [0.3289], time: [105.3846ms]\n",
+      "Epoch: [  6/ 10], step: [  151/  390], loss: [0.2102], avg loss: [0.3281], time: [102.5255ms]\n",
+      "Epoch: [  6/ 10], step: [  152/  390], loss: [0.3544], avg loss: [0.3283], time: [102.5755ms]\n",
+      "Epoch: [  6/ 10], step: [  153/  390], loss: [0.2458], avg loss: [0.3277], time: [103.0257ms]\n",
+      "Epoch: [  6/ 10], step: [  154/  390], loss: [0.2079], avg loss: [0.3269], time: [104.9073ms]\n",
+      "Epoch: [  6/ 10], step: [  155/  390], loss: [0.5016], avg loss: [0.3281], time: [107.5962ms]\n",
+      "Epoch: [  6/ 10], step: [  156/  390], loss: [0.3904], avg loss: [0.3285], time: [102.3004ms]\n",
+      "Epoch: [  6/ 10], step: [  157/  390], loss: [0.2560], avg loss: [0.3280], time: [103.6665ms]\n",
+      "Epoch: [  6/ 10], step: [  158/  390], loss: [0.3972], avg loss: [0.3284], time: [99.9253ms]\n",
+      "Epoch: [  6/ 10], step: [  159/  390], loss: [0.3128], avg loss: [0.3283], time: [104.0418ms]\n",
+      "Epoch: [  6/ 10], step: [  160/  390], loss: [0.3540], avg loss: [0.3285], time: [105.4263ms]\n",
+      "Epoch: [  6/ 10], step: [  161/  390], loss: [0.3925], avg loss: [0.3289], time: [108.7251ms]\n",
+      "Epoch: [  6/ 10], step: [  162/  390], loss: [0.3021], avg loss: [0.3287], time: [102.9158ms]\n",
+      "Epoch: [  6/ 10], step: [  163/  390], loss: [0.3047], avg loss: [0.3286], time: [105.1214ms]\n",
+      "Epoch: [  6/ 10], step: [  164/  390], loss: [0.2893], avg loss: [0.3283], time: [103.3919ms]\n",
+      "Epoch: [  6/ 10], step: [  165/  390], loss: [0.2883], avg loss: [0.3281], time: [104.9266ms]\n",
+      "Epoch: [  6/ 10], step: [  166/  390], loss: [0.3685], avg loss: [0.3283], time: [102.9325ms]\n",
+      "Epoch: [  6/ 10], step: [  167/  390], loss: [0.4150], avg loss: [0.3289], time: [105.2999ms]\n",
+      "Epoch: [  6/ 10], step: [  168/  390], loss: [0.3211], avg loss: [0.3288], time: [101.4946ms]\n",
+      "Epoch: [  6/ 10], step: [  169/  390], loss: [0.2711], avg loss: [0.3285], time: [105.3841ms]\n",
+      "Epoch: [  6/ 10], step: [  170/  390], loss: [0.3252], avg loss: [0.3285], time: [103.5354ms]\n",
+      "Epoch: [  6/ 10], step: [  171/  390], loss: [0.3076], avg loss: [0.3283], time: [103.5259ms]\n",
+      "Epoch: [  6/ 10], step: [  172/  390], loss: [0.3561], avg loss: [0.3285], time: [102.9892ms]\n",
+      "Epoch: [  6/ 10], step: [  173/  390], loss: [0.2063], avg loss: [0.3278], time: [106.4513ms]\n",
+      "Epoch: [  6/ 10], step: [  174/  390], loss: [0.3680], avg loss: [0.3280], time: [104.9628ms]\n",
+      "Epoch: [  6/ 10], step: [  175/  390], loss: [0.3585], avg loss: [0.3282], time: [106.5559ms]\n",
+      "Epoch: [  6/ 10], step: [  176/  390], loss: [0.2052], avg loss: [0.3275], time: [105.7465ms]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: [  6/ 10], step: [  177/  390], loss: [0.3473], avg loss: [0.3276], time: [103.0400ms]\n",
+      "Epoch: [  6/ 10], step: [  178/  390], loss: [0.4617], avg loss: [0.3284], time: [102.4530ms]\n",
+      "Epoch: [  6/ 10], step: [  179/  390], loss: [0.2574], avg loss: [0.3280], time: [104.6448ms]\n",
+      "Epoch: [  6/ 10], step: [  180/  390], loss: [0.2926], avg loss: [0.3278], time: [102.3531ms]\n",
+      "Epoch: [  6/ 10], step: [  181/  390], loss: [0.2689], avg loss: [0.3274], time: [105.6643ms]\n",
+      "Epoch: [  6/ 10], step: [  182/  390], loss: [0.2425], avg loss: [0.3270], time: [105.7646ms]\n",
+      "Epoch: [  6/ 10], step: [  183/  390], loss: [0.4197], avg loss: [0.3275], time: [104.4226ms]\n",
+      "Epoch: [  6/ 10], step: [  184/  390], loss: [0.3622], avg loss: [0.3277], time: [102.5190ms]\n",
+      "Epoch: [  6/ 10], step: [  185/  390], loss: [0.3172], avg loss: [0.3276], time: [107.5490ms]\n",
+      "Epoch: [  6/ 10], step: [  186/  390], loss: [0.2831], avg loss: [0.3274], time: [100.2440ms]\n",
+      "Epoch: [  6/ 10], step: [  187/  390], loss: [0.4395], avg loss: [0.3280], time: [107.0464ms]\n",
+      "Epoch: [  6/ 10], step: [  188/  390], loss: [0.3841], avg loss: [0.3283], time: [101.7933ms]\n",
+      "Epoch: [  6/ 10], step: [  189/  390], loss: [0.4334], avg loss: [0.3288], time: [104.6174ms]\n",
+      "Epoch: [  6/ 10], step: [  190/  390], loss: [0.5027], avg loss: [0.3297], time: [100.9848ms]\n",
+      "Epoch: [  6/ 10], step: [  191/  390], loss: [0.5141], avg loss: [0.3307], time: [103.6432ms]\n",
+      "Epoch: [  6/ 10], step: [  192/  390], loss: [0.3588], avg loss: [0.3309], time: [101.1293ms]\n",
+      "Epoch: [  6/ 10], step: [  193/  390], loss: [0.3650], avg loss: [0.3310], time: [104.4297ms]\n",
+      "Epoch: [  6/ 10], step: [  194/  390], loss: [0.3152], avg loss: [0.3310], time: [105.3679ms]\n",
+      "Epoch: [  6/ 10], step: [  195/  390], loss: [0.3063], avg loss: [0.3308], time: [108.4087ms]\n",
+      "Epoch: [  6/ 10], step: [  196/  390], loss: [0.3097], avg loss: [0.3307], time: [100.6358ms]\n",
+      "Epoch: [  6/ 10], step: [  197/  390], loss: [0.3507], avg loss: [0.3308], time: [103.8554ms]\n",
+      "Epoch: [  6/ 10], step: [  198/  390], loss: [0.2534], avg loss: [0.3304], time: [104.0516ms]\n",
+      "Epoch: [  6/ 10], step: [  199/  390], loss: [0.4216], avg loss: [0.3309], time: [104.8689ms]\n",
+      "Epoch: [  6/ 10], step: [  200/  390], loss: [0.4192], avg loss: [0.3313], time: [100.3902ms]\n",
+      "Epoch: [  6/ 10], step: [  201/  390], loss: [0.3980], avg loss: [0.3317], time: [106.2922ms]\n",
+      "Epoch: [  6/ 10], step: [  202/  390], loss: [0.3389], avg loss: [0.3317], time: [101.0244ms]\n",
+      "Epoch: [  6/ 10], step: [  203/  390], loss: [0.3186], avg loss: [0.3316], time: [103.4589ms]\n",
+      "Epoch: [  6/ 10], step: [  204/  390], loss: [0.5272], avg loss: [0.3326], time: [101.0959ms]\n",
+      "Epoch: [  6/ 10], step: [  205/  390], loss: [0.4031], avg loss: [0.3329], time: [105.9949ms]\n",
+      "Epoch: [  6/ 10], step: [  206/  390], loss: [0.3488], avg loss: [0.3330], time: [105.3770ms]\n",
+      "Epoch: [  6/ 10], step: [  207/  390], loss: [0.3204], avg loss: [0.3330], time: [103.0207ms]\n",
+      "Epoch: [  6/ 10], step: [  208/  390], loss: [0.3215], avg loss: [0.3329], time: [105.6094ms]\n",
+      "Epoch: [  6/ 10], step: [  209/  390], loss: [0.3097], avg loss: [0.3328], time: [105.9272ms]\n",
+      "Epoch: [  6/ 10], step: [  210/  390], loss: [0.2991], avg loss: [0.3326], time: [104.9280ms]\n",
+      "Epoch: [  6/ 10], step: [  211/  390], loss: [0.2512], avg loss: [0.3322], time: [103.2462ms]\n",
+      "Epoch: [  6/ 10], step: [  212/  390], loss: [0.2952], avg loss: [0.3321], time: [106.2520ms]\n",
+      "Epoch: [  6/ 10], step: [  213/  390], loss: [0.3371], avg loss: [0.3321], time: [105.0916ms]\n",
+      "Epoch: [  6/ 10], step: [  214/  390], loss: [0.3340], avg loss: [0.3321], time: [103.2071ms]\n",
+      "Epoch: [  6/ 10], step: [  215/  390], loss: [0.2598], avg loss: [0.3318], time: [104.0084ms]\n",
+      "Epoch: [  6/ 10], step: [  216/  390], loss: [0.3255], avg loss: [0.3317], time: [105.0892ms]\n",
+      "Epoch: [  6/ 10], step: [  217/  390], loss: [0.3541], avg loss: [0.3318], time: [102.8223ms]\n",
+      "Epoch: [  6/ 10], step: [  218/  390], loss: [0.3187], avg loss: [0.3318], time: [101.0220ms]\n",
+      "Epoch: [  6/ 10], step: [  219/  390], loss: [0.2939], avg loss: [0.3316], time: [109.3974ms]\n",
+      "Epoch: [  6/ 10], step: [  220/  390], loss: [0.2786], avg loss: [0.3314], time: [106.2407ms]\n",
+      "Epoch: [  6/ 10], step: [  221/  390], loss: [0.2779], avg loss: [0.3311], time: [105.8226ms]\n",
+      "Epoch: [  6/ 10], step: [  222/  390], loss: [0.4111], avg loss: [0.3315], time: [107.4440ms]\n",
+      "Epoch: [  6/ 10], step: [  223/  390], loss: [0.3184], avg loss: [0.3314], time: [105.4258ms]\n",
+      "Epoch: [  6/ 10], step: [  224/  390], loss: [0.1722], avg loss: [0.3307], time: [101.8672ms]\n",
+      "Epoch: [  6/ 10], step: [  225/  390], loss: [0.2848], avg loss: [0.3305], time: [104.1944ms]\n",
+      "Epoch: [  6/ 10], step: [  226/  390], loss: [0.3035], avg loss: [0.3304], time: [108.1917ms]\n",
+      "Epoch: [  6/ 10], step: [  227/  390], loss: [0.4568], avg loss: [0.3309], time: [103.3666ms]\n",
+      "Epoch: [  6/ 10], step: [  228/  390], loss: [0.2989], avg loss: [0.3308], time: [102.0334ms]\n",
+      "Epoch: [  6/ 10], step: [  229/  390], loss: [0.2840], avg loss: [0.3306], time: [106.0257ms]\n",
+      "Epoch: [  6/ 10], step: [  230/  390], loss: [0.3429], avg loss: [0.3307], time: [100.8019ms]\n",
+      "Epoch: [  6/ 10], step: [  231/  390], loss: [0.3582], avg loss: [0.3308], time: [103.8568ms]\n",
+      "Epoch: [  6/ 10], step: [  232/  390], loss: [0.2675], avg loss: [0.3305], time: [101.8233ms]\n",
+      "Epoch: [  6/ 10], step: [  233/  390], loss: [0.2883], avg loss: [0.3303], time: [106.8397ms]\n",
+      "Epoch: [  6/ 10], step: [  234/  390], loss: [0.3633], avg loss: [0.3305], time: [101.1724ms]\n",
+      "Epoch: [  6/ 10], step: [  235/  390], loss: [0.3305], avg loss: [0.3305], time: [103.3738ms]\n",
+      "Epoch: [  6/ 10], step: [  236/  390], loss: [0.2916], avg loss: [0.3303], time: [106.2672ms]\n",
+      "Epoch: [  6/ 10], step: [  237/  390], loss: [0.3045], avg loss: [0.3302], time: [102.8006ms]\n",
+      "Epoch: [  6/ 10], step: [  238/  390], loss: [0.2606], avg loss: [0.3299], time: [103.4052ms]\n",
+      "Epoch: [  6/ 10], step: [  239/  390], loss: [0.2456], avg loss: [0.3295], time: [102.8235ms]\n",
+      "Epoch: [  6/ 10], step: [  240/  390], loss: [0.2210], avg loss: [0.3291], time: [102.1998ms]\n",
+      "Epoch: [  6/ 10], step: [  241/  390], loss: [0.3274], avg loss: [0.3291], time: [102.6089ms]\n",
+      "Epoch: [  6/ 10], step: [  242/  390], loss: [0.4134], avg loss: [0.3294], time: [105.5417ms]\n",
+      "Epoch: [  6/ 10], step: [  243/  390], loss: [0.4599], avg loss: [0.3300], time: [107.7466ms]\n",
+      "Epoch: [  6/ 10], step: [  244/  390], loss: [0.5947], avg loss: [0.3311], time: [101.1190ms]\n",
+      "Epoch: [  6/ 10], step: [  245/  390], loss: [0.2561], avg loss: [0.3307], time: [103.1210ms]\n",
+      "Epoch: [  6/ 10], step: [  246/  390], loss: [0.2175], avg loss: [0.3303], time: [101.2173ms]\n",
+      "Epoch: [  6/ 10], step: [  247/  390], loss: [0.3314], avg loss: [0.3303], time: [103.4813ms]\n",
+      "Epoch: [  6/ 10], step: [  248/  390], loss: [0.2679], avg loss: [0.3300], time: [103.9636ms]\n",
+      "Epoch: [  6/ 10], step: [  249/  390], loss: [0.3549], avg loss: [0.3301], time: [104.9063ms]\n",
+      "Epoch: [  6/ 10], step: [  250/  390], loss: [0.2441], avg loss: [0.3298], time: [105.2394ms]\n",
+      "Epoch: [  6/ 10], step: [  251/  390], loss: [0.2675], avg loss: [0.3295], time: [104.2738ms]\n",
+      "Epoch: [  6/ 10], step: [  252/  390], loss: [0.3183], avg loss: [0.3295], time: [102.8545ms]\n",
+      "Epoch: [  6/ 10], step: [  253/  390], loss: [0.3769], avg loss: [0.3297], time: [104.5601ms]\n",
+      "Epoch: [  6/ 10], step: [  254/  390], loss: [0.2539], avg loss: [0.3294], time: [103.2653ms]\n",
+      "Epoch: [  6/ 10], step: [  255/  390], loss: [0.4019], avg loss: [0.3297], time: [105.1457ms]\n",
+      "Epoch: [  6/ 10], step: [  256/  390], loss: [0.3086], avg loss: [0.3296], time: [102.6106ms]\n",
+      "Epoch: [  6/ 10], step: [  257/  390], loss: [0.4399], avg loss: [0.3300], time: [106.8661ms]\n",
+      "Epoch: [  6/ 10], step: [  258/  390], loss: [0.2868], avg loss: [0.3299], time: [102.7610ms]\n",
+      "Epoch: [  6/ 10], step: [  259/  390], loss: [0.3434], avg loss: [0.3299], time: [102.0977ms]\n",
+      "Epoch: [  6/ 10], step: [  260/  390], loss: [0.2957], avg loss: [0.3298], time: [103.9495ms]\n",
+      "Epoch: [  6/ 10], step: [  261/  390], loss: [0.2614], avg loss: [0.3295], time: [108.6328ms]\n",
+      "Epoch: [  6/ 10], step: [  262/  390], loss: [0.2950], avg loss: [0.3294], time: [102.8056ms]\n",
+      "Epoch: [  6/ 10], step: [  263/  390], loss: [0.2932], avg loss: [0.3292], time: [101.2115ms]\n",
+      "Epoch: [  6/ 10], step: [  264/  390], loss: [0.3685], avg loss: [0.3294], time: [104.7268ms]\n",
+      "Epoch: [  6/ 10], step: [  265/  390], loss: [0.2662], avg loss: [0.3292], time: [104.7187ms]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: [  6/ 10], step: [  266/  390], loss: [0.1851], avg loss: [0.3286], time: [106.4603ms]\n",
+      "Epoch: [  6/ 10], step: [  267/  390], loss: [0.3902], avg loss: [0.3288], time: [105.4015ms]\n",
+      "Epoch: [  6/ 10], step: [  268/  390], loss: [0.1962], avg loss: [0.3284], time: [102.4544ms]\n",
+      "Epoch: [  6/ 10], step: [  269/  390], loss: [0.2614], avg loss: [0.3281], time: [105.6340ms]\n",
+      "Epoch: [  6/ 10], step: [  270/  390], loss: [0.2919], avg loss: [0.3280], time: [103.2822ms]\n",
+      "Epoch: [  6/ 10], step: [  271/  390], loss: [0.4295], avg loss: [0.3283], time: [104.4779ms]\n",
+      "Epoch: [  6/ 10], step: [  272/  390], loss: [0.3681], avg loss: [0.3285], time: [107.9822ms]\n",
+      "Epoch: [  6/ 10], step: [  273/  390], loss: [0.2417], avg loss: [0.3282], time: [106.6778ms]\n",
+      "Epoch: [  6/ 10], step: [  274/  390], loss: [0.3749], avg loss: [0.3283], time: [107.0487ms]\n",
+      "Epoch: [  6/ 10], step: [  275/  390], loss: [0.3401], avg loss: [0.3284], time: [103.5895ms]\n",
+      "Epoch: [  6/ 10], step: [  276/  390], loss: [0.3363], avg loss: [0.3284], time: [103.7929ms]\n",
+      "Epoch: [  6/ 10], step: [  277/  390], loss: [0.3809], avg loss: [0.3286], time: [106.1981ms]\n",
+      "Epoch: [  6/ 10], step: [  278/  390], loss: [0.2851], avg loss: [0.3284], time: [104.9306ms]\n",
+      "Epoch: [  6/ 10], step: [  279/  390], loss: [0.3831], avg loss: [0.3286], time: [102.7482ms]\n",
+      "Epoch: [  6/ 10], step: [  280/  390], loss: [0.3269], avg loss: [0.3286], time: [107.4982ms]\n",
+      "Epoch: [  6/ 10], step: [  281/  390], loss: [0.2682], avg loss: [0.3284], time: [107.1913ms]\n",
+      "Epoch: [  6/ 10], step: [  282/  390], loss: [0.2464], avg loss: [0.3281], time: [100.6212ms]\n",
+      "Epoch: [  6/ 10], step: [  283/  390], loss: [0.3946], avg loss: [0.3284], time: [105.3467ms]\n",
+      "Epoch: [  6/ 10], step: [  284/  390], loss: [0.3671], avg loss: [0.3285], time: [105.5491ms]\n",
+      "Epoch: [  6/ 10], step: [  285/  390], loss: [0.2973], avg loss: [0.3284], time: [107.2581ms]\n",
+      "Epoch: [  6/ 10], step: [  286/  390], loss: [0.3856], avg loss: [0.3286], time: [105.2179ms]\n",
+      "Epoch: [  6/ 10], step: [  287/  390], loss: [0.4005], avg loss: [0.3288], time: [106.7197ms]\n",
+      "Epoch: [  6/ 10], step: [  288/  390], loss: [0.3100], avg loss: [0.3288], time: [103.0917ms]\n",
+      "Epoch: [  6/ 10], step: [  289/  390], loss: [0.4213], avg loss: [0.3291], time: [102.3197ms]\n",
+      "Epoch: [  6/ 10], step: [  290/  390], loss: [0.2163], avg loss: [0.3287], time: [102.0153ms]\n",
+      "Epoch: [  6/ 10], step: [  291/  390], loss: [0.2245], avg loss: [0.3283], time: [103.6959ms]\n",
+      "Epoch: [  6/ 10], step: [  292/  390], loss: [0.2426], avg loss: [0.3281], time: [104.0010ms]\n",
+      "Epoch: [  6/ 10], step: [  293/  390], loss: [0.3086], avg loss: [0.3280], time: [104.6097ms]\n",
+      "Epoch: [  6/ 10], step: [  294/  390], loss: [0.3300], avg loss: [0.3280], time: [106.2334ms]\n",
+      "Epoch: [  6/ 10], step: [  295/  390], loss: [0.4324], avg loss: [0.3283], time: [106.7400ms]\n",
+      "Epoch: [  6/ 10], step: [  296/  390], loss: [0.4079], avg loss: [0.3286], time: [103.9681ms]\n",
+      "Epoch: [  6/ 10], step: [  297/  390], loss: [0.3564], avg loss: [0.3287], time: [104.4164ms]\n",
+      "Epoch: [  6/ 10], step: [  298/  390], loss: [0.3987], avg loss: [0.3289], time: [104.2852ms]\n",
+      "Epoch: [  6/ 10], step: [  299/  390], loss: [0.3378], avg loss: [0.3290], time: [106.4155ms]\n",
+      "Epoch: [  6/ 10], step: [  300/  390], loss: [0.4463], avg loss: [0.3294], time: [106.0929ms]\n",
+      "Epoch: [  6/ 10], step: [  301/  390], loss: [0.3557], avg loss: [0.3295], time: [105.8309ms]\n",
+      "Epoch: [  6/ 10], step: [  302/  390], loss: [0.4535], avg loss: [0.3299], time: [104.7084ms]\n",
+      "Epoch: [  6/ 10], step: [  303/  390], loss: [0.3136], avg loss: [0.3298], time: [102.8643ms]\n",
+      "Epoch: [  6/ 10], step: [  304/  390], loss: [0.2858], avg loss: [0.3297], time: [102.1047ms]\n",
+      "Epoch: [  6/ 10], step: [  305/  390], loss: [0.4527], avg loss: [0.3301], time: [107.2831ms]\n",
+      "Epoch: [  6/ 10], step: [  306/  390], loss: [0.4973], avg loss: [0.3306], time: [104.9027ms]\n",
+      "Epoch: [  6/ 10], step: [  307/  390], loss: [0.3944], avg loss: [0.3308], time: [103.0478ms]\n",
+      "Epoch: [  6/ 10], step: [  308/  390], loss: [0.3267], avg loss: [0.3308], time: [101.0103ms]\n",
+      "Epoch: [  6/ 10], step: [  309/  390], loss: [0.3917], avg loss: [0.3310], time: [107.0888ms]\n",
+      "Epoch: [  6/ 10], step: [  310/  390], loss: [0.2803], avg loss: [0.3308], time: [105.2139ms]\n",
+      "Epoch: [  6/ 10], step: [  311/  390], loss: [0.4024], avg loss: [0.3311], time: [105.0539ms]\n",
+      "Epoch: [  6/ 10], step: [  312/  390], loss: [0.4093], avg loss: [0.3313], time: [101.4223ms]\n",
+      "Epoch: [  6/ 10], step: [  313/  390], loss: [0.3855], avg loss: [0.3315], time: [107.0051ms]\n",
+      "Epoch: [  6/ 10], step: [  314/  390], loss: [0.3074], avg loss: [0.3314], time: [101.5892ms]\n",
+      "Epoch: [  6/ 10], step: [  315/  390], loss: [0.2501], avg loss: [0.3312], time: [104.3346ms]\n",
+      "Epoch: [  6/ 10], step: [  316/  390], loss: [0.3559], avg loss: [0.3312], time: [105.0799ms]\n",
+      "Epoch: [  6/ 10], step: [  317/  390], loss: [0.3158], avg loss: [0.3312], time: [105.5751ms]\n",
+      "Epoch: [  6/ 10], step: [  318/  390], loss: [0.2860], avg loss: [0.3311], time: [101.3811ms]\n",
+      "Epoch: [  6/ 10], step: [  319/  390], loss: [0.1979], avg loss: [0.3306], time: [107.5933ms]\n",
+      "Epoch: [  6/ 10], step: [  320/  390], loss: [0.2508], avg loss: [0.3304], time: [104.7511ms]\n",
+      "Epoch: [  6/ 10], step: [  321/  390], loss: [0.3351], avg loss: [0.3304], time: [103.9152ms]\n",
+      "Epoch: [  6/ 10], step: [  322/  390], loss: [0.3078], avg loss: [0.3303], time: [101.2144ms]\n",
+      "Epoch: [  6/ 10], step: [  323/  390], loss: [0.2160], avg loss: [0.3300], time: [108.0611ms]\n",
+      "Epoch: [  6/ 10], step: [  324/  390], loss: [0.2717], avg loss: [0.3298], time: [100.1897ms]\n",
+      "Epoch: [  6/ 10], step: [  325/  390], loss: [0.2465], avg loss: [0.3295], time: [106.1027ms]\n",
+      "Epoch: [  6/ 10], step: [  326/  390], loss: [0.4169], avg loss: [0.3298], time: [106.1909ms]\n",
+      "Epoch: [  6/ 10], step: [  327/  390], loss: [0.2714], avg loss: [0.3296], time: [108.6955ms]\n",
+      "Epoch: [  6/ 10], step: [  328/  390], loss: [0.2966], avg loss: [0.3295], time: [101.0115ms]\n",
+      "Epoch: [  6/ 10], step: [  329/  390], loss: [0.2984], avg loss: [0.3294], time: [102.9346ms]\n",
+      "Epoch: [  6/ 10], step: [  330/  390], loss: [0.2708], avg loss: [0.3293], time: [105.4211ms]\n",
+      "Epoch: [  6/ 10], step: [  331/  390], loss: [0.3978], avg loss: [0.3295], time: [106.9992ms]\n",
+      "Epoch: [  6/ 10], step: [  332/  390], loss: [0.3094], avg loss: [0.3294], time: [104.5735ms]\n",
+      "Epoch: [  6/ 10], step: [  333/  390], loss: [0.3462], avg loss: [0.3295], time: [105.4125ms]\n",
+      "Epoch: [  6/ 10], step: [  334/  390], loss: [0.2669], avg loss: [0.3293], time: [101.0020ms]\n",
+      "Epoch: [  6/ 10], step: [  335/  390], loss: [0.4101], avg loss: [0.3295], time: [103.3039ms]\n",
+      "Epoch: [  6/ 10], step: [  336/  390], loss: [0.3374], avg loss: [0.3295], time: [107.1084ms]\n",
+      "Epoch: [  6/ 10], step: [  337/  390], loss: [0.4897], avg loss: [0.3300], time: [102.1457ms]\n",
+      "Epoch: [  6/ 10], step: [  338/  390], loss: [0.4213], avg loss: [0.3303], time: [102.9134ms]\n",
+      "Epoch: [  6/ 10], step: [  339/  390], loss: [0.3470], avg loss: [0.3303], time: [105.8249ms]\n",
+      "Epoch: [  6/ 10], step: [  340/  390], loss: [0.3184], avg loss: [0.3303], time: [101.3196ms]\n",
+      "Epoch: [  6/ 10], step: [  341/  390], loss: [0.2712], avg loss: [0.3301], time: [102.4146ms]\n",
+      "Epoch: [  6/ 10], step: [  342/  390], loss: [0.3386], avg loss: [0.3301], time: [103.4834ms]\n",
+      "Epoch: [  6/ 10], step: [  343/  390], loss: [0.2672], avg loss: [0.3300], time: [107.3389ms]\n",
+      "Epoch: [  6/ 10], step: [  344/  390], loss: [0.2524], avg loss: [0.3297], time: [106.3159ms]\n",
+      "Epoch: [  6/ 10], step: [  345/  390], loss: [0.4011], avg loss: [0.3299], time: [105.5076ms]\n",
+      "Epoch: [  6/ 10], step: [  346/  390], loss: [0.2394], avg loss: [0.3297], time: [106.8244ms]\n",
+      "Epoch: [  6/ 10], step: [  347/  390], loss: [0.3335], avg loss: [0.3297], time: [106.9553ms]\n",
+      "Epoch: [  6/ 10], step: [  348/  390], loss: [0.4013], avg loss: [0.3299], time: [101.4163ms]\n",
+      "Epoch: [  6/ 10], step: [  349/  390], loss: [0.2915], avg loss: [0.3298], time: [108.4950ms]\n",
+      "Epoch: [  6/ 10], step: [  350/  390], loss: [0.3499], avg loss: [0.3298], time: [101.9070ms]\n",
+      "Epoch: [  6/ 10], step: [  351/  390], loss: [0.2878], avg loss: [0.3297], time: [103.7657ms]\n",
+      "Epoch: [  6/ 10], step: [  352/  390], loss: [0.3596], avg loss: [0.3298], time: [99.0610ms]\n",
+      "Epoch: [  6/ 10], step: [  353/  390], loss: [0.2053], avg loss: [0.3295], time: [105.2699ms]\n",
+      "Epoch: [  6/ 10], step: [  354/  390], loss: [0.3241], avg loss: [0.3294], time: [101.9120ms]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: [  6/ 10], step: [  355/  390], loss: [0.4533], avg loss: [0.3298], time: [105.9098ms]\n",
+      "Epoch: [  6/ 10], step: [  356/  390], loss: [0.2419], avg loss: [0.3295], time: [103.5061ms]\n",
+      "Epoch: [  6/ 10], step: [  357/  390], loss: [0.2371], avg loss: [0.3293], time: [102.0308ms]\n",
+      "Epoch: [  6/ 10], step: [  358/  390], loss: [0.3193], avg loss: [0.3293], time: [102.3316ms]\n",
+      "Epoch: [  6/ 10], step: [  359/  390], loss: [0.4685], avg loss: [0.3296], time: [103.1935ms]\n",
+      "Epoch: [  6/ 10], step: [  360/  390], loss: [0.3362], avg loss: [0.3297], time: [103.2398ms]\n",
+      "Epoch: [  6/ 10], step: [  361/  390], loss: [0.4437], avg loss: [0.3300], time: [105.7558ms]\n",
+      "Epoch: [  6/ 10], step: [  362/  390], loss: [0.3613], avg loss: [0.3301], time: [100.1587ms]\n",
+      "Epoch: [  6/ 10], step: [  363/  390], loss: [0.4118], avg loss: [0.3303], time: [106.5342ms]\n",
+      "Epoch: [  6/ 10], step: [  364/  390], loss: [0.3095], avg loss: [0.3302], time: [103.8628ms]\n",
+      "Epoch: [  6/ 10], step: [  365/  390], loss: [0.2669], avg loss: [0.3301], time: [106.3886ms]\n",
+      "Epoch: [  6/ 10], step: [  366/  390], loss: [0.2606], avg loss: [0.3299], time: [106.6759ms]\n",
+      "Epoch: [  6/ 10], step: [  367/  390], loss: [0.3994], avg loss: [0.3301], time: [104.9557ms]\n",
+      "Epoch: [  6/ 10], step: [  368/  390], loss: [0.2873], avg loss: [0.3299], time: [107.0628ms]\n",
+      "Epoch: [  6/ 10], step: [  369/  390], loss: [0.2830], avg loss: [0.3298], time: [108.3372ms]\n",
+      "Epoch: [  6/ 10], step: [  370/  390], loss: [0.2995], avg loss: [0.3297], time: [105.0665ms]\n",
+      "Epoch: [  6/ 10], step: [  371/  390], loss: [0.2545], avg loss: [0.3295], time: [103.4839ms]\n",
+      "Epoch: [  6/ 10], step: [  372/  390], loss: [0.2930], avg loss: [0.3294], time: [102.1507ms]\n",
+      "Epoch: [  6/ 10], step: [  373/  390], loss: [0.3777], avg loss: [0.3296], time: [104.1732ms]\n",
+      "Epoch: [  6/ 10], step: [  374/  390], loss: [0.5867], avg loss: [0.3302], time: [103.1616ms]\n",
+      "Epoch: [  6/ 10], step: [  375/  390], loss: [0.2580], avg loss: [0.3301], time: [105.5446ms]\n",
+      "Epoch: [  6/ 10], step: [  376/  390], loss: [0.1726], avg loss: [0.3296], time: [104.7406ms]\n",
+      "Epoch: [  6/ 10], step: [  377/  390], loss: [0.2685], avg loss: [0.3295], time: [105.3722ms]\n",
+      "Epoch: [  6/ 10], step: [  378/  390], loss: [0.2625], avg loss: [0.3293], time: [105.3462ms]\n",
+      "Epoch: [  6/ 10], step: [  379/  390], loss: [0.2591], avg loss: [0.3291], time: [106.2505ms]\n",
+      "Epoch: [  6/ 10], step: [  380/  390], loss: [0.3863], avg loss: [0.3293], time: [102.0939ms]\n",
+      "Epoch: [  6/ 10], step: [  381/  390], loss: [0.2968], avg loss: [0.3292], time: [107.4014ms]\n",
+      "Epoch: [  6/ 10], step: [  382/  390], loss: [0.3835], avg loss: [0.3293], time: [102.7396ms]\n",
+      "Epoch: [  6/ 10], step: [  383/  390], loss: [0.4430], avg loss: [0.3296], time: [103.3907ms]\n",
+      "Epoch: [  6/ 10], step: [  384/  390], loss: [0.4552], avg loss: [0.3299], time: [104.4273ms]\n",
+      "Epoch: [  6/ 10], step: [  385/  390], loss: [0.2496], avg loss: [0.3297], time: [103.3270ms]\n",
+      "Epoch: [  6/ 10], step: [  386/  390], loss: [0.2851], avg loss: [0.3296], time: [102.7114ms]\n",
+      "Epoch: [  6/ 10], step: [  387/  390], loss: [0.2592], avg loss: [0.3294], time: [103.4312ms]\n",
+      "Epoch: [  6/ 10], step: [  388/  390], loss: [0.3486], avg loss: [0.3295], time: [103.0850ms]\n",
+      "Epoch: [  6/ 10], step: [  389/  390], loss: [0.4242], avg loss: [0.3297], time: [104.1567ms]\n",
+      "Epoch: [  6/ 10], step: [  390/  390], loss: [0.4188], avg loss: [0.3300], time: [879.6704ms]\n",
+      "Epoch time: 41775.477, per step time: 107.117\n",
+      "Epoch time: 41775.770, per step time: 107.117, avg loss: 0.330\n",
+      "************************************************************\n",
+      "Epoch: [  7/ 10], step: [    1/  390], loss: [0.4987], avg loss: [0.4987], time: [103.4799ms]\n",
+      "Epoch: [  7/ 10], step: [    2/  390], loss: [0.2668], avg loss: [0.3827], time: [105.1037ms]\n",
+      "Epoch: [  7/ 10], step: [    3/  390], loss: [0.2438], avg loss: [0.3364], time: [105.2711ms]\n",
+      "Epoch: [  7/ 10], step: [    4/  390], loss: [0.2162], avg loss: [0.3064], time: [103.6854ms]\n",
+      "Epoch: [  7/ 10], step: [    5/  390], loss: [0.2195], avg loss: [0.2890], time: [109.2501ms]\n",
+      "Epoch: [  7/ 10], step: [    6/  390], loss: [0.3050], avg loss: [0.2917], time: [105.3102ms]\n",
+      "Epoch: [  7/ 10], step: [    7/  390], loss: [0.2998], avg loss: [0.2928], time: [104.3055ms]\n",
+      "Epoch: [  7/ 10], step: [    8/  390], loss: [0.2066], avg loss: [0.2820], time: [106.9927ms]\n",
+      "Epoch: [  7/ 10], step: [    9/  390], loss: [0.2900], avg loss: [0.2829], time: [107.9371ms]\n",
+      "Epoch: [  7/ 10], step: [   10/  390], loss: [0.3204], avg loss: [0.2867], time: [107.1513ms]\n",
+      "Epoch: [  7/ 10], step: [   11/  390], loss: [0.3092], avg loss: [0.2887], time: [109.4887ms]\n",
+      "Epoch: [  7/ 10], step: [   12/  390], loss: [0.2089], avg loss: [0.2821], time: [104.9278ms]\n",
+      "Epoch: [  7/ 10], step: [   13/  390], loss: [0.4390], avg loss: [0.2941], time: [104.5280ms]\n",
+      "Epoch: [  7/ 10], step: [   14/  390], loss: [0.2447], avg loss: [0.2906], time: [109.8156ms]\n",
+      "Epoch: [  7/ 10], step: [   15/  390], loss: [0.3001], avg loss: [0.2912], time: [108.1660ms]\n",
+      "Epoch: [  7/ 10], step: [   16/  390], loss: [0.2784], avg loss: [0.2904], time: [106.7076ms]\n",
+      "Epoch: [  7/ 10], step: [   17/  390], loss: [0.3556], avg loss: [0.2943], time: [105.6578ms]\n",
+      "Epoch: [  7/ 10], step: [   18/  390], loss: [0.4071], avg loss: [0.3005], time: [103.9522ms]\n",
+      "Epoch: [  7/ 10], step: [   19/  390], loss: [0.3229], avg loss: [0.3017], time: [103.2581ms]\n",
+      "Epoch: [  7/ 10], step: [   20/  390], loss: [0.3676], avg loss: [0.3050], time: [103.9524ms]\n",
+      "Epoch: [  7/ 10], step: [   21/  390], loss: [0.4012], avg loss: [0.3096], time: [107.7132ms]\n",
+      "Epoch: [  7/ 10], step: [   22/  390], loss: [0.2647], avg loss: [0.3075], time: [108.6879ms]\n",
+      "Epoch: [  7/ 10], step: [   23/  390], loss: [0.2700], avg loss: [0.3059], time: [106.2062ms]\n",
+      "Epoch: [  7/ 10], step: [   24/  390], loss: [0.2553], avg loss: [0.3038], time: [104.6212ms]\n",
+      "Epoch: [  7/ 10], step: [   25/  390], loss: [0.3872], avg loss: [0.3071], time: [104.6693ms]\n",
+      "Epoch: [  7/ 10], step: [   26/  390], loss: [0.2646], avg loss: [0.3055], time: [104.0697ms]\n",
+      "Epoch: [  7/ 10], step: [   27/  390], loss: [0.4048], avg loss: [0.3092], time: [106.9255ms]\n",
+      "Epoch: [  7/ 10], step: [   28/  390], loss: [0.2702], avg loss: [0.3078], time: [103.9541ms]\n",
+      "Epoch: [  7/ 10], step: [   29/  390], loss: [0.2565], avg loss: [0.3060], time: [105.5868ms]\n",
+      "Epoch: [  7/ 10], step: [   30/  390], loss: [0.3814], avg loss: [0.3085], time: [108.4297ms]\n",
+      "Epoch: [  7/ 10], step: [   31/  390], loss: [0.2905], avg loss: [0.3080], time: [106.5609ms]\n",
+      "Epoch: [  7/ 10], step: [   32/  390], loss: [0.3505], avg loss: [0.3093], time: [103.4110ms]\n",
+      "Epoch: [  7/ 10], step: [   33/  390], loss: [0.2309], avg loss: [0.3069], time: [105.4876ms]\n",
+      "Epoch: [  7/ 10], step: [   34/  390], loss: [0.2800], avg loss: [0.3061], time: [106.8888ms]\n",
+      "Epoch: [  7/ 10], step: [   35/  390], loss: [0.2286], avg loss: [0.3039], time: [104.6867ms]\n",
+      "Epoch: [  7/ 10], step: [   36/  390], loss: [0.2181], avg loss: [0.3015], time: [108.2883ms]\n",
+      "Epoch: [  7/ 10], step: [   37/  390], loss: [0.3667], avg loss: [0.3033], time: [104.5008ms]\n",
+      "Epoch: [  7/ 10], step: [   38/  390], loss: [0.3457], avg loss: [0.3044], time: [105.8502ms]\n",
+      "Epoch: [  7/ 10], step: [   39/  390], loss: [0.3112], avg loss: [0.3046], time: [107.2445ms]\n",
+      "Epoch: [  7/ 10], step: [   40/  390], loss: [0.2804], avg loss: [0.3040], time: [106.5450ms]\n",
+      "Epoch: [  7/ 10], step: [   41/  390], loss: [0.2552], avg loss: [0.3028], time: [103.1537ms]\n",
+      "Epoch: [  7/ 10], step: [   42/  390], loss: [0.1920], avg loss: [0.3001], time: [104.1739ms]\n",
+      "Epoch: [  7/ 10], step: [   43/  390], loss: [0.3377], avg loss: [0.3010], time: [103.8806ms]\n",
+      "Epoch: [  7/ 10], step: [   44/  390], loss: [0.2705], avg loss: [0.3003], time: [104.0616ms]\n",
+      "Epoch: [  7/ 10], step: [   45/  390], loss: [0.4264], avg loss: [0.3031], time: [105.0491ms]\n",
+      "Epoch: [  7/ 10], step: [   46/  390], loss: [0.2829], avg loss: [0.3027], time: [107.9972ms]\n",
+      "Epoch: [  7/ 10], step: [   47/  390], loss: [0.4340], avg loss: [0.3055], time: [103.6415ms]\n",
+      "Epoch: [  7/ 10], step: [   48/  390], loss: [0.2982], avg loss: [0.3053], time: [105.5567ms]\n",
+      "Epoch: [  7/ 10], step: [   49/  390], loss: [0.2619], avg loss: [0.3044], time: [103.4281ms]\n",
+      "Epoch: [  7/ 10], step: [   50/  390], loss: [0.3331], avg loss: [0.3050], time: [105.3843ms]\n",
+      "Epoch: [  7/ 10], step: [   51/  390], loss: [0.2737], avg loss: [0.3044], time: [102.9415ms]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: [  7/ 10], step: [   52/  390], loss: [0.3062], avg loss: [0.3044], time: [106.5502ms]\n",
+      "Epoch: [  7/ 10], step: [   53/  390], loss: [0.3455], avg loss: [0.3052], time: [105.5696ms]\n",
+      "Epoch: [  7/ 10], step: [   54/  390], loss: [0.3581], avg loss: [0.3062], time: [104.5468ms]\n",
+      "Epoch: [  7/ 10], step: [   55/  390], loss: [0.2514], avg loss: [0.3052], time: [104.1136ms]\n",
+      "Epoch: [  7/ 10], step: [   56/  390], loss: [0.3478], avg loss: [0.3060], time: [109.1614ms]\n",
+      "Epoch: [  7/ 10], step: [   57/  390], loss: [0.2962], avg loss: [0.3058], time: [108.6161ms]\n",
+      "Epoch: [  7/ 10], step: [   58/  390], loss: [0.2631], avg loss: [0.3050], time: [104.1448ms]\n",
+      "Epoch: [  7/ 10], step: [   59/  390], loss: [0.2864], avg loss: [0.3047], time: [105.4285ms]\n",
+      "Epoch: [  7/ 10], step: [   60/  390], loss: [0.3093], avg loss: [0.3048], time: [104.7280ms]\n",
+      "Epoch: [  7/ 10], step: [   61/  390], loss: [0.2864], avg loss: [0.3045], time: [103.9274ms]\n",
+      "Epoch: [  7/ 10], step: [   62/  390], loss: [0.1889], avg loss: [0.3026], time: [105.6631ms]\n",
+      "Epoch: [  7/ 10], step: [   63/  390], loss: [0.3674], avg loss: [0.3037], time: [108.4349ms]\n",
+      "Epoch: [  7/ 10], step: [   64/  390], loss: [0.3365], avg loss: [0.3042], time: [103.7886ms]\n",
+      "Epoch: [  7/ 10], step: [   65/  390], loss: [0.3307], avg loss: [0.3046], time: [105.7844ms]\n",
+      "Epoch: [  7/ 10], step: [   66/  390], loss: [0.1550], avg loss: [0.3023], time: [104.3255ms]\n",
+      "Epoch: [  7/ 10], step: [   67/  390], loss: [0.2388], avg loss: [0.3014], time: [105.9399ms]\n",
+      "Epoch: [  7/ 10], step: [   68/  390], loss: [0.3041], avg loss: [0.3014], time: [106.5049ms]\n",
+      "Epoch: [  7/ 10], step: [   69/  390], loss: [0.3472], avg loss: [0.3021], time: [105.2754ms]\n",
+      "Epoch: [  7/ 10], step: [   70/  390], loss: [0.3063], avg loss: [0.3021], time: [107.3632ms]\n",
+      "Epoch: [  7/ 10], step: [   71/  390], loss: [0.2721], avg loss: [0.3017], time: [103.0774ms]\n",
+      "Epoch: [  7/ 10], step: [   72/  390], loss: [0.2984], avg loss: [0.3017], time: [108.0120ms]\n",
+      "Epoch: [  7/ 10], step: [   73/  390], loss: [0.2822], avg loss: [0.3014], time: [107.9943ms]\n",
+      "Epoch: [  7/ 10], step: [   74/  390], loss: [0.2518], avg loss: [0.3007], time: [103.7858ms]\n",
+      "Epoch: [  7/ 10], step: [   75/  390], loss: [0.3445], avg loss: [0.3013], time: [105.7239ms]\n",
+      "Epoch: [  7/ 10], step: [   76/  390], loss: [0.2901], avg loss: [0.3012], time: [107.0602ms]\n",
+      "Epoch: [  7/ 10], step: [   77/  390], loss: [0.3076], avg loss: [0.3013], time: [104.1758ms]\n",
+      "Epoch: [  7/ 10], step: [   78/  390], loss: [0.1980], avg loss: [0.2999], time: [105.1855ms]\n",
+      "Epoch: [  7/ 10], step: [   79/  390], loss: [0.1895], avg loss: [0.2985], time: [103.4663ms]\n",
+      "Epoch: [  7/ 10], step: [   80/  390], loss: [0.2033], avg loss: [0.2973], time: [106.2453ms]\n",
+      "Epoch: [  7/ 10], step: [   81/  390], loss: [0.2264], avg loss: [0.2965], time: [103.9464ms]\n",
+      "Epoch: [  7/ 10], step: [   82/  390], loss: [0.2937], avg loss: [0.2964], time: [104.4967ms]\n",
+      "Epoch: [  7/ 10], step: [   83/  390], loss: [0.2607], avg loss: [0.2960], time: [104.8219ms]\n",
+      "Epoch: [  7/ 10], step: [   84/  390], loss: [0.4120], avg loss: [0.2974], time: [106.9705ms]\n",
+      "Epoch: [  7/ 10], step: [   85/  390], loss: [0.2139], avg loss: [0.2964], time: [109.8955ms]\n",
+      "Epoch: [  7/ 10], step: [   86/  390], loss: [0.2820], avg loss: [0.2962], time: [104.9821ms]\n",
+      "Epoch: [  7/ 10], step: [   87/  390], loss: [0.4323], avg loss: [0.2978], time: [105.1698ms]\n",
+      "Epoch: [  7/ 10], step: [   88/  390], loss: [0.3326], avg loss: [0.2982], time: [105.3722ms]\n",
+      "Epoch: [  7/ 10], step: [   89/  390], loss: [0.3487], avg loss: [0.2988], time: [104.0766ms]\n",
+      "Epoch: [  7/ 10], step: [   90/  390], loss: [0.3475], avg loss: [0.2993], time: [102.5507ms]\n",
+      "Epoch: [  7/ 10], step: [   91/  390], loss: [0.3121], avg loss: [0.2994], time: [105.6521ms]\n",
+      "Epoch: [  7/ 10], step: [   92/  390], loss: [0.3437], avg loss: [0.2999], time: [103.9290ms]\n",
+      "Epoch: [  7/ 10], step: [   93/  390], loss: [0.3428], avg loss: [0.3004], time: [109.7741ms]\n",
+      "Epoch: [  7/ 10], step: [   94/  390], loss: [0.3187], avg loss: [0.3006], time: [107.1169ms]\n",
+      "Epoch: [  7/ 10], step: [   95/  390], loss: [0.2734], avg loss: [0.3003], time: [103.1713ms]\n",
+      "Epoch: [  7/ 10], step: [   96/  390], loss: [0.4287], avg loss: [0.3016], time: [107.0080ms]\n",
+      "Epoch: [  7/ 10], step: [   97/  390], loss: [0.2319], avg loss: [0.3009], time: [103.9279ms]\n",
+      "Epoch: [  7/ 10], step: [   98/  390], loss: [0.2512], avg loss: [0.3004], time: [107.6398ms]\n",
+      "Epoch: [  7/ 10], step: [   99/  390], loss: [0.3681], avg loss: [0.3011], time: [105.2694ms]\n",
+      "Epoch: [  7/ 10], step: [  100/  390], loss: [0.2600], avg loss: [0.3007], time: [102.6216ms]\n",
+      "Epoch: [  7/ 10], step: [  101/  390], loss: [0.3838], avg loss: [0.3015], time: [105.0179ms]\n",
+      "Epoch: [  7/ 10], step: [  102/  390], loss: [0.2613], avg loss: [0.3011], time: [105.4828ms]\n",
+      "Epoch: [  7/ 10], step: [  103/  390], loss: [0.2161], avg loss: [0.3003], time: [105.6345ms]\n",
+      "Epoch: [  7/ 10], step: [  104/  390], loss: [0.2999], avg loss: [0.3003], time: [107.4884ms]\n",
+      "Epoch: [  7/ 10], step: [  105/  390], loss: [0.2319], avg loss: [0.2996], time: [104.6476ms]\n",
+      "Epoch: [  7/ 10], step: [  106/  390], loss: [0.3333], avg loss: [0.2999], time: [107.5947ms]\n",
+      "Epoch: [  7/ 10], step: [  107/  390], loss: [0.2740], avg loss: [0.2997], time: [105.1531ms]\n",
+      "Epoch: [  7/ 10], step: [  108/  390], loss: [0.2087], avg loss: [0.2989], time: [102.8006ms]\n",
+      "Epoch: [  7/ 10], step: [  109/  390], loss: [0.3952], avg loss: [0.2997], time: [104.7370ms]\n",
+      "Epoch: [  7/ 10], step: [  110/  390], loss: [0.1982], avg loss: [0.2988], time: [108.3288ms]\n",
+      "Epoch: [  7/ 10], step: [  111/  390], loss: [0.3236], avg loss: [0.2990], time: [104.2867ms]\n",
+      "Epoch: [  7/ 10], step: [  112/  390], loss: [0.3696], avg loss: [0.2997], time: [107.6927ms]\n",
+      "Epoch: [  7/ 10], step: [  113/  390], loss: [0.2700], avg loss: [0.2994], time: [104.3582ms]\n",
+      "Epoch: [  7/ 10], step: [  114/  390], loss: [0.2315], avg loss: [0.2988], time: [103.0068ms]\n",
+      "Epoch: [  7/ 10], step: [  115/  390], loss: [0.3591], avg loss: [0.2993], time: [103.8775ms]\n",
+      "Epoch: [  7/ 10], step: [  116/  390], loss: [0.3878], avg loss: [0.3001], time: [103.0214ms]\n",
+      "Epoch: [  7/ 10], step: [  117/  390], loss: [0.2875], avg loss: [0.3000], time: [106.9419ms]\n",
+      "Epoch: [  7/ 10], step: [  118/  390], loss: [0.2651], avg loss: [0.2997], time: [107.3637ms]\n",
+      "Epoch: [  7/ 10], step: [  119/  390], loss: [0.3032], avg loss: [0.2997], time: [105.6190ms]\n",
+      "Epoch: [  7/ 10], step: [  120/  390], loss: [0.3698], avg loss: [0.3003], time: [104.0704ms]\n",
+      "Epoch: [  7/ 10], step: [  121/  390], loss: [0.4825], avg loss: [0.3018], time: [105.0093ms]\n",
+      "Epoch: [  7/ 10], step: [  122/  390], loss: [0.3069], avg loss: [0.3019], time: [105.5503ms]\n",
+      "Epoch: [  7/ 10], step: [  123/  390], loss: [0.3896], avg loss: [0.3026], time: [105.6709ms]\n",
+      "Epoch: [  7/ 10], step: [  124/  390], loss: [0.3294], avg loss: [0.3028], time: [106.7777ms]\n",
+      "Epoch: [  7/ 10], step: [  125/  390], loss: [0.2650], avg loss: [0.3025], time: [101.7442ms]\n",
+      "Epoch: [  7/ 10], step: [  126/  390], loss: [0.3385], avg loss: [0.3028], time: [104.1543ms]\n",
+      "Epoch: [  7/ 10], step: [  127/  390], loss: [0.3434], avg loss: [0.3031], time: [104.6388ms]\n",
+      "Epoch: [  7/ 10], step: [  128/  390], loss: [0.3783], avg loss: [0.3037], time: [106.7832ms]\n",
+      "Epoch: [  7/ 10], step: [  129/  390], loss: [0.4386], avg loss: [0.3047], time: [107.1422ms]\n",
+      "Epoch: [  7/ 10], step: [  130/  390], loss: [0.2633], avg loss: [0.3044], time: [105.0718ms]\n",
+      "Epoch: [  7/ 10], step: [  131/  390], loss: [0.3878], avg loss: [0.3050], time: [105.1097ms]\n",
+      "Epoch: [  7/ 10], step: [  132/  390], loss: [0.2874], avg loss: [0.3049], time: [107.7993ms]\n",
+      "Epoch: [  7/ 10], step: [  133/  390], loss: [0.4297], avg loss: [0.3058], time: [103.5244ms]\n",
+      "Epoch: [  7/ 10], step: [  134/  390], loss: [0.4489], avg loss: [0.3069], time: [105.8481ms]\n",
+      "Epoch: [  7/ 10], step: [  135/  390], loss: [0.4091], avg loss: [0.3077], time: [109.1225ms]\n",
+      "Epoch: [  7/ 10], step: [  136/  390], loss: [0.3105], avg loss: [0.3077], time: [102.7782ms]\n",
+      "Epoch: [  7/ 10], step: [  137/  390], loss: [0.3260], avg loss: [0.3078], time: [104.6405ms]\n",
+      "Epoch: [  7/ 10], step: [  138/  390], loss: [0.4096], avg loss: [0.3086], time: [103.8883ms]\n",
+      "Epoch: [  7/ 10], step: [  139/  390], loss: [0.3988], avg loss: [0.3092], time: [105.8524ms]\n",
+      "Epoch: [  7/ 10], step: [  140/  390], loss: [0.1529], avg loss: [0.3081], time: [106.8971ms]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: [  7/ 10], step: [  141/  390], loss: [0.4725], avg loss: [0.3093], time: [102.4973ms]\n",
+      "Epoch: [  7/ 10], step: [  142/  390], loss: [0.3928], avg loss: [0.3099], time: [105.4571ms]\n",
+      "Epoch: [  7/ 10], step: [  143/  390], loss: [0.3646], avg loss: [0.3102], time: [103.7819ms]\n",
+      "Epoch: [  7/ 10], step: [  144/  390], loss: [0.2601], avg loss: [0.3099], time: [107.0786ms]\n",
+      "Epoch: [  7/ 10], step: [  145/  390], loss: [0.4328], avg loss: [0.3107], time: [107.2645ms]\n",
+      "Epoch: [  7/ 10], step: [  146/  390], loss: [0.4251], avg loss: [0.3115], time: [104.9128ms]\n",
+      "Epoch: [  7/ 10], step: [  147/  390], loss: [0.2112], avg loss: [0.3108], time: [105.6156ms]\n",
+      "Epoch: [  7/ 10], step: [  148/  390], loss: [0.3383], avg loss: [0.3110], time: [102.7992ms]\n",
+      "Epoch: [  7/ 10], step: [  149/  390], loss: [0.3793], avg loss: [0.3115], time: [109.6804ms]\n",
+      "Epoch: [  7/ 10], step: [  150/  390], loss: [0.2300], avg loss: [0.3109], time: [105.2263ms]\n",
+      "Epoch: [  7/ 10], step: [  151/  390], loss: [0.3427], avg loss: [0.3111], time: [104.9993ms]\n",
+      "Epoch: [  7/ 10], step: [  152/  390], loss: [0.3089], avg loss: [0.3111], time: [109.7083ms]\n",
+      "Epoch: [  7/ 10], step: [  153/  390], loss: [0.3507], avg loss: [0.3114], time: [106.1387ms]\n",
+      "Epoch: [  7/ 10], step: [  154/  390], loss: [0.2947], avg loss: [0.3113], time: [104.9142ms]\n",
+      "Epoch: [  7/ 10], step: [  155/  390], loss: [0.2489], avg loss: [0.3109], time: [105.5384ms]\n",
+      "Epoch: [  7/ 10], step: [  156/  390], loss: [0.2677], avg loss: [0.3106], time: [103.9970ms]\n",
+      "Epoch: [  7/ 10], step: [  157/  390], loss: [0.3559], avg loss: [0.3109], time: [104.3847ms]\n",
+      "Epoch: [  7/ 10], step: [  158/  390], loss: [0.4911], avg loss: [0.3120], time: [103.8167ms]\n",
+      "Epoch: [  7/ 10], step: [  159/  390], loss: [0.1923], avg loss: [0.3113], time: [104.1927ms]\n",
+      "Epoch: [  7/ 10], step: [  160/  390], loss: [0.2644], avg loss: [0.3110], time: [108.0878ms]\n",
+      "Epoch: [  7/ 10], step: [  161/  390], loss: [0.2804], avg loss: [0.3108], time: [105.1543ms]\n",
+      "Epoch: [  7/ 10], step: [  162/  390], loss: [0.4733], avg loss: [0.3118], time: [106.6024ms]\n",
+      "Epoch: [  7/ 10], step: [  163/  390], loss: [0.3742], avg loss: [0.3122], time: [108.6566ms]\n",
+      "Epoch: [  7/ 10], step: [  164/  390], loss: [0.1808], avg loss: [0.3114], time: [107.8246ms]\n",
+      "Epoch: [  7/ 10], step: [  165/  390], loss: [0.3073], avg loss: [0.3114], time: [106.7700ms]\n",
+      "Epoch: [  7/ 10], step: [  166/  390], loss: [0.2948], avg loss: [0.3113], time: [104.8410ms]\n",
+      "Epoch: [  7/ 10], step: [  167/  390], loss: [0.2632], avg loss: [0.3110], time: [105.4540ms]\n",
+      "Epoch: [  7/ 10], step: [  168/  390], loss: [0.3022], avg loss: [0.3109], time: [106.5183ms]\n",
+      "Epoch: [  7/ 10], step: [  169/  390], loss: [0.2658], avg loss: [0.3107], time: [106.8065ms]\n",
+      "Epoch: [  7/ 10], step: [  170/  390], loss: [0.2519], avg loss: [0.3103], time: [107.7523ms]\n",
+      "Epoch: [  7/ 10], step: [  171/  390], loss: [0.1923], avg loss: [0.3096], time: [103.9245ms]\n",
+      "Epoch: [  7/ 10], step: [  172/  390], loss: [0.4174], avg loss: [0.3102], time: [103.5368ms]\n",
+      "Epoch: [  7/ 10], step: [  173/  390], loss: [0.2779], avg loss: [0.3101], time: [108.7277ms]\n",
+      "Epoch: [  7/ 10], step: [  174/  390], loss: [0.2294], avg loss: [0.3096], time: [105.7508ms]\n",
+      "Epoch: [  7/ 10], step: [  175/  390], loss: [0.3028], avg loss: [0.3096], time: [105.8543ms]\n",
+      "Epoch: [  7/ 10], step: [  176/  390], loss: [0.2897], avg loss: [0.3094], time: [102.3648ms]\n",
+      "Epoch: [  7/ 10], step: [  177/  390], loss: [0.3320], avg loss: [0.3096], time: [107.7371ms]\n",
+      "Epoch: [  7/ 10], step: [  178/  390], loss: [0.4117], avg loss: [0.3101], time: [103.9295ms]\n",
+      "Epoch: [  7/ 10], step: [  179/  390], loss: [0.2853], avg loss: [0.3100], time: [107.1053ms]\n",
+      "Epoch: [  7/ 10], step: [  180/  390], loss: [0.2863], avg loss: [0.3099], time: [104.0466ms]\n",
+      "Epoch: [  7/ 10], step: [  181/  390], loss: [0.2929], avg loss: [0.3098], time: [103.6911ms]\n",
+      "Epoch: [  7/ 10], step: [  182/  390], loss: [0.3603], avg loss: [0.3101], time: [104.2993ms]\n",
+      "Epoch: [  7/ 10], step: [  183/  390], loss: [0.3064], avg loss: [0.3100], time: [107.1732ms]\n",
+      "Epoch: [  7/ 10], step: [  184/  390], loss: [0.3416], avg loss: [0.3102], time: [107.9051ms]\n",
+      "Epoch: [  7/ 10], step: [  185/  390], loss: [0.1937], avg loss: [0.3096], time: [107.5134ms]\n",
+      "Epoch: [  7/ 10], step: [  186/  390], loss: [0.3261], avg loss: [0.3097], time: [103.9124ms]\n",
+      "Epoch: [  7/ 10], step: [  187/  390], loss: [0.4091], avg loss: [0.3102], time: [106.4439ms]\n",
+      "Epoch: [  7/ 10], step: [  188/  390], loss: [0.3246], avg loss: [0.3103], time: [107.5549ms]\n",
+      "Epoch: [  7/ 10], step: [  189/  390], loss: [0.2380], avg loss: [0.3099], time: [109.7984ms]\n",
+      "Epoch: [  7/ 10], step: [  190/  390], loss: [0.3734], avg loss: [0.3102], time: [104.4533ms]\n",
+      "Epoch: [  7/ 10], step: [  191/  390], loss: [0.2739], avg loss: [0.3100], time: [106.0677ms]\n",
+      "Epoch: [  7/ 10], step: [  192/  390], loss: [0.1707], avg loss: [0.3093], time: [108.5434ms]\n",
+      "Epoch: [  7/ 10], step: [  193/  390], loss: [0.2889], avg loss: [0.3092], time: [105.4544ms]\n",
+      "Epoch: [  7/ 10], step: [  194/  390], loss: [0.3508], avg loss: [0.3094], time: [101.5360ms]\n",
+      "Epoch: [  7/ 10], step: [  195/  390], loss: [0.3550], avg loss: [0.3097], time: [103.0908ms]\n",
+      "Epoch: [  7/ 10], step: [  196/  390], loss: [0.3134], avg loss: [0.3097], time: [103.9538ms]\n",
+      "Epoch: [  7/ 10], step: [  197/  390], loss: [0.2662], avg loss: [0.3095], time: [103.5016ms]\n",
+      "Epoch: [  7/ 10], step: [  198/  390], loss: [0.1943], avg loss: [0.3089], time: [104.9876ms]\n",
+      "Epoch: [  7/ 10], step: [  199/  390], loss: [0.2413], avg loss: [0.3085], time: [104.7204ms]\n",
+      "Epoch: [  7/ 10], step: [  200/  390], loss: [0.4060], avg loss: [0.3090], time: [104.4509ms]\n",
+      "Epoch: [  7/ 10], step: [  201/  390], loss: [0.2927], avg loss: [0.3089], time: [105.2222ms]\n",
+      "Epoch: [  7/ 10], step: [  202/  390], loss: [0.4597], avg loss: [0.3097], time: [104.9170ms]\n",
+      "Epoch: [  7/ 10], step: [  203/  390], loss: [0.1949], avg loss: [0.3091], time: [103.9994ms]\n",
+      "Epoch: [  7/ 10], step: [  204/  390], loss: [0.2847], avg loss: [0.3090], time: [102.7470ms]\n",
+      "Epoch: [  7/ 10], step: [  205/  390], loss: [0.2219], avg loss: [0.3086], time: [108.4127ms]\n",
+      "Epoch: [  7/ 10], step: [  206/  390], loss: [0.2121], avg loss: [0.3081], time: [109.1967ms]\n",
+      "Epoch: [  7/ 10], step: [  207/  390], loss: [0.2721], avg loss: [0.3079], time: [104.8143ms]\n",
+      "Epoch: [  7/ 10], step: [  208/  390], loss: [0.3978], avg loss: [0.3084], time: [105.2120ms]\n",
+      "Epoch: [  7/ 10], step: [  209/  390], loss: [0.3549], avg loss: [0.3086], time: [108.5777ms]\n",
+      "Epoch: [  7/ 10], step: [  210/  390], loss: [0.2148], avg loss: [0.3081], time: [104.9058ms]\n",
+      "Epoch: [  7/ 10], step: [  211/  390], loss: [0.3941], avg loss: [0.3085], time: [105.1302ms]\n",
+      "Epoch: [  7/ 10], step: [  212/  390], loss: [0.3572], avg loss: [0.3088], time: [107.9516ms]\n",
+      "Epoch: [  7/ 10], step: [  213/  390], loss: [0.4223], avg loss: [0.3093], time: [105.4211ms]\n",
+      "Epoch: [  7/ 10], step: [  214/  390], loss: [0.3817], avg loss: [0.3096], time: [106.9067ms]\n",
+      "Epoch: [  7/ 10], step: [  215/  390], loss: [0.2850], avg loss: [0.3095], time: [104.4755ms]\n",
+      "Epoch: [  7/ 10], step: [  216/  390], loss: [0.3105], avg loss: [0.3095], time: [104.7082ms]\n",
+      "Epoch: [  7/ 10], step: [  217/  390], loss: [0.2596], avg loss: [0.3093], time: [106.2517ms]\n",
+      "Epoch: [  7/ 10], step: [  218/  390], loss: [0.2437], avg loss: [0.3090], time: [104.4776ms]\n",
+      "Epoch: [  7/ 10], step: [  219/  390], loss: [0.3108], avg loss: [0.3090], time: [102.9851ms]\n",
+      "Epoch: [  7/ 10], step: [  220/  390], loss: [0.2695], avg loss: [0.3088], time: [103.1549ms]\n",
+      "Epoch: [  7/ 10], step: [  221/  390], loss: [0.1840], avg loss: [0.3083], time: [106.4603ms]\n",
+      "Epoch: [  7/ 10], step: [  222/  390], loss: [0.3094], avg loss: [0.3083], time: [107.9845ms]\n",
+      "Epoch: [  7/ 10], step: [  223/  390], loss: [0.3207], avg loss: [0.3083], time: [105.9992ms]\n",
+      "Epoch: [  7/ 10], step: [  224/  390], loss: [0.2268], avg loss: [0.3080], time: [107.3406ms]\n",
+      "Epoch: [  7/ 10], step: [  225/  390], loss: [0.2396], avg loss: [0.3077], time: [105.1574ms]\n",
+      "Epoch: [  7/ 10], step: [  226/  390], loss: [0.1836], avg loss: [0.3071], time: [105.2394ms]\n",
+      "Epoch: [  7/ 10], step: [  227/  390], loss: [0.2902], avg loss: [0.3070], time: [103.7591ms]\n",
+      "Epoch: [  7/ 10], step: [  228/  390], loss: [0.3813], avg loss: [0.3074], time: [103.8945ms]\n",
+      "Epoch: [  7/ 10], step: [  229/  390], loss: [0.2926], avg loss: [0.3073], time: [104.3301ms]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: [  7/ 10], step: [  230/  390], loss: [0.4031], avg loss: [0.3077], time: [104.8095ms]\n",
+      "Epoch: [  7/ 10], step: [  231/  390], loss: [0.2659], avg loss: [0.3075], time: [106.7090ms]\n",
+      "Epoch: [  7/ 10], step: [  232/  390], loss: [0.4359], avg loss: [0.3081], time: [105.9318ms]\n",
+      "Epoch: [  7/ 10], step: [  233/  390], loss: [0.2296], avg loss: [0.3078], time: [104.9607ms]\n",
+      "Epoch: [  7/ 10], step: [  234/  390], loss: [0.3760], avg loss: [0.3080], time: [104.5735ms]\n",
+      "Epoch: [  7/ 10], step: [  235/  390], loss: [0.1930], avg loss: [0.3076], time: [106.2450ms]\n",
+      "Epoch: [  7/ 10], step: [  236/  390], loss: [0.4012], avg loss: [0.3080], time: [105.0429ms]\n",
+      "Epoch: [  7/ 10], step: [  237/  390], loss: [0.1525], avg loss: [0.3073], time: [103.3261ms]\n",
+      "Epoch: [  7/ 10], step: [  238/  390], loss: [0.4822], avg loss: [0.3080], time: [105.6840ms]\n",
+      "Epoch: [  7/ 10], step: [  239/  390], loss: [0.2978], avg loss: [0.3080], time: [108.5942ms]\n",
+      "Epoch: [  7/ 10], step: [  240/  390], loss: [0.2879], avg loss: [0.3079], time: [105.2735ms]\n",
+      "Epoch: [  7/ 10], step: [  241/  390], loss: [0.3184], avg loss: [0.3079], time: [103.0774ms]\n",
+      "Epoch: [  7/ 10], step: [  242/  390], loss: [0.3067], avg loss: [0.3079], time: [108.2737ms]\n",
+      "Epoch: [  7/ 10], step: [  243/  390], loss: [0.3059], avg loss: [0.3079], time: [105.4873ms]\n",
+      "Epoch: [  7/ 10], step: [  244/  390], loss: [0.3247], avg loss: [0.3080], time: [107.8405ms]\n",
+      "Epoch: [  7/ 10], step: [  245/  390], loss: [0.5435], avg loss: [0.3090], time: [104.5954ms]\n",
+      "Epoch: [  7/ 10], step: [  246/  390], loss: [0.3728], avg loss: [0.3092], time: [105.0222ms]\n",
+      "Epoch: [  7/ 10], step: [  247/  390], loss: [0.3015], avg loss: [0.3092], time: [104.3298ms]\n",
+      "Epoch: [  7/ 10], step: [  248/  390], loss: [0.2837], avg loss: [0.3091], time: [105.3295ms]\n",
+      "Epoch: [  7/ 10], step: [  249/  390], loss: [0.2077], avg loss: [0.3087], time: [104.7432ms]\n",
+      "Epoch: [  7/ 10], step: [  250/  390], loss: [0.1852], avg loss: [0.3082], time: [107.6858ms]\n",
+      "Epoch: [  7/ 10], step: [  251/  390], loss: [0.2704], avg loss: [0.3080], time: [105.6550ms]\n",
+      "Epoch: [  7/ 10], step: [  252/  390], loss: [0.3132], avg loss: [0.3081], time: [105.1145ms]\n",
+      "Epoch: [  7/ 10], step: [  253/  390], loss: [0.2244], avg loss: [0.3077], time: [108.8803ms]\n",
+      "Epoch: [  7/ 10], step: [  254/  390], loss: [0.2337], avg loss: [0.3074], time: [104.8040ms]\n",
+      "Epoch: [  7/ 10], step: [  255/  390], loss: [0.2662], avg loss: [0.3073], time: [105.6545ms]\n",
+      "Epoch: [  7/ 10], step: [  256/  390], loss: [0.1683], avg loss: [0.3067], time: [105.4611ms]\n",
+      "Epoch: [  7/ 10], step: [  257/  390], loss: [0.3610], avg loss: [0.3069], time: [106.9396ms]\n",
+      "Epoch: [  7/ 10], step: [  258/  390], loss: [0.2154], avg loss: [0.3066], time: [105.5369ms]\n",
+      "Epoch: [  7/ 10], step: [  259/  390], loss: [0.3245], avg loss: [0.3067], time: [105.3894ms]\n",
+      "Epoch: [  7/ 10], step: [  260/  390], loss: [0.3826], avg loss: [0.3069], time: [107.1212ms]\n",
+      "Epoch: [  7/ 10], step: [  261/  390], loss: [0.4108], avg loss: [0.3073], time: [106.4579ms]\n",
+      "Epoch: [  7/ 10], step: [  262/  390], loss: [0.2967], avg loss: [0.3073], time: [107.2414ms]\n",
+      "Epoch: [  7/ 10], step: [  263/  390], loss: [0.2311], avg loss: [0.3070], time: [103.9524ms]\n",
+      "Epoch: [  7/ 10], step: [  264/  390], loss: [0.3229], avg loss: [0.3071], time: [104.0058ms]\n",
+      "Epoch: [  7/ 10], step: [  265/  390], loss: [0.3456], avg loss: [0.3072], time: [105.0808ms]\n",
+      "Epoch: [  7/ 10], step: [  266/  390], loss: [0.2595], avg loss: [0.3070], time: [108.4282ms]\n",
+      "Epoch: [  7/ 10], step: [  267/  390], loss: [0.2446], avg loss: [0.3068], time: [106.0038ms]\n",
+      "Epoch: [  7/ 10], step: [  268/  390], loss: [0.2589], avg loss: [0.3066], time: [104.6507ms]\n",
+      "Epoch: [  7/ 10], step: [  269/  390], loss: [0.3324], avg loss: [0.3067], time: [102.9410ms]\n",
+      "Epoch: [  7/ 10], step: [  270/  390], loss: [0.2709], avg loss: [0.3066], time: [105.9430ms]\n",
+      "Epoch: [  7/ 10], step: [  271/  390], loss: [0.3636], avg loss: [0.3068], time: [104.7845ms]\n",
+      "Epoch: [  7/ 10], step: [  272/  390], loss: [0.3574], avg loss: [0.3070], time: [108.6175ms]\n",
+      "Epoch: [  7/ 10], step: [  273/  390], loss: [0.3321], avg loss: [0.3071], time: [104.9302ms]\n",
+      "Epoch: [  7/ 10], step: [  274/  390], loss: [0.2917], avg loss: [0.3070], time: [108.9404ms]\n",
+      "Epoch: [  7/ 10], step: [  275/  390], loss: [0.2740], avg loss: [0.3069], time: [105.9654ms]\n",
+      "Epoch: [  7/ 10], step: [  276/  390], loss: [0.2684], avg loss: [0.3068], time: [103.2541ms]\n",
+      "Epoch: [  7/ 10], step: [  277/  390], loss: [0.2436], avg loss: [0.3065], time: [102.8297ms]\n",
+      "Epoch: [  7/ 10], step: [  278/  390], loss: [0.4741], avg loss: [0.3071], time: [106.4193ms]\n",
+      "Epoch: [  7/ 10], step: [  279/  390], loss: [0.3996], avg loss: [0.3075], time: [107.2798ms]\n",
+      "Epoch: [  7/ 10], step: [  280/  390], loss: [0.3023], avg loss: [0.3075], time: [105.6099ms]\n",
+      "Epoch: [  7/ 10], step: [  281/  390], loss: [0.2293], avg loss: [0.3072], time: [105.9947ms]\n",
+      "Epoch: [  7/ 10], step: [  282/  390], loss: [0.3209], avg loss: [0.3072], time: [104.4874ms]\n",
+      "Epoch: [  7/ 10], step: [  283/  390], loss: [0.3115], avg loss: [0.3072], time: [105.0091ms]\n",
+      "Epoch: [  7/ 10], step: [  284/  390], loss: [0.2205], avg loss: [0.3069], time: [104.9547ms]\n",
+      "Epoch: [  7/ 10], step: [  285/  390], loss: [0.2650], avg loss: [0.3068], time: [106.4470ms]\n",
+      "Epoch: [  7/ 10], step: [  286/  390], loss: [0.3380], avg loss: [0.3069], time: [105.9015ms]\n",
+      "Epoch: [  7/ 10], step: [  287/  390], loss: [0.4386], avg loss: [0.3074], time: [108.6755ms]\n",
+      "Epoch: [  7/ 10], step: [  288/  390], loss: [0.3113], avg loss: [0.3074], time: [106.8857ms]\n",
+      "Epoch: [  7/ 10], step: [  289/  390], loss: [0.3227], avg loss: [0.3074], time: [105.7277ms]\n",
+      "Epoch: [  7/ 10], step: [  290/  390], loss: [0.2071], avg loss: [0.3071], time: [105.9880ms]\n",
+      "Epoch: [  7/ 10], step: [  291/  390], loss: [0.3814], avg loss: [0.3073], time: [105.9330ms]\n",
+      "Epoch: [  7/ 10], step: [  292/  390], loss: [0.2602], avg loss: [0.3072], time: [108.2239ms]\n",
+      "Epoch: [  7/ 10], step: [  293/  390], loss: [0.2281], avg loss: [0.3069], time: [103.5922ms]\n",
+      "Epoch: [  7/ 10], step: [  294/  390], loss: [0.4244], avg loss: [0.3073], time: [108.8440ms]\n",
+      "Epoch: [  7/ 10], step: [  295/  390], loss: [0.3539], avg loss: [0.3075], time: [105.6216ms]\n",
+      "Epoch: [  7/ 10], step: [  296/  390], loss: [0.3055], avg loss: [0.3075], time: [103.5883ms]\n",
+      "Epoch: [  7/ 10], step: [  297/  390], loss: [0.2855], avg loss: [0.3074], time: [103.2639ms]\n",
+      "Epoch: [  7/ 10], step: [  298/  390], loss: [0.3432], avg loss: [0.3075], time: [106.2555ms]\n",
+      "Epoch: [  7/ 10], step: [  299/  390], loss: [0.2286], avg loss: [0.3072], time: [110.4863ms]\n",
+      "Epoch: [  7/ 10], step: [  300/  390], loss: [0.3493], avg loss: [0.3074], time: [103.4355ms]\n",
+      "Epoch: [  7/ 10], step: [  301/  390], loss: [0.4564], avg loss: [0.3079], time: [108.3748ms]\n",
+      "Epoch: [  7/ 10], step: [  302/  390], loss: [0.2489], avg loss: [0.3077], time: [104.0170ms]\n",
+      "Epoch: [  7/ 10], step: [  303/  390], loss: [0.2173], avg loss: [0.3074], time: [106.5309ms]\n",
+      "Epoch: [  7/ 10], step: [  304/  390], loss: [0.3805], avg loss: [0.3076], time: [104.9554ms]\n",
+      "Epoch: [  7/ 10], step: [  305/  390], loss: [0.1876], avg loss: [0.3072], time: [104.3003ms]\n",
+      "Epoch: [  7/ 10], step: [  306/  390], loss: [0.4118], avg loss: [0.3076], time: [104.6221ms]\n",
+      "Epoch: [  7/ 10], step: [  307/  390], loss: [0.2634], avg loss: [0.3074], time: [105.2783ms]\n",
+      "Epoch: [  7/ 10], step: [  308/  390], loss: [0.3567], avg loss: [0.3076], time: [104.6295ms]\n",
+      "Epoch: [  7/ 10], step: [  309/  390], loss: [0.4348], avg loss: [0.3080], time: [108.4967ms]\n",
+      "Epoch: [  7/ 10], step: [  310/  390], loss: [0.2597], avg loss: [0.3078], time: [105.5329ms]\n",
+      "Epoch: [  7/ 10], step: [  311/  390], loss: [0.2622], avg loss: [0.3077], time: [108.0813ms]\n",
+      "Epoch: [  7/ 10], step: [  312/  390], loss: [0.3840], avg loss: [0.3079], time: [107.1780ms]\n",
+      "Epoch: [  7/ 10], step: [  313/  390], loss: [0.2901], avg loss: [0.3079], time: [105.6113ms]\n",
+      "Epoch: [  7/ 10], step: [  314/  390], loss: [0.3276], avg loss: [0.3079], time: [108.1054ms]\n",
+      "Epoch: [  7/ 10], step: [  315/  390], loss: [0.2987], avg loss: [0.3079], time: [108.2742ms]\n",
+      "Epoch: [  7/ 10], step: [  316/  390], loss: [0.2979], avg loss: [0.3079], time: [105.0766ms]\n",
+      "Epoch: [  7/ 10], step: [  317/  390], loss: [0.3587], avg loss: [0.3080], time: [106.2484ms]\n",
+      "Epoch: [  7/ 10], step: [  318/  390], loss: [0.3245], avg loss: [0.3081], time: [107.5473ms]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: [  7/ 10], step: [  319/  390], loss: [0.2874], avg loss: [0.3080], time: [110.2710ms]\n",
+      "Epoch: [  7/ 10], step: [  320/  390], loss: [0.2773], avg loss: [0.3079], time: [105.9952ms]\n",
+      "Epoch: [  7/ 10], step: [  321/  390], loss: [0.3119], avg loss: [0.3079], time: [103.2131ms]\n",
+      "Epoch: [  7/ 10], step: [  322/  390], loss: [0.5180], avg loss: [0.3086], time: [105.6886ms]\n",
+      "Epoch: [  7/ 10], step: [  323/  390], loss: [0.2819], avg loss: [0.3085], time: [108.1693ms]\n",
+      "Epoch: [  7/ 10], step: [  324/  390], loss: [0.2582], avg loss: [0.3084], time: [105.5784ms]\n",
+      "Epoch: [  7/ 10], step: [  325/  390], loss: [0.3137], avg loss: [0.3084], time: [107.8506ms]\n",
+      "Epoch: [  7/ 10], step: [  326/  390], loss: [0.3719], avg loss: [0.3086], time: [105.4270ms]\n",
+      "Epoch: [  7/ 10], step: [  327/  390], loss: [0.2965], avg loss: [0.3085], time: [106.1039ms]\n",
+      "Epoch: [  7/ 10], step: [  328/  390], loss: [0.2923], avg loss: [0.3085], time: [104.4450ms]\n",
+      "Epoch: [  7/ 10], step: [  329/  390], loss: [0.2939], avg loss: [0.3084], time: [105.1989ms]\n",
+      "Epoch: [  7/ 10], step: [  330/  390], loss: [0.2711], avg loss: [0.3083], time: [106.4591ms]\n",
+      "Epoch: [  7/ 10], step: [  331/  390], loss: [0.2564], avg loss: [0.3082], time: [104.8388ms]\n",
+      "Epoch: [  7/ 10], step: [  332/  390], loss: [0.2319], avg loss: [0.3079], time: [108.9432ms]\n",
+      "Epoch: [  7/ 10], step: [  333/  390], loss: [0.2975], avg loss: [0.3079], time: [110.3246ms]\n",
+      "Epoch: [  7/ 10], step: [  334/  390], loss: [0.6099], avg loss: [0.3088], time: [104.0637ms]\n",
+      "Epoch: [  7/ 10], step: [  335/  390], loss: [0.3109], avg loss: [0.3088], time: [106.6015ms]\n",
+      "Epoch: [  7/ 10], step: [  336/  390], loss: [0.1355], avg loss: [0.3083], time: [103.8537ms]\n",
+      "Epoch: [  7/ 10], step: [  337/  390], loss: [0.4506], avg loss: [0.3087], time: [104.6624ms]\n",
+      "Epoch: [  7/ 10], step: [  338/  390], loss: [0.4515], avg loss: [0.3091], time: [108.9067ms]\n",
+      "Epoch: [  7/ 10], step: [  339/  390], loss: [0.3207], avg loss: [0.3092], time: [106.6246ms]\n",
+      "Epoch: [  7/ 10], step: [  340/  390], loss: [0.3045], avg loss: [0.3092], time: [102.9556ms]\n",
+      "Epoch: [  7/ 10], step: [  341/  390], loss: [0.2666], avg loss: [0.3090], time: [108.9008ms]\n",
+      "Epoch: [  7/ 10], step: [  342/  390], loss: [0.4119], avg loss: [0.3093], time: [106.6031ms]\n",
+      "Epoch: [  7/ 10], step: [  343/  390], loss: [0.2923], avg loss: [0.3093], time: [105.3276ms]\n",
+      "Epoch: [  7/ 10], step: [  344/  390], loss: [0.3069], avg loss: [0.3093], time: [105.5856ms]\n",
+      "Epoch: [  7/ 10], step: [  345/  390], loss: [0.2237], avg loss: [0.3090], time: [104.8748ms]\n",
+      "Epoch: [  7/ 10], step: [  346/  390], loss: [0.2427], avg loss: [0.3088], time: [105.4814ms]\n",
+      "Epoch: [  7/ 10], step: [  347/  390], loss: [0.2578], avg loss: [0.3087], time: [109.9048ms]\n",
+      "Epoch: [  7/ 10], step: [  348/  390], loss: [0.3885], avg loss: [0.3089], time: [105.2608ms]\n",
+      "Epoch: [  7/ 10], step: [  349/  390], loss: [0.2785], avg loss: [0.3088], time: [102.2446ms]\n",
+      "Epoch: [  7/ 10], step: [  350/  390], loss: [0.3561], avg loss: [0.3090], time: [105.5462ms]\n",
+      "Epoch: [  7/ 10], step: [  351/  390], loss: [0.4515], avg loss: [0.3094], time: [106.4477ms]\n",
+      "Epoch: [  7/ 10], step: [  352/  390], loss: [0.2931], avg loss: [0.3093], time: [104.6855ms]\n",
+      "Epoch: [  7/ 10], step: [  353/  390], loss: [0.3824], avg loss: [0.3095], time: [105.3936ms]\n",
+      "Epoch: [  7/ 10], step: [  354/  390], loss: [0.1658], avg loss: [0.3091], time: [107.2855ms]\n",
+      "Epoch: [  7/ 10], step: [  355/  390], loss: [0.4529], avg loss: [0.3095], time: [104.5136ms]\n",
+      "Epoch: [  7/ 10], step: [  356/  390], loss: [0.3766], avg loss: [0.3097], time: [104.5198ms]\n",
+      "Epoch: [  7/ 10], step: [  357/  390], loss: [0.2821], avg loss: [0.3097], time: [103.3289ms]\n",
+      "Epoch: [  7/ 10], step: [  358/  390], loss: [0.2354], avg loss: [0.3094], time: [108.2060ms]\n",
+      "Epoch: [  7/ 10], step: [  359/  390], loss: [0.3754], avg loss: [0.3096], time: [103.9968ms]\n",
+      "Epoch: [  7/ 10], step: [  360/  390], loss: [0.3338], avg loss: [0.3097], time: [101.7914ms]\n",
+      "Epoch: [  7/ 10], step: [  361/  390], loss: [0.3404], avg loss: [0.3098], time: [106.6372ms]\n",
+      "Epoch: [  7/ 10], step: [  362/  390], loss: [0.5074], avg loss: [0.3103], time: [103.7226ms]\n",
+      "Epoch: [  7/ 10], step: [  363/  390], loss: [0.3289], avg loss: [0.3104], time: [107.6303ms]\n",
+      "Epoch: [  7/ 10], step: [  364/  390], loss: [0.2627], avg loss: [0.3102], time: [105.5136ms]\n",
+      "Epoch: [  7/ 10], step: [  365/  390], loss: [0.3471], avg loss: [0.3103], time: [105.0911ms]\n",
+      "Epoch: [  7/ 10], step: [  366/  390], loss: [0.3044], avg loss: [0.3103], time: [105.0951ms]\n",
+      "Epoch: [  7/ 10], step: [  367/  390], loss: [0.4036], avg loss: [0.3106], time: [103.2827ms]\n",
+      "Epoch: [  7/ 10], step: [  368/  390], loss: [0.3972], avg loss: [0.3108], time: [103.8237ms]\n",
+      "Epoch: [  7/ 10], step: [  369/  390], loss: [0.3652], avg loss: [0.3110], time: [105.3357ms]\n",
+      "Epoch: [  7/ 10], step: [  370/  390], loss: [0.3068], avg loss: [0.3110], time: [107.9171ms]\n",
+      "Epoch: [  7/ 10], step: [  371/  390], loss: [0.2776], avg loss: [0.3109], time: [105.5713ms]\n",
+      "Epoch: [  7/ 10], step: [  372/  390], loss: [0.3689], avg loss: [0.3110], time: [108.3863ms]\n",
+      "Epoch: [  7/ 10], step: [  373/  390], loss: [0.3331], avg loss: [0.3111], time: [104.6934ms]\n",
+      "Epoch: [  7/ 10], step: [  374/  390], loss: [0.3642], avg loss: [0.3112], time: [105.8490ms]\n",
+      "Epoch: [  7/ 10], step: [  375/  390], loss: [0.4690], avg loss: [0.3116], time: [106.0085ms]\n",
+      "Epoch: [  7/ 10], step: [  376/  390], loss: [0.3052], avg loss: [0.3116], time: [104.5957ms]\n",
+      "Epoch: [  7/ 10], step: [  377/  390], loss: [0.2689], avg loss: [0.3115], time: [106.7445ms]\n",
+      "Epoch: [  7/ 10], step: [  378/  390], loss: [0.5337], avg loss: [0.3121], time: [107.4522ms]\n",
+      "Epoch: [  7/ 10], step: [  379/  390], loss: [0.2856], avg loss: [0.3120], time: [103.2515ms]\n",
+      "Epoch: [  7/ 10], step: [  380/  390], loss: [0.2056], avg loss: [0.3118], time: [108.4552ms]\n",
+      "Epoch: [  7/ 10], step: [  381/  390], loss: [0.3496], avg loss: [0.3119], time: [102.8993ms]\n",
+      "Epoch: [  7/ 10], step: [  382/  390], loss: [0.3747], avg loss: [0.3120], time: [107.6214ms]\n",
+      "Epoch: [  7/ 10], step: [  383/  390], loss: [0.2499], avg loss: [0.3119], time: [104.1596ms]\n",
+      "Epoch: [  7/ 10], step: [  384/  390], loss: [0.3007], avg loss: [0.3118], time: [109.2494ms]\n",
+      "Epoch: [  7/ 10], step: [  385/  390], loss: [0.2983], avg loss: [0.3118], time: [104.3456ms]\n",
+      "Epoch: [  7/ 10], step: [  386/  390], loss: [0.3484], avg loss: [0.3119], time: [107.7156ms]\n",
+      "Epoch: [  7/ 10], step: [  387/  390], loss: [0.3087], avg loss: [0.3119], time: [109.5369ms]\n",
+      "Epoch: [  7/ 10], step: [  388/  390], loss: [0.3337], avg loss: [0.3119], time: [107.5892ms]\n",
+      "Epoch: [  7/ 10], step: [  389/  390], loss: [0.2782], avg loss: [0.3118], time: [102.8912ms]\n",
+      "Epoch: [  7/ 10], step: [  390/  390], loss: [0.2050], avg loss: [0.3116], time: [913.0900ms]\n",
+      "Epoch time: 42366.964, per step time: 108.633\n",
+      "Epoch time: 42367.284, per step time: 108.634, avg loss: 0.312\n",
+      "************************************************************\n",
+      "Epoch: [  8/ 10], step: [    1/  390], loss: [0.2149], avg loss: [0.2149], time: [99.9627ms]\n",
+      "Epoch: [  8/ 10], step: [    2/  390], loss: [0.1804], avg loss: [0.1977], time: [103.7598ms]\n",
+      "Epoch: [  8/ 10], step: [    3/  390], loss: [0.3627], avg loss: [0.2527], time: [101.5825ms]\n",
+      "Epoch: [  8/ 10], step: [    4/  390], loss: [0.3586], avg loss: [0.2792], time: [102.5608ms]\n",
+      "Epoch: [  8/ 10], step: [    5/  390], loss: [0.2930], avg loss: [0.2819], time: [103.3924ms]\n",
+      "Epoch: [  8/ 10], step: [    6/  390], loss: [0.2007], avg loss: [0.2684], time: [103.3638ms]\n",
+      "Epoch: [  8/ 10], step: [    7/  390], loss: [0.2223], avg loss: [0.2618], time: [106.2517ms]\n",
+      "Epoch: [  8/ 10], step: [    8/  390], loss: [0.2357], avg loss: [0.2585], time: [102.8740ms]\n",
+      "Epoch: [  8/ 10], step: [    9/  390], loss: [0.3872], avg loss: [0.2728], time: [104.3794ms]\n",
+      "Epoch: [  8/ 10], step: [   10/  390], loss: [0.1634], avg loss: [0.2619], time: [104.8412ms]\n",
+      "Epoch: [  8/ 10], step: [   11/  390], loss: [0.2364], avg loss: [0.2596], time: [104.3904ms]\n",
+      "Epoch: [  8/ 10], step: [   12/  390], loss: [0.4116], avg loss: [0.2722], time: [103.3373ms]\n",
+      "Epoch: [  8/ 10], step: [   13/  390], loss: [0.2491], avg loss: [0.2705], time: [105.9339ms]\n",
+      "Epoch: [  8/ 10], step: [   14/  390], loss: [0.3110], avg loss: [0.2734], time: [104.9607ms]\n",
+      "Epoch: [  8/ 10], step: [   15/  390], loss: [0.2004], avg loss: [0.2685], time: [102.9387ms]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: [  8/ 10], step: [   16/  390], loss: [0.2551], avg loss: [0.2677], time: [107.0006ms]\n",
+      "Epoch: [  8/ 10], step: [   17/  390], loss: [0.3402], avg loss: [0.2719], time: [102.4706ms]\n",
+      "Epoch: [  8/ 10], step: [   18/  390], loss: [0.2975], avg loss: [0.2733], time: [106.4065ms]\n",
+      "Epoch: [  8/ 10], step: [   19/  390], loss: [0.2487], avg loss: [0.2720], time: [106.8141ms]\n",
+      "Epoch: [  8/ 10], step: [   20/  390], loss: [0.2542], avg loss: [0.2712], time: [108.3596ms]\n",
+      "Epoch: [  8/ 10], step: [   21/  390], loss: [0.2751], avg loss: [0.2713], time: [101.2235ms]\n",
+      "Epoch: [  8/ 10], step: [   22/  390], loss: [0.3212], avg loss: [0.2736], time: [107.4750ms]\n",
+      "Epoch: [  8/ 10], step: [   23/  390], loss: [0.2760], avg loss: [0.2737], time: [105.3512ms]\n",
+      "Epoch: [  8/ 10], step: [   24/  390], loss: [0.1505], avg loss: [0.2686], time: [101.8736ms]\n",
+      "Epoch: [  8/ 10], step: [   25/  390], loss: [0.2349], avg loss: [0.2672], time: [104.0020ms]\n",
+      "Epoch: [  8/ 10], step: [   26/  390], loss: [0.1072], avg loss: [0.2611], time: [106.9102ms]\n",
+      "Epoch: [  8/ 10], step: [   27/  390], loss: [0.3493], avg loss: [0.2643], time: [102.9167ms]\n",
+      "Epoch: [  8/ 10], step: [   28/  390], loss: [0.1981], avg loss: [0.2620], time: [104.2376ms]\n",
+      "Epoch: [  8/ 10], step: [   29/  390], loss: [0.2218], avg loss: [0.2606], time: [100.7419ms]\n",
+      "Epoch: [  8/ 10], step: [   30/  390], loss: [0.2380], avg loss: [0.2598], time: [102.7915ms]\n",
+      "Epoch: [  8/ 10], step: [   31/  390], loss: [0.2702], avg loss: [0.2602], time: [102.2146ms]\n",
+      "Epoch: [  8/ 10], step: [   32/  390], loss: [0.2819], avg loss: [0.2609], time: [103.3094ms]\n",
+      "Epoch: [  8/ 10], step: [   33/  390], loss: [0.3173], avg loss: [0.2626], time: [104.7378ms]\n",
+      "Epoch: [  8/ 10], step: [   34/  390], loss: [0.2883], avg loss: [0.2633], time: [101.5081ms]\n",
+      "Epoch: [  8/ 10], step: [   35/  390], loss: [0.3038], avg loss: [0.2645], time: [106.4558ms]\n",
+      "Epoch: [  8/ 10], step: [   36/  390], loss: [0.3776], avg loss: [0.2676], time: [106.2822ms]\n",
+      "Epoch: [  8/ 10], step: [   37/  390], loss: [0.3619], avg loss: [0.2702], time: [99.5617ms]\n",
+      "Epoch: [  8/ 10], step: [   38/  390], loss: [0.3471], avg loss: [0.2722], time: [106.0185ms]\n",
+      "Epoch: [  8/ 10], step: [   39/  390], loss: [0.2261], avg loss: [0.2710], time: [105.8998ms]\n",
+      "Epoch: [  8/ 10], step: [   40/  390], loss: [0.2389], avg loss: [0.2702], time: [101.6922ms]\n",
+      "Epoch: [  8/ 10], step: [   41/  390], loss: [0.2973], avg loss: [0.2709], time: [103.1911ms]\n",
+      "Epoch: [  8/ 10], step: [   42/  390], loss: [0.3369], avg loss: [0.2724], time: [102.7579ms]\n",
+      "Epoch: [  8/ 10], step: [   43/  390], loss: [0.5723], avg loss: [0.2794], time: [104.4035ms]\n",
+      "Epoch: [  8/ 10], step: [   44/  390], loss: [0.3082], avg loss: [0.2801], time: [102.6764ms]\n",
+      "Epoch: [  8/ 10], step: [   45/  390], loss: [0.3245], avg loss: [0.2811], time: [104.8353ms]\n",
+      "Epoch: [  8/ 10], step: [   46/  390], loss: [0.3054], avg loss: [0.2816], time: [104.9926ms]\n",
+      "Epoch: [  8/ 10], step: [   47/  390], loss: [0.2204], avg loss: [0.2803], time: [101.2921ms]\n",
+      "Epoch: [  8/ 10], step: [   48/  390], loss: [0.4341], avg loss: [0.2835], time: [103.3404ms]\n",
+      "Epoch: [  8/ 10], step: [   49/  390], loss: [0.2574], avg loss: [0.2830], time: [101.7869ms]\n",
+      "Epoch: [  8/ 10], step: [   50/  390], loss: [0.3625], avg loss: [0.2845], time: [106.5040ms]\n",
+      "Epoch: [  8/ 10], step: [   51/  390], loss: [0.3555], avg loss: [0.2859], time: [104.4257ms]\n",
+      "Epoch: [  8/ 10], step: [   52/  390], loss: [0.2120], avg loss: [0.2845], time: [102.0887ms]\n",
+      "Epoch: [  8/ 10], step: [   53/  390], loss: [0.2403], avg loss: [0.2837], time: [103.4577ms]\n",
+      "Epoch: [  8/ 10], step: [   54/  390], loss: [0.2480], avg loss: [0.2830], time: [102.0255ms]\n",
+      "Epoch: [  8/ 10], step: [   55/  390], loss: [0.4171], avg loss: [0.2855], time: [103.0657ms]\n",
+      "Epoch: [  8/ 10], step: [   56/  390], loss: [0.3163], avg loss: [0.2860], time: [103.3020ms]\n",
+      "Epoch: [  8/ 10], step: [   57/  390], loss: [0.3176], avg loss: [0.2866], time: [101.9452ms]\n",
+      "Epoch: [  8/ 10], step: [   58/  390], loss: [0.2448], avg loss: [0.2858], time: [104.5702ms]\n",
+      "Epoch: [  8/ 10], step: [   59/  390], loss: [0.3658], avg loss: [0.2872], time: [105.9697ms]\n",
+      "Epoch: [  8/ 10], step: [   60/  390], loss: [0.3966], avg loss: [0.2890], time: [103.7662ms]\n",
+      "Epoch: [  8/ 10], step: [   61/  390], loss: [0.3659], avg loss: [0.2903], time: [104.8198ms]\n",
+      "Epoch: [  8/ 10], step: [   62/  390], loss: [0.2222], avg loss: [0.2892], time: [102.3245ms]\n",
+      "Epoch: [  8/ 10], step: [   63/  390], loss: [0.3557], avg loss: [0.2902], time: [101.8894ms]\n",
+      "Epoch: [  8/ 10], step: [   64/  390], loss: [0.2123], avg loss: [0.2890], time: [103.6301ms]\n",
+      "Epoch: [  8/ 10], step: [   65/  390], loss: [0.2045], avg loss: [0.2877], time: [100.6439ms]\n",
+      "Epoch: [  8/ 10], step: [   66/  390], loss: [0.2570], avg loss: [0.2873], time: [106.7004ms]\n",
+      "Epoch: [  8/ 10], step: [   67/  390], loss: [0.2672], avg loss: [0.2870], time: [103.2276ms]\n",
+      "Epoch: [  8/ 10], step: [   68/  390], loss: [0.1659], avg loss: [0.2852], time: [104.2333ms]\n",
+      "Epoch: [  8/ 10], step: [   69/  390], loss: [0.2854], avg loss: [0.2852], time: [104.9914ms]\n",
+      "Epoch: [  8/ 10], step: [   70/  390], loss: [0.2377], avg loss: [0.2845], time: [102.8299ms]\n",
+      "Epoch: [  8/ 10], step: [   71/  390], loss: [0.2993], avg loss: [0.2847], time: [104.7037ms]\n",
+      "Epoch: [  8/ 10], step: [   72/  390], loss: [0.2682], avg loss: [0.2845], time: [104.3675ms]\n",
+      "Epoch: [  8/ 10], step: [   73/  390], loss: [0.1733], avg loss: [0.2830], time: [103.1122ms]\n",
+      "Epoch: [  8/ 10], step: [   74/  390], loss: [0.2731], avg loss: [0.2828], time: [107.4519ms]\n",
+      "Epoch: [  8/ 10], step: [   75/  390], loss: [0.2913], avg loss: [0.2829], time: [105.0410ms]\n",
+      "Epoch: [  8/ 10], step: [   76/  390], loss: [0.1981], avg loss: [0.2818], time: [103.5707ms]\n",
+      "Epoch: [  8/ 10], step: [   77/  390], loss: [0.2849], avg loss: [0.2819], time: [101.8102ms]\n",
+      "Epoch: [  8/ 10], step: [   78/  390], loss: [0.3997], avg loss: [0.2834], time: [106.3707ms]\n",
+      "Epoch: [  8/ 10], step: [   79/  390], loss: [0.2753], avg loss: [0.2833], time: [103.1048ms]\n",
+      "Epoch: [  8/ 10], step: [   80/  390], loss: [0.3147], avg loss: [0.2837], time: [102.6042ms]\n",
+      "Epoch: [  8/ 10], step: [   81/  390], loss: [0.3199], avg loss: [0.2841], time: [102.7009ms]\n",
+      "Epoch: [  8/ 10], step: [   82/  390], loss: [0.2713], avg loss: [0.2840], time: [102.1183ms]\n",
+      "Epoch: [  8/ 10], step: [   83/  390], loss: [0.2855], avg loss: [0.2840], time: [105.1257ms]\n",
+      "Epoch: [  8/ 10], step: [   84/  390], loss: [0.2076], avg loss: [0.2831], time: [103.4346ms]\n",
+      "Epoch: [  8/ 10], step: [   85/  390], loss: [0.3363], avg loss: [0.2837], time: [103.2245ms]\n",
+      "Epoch: [  8/ 10], step: [   86/  390], loss: [0.3122], avg loss: [0.2840], time: [102.2060ms]\n",
+      "Epoch: [  8/ 10], step: [   87/  390], loss: [0.2516], avg loss: [0.2837], time: [101.2897ms]\n",
+      "Epoch: [  8/ 10], step: [   88/  390], loss: [0.2329], avg loss: [0.2831], time: [106.2698ms]\n",
+      "Epoch: [  8/ 10], step: [   89/  390], loss: [0.2841], avg loss: [0.2831], time: [103.2794ms]\n",
+      "Epoch: [  8/ 10], step: [   90/  390], loss: [0.2238], avg loss: [0.2824], time: [102.0787ms]\n",
+      "Epoch: [  8/ 10], step: [   91/  390], loss: [0.2369], avg loss: [0.2819], time: [104.2061ms]\n",
+      "Epoch: [  8/ 10], step: [   92/  390], loss: [0.2746], avg loss: [0.2819], time: [105.7801ms]\n",
+      "Epoch: [  8/ 10], step: [   93/  390], loss: [0.3308], avg loss: [0.2824], time: [102.5574ms]\n",
+      "Epoch: [  8/ 10], step: [   94/  390], loss: [0.3584], avg loss: [0.2832], time: [103.8244ms]\n",
+      "Epoch: [  8/ 10], step: [   95/  390], loss: [0.3276], avg loss: [0.2837], time: [103.2546ms]\n",
+      "Epoch: [  8/ 10], step: [   96/  390], loss: [0.3361], avg loss: [0.2842], time: [101.2833ms]\n",
+      "Epoch: [  8/ 10], step: [   97/  390], loss: [0.2652], avg loss: [0.2840], time: [105.8977ms]\n",
+      "Epoch: [  8/ 10], step: [   98/  390], loss: [0.2178], avg loss: [0.2833], time: [102.9072ms]\n",
+      "Epoch: [  8/ 10], step: [   99/  390], loss: [0.2998], avg loss: [0.2835], time: [102.4833ms]\n",
+      "Epoch: [  8/ 10], step: [  100/  390], loss: [0.2527], avg loss: [0.2832], time: [100.8837ms]\n",
+      "Epoch: [  8/ 10], step: [  101/  390], loss: [0.3188], avg loss: [0.2835], time: [102.9761ms]\n",
+      "Epoch: [  8/ 10], step: [  102/  390], loss: [0.2340], avg loss: [0.2831], time: [106.6611ms]\n",
+      "Epoch: [  8/ 10], step: [  103/  390], loss: [0.1899], avg loss: [0.2821], time: [102.8919ms]\n",
+      "Epoch: [  8/ 10], step: [  104/  390], loss: [0.3204], avg loss: [0.2825], time: [103.0960ms]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: [  8/ 10], step: [  105/  390], loss: [0.3339], avg loss: [0.2830], time: [101.1257ms]\n",
+      "Epoch: [  8/ 10], step: [  106/  390], loss: [0.3085], avg loss: [0.2832], time: [104.0373ms]\n",
+      "Epoch: [  8/ 10], step: [  107/  390], loss: [0.3561], avg loss: [0.2839], time: [104.2287ms]\n",
+      "Epoch: [  8/ 10], step: [  108/  390], loss: [0.3255], avg loss: [0.2843], time: [104.0325ms]\n",
+      "Epoch: [  8/ 10], step: [  109/  390], loss: [0.3709], avg loss: [0.2851], time: [103.4937ms]\n",
+      "Epoch: [  8/ 10], step: [  110/  390], loss: [0.2567], avg loss: [0.2848], time: [101.7263ms]\n",
+      "Epoch: [  8/ 10], step: [  111/  390], loss: [0.2285], avg loss: [0.2843], time: [103.9937ms]\n",
+      "Epoch: [  8/ 10], step: [  112/  390], loss: [0.1699], avg loss: [0.2833], time: [105.4158ms]\n",
+      "Epoch: [  8/ 10], step: [  113/  390], loss: [0.2693], avg loss: [0.2832], time: [105.9487ms]\n",
+      "Epoch: [  8/ 10], step: [  114/  390], loss: [0.4444], avg loss: [0.2846], time: [104.3928ms]\n",
+      "Epoch: [  8/ 10], step: [  115/  390], loss: [0.2116], avg loss: [0.2840], time: [106.0302ms]\n",
+      "Epoch: [  8/ 10], step: [  116/  390], loss: [0.3997], avg loss: [0.2850], time: [101.6543ms]\n",
+      "Epoch: [  8/ 10], step: [  117/  390], loss: [0.2387], avg loss: [0.2846], time: [106.1738ms]\n",
+      "Epoch: [  8/ 10], step: [  118/  390], loss: [0.2712], avg loss: [0.2845], time: [103.6386ms]\n",
+      "Epoch: [  8/ 10], step: [  119/  390], loss: [0.2482], avg loss: [0.2842], time: [103.3423ms]\n",
+      "Epoch: [  8/ 10], step: [  120/  390], loss: [0.2702], avg loss: [0.2840], time: [104.9221ms]\n",
+      "Epoch: [  8/ 10], step: [  121/  390], loss: [0.4016], avg loss: [0.2850], time: [103.2031ms]\n",
+      "Epoch: [  8/ 10], step: [  122/  390], loss: [0.3797], avg loss: [0.2858], time: [106.5831ms]\n",
+      "Epoch: [  8/ 10], step: [  123/  390], loss: [0.1121], avg loss: [0.2844], time: [102.4332ms]\n",
+      "Epoch: [  8/ 10], step: [  124/  390], loss: [0.2173], avg loss: [0.2838], time: [103.8256ms]\n",
+      "Epoch: [  8/ 10], step: [  125/  390], loss: [0.2104], avg loss: [0.2832], time: [102.2441ms]\n",
+      "Epoch: [  8/ 10], step: [  126/  390], loss: [0.2904], avg loss: [0.2833], time: [102.6425ms]\n",
+      "Epoch: [  8/ 10], step: [  127/  390], loss: [0.2524], avg loss: [0.2831], time: [100.6939ms]\n",
+      "Epoch: [  8/ 10], step: [  128/  390], loss: [0.2956], avg loss: [0.2832], time: [104.7201ms]\n",
+      "Epoch: [  8/ 10], step: [  129/  390], loss: [0.3088], avg loss: [0.2834], time: [104.7549ms]\n",
+      "Epoch: [  8/ 10], step: [  130/  390], loss: [0.2754], avg loss: [0.2833], time: [101.0370ms]\n",
+      "Epoch: [  8/ 10], step: [  131/  390], loss: [0.2397], avg loss: [0.2830], time: [105.9654ms]\n",
+      "Epoch: [  8/ 10], step: [  132/  390], loss: [0.3058], avg loss: [0.2831], time: [104.1484ms]\n",
+      "Epoch: [  8/ 10], step: [  133/  390], loss: [0.1613], avg loss: [0.2822], time: [105.4857ms]\n",
+      "Epoch: [  8/ 10], step: [  134/  390], loss: [0.2912], avg loss: [0.2823], time: [102.5901ms]\n",
+      "Epoch: [  8/ 10], step: [  135/  390], loss: [0.2714], avg loss: [0.2822], time: [105.9830ms]\n",
+      "Epoch: [  8/ 10], step: [  136/  390], loss: [0.2966], avg loss: [0.2823], time: [107.1360ms]\n",
+      "Epoch: [  8/ 10], step: [  137/  390], loss: [0.4892], avg loss: [0.2838], time: [101.4531ms]\n",
+      "Epoch: [  8/ 10], step: [  138/  390], loss: [0.4067], avg loss: [0.2847], time: [104.9294ms]\n",
+      "Epoch: [  8/ 10], step: [  139/  390], loss: [0.3947], avg loss: [0.2855], time: [103.3685ms]\n",
+      "Epoch: [  8/ 10], step: [  140/  390], loss: [0.2636], avg loss: [0.2853], time: [101.3122ms]\n",
+      "Epoch: [  8/ 10], step: [  141/  390], loss: [0.2913], avg loss: [0.2854], time: [104.4834ms]\n",
+      "Epoch: [  8/ 10], step: [  142/  390], loss: [0.3560], avg loss: [0.2859], time: [105.8872ms]\n",
+      "Epoch: [  8/ 10], step: [  143/  390], loss: [0.1532], avg loss: [0.2850], time: [102.7212ms]\n",
+      "Epoch: [  8/ 10], step: [  144/  390], loss: [0.1977], avg loss: [0.2844], time: [108.3438ms]\n",
+      "Epoch: [  8/ 10], step: [  145/  390], loss: [0.2216], avg loss: [0.2839], time: [101.2526ms]\n",
+      "Epoch: [  8/ 10], step: [  146/  390], loss: [0.3060], avg loss: [0.2841], time: [107.4364ms]\n",
+      "Epoch: [  8/ 10], step: [  147/  390], loss: [0.2543], avg loss: [0.2839], time: [105.1431ms]\n",
+      "Epoch: [  8/ 10], step: [  148/  390], loss: [0.2818], avg loss: [0.2839], time: [102.4642ms]\n",
+      "Epoch: [  8/ 10], step: [  149/  390], loss: [0.3537], avg loss: [0.2843], time: [104.4877ms]\n",
+      "Epoch: [  8/ 10], step: [  150/  390], loss: [0.2540], avg loss: [0.2841], time: [106.3020ms]\n",
+      "Epoch: [  8/ 10], step: [  151/  390], loss: [0.2113], avg loss: [0.2836], time: [101.0654ms]\n",
+      "Epoch: [  8/ 10], step: [  152/  390], loss: [0.3518], avg loss: [0.2841], time: [100.9429ms]\n",
+      "Epoch: [  8/ 10], step: [  153/  390], loss: [0.2428], avg loss: [0.2838], time: [105.3364ms]\n",
+      "Epoch: [  8/ 10], step: [  154/  390], loss: [0.2941], avg loss: [0.2839], time: [102.0465ms]\n",
+      "Epoch: [  8/ 10], step: [  155/  390], loss: [0.3129], avg loss: [0.2841], time: [104.5511ms]\n",
+      "Epoch: [  8/ 10], step: [  156/  390], loss: [0.3826], avg loss: [0.2847], time: [103.7295ms]\n",
+      "Epoch: [  8/ 10], step: [  157/  390], loss: [0.2870], avg loss: [0.2847], time: [103.0314ms]\n",
+      "Epoch: [  8/ 10], step: [  158/  390], loss: [0.3251], avg loss: [0.2850], time: [102.6587ms]\n",
+      "Epoch: [  8/ 10], step: [  159/  390], loss: [0.4708], avg loss: [0.2861], time: [101.7122ms]\n",
+      "Epoch: [  8/ 10], step: [  160/  390], loss: [0.3849], avg loss: [0.2868], time: [102.7229ms]\n",
+      "Epoch: [  8/ 10], step: [  161/  390], loss: [0.3747], avg loss: [0.2873], time: [101.2404ms]\n",
+      "Epoch: [  8/ 10], step: [  162/  390], loss: [0.2592], avg loss: [0.2871], time: [103.0819ms]\n",
+      "Epoch: [  8/ 10], step: [  163/  390], loss: [0.3399], avg loss: [0.2875], time: [103.2174ms]\n",
+      "Epoch: [  8/ 10], step: [  164/  390], loss: [0.3366], avg loss: [0.2878], time: [101.9282ms]\n",
+      "Epoch: [  8/ 10], step: [  165/  390], loss: [0.2238], avg loss: [0.2874], time: [102.0501ms]\n",
+      "Epoch: [  8/ 10], step: [  166/  390], loss: [0.2818], avg loss: [0.2873], time: [102.9756ms]\n",
+      "Epoch: [  8/ 10], step: [  167/  390], loss: [0.3048], avg loss: [0.2874], time: [100.5049ms]\n",
+      "Epoch: [  8/ 10], step: [  168/  390], loss: [0.2822], avg loss: [0.2874], time: [104.8827ms]\n",
+      "Epoch: [  8/ 10], step: [  169/  390], loss: [0.2954], avg loss: [0.2875], time: [106.9095ms]\n",
+      "Epoch: [  8/ 10], step: [  170/  390], loss: [0.2159], avg loss: [0.2870], time: [104.0988ms]\n",
+      "Epoch: [  8/ 10], step: [  171/  390], loss: [0.2859], avg loss: [0.2870], time: [103.6556ms]\n",
+      "Epoch: [  8/ 10], step: [  172/  390], loss: [0.3350], avg loss: [0.2873], time: [106.1785ms]\n",
+      "Epoch: [  8/ 10], step: [  173/  390], loss: [0.2139], avg loss: [0.2869], time: [104.7947ms]\n",
+      "Epoch: [  8/ 10], step: [  174/  390], loss: [0.3930], avg loss: [0.2875], time: [104.8133ms]\n",
+      "Epoch: [  8/ 10], step: [  175/  390], loss: [0.2229], avg loss: [0.2871], time: [102.8888ms]\n",
+      "Epoch: [  8/ 10], step: [  176/  390], loss: [0.3234], avg loss: [0.2873], time: [107.0349ms]\n",
+      "Epoch: [  8/ 10], step: [  177/  390], loss: [0.2304], avg loss: [0.2870], time: [100.7838ms]\n",
+      "Epoch: [  8/ 10], step: [  178/  390], loss: [0.3864], avg loss: [0.2876], time: [105.3085ms]\n",
+      "Epoch: [  8/ 10], step: [  179/  390], loss: [0.3090], avg loss: [0.2877], time: [104.4328ms]\n",
+      "Epoch: [  8/ 10], step: [  180/  390], loss: [0.2704], avg loss: [0.2876], time: [103.4174ms]\n",
+      "Epoch: [  8/ 10], step: [  181/  390], loss: [0.3385], avg loss: [0.2879], time: [104.2037ms]\n",
+      "Epoch: [  8/ 10], step: [  182/  390], loss: [0.2771], avg loss: [0.2878], time: [104.3608ms]\n",
+      "Epoch: [  8/ 10], step: [  183/  390], loss: [0.3193], avg loss: [0.2880], time: [105.0370ms]\n",
+      "Epoch: [  8/ 10], step: [  184/  390], loss: [0.1769], avg loss: [0.2874], time: [102.8898ms]\n",
+      "Epoch: [  8/ 10], step: [  185/  390], loss: [0.2449], avg loss: [0.2872], time: [103.1065ms]\n",
+      "Epoch: [  8/ 10], step: [  186/  390], loss: [0.2875], avg loss: [0.2872], time: [107.6005ms]\n",
+      "Epoch: [  8/ 10], step: [  187/  390], loss: [0.2144], avg loss: [0.2868], time: [102.1121ms]\n",
+      "Epoch: [  8/ 10], step: [  188/  390], loss: [0.4234], avg loss: [0.2875], time: [105.1552ms]\n",
+      "Epoch: [  8/ 10], step: [  189/  390], loss: [0.2382], avg loss: [0.2872], time: [105.0382ms]\n",
+      "Epoch: [  8/ 10], step: [  190/  390], loss: [0.2586], avg loss: [0.2871], time: [105.0529ms]\n",
+      "Epoch: [  8/ 10], step: [  191/  390], loss: [0.3665], avg loss: [0.2875], time: [101.3770ms]\n",
+      "Epoch: [  8/ 10], step: [  192/  390], loss: [0.2088], avg loss: [0.2871], time: [105.0966ms]\n",
+      "Epoch: [  8/ 10], step: [  193/  390], loss: [0.3332], avg loss: [0.2873], time: [104.6309ms]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: [  8/ 10], step: [  194/  390], loss: [0.2501], avg loss: [0.2871], time: [105.4816ms]\n",
+      "Epoch: [  8/ 10], step: [  195/  390], loss: [0.1891], avg loss: [0.2866], time: [102.4518ms]\n",
+      "Epoch: [  8/ 10], step: [  196/  390], loss: [0.2274], avg loss: [0.2863], time: [101.9406ms]\n",
+      "Epoch: [  8/ 10], step: [  197/  390], loss: [0.3215], avg loss: [0.2865], time: [100.8925ms]\n",
+      "Epoch: [  8/ 10], step: [  198/  390], loss: [0.2382], avg loss: [0.2863], time: [106.7557ms]\n",
+      "Epoch: [  8/ 10], step: [  199/  390], loss: [0.3136], avg loss: [0.2864], time: [105.3262ms]\n",
+      "Epoch: [  8/ 10], step: [  200/  390], loss: [0.3687], avg loss: [0.2868], time: [102.4990ms]\n",
+      "Epoch: [  8/ 10], step: [  201/  390], loss: [0.1899], avg loss: [0.2863], time: [101.1612ms]\n",
+      "Epoch: [  8/ 10], step: [  202/  390], loss: [0.2513], avg loss: [0.2862], time: [101.4724ms]\n",
+      "Epoch: [  8/ 10], step: [  203/  390], loss: [0.2842], avg loss: [0.2861], time: [102.1821ms]\n",
+      "Epoch: [  8/ 10], step: [  204/  390], loss: [0.2917], avg loss: [0.2862], time: [106.1144ms]\n",
+      "Epoch: [  8/ 10], step: [  205/  390], loss: [0.2588], avg loss: [0.2860], time: [106.8852ms]\n",
+      "Epoch: [  8/ 10], step: [  206/  390], loss: [0.3324], avg loss: [0.2863], time: [106.4188ms]\n",
+      "Epoch: [  8/ 10], step: [  207/  390], loss: [0.3042], avg loss: [0.2864], time: [102.8409ms]\n",
+      "Epoch: [  8/ 10], step: [  208/  390], loss: [0.2606], avg loss: [0.2862], time: [103.3425ms]\n",
+      "Epoch: [  8/ 10], step: [  209/  390], loss: [0.3536], avg loss: [0.2865], time: [105.6108ms]\n",
+      "Epoch: [  8/ 10], step: [  210/  390], loss: [0.4595], avg loss: [0.2874], time: [102.4525ms]\n",
+      "Epoch: [  8/ 10], step: [  211/  390], loss: [0.2538], avg loss: [0.2872], time: [100.4620ms]\n",
+      "Epoch: [  8/ 10], step: [  212/  390], loss: [0.3812], avg loss: [0.2877], time: [102.3226ms]\n",
+      "Epoch: [  8/ 10], step: [  213/  390], loss: [0.1679], avg loss: [0.2871], time: [101.3007ms]\n",
+      "Epoch: [  8/ 10], step: [  214/  390], loss: [0.1868], avg loss: [0.2866], time: [102.5825ms]\n",
+      "Epoch: [  8/ 10], step: [  215/  390], loss: [0.4198], avg loss: [0.2872], time: [101.8732ms]\n",
+      "Epoch: [  8/ 10], step: [  216/  390], loss: [0.3415], avg loss: [0.2875], time: [104.1446ms]\n",
+      "Epoch: [  8/ 10], step: [  217/  390], loss: [0.2309], avg loss: [0.2872], time: [104.2297ms]\n",
+      "Epoch: [  8/ 10], step: [  218/  390], loss: [0.3316], avg loss: [0.2874], time: [103.6499ms]\n",
+      "Epoch: [  8/ 10], step: [  219/  390], loss: [0.3680], avg loss: [0.2878], time: [103.9608ms]\n",
+      "Epoch: [  8/ 10], step: [  220/  390], loss: [0.2453], avg loss: [0.2876], time: [103.8964ms]\n",
+      "Epoch: [  8/ 10], step: [  221/  390], loss: [0.4186], avg loss: [0.2882], time: [103.5337ms]\n",
+      "Epoch: [  8/ 10], step: [  222/  390], loss: [0.2608], avg loss: [0.2881], time: [105.7644ms]\n",
+      "Epoch: [  8/ 10], step: [  223/  390], loss: [0.3379], avg loss: [0.2883], time: [101.0270ms]\n",
+      "Epoch: [  8/ 10], step: [  224/  390], loss: [0.2239], avg loss: [0.2880], time: [104.9690ms]\n",
+      "Epoch: [  8/ 10], step: [  225/  390], loss: [0.3269], avg loss: [0.2882], time: [103.5392ms]\n",
+      "Epoch: [  8/ 10], step: [  226/  390], loss: [0.1936], avg loss: [0.2878], time: [103.5523ms]\n",
+      "Epoch: [  8/ 10], step: [  227/  390], loss: [0.2899], avg loss: [0.2878], time: [102.5474ms]\n",
+      "Epoch: [  8/ 10], step: [  228/  390], loss: [0.2795], avg loss: [0.2877], time: [101.8667ms]\n",
+      "Epoch: [  8/ 10], step: [  229/  390], loss: [0.2784], avg loss: [0.2877], time: [101.3720ms]\n",
+      "Epoch: [  8/ 10], step: [  230/  390], loss: [0.3530], avg loss: [0.2880], time: [107.1603ms]\n",
+      "Epoch: [  8/ 10], step: [  231/  390], loss: [0.2883], avg loss: [0.2880], time: [105.8881ms]\n",
+      "Epoch: [  8/ 10], step: [  232/  390], loss: [0.3957], avg loss: [0.2885], time: [102.8709ms]\n",
+      "Epoch: [  8/ 10], step: [  233/  390], loss: [0.1569], avg loss: [0.2879], time: [104.8172ms]\n",
+      "Epoch: [  8/ 10], step: [  234/  390], loss: [0.3854], avg loss: [0.2883], time: [105.8090ms]\n",
+      "Epoch: [  8/ 10], step: [  235/  390], loss: [0.2987], avg loss: [0.2884], time: [105.6798ms]\n",
+      "Epoch: [  8/ 10], step: [  236/  390], loss: [0.4343], avg loss: [0.2890], time: [107.0321ms]\n",
+      "Epoch: [  8/ 10], step: [  237/  390], loss: [0.2411], avg loss: [0.2888], time: [106.8735ms]\n",
+      "Epoch: [  8/ 10], step: [  238/  390], loss: [0.2459], avg loss: [0.2886], time: [101.7492ms]\n",
+      "Epoch: [  8/ 10], step: [  239/  390], loss: [0.3338], avg loss: [0.2888], time: [102.7682ms]\n",
+      "Epoch: [  8/ 10], step: [  240/  390], loss: [0.3082], avg loss: [0.2889], time: [105.4368ms]\n",
+      "Epoch: [  8/ 10], step: [  241/  390], loss: [0.2265], avg loss: [0.2886], time: [99.3347ms]\n",
+      "Epoch: [  8/ 10], step: [  242/  390], loss: [0.2507], avg loss: [0.2884], time: [104.3801ms]\n",
+      "Epoch: [  8/ 10], step: [  243/  390], loss: [0.3032], avg loss: [0.2885], time: [102.6051ms]\n",
+      "Epoch: [  8/ 10], step: [  244/  390], loss: [0.3334], avg loss: [0.2887], time: [102.5877ms]\n",
+      "Epoch: [  8/ 10], step: [  245/  390], loss: [0.4204], avg loss: [0.2892], time: [102.8838ms]\n",
+      "Epoch: [  8/ 10], step: [  246/  390], loss: [0.2962], avg loss: [0.2893], time: [108.4619ms]\n",
+      "Epoch: [  8/ 10], step: [  247/  390], loss: [0.3268], avg loss: [0.2894], time: [104.6379ms]\n",
+      "Epoch: [  8/ 10], step: [  248/  390], loss: [0.3063], avg loss: [0.2895], time: [106.3936ms]\n",
+      "Epoch: [  8/ 10], step: [  249/  390], loss: [0.2344], avg loss: [0.2893], time: [103.0197ms]\n",
+      "Epoch: [  8/ 10], step: [  250/  390], loss: [0.3675], avg loss: [0.2896], time: [103.4546ms]\n",
+      "Epoch: [  8/ 10], step: [  251/  390], loss: [0.2744], avg loss: [0.2895], time: [103.1058ms]\n",
+      "Epoch: [  8/ 10], step: [  252/  390], loss: [0.4469], avg loss: [0.2901], time: [106.5910ms]\n",
+      "Epoch: [  8/ 10], step: [  253/  390], loss: [0.3931], avg loss: [0.2905], time: [105.9005ms]\n",
+      "Epoch: [  8/ 10], step: [  254/  390], loss: [0.2097], avg loss: [0.2902], time: [106.3991ms]\n",
+      "Epoch: [  8/ 10], step: [  255/  390], loss: [0.2915], avg loss: [0.2902], time: [101.1448ms]\n",
+      "Epoch: [  8/ 10], step: [  256/  390], loss: [0.2605], avg loss: [0.2901], time: [103.9233ms]\n",
+      "Epoch: [  8/ 10], step: [  257/  390], loss: [0.1835], avg loss: [0.2897], time: [104.6898ms]\n",
+      "Epoch: [  8/ 10], step: [  258/  390], loss: [0.3082], avg loss: [0.2898], time: [107.6202ms]\n",
+      "Epoch: [  8/ 10], step: [  259/  390], loss: [0.1538], avg loss: [0.2892], time: [102.6068ms]\n",
+      "Epoch: [  8/ 10], step: [  260/  390], loss: [0.2970], avg loss: [0.2893], time: [108.2559ms]\n",
+      "Epoch: [  8/ 10], step: [  261/  390], loss: [0.2292], avg loss: [0.2890], time: [102.9320ms]\n",
+      "Epoch: [  8/ 10], step: [  262/  390], loss: [0.2763], avg loss: [0.2890], time: [105.5338ms]\n",
+      "Epoch: [  8/ 10], step: [  263/  390], loss: [0.4960], avg loss: [0.2898], time: [100.9338ms]\n",
+      "Epoch: [  8/ 10], step: [  264/  390], loss: [0.3799], avg loss: [0.2901], time: [105.1140ms]\n",
+      "Epoch: [  8/ 10], step: [  265/  390], loss: [0.3887], avg loss: [0.2905], time: [104.0342ms]\n",
+      "Epoch: [  8/ 10], step: [  266/  390], loss: [0.2376], avg loss: [0.2903], time: [107.4958ms]\n",
+      "Epoch: [  8/ 10], step: [  267/  390], loss: [0.2944], avg loss: [0.2903], time: [101.5563ms]\n",
+      "Epoch: [  8/ 10], step: [  268/  390], loss: [0.2557], avg loss: [0.2902], time: [101.6114ms]\n",
+      "Epoch: [  8/ 10], step: [  269/  390], loss: [0.3924], avg loss: [0.2906], time: [106.6966ms]\n",
+      "Epoch: [  8/ 10], step: [  270/  390], loss: [0.2742], avg loss: [0.2905], time: [102.8039ms]\n",
+      "Epoch: [  8/ 10], step: [  271/  390], loss: [0.3677], avg loss: [0.2908], time: [104.4290ms]\n",
+      "Epoch: [  8/ 10], step: [  272/  390], loss: [0.3184], avg loss: [0.2909], time: [104.7673ms]\n",
+      "Epoch: [  8/ 10], step: [  273/  390], loss: [0.2249], avg loss: [0.2906], time: [103.0540ms]\n",
+      "Epoch: [  8/ 10], step: [  274/  390], loss: [0.3460], avg loss: [0.2908], time: [104.6810ms]\n",
+      "Epoch: [  8/ 10], step: [  275/  390], loss: [0.2943], avg loss: [0.2909], time: [101.3746ms]\n",
+      "Epoch: [  8/ 10], step: [  276/  390], loss: [0.3249], avg loss: [0.2910], time: [102.2553ms]\n",
+      "Epoch: [  8/ 10], step: [  277/  390], loss: [0.3228], avg loss: [0.2911], time: [104.1923ms]\n",
+      "Epoch: [  8/ 10], step: [  278/  390], loss: [0.1978], avg loss: [0.2908], time: [102.0288ms]\n",
+      "Epoch: [  8/ 10], step: [  279/  390], loss: [0.2511], avg loss: [0.2906], time: [100.1449ms]\n",
+      "Epoch: [  8/ 10], step: [  280/  390], loss: [0.2804], avg loss: [0.2906], time: [105.2427ms]\n",
+      "Epoch: [  8/ 10], step: [  281/  390], loss: [0.2771], avg loss: [0.2905], time: [105.3586ms]\n",
+      "Epoch: [  8/ 10], step: [  282/  390], loss: [0.2485], avg loss: [0.2904], time: [101.2533ms]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: [  8/ 10], step: [  283/  390], loss: [0.3052], avg loss: [0.2904], time: [100.7648ms]\n",
+      "Epoch: [  8/ 10], step: [  284/  390], loss: [0.3046], avg loss: [0.2905], time: [107.3947ms]\n",
+      "Epoch: [  8/ 10], step: [  285/  390], loss: [0.3282], avg loss: [0.2906], time: [102.7765ms]\n",
+      "Epoch: [  8/ 10], step: [  286/  390], loss: [0.2687], avg loss: [0.2905], time: [108.1564ms]\n",
+      "Epoch: [  8/ 10], step: [  287/  390], loss: [0.2085], avg loss: [0.2903], time: [102.8466ms]\n",
+      "Epoch: [  8/ 10], step: [  288/  390], loss: [0.2500], avg loss: [0.2901], time: [107.4021ms]\n",
+      "Epoch: [  8/ 10], step: [  289/  390], loss: [0.2477], avg loss: [0.2900], time: [105.6416ms]\n",
+      "Epoch: [  8/ 10], step: [  290/  390], loss: [0.1799], avg loss: [0.2896], time: [103.6391ms]\n",
+      "Epoch: [  8/ 10], step: [  291/  390], loss: [0.3890], avg loss: [0.2899], time: [103.8487ms]\n",
+      "Epoch: [  8/ 10], step: [  292/  390], loss: [0.2363], avg loss: [0.2897], time: [101.7900ms]\n",
+      "Epoch: [  8/ 10], step: [  293/  390], loss: [0.3996], avg loss: [0.2901], time: [106.0460ms]\n",
+      "Epoch: [  8/ 10], step: [  294/  390], loss: [0.3036], avg loss: [0.2902], time: [102.5286ms]\n",
+      "Epoch: [  8/ 10], step: [  295/  390], loss: [0.3625], avg loss: [0.2904], time: [106.6227ms]\n",
+      "Epoch: [  8/ 10], step: [  296/  390], loss: [0.3306], avg loss: [0.2906], time: [100.7111ms]\n",
+      "Epoch: [  8/ 10], step: [  297/  390], loss: [0.2989], avg loss: [0.2906], time: [102.4008ms]\n",
+      "Epoch: [  8/ 10], step: [  298/  390], loss: [0.3709], avg loss: [0.2908], time: [106.6396ms]\n",
+      "Epoch: [  8/ 10], step: [  299/  390], loss: [0.4077], avg loss: [0.2912], time: [103.5302ms]\n",
+      "Epoch: [  8/ 10], step: [  300/  390], loss: [0.3659], avg loss: [0.2915], time: [103.3807ms]\n",
+      "Epoch: [  8/ 10], step: [  301/  390], loss: [0.3173], avg loss: [0.2916], time: [104.9225ms]\n",
+      "Epoch: [  8/ 10], step: [  302/  390], loss: [0.2164], avg loss: [0.2913], time: [107.4843ms]\n",
+      "Epoch: [  8/ 10], step: [  303/  390], loss: [0.2811], avg loss: [0.2913], time: [105.4397ms]\n",
+      "Epoch: [  8/ 10], step: [  304/  390], loss: [0.2248], avg loss: [0.2911], time: [106.0581ms]\n",
+      "Epoch: [  8/ 10], step: [  305/  390], loss: [0.3226], avg loss: [0.2912], time: [101.0187ms]\n",
+      "Epoch: [  8/ 10], step: [  306/  390], loss: [0.4554], avg loss: [0.2917], time: [105.2384ms]\n",
+      "Epoch: [  8/ 10], step: [  307/  390], loss: [0.2045], avg loss: [0.2914], time: [101.7089ms]\n",
+      "Epoch: [  8/ 10], step: [  308/  390], loss: [0.2654], avg loss: [0.2913], time: [101.4795ms]\n",
+      "Epoch: [  8/ 10], step: [  309/  390], loss: [0.3877], avg loss: [0.2917], time: [102.4687ms]\n",
+      "Epoch: [  8/ 10], step: [  310/  390], loss: [0.3128], avg loss: [0.2917], time: [107.8303ms]\n",
+      "Epoch: [  8/ 10], step: [  311/  390], loss: [0.3225], avg loss: [0.2918], time: [106.2863ms]\n",
+      "Epoch: [  8/ 10], step: [  312/  390], loss: [0.2464], avg loss: [0.2917], time: [99.4079ms]\n",
+      "Epoch: [  8/ 10], step: [  313/  390], loss: [0.2058], avg loss: [0.2914], time: [104.8460ms]\n",
+      "Epoch: [  8/ 10], step: [  314/  390], loss: [0.2562], avg loss: [0.2913], time: [102.0980ms]\n",
+      "Epoch: [  8/ 10], step: [  315/  390], loss: [0.2906], avg loss: [0.2913], time: [102.4487ms]\n",
+      "Epoch: [  8/ 10], step: [  316/  390], loss: [0.2278], avg loss: [0.2911], time: [102.9611ms]\n",
+      "Epoch: [  8/ 10], step: [  317/  390], loss: [0.5644], avg loss: [0.2919], time: [103.7703ms]\n",
+      "Epoch: [  8/ 10], step: [  318/  390], loss: [0.2196], avg loss: [0.2917], time: [106.2648ms]\n",
+      "Epoch: [  8/ 10], step: [  319/  390], loss: [0.2686], avg loss: [0.2916], time: [104.3472ms]\n",
+      "Epoch: [  8/ 10], step: [  320/  390], loss: [0.4012], avg loss: [0.2920], time: [106.7402ms]\n",
+      "Epoch: [  8/ 10], step: [  321/  390], loss: [0.3391], avg loss: [0.2921], time: [102.1755ms]\n",
+      "Epoch: [  8/ 10], step: [  322/  390], loss: [0.2743], avg loss: [0.2921], time: [102.1249ms]\n",
+      "Epoch: [  8/ 10], step: [  323/  390], loss: [0.4422], avg loss: [0.2925], time: [105.3572ms]\n",
+      "Epoch: [  8/ 10], step: [  324/  390], loss: [0.3312], avg loss: [0.2927], time: [101.5308ms]\n",
+      "Epoch: [  8/ 10], step: [  325/  390], loss: [0.4168], avg loss: [0.2930], time: [101.8789ms]\n",
+      "Epoch: [  8/ 10], step: [  326/  390], loss: [0.2627], avg loss: [0.2930], time: [102.4492ms]\n",
+      "Epoch: [  8/ 10], step: [  327/  390], loss: [0.3838], avg loss: [0.2932], time: [104.4939ms]\n",
+      "Epoch: [  8/ 10], step: [  328/  390], loss: [0.3179], avg loss: [0.2933], time: [99.9658ms]\n",
+      "Epoch: [  8/ 10], step: [  329/  390], loss: [0.3666], avg loss: [0.2935], time: [102.0112ms]\n",
+      "Epoch: [  8/ 10], step: [  330/  390], loss: [0.3488], avg loss: [0.2937], time: [102.6850ms]\n",
+      "Epoch: [  8/ 10], step: [  331/  390], loss: [0.2525], avg loss: [0.2936], time: [104.0716ms]\n",
+      "Epoch: [  8/ 10], step: [  332/  390], loss: [0.2915], avg loss: [0.2936], time: [102.4113ms]\n",
+      "Epoch: [  8/ 10], step: [  333/  390], loss: [0.2774], avg loss: [0.2935], time: [105.7770ms]\n",
+      "Epoch: [  8/ 10], step: [  334/  390], loss: [0.2881], avg loss: [0.2935], time: [101.4314ms]\n",
+      "Epoch: [  8/ 10], step: [  335/  390], loss: [0.3295], avg loss: [0.2936], time: [104.5465ms]\n",
+      "Epoch: [  8/ 10], step: [  336/  390], loss: [0.2187], avg loss: [0.2934], time: [104.4781ms]\n",
+      "Epoch: [  8/ 10], step: [  337/  390], loss: [0.2379], avg loss: [0.2932], time: [103.0433ms]\n",
+      "Epoch: [  8/ 10], step: [  338/  390], loss: [0.3931], avg loss: [0.2935], time: [105.2401ms]\n",
+      "Epoch: [  8/ 10], step: [  339/  390], loss: [0.2094], avg loss: [0.2933], time: [104.9283ms]\n",
+      "Epoch: [  8/ 10], step: [  340/  390], loss: [0.2684], avg loss: [0.2932], time: [106.2303ms]\n",
+      "Epoch: [  8/ 10], step: [  341/  390], loss: [0.3613], avg loss: [0.2934], time: [105.0959ms]\n",
+      "Epoch: [  8/ 10], step: [  342/  390], loss: [0.2116], avg loss: [0.2932], time: [104.8949ms]\n",
+      "Epoch: [  8/ 10], step: [  343/  390], loss: [0.4666], avg loss: [0.2937], time: [102.3049ms]\n",
+      "Epoch: [  8/ 10], step: [  344/  390], loss: [0.2186], avg loss: [0.2934], time: [104.3375ms]\n",
+      "Epoch: [  8/ 10], step: [  345/  390], loss: [0.3330], avg loss: [0.2936], time: [103.9658ms]\n",
+      "Epoch: [  8/ 10], step: [  346/  390], loss: [0.2798], avg loss: [0.2935], time: [100.8401ms]\n",
+      "Epoch: [  8/ 10], step: [  347/  390], loss: [0.1680], avg loss: [0.2932], time: [104.0971ms]\n",
+      "Epoch: [  8/ 10], step: [  348/  390], loss: [0.2947], avg loss: [0.2932], time: [103.9884ms]\n",
+      "Epoch: [  8/ 10], step: [  349/  390], loss: [0.1921], avg loss: [0.2929], time: [100.9669ms]\n",
+      "Epoch: [  8/ 10], step: [  350/  390], loss: [0.2572], avg loss: [0.2928], time: [102.8190ms]\n",
+      "Epoch: [  8/ 10], step: [  351/  390], loss: [0.3251], avg loss: [0.2929], time: [103.5602ms]\n",
+      "Epoch: [  8/ 10], step: [  352/  390], loss: [0.1561], avg loss: [0.2925], time: [102.4060ms]\n",
+      "Epoch: [  8/ 10], step: [  353/  390], loss: [0.3842], avg loss: [0.2927], time: [105.7391ms]\n",
+      "Epoch: [  8/ 10], step: [  354/  390], loss: [0.3143], avg loss: [0.2928], time: [103.2579ms]\n",
+      "Epoch: [  8/ 10], step: [  355/  390], loss: [0.3157], avg loss: [0.2929], time: [106.1723ms]\n",
+      "Epoch: [  8/ 10], step: [  356/  390], loss: [0.2084], avg loss: [0.2926], time: [105.6907ms]\n",
+      "Epoch: [  8/ 10], step: [  357/  390], loss: [0.3469], avg loss: [0.2928], time: [104.6863ms]\n",
+      "Epoch: [  8/ 10], step: [  358/  390], loss: [0.2570], avg loss: [0.2927], time: [105.5679ms]\n",
+      "Epoch: [  8/ 10], step: [  359/  390], loss: [0.1771], avg loss: [0.2924], time: [106.8208ms]\n",
+      "Epoch: [  8/ 10], step: [  360/  390], loss: [0.4097], avg loss: [0.2927], time: [104.7275ms]\n",
+      "Epoch: [  8/ 10], step: [  361/  390], loss: [0.2052], avg loss: [0.2924], time: [105.3350ms]\n",
+      "Epoch: [  8/ 10], step: [  362/  390], loss: [0.2419], avg loss: [0.2923], time: [102.6196ms]\n",
+      "Epoch: [  8/ 10], step: [  363/  390], loss: [0.2891], avg loss: [0.2923], time: [102.7620ms]\n",
+      "Epoch: [  8/ 10], step: [  364/  390], loss: [0.3674], avg loss: [0.2925], time: [102.9015ms]\n",
+      "Epoch: [  8/ 10], step: [  365/  390], loss: [0.3137], avg loss: [0.2926], time: [105.0744ms]\n",
+      "Epoch: [  8/ 10], step: [  366/  390], loss: [0.3452], avg loss: [0.2927], time: [108.6090ms]\n",
+      "Epoch: [  8/ 10], step: [  367/  390], loss: [0.3247], avg loss: [0.2928], time: [103.7612ms]\n",
+      "Epoch: [  8/ 10], step: [  368/  390], loss: [0.2509], avg loss: [0.2927], time: [108.0258ms]\n",
+      "Epoch: [  8/ 10], step: [  369/  390], loss: [0.3878], avg loss: [0.2929], time: [105.3483ms]\n",
+      "Epoch: [  8/ 10], step: [  370/  390], loss: [0.3596], avg loss: [0.2931], time: [103.9951ms]\n",
+      "Epoch: [  8/ 10], step: [  371/  390], loss: [0.3270], avg loss: [0.2932], time: [107.1894ms]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: [  8/ 10], step: [  372/  390], loss: [0.2237], avg loss: [0.2930], time: [104.1131ms]\n",
+      "Epoch: [  8/ 10], step: [  373/  390], loss: [0.1964], avg loss: [0.2928], time: [107.8188ms]\n",
+      "Epoch: [  8/ 10], step: [  374/  390], loss: [0.3240], avg loss: [0.2928], time: [102.8645ms]\n",
+      "Epoch: [  8/ 10], step: [  375/  390], loss: [0.4185], avg loss: [0.2932], time: [101.9661ms]\n",
+      "Epoch: [  8/ 10], step: [  376/  390], loss: [0.2762], avg loss: [0.2931], time: [102.5932ms]\n",
+      "Epoch: [  8/ 10], step: [  377/  390], loss: [0.2433], avg loss: [0.2930], time: [105.0153ms]\n",
+      "Epoch: [  8/ 10], step: [  378/  390], loss: [0.3024], avg loss: [0.2930], time: [103.4801ms]\n",
+      "Epoch: [  8/ 10], step: [  379/  390], loss: [0.3009], avg loss: [0.2930], time: [106.7383ms]\n",
+      "Epoch: [  8/ 10], step: [  380/  390], loss: [0.3313], avg loss: [0.2931], time: [103.3118ms]\n",
+      "Epoch: [  8/ 10], step: [  381/  390], loss: [0.2318], avg loss: [0.2930], time: [100.6205ms]\n",
+      "Epoch: [  8/ 10], step: [  382/  390], loss: [0.2963], avg loss: [0.2930], time: [103.0352ms]\n",
+      "Epoch: [  8/ 10], step: [  383/  390], loss: [0.3568], avg loss: [0.2932], time: [100.2462ms]\n",
+      "Epoch: [  8/ 10], step: [  384/  390], loss: [0.2718], avg loss: [0.2931], time: [101.7380ms]\n",
+      "Epoch: [  8/ 10], step: [  385/  390], loss: [0.3772], avg loss: [0.2933], time: [106.9889ms]\n",
+      "Epoch: [  8/ 10], step: [  386/  390], loss: [0.4922], avg loss: [0.2938], time: [104.3885ms]\n",
+      "Epoch: [  8/ 10], step: [  387/  390], loss: [0.4117], avg loss: [0.2941], time: [103.9355ms]\n",
+      "Epoch: [  8/ 10], step: [  388/  390], loss: [0.3131], avg loss: [0.2942], time: [102.3018ms]\n",
+      "Epoch: [  8/ 10], step: [  389/  390], loss: [0.3322], avg loss: [0.2943], time: [98.4211ms]\n",
+      "Epoch: [  8/ 10], step: [  390/  390], loss: [0.2457], avg loss: [0.2942], time: [825.6276ms]\n",
+      "Epoch time: 41559.210, per step time: 106.562\n",
+      "Epoch time: 41559.509, per step time: 106.563, avg loss: 0.294\n",
+      "************************************************************\n",
+      "Epoch: [  9/ 10], step: [    1/  390], loss: [0.2256], avg loss: [0.2256], time: [73.0402ms]\n",
+      "Epoch: [  9/ 10], step: [    2/  390], loss: [0.3673], avg loss: [0.2965], time: [100.7907ms]\n",
+      "Epoch: [  9/ 10], step: [    3/  390], loss: [0.3487], avg loss: [0.3139], time: [102.1595ms]\n",
+      "Epoch: [  9/ 10], step: [    4/  390], loss: [0.2746], avg loss: [0.3041], time: [100.4121ms]\n",
+      "Epoch: [  9/ 10], step: [    5/  390], loss: [0.2949], avg loss: [0.3022], time: [99.5049ms]\n",
+      "Epoch: [  9/ 10], step: [    6/  390], loss: [0.2162], avg loss: [0.2879], time: [104.5511ms]\n",
+      "Epoch: [  9/ 10], step: [    7/  390], loss: [0.2553], avg loss: [0.2832], time: [101.1810ms]\n",
+      "Epoch: [  9/ 10], step: [    8/  390], loss: [0.2775], avg loss: [0.2825], time: [101.3844ms]\n",
+      "Epoch: [  9/ 10], step: [    9/  390], loss: [0.2729], avg loss: [0.2815], time: [99.4809ms]\n",
+      "Epoch: [  9/ 10], step: [   10/  390], loss: [0.3049], avg loss: [0.2838], time: [104.8808ms]\n",
+      "Epoch: [  9/ 10], step: [   11/  390], loss: [0.2232], avg loss: [0.2783], time: [99.8337ms]\n",
+      "Epoch: [  9/ 10], step: [   12/  390], loss: [0.4350], avg loss: [0.2913], time: [102.1557ms]\n",
+      "Epoch: [  9/ 10], step: [   13/  390], loss: [0.2641], avg loss: [0.2893], time: [98.5370ms]\n",
+      "Epoch: [  9/ 10], step: [   14/  390], loss: [0.2723], avg loss: [0.2880], time: [102.5147ms]\n",
+      "Epoch: [  9/ 10], step: [   15/  390], loss: [0.3581], avg loss: [0.2927], time: [101.6605ms]\n",
+      "Epoch: [  9/ 10], step: [   16/  390], loss: [0.3240], avg loss: [0.2947], time: [104.3739ms]\n",
+      "Epoch: [  9/ 10], step: [   17/  390], loss: [0.2842], avg loss: [0.2941], time: [102.5045ms]\n",
+      "Epoch: [  9/ 10], step: [   18/  390], loss: [0.2179], avg loss: [0.2898], time: [104.9385ms]\n",
+      "Epoch: [  9/ 10], step: [   19/  390], loss: [0.2201], avg loss: [0.2862], time: [101.8097ms]\n",
+      "Epoch: [  9/ 10], step: [   20/  390], loss: [0.2116], avg loss: [0.2824], time: [101.5544ms]\n",
+      "Epoch: [  9/ 10], step: [   21/  390], loss: [0.2918], avg loss: [0.2829], time: [101.5136ms]\n",
+      "Epoch: [  9/ 10], step: [   22/  390], loss: [0.3158], avg loss: [0.2844], time: [105.2084ms]\n",
+      "Epoch: [  9/ 10], step: [   23/  390], loss: [0.2919], avg loss: [0.2847], time: [98.5651ms]\n",
+      "Epoch: [  9/ 10], step: [   24/  390], loss: [0.3004], avg loss: [0.2853], time: [101.0883ms]\n",
+      "Epoch: [  9/ 10], step: [   25/  390], loss: [0.1961], avg loss: [0.2818], time: [103.5347ms]\n",
+      "Epoch: [  9/ 10], step: [   26/  390], loss: [0.1507], avg loss: [0.2767], time: [100.9510ms]\n",
+      "Epoch: [  9/ 10], step: [   27/  390], loss: [0.2368], avg loss: [0.2753], time: [105.0570ms]\n",
+      "Epoch: [  9/ 10], step: [   28/  390], loss: [0.2472], avg loss: [0.2743], time: [101.9182ms]\n",
+      "Epoch: [  9/ 10], step: [   29/  390], loss: [0.3680], avg loss: [0.2775], time: [104.1172ms]\n",
+      "Epoch: [  9/ 10], step: [   30/  390], loss: [0.2974], avg loss: [0.2782], time: [100.6999ms]\n",
+      "Epoch: [  9/ 10], step: [   31/  390], loss: [0.4239], avg loss: [0.2829], time: [100.9908ms]\n",
+      "Epoch: [  9/ 10], step: [   32/  390], loss: [0.2210], avg loss: [0.2809], time: [102.1395ms]\n",
+      "Epoch: [  9/ 10], step: [   33/  390], loss: [0.2801], avg loss: [0.2809], time: [102.2515ms]\n",
+      "Epoch: [  9/ 10], step: [   34/  390], loss: [0.3228], avg loss: [0.2821], time: [100.3528ms]\n",
+      "Epoch: [  9/ 10], step: [   35/  390], loss: [0.2770], avg loss: [0.2820], time: [103.7469ms]\n",
+      "Epoch: [  9/ 10], step: [   36/  390], loss: [0.2428], avg loss: [0.2809], time: [103.4606ms]\n",
+      "Epoch: [  9/ 10], step: [   37/  390], loss: [0.3188], avg loss: [0.2819], time: [101.4192ms]\n",
+      "Epoch: [  9/ 10], step: [   38/  390], loss: [0.3796], avg loss: [0.2845], time: [100.3752ms]\n",
+      "Epoch: [  9/ 10], step: [   39/  390], loss: [0.3048], avg loss: [0.2850], time: [101.6319ms]\n",
+      "Epoch: [  9/ 10], step: [   40/  390], loss: [0.3629], avg loss: [0.2870], time: [103.6632ms]\n",
+      "Epoch: [  9/ 10], step: [   41/  390], loss: [0.2277], avg loss: [0.2855], time: [103.6563ms]\n",
+      "Epoch: [  9/ 10], step: [   42/  390], loss: [0.3251], avg loss: [0.2865], time: [101.8090ms]\n",
+      "Epoch: [  9/ 10], step: [   43/  390], loss: [0.2962], avg loss: [0.2867], time: [99.7391ms]\n",
+      "Epoch: [  9/ 10], step: [   44/  390], loss: [0.3035], avg loss: [0.2871], time: [102.1402ms]\n",
+      "Epoch: [  9/ 10], step: [   45/  390], loss: [0.2271], avg loss: [0.2857], time: [102.7102ms]\n",
+      "Epoch: [  9/ 10], step: [   46/  390], loss: [0.3214], avg loss: [0.2865], time: [100.9073ms]\n",
+      "Epoch: [  9/ 10], step: [   47/  390], loss: [0.3241], avg loss: [0.2873], time: [102.1171ms]\n",
+      "Epoch: [  9/ 10], step: [   48/  390], loss: [0.2813], avg loss: [0.2872], time: [103.9498ms]\n",
+      "Epoch: [  9/ 10], step: [   49/  390], loss: [0.2779], avg loss: [0.2870], time: [103.4803ms]\n",
+      "Epoch: [  9/ 10], step: [   50/  390], loss: [0.3609], avg loss: [0.2885], time: [106.9639ms]\n",
+      "Epoch: [  9/ 10], step: [   51/  390], loss: [0.2184], avg loss: [0.2871], time: [103.6048ms]\n",
+      "Epoch: [  9/ 10], step: [   52/  390], loss: [0.2971], avg loss: [0.2873], time: [100.3568ms]\n",
+      "Epoch: [  9/ 10], step: [   53/  390], loss: [0.2773], avg loss: [0.2871], time: [101.9478ms]\n",
+      "Epoch: [  9/ 10], step: [   54/  390], loss: [0.2829], avg loss: [0.2870], time: [103.6716ms]\n",
+      "Epoch: [  9/ 10], step: [   55/  390], loss: [0.2038], avg loss: [0.2855], time: [106.3128ms]\n",
+      "Epoch: [  9/ 10], step: [   56/  390], loss: [0.1633], avg loss: [0.2833], time: [105.2680ms]\n",
+      "Epoch: [  9/ 10], step: [   57/  390], loss: [0.3691], avg loss: [0.2848], time: [104.1119ms]\n",
+      "Epoch: [  9/ 10], step: [   58/  390], loss: [0.2271], avg loss: [0.2838], time: [100.7919ms]\n",
+      "Epoch: [  9/ 10], step: [   59/  390], loss: [0.2663], avg loss: [0.2835], time: [106.4491ms]\n",
+      "Epoch: [  9/ 10], step: [   60/  390], loss: [0.4288], avg loss: [0.2860], time: [100.9979ms]\n",
+      "Epoch: [  9/ 10], step: [   61/  390], loss: [0.2189], avg loss: [0.2849], time: [102.9048ms]\n",
+      "Epoch: [  9/ 10], step: [   62/  390], loss: [0.4068], avg loss: [0.2868], time: [104.0356ms]\n",
+      "Epoch: [  9/ 10], step: [   63/  390], loss: [0.2435], avg loss: [0.2861], time: [105.6540ms]\n",
+      "Epoch: [  9/ 10], step: [   64/  390], loss: [0.3208], avg loss: [0.2867], time: [101.3505ms]\n",
+      "Epoch: [  9/ 10], step: [   65/  390], loss: [0.1461], avg loss: [0.2845], time: [103.0321ms]\n",
+      "Epoch: [  9/ 10], step: [   66/  390], loss: [0.2150], avg loss: [0.2835], time: [105.2101ms]\n",
+      "Epoch: [  9/ 10], step: [   67/  390], loss: [0.3100], avg loss: [0.2839], time: [99.7190ms]\n",
+      "Epoch: [  9/ 10], step: [   68/  390], loss: [0.2170], avg loss: [0.2829], time: [101.7153ms]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: [  9/ 10], step: [   69/  390], loss: [0.4718], avg loss: [0.2856], time: [103.2302ms]\n",
+      "Epoch: [  9/ 10], step: [   70/  390], loss: [0.4030], avg loss: [0.2873], time: [100.6932ms]\n",
+      "Epoch: [  9/ 10], step: [   71/  390], loss: [0.3980], avg loss: [0.2888], time: [100.3315ms]\n",
+      "Epoch: [  9/ 10], step: [   72/  390], loss: [0.2488], avg loss: [0.2883], time: [102.1490ms]\n",
+      "Epoch: [  9/ 10], step: [   73/  390], loss: [0.1879], avg loss: [0.2869], time: [103.2121ms]\n",
+      "Epoch: [  9/ 10], step: [   74/  390], loss: [0.3052], avg loss: [0.2872], time: [105.9821ms]\n",
+      "Epoch: [  9/ 10], step: [   75/  390], loss: [0.1858], avg loss: [0.2858], time: [103.2846ms]\n",
+      "Epoch: [  9/ 10], step: [   76/  390], loss: [0.1737], avg loss: [0.2843], time: [102.9892ms]\n",
+      "Epoch: [  9/ 10], step: [   77/  390], loss: [0.3333], avg loss: [0.2850], time: [101.4016ms]\n",
+      "Epoch: [  9/ 10], step: [   78/  390], loss: [0.1959], avg loss: [0.2838], time: [103.2929ms]\n",
+      "Epoch: [  9/ 10], step: [   79/  390], loss: [0.2411], avg loss: [0.2833], time: [104.4221ms]\n",
+      "Epoch: [  9/ 10], step: [   80/  390], loss: [0.2749], avg loss: [0.2832], time: [101.3968ms]\n",
+      "Epoch: [  9/ 10], step: [   81/  390], loss: [0.1702], avg loss: [0.2818], time: [103.2207ms]\n",
+      "Epoch: [  9/ 10], step: [   82/  390], loss: [0.1831], avg loss: [0.2806], time: [101.0070ms]\n",
+      "Epoch: [  9/ 10], step: [   83/  390], loss: [0.3682], avg loss: [0.2816], time: [103.2400ms]\n",
+      "Epoch: [  9/ 10], step: [   84/  390], loss: [0.1844], avg loss: [0.2805], time: [101.6552ms]\n",
+      "Epoch: [  9/ 10], step: [   85/  390], loss: [0.2799], avg loss: [0.2805], time: [100.9090ms]\n",
+      "Epoch: [  9/ 10], step: [   86/  390], loss: [0.2805], avg loss: [0.2805], time: [102.1225ms]\n",
+      "Epoch: [  9/ 10], step: [   87/  390], loss: [0.3685], avg loss: [0.2815], time: [105.3522ms]\n",
+      "Epoch: [  9/ 10], step: [   88/  390], loss: [0.2802], avg loss: [0.2815], time: [103.0819ms]\n",
+      "Epoch: [  9/ 10], step: [   89/  390], loss: [0.1326], avg loss: [0.2798], time: [101.0733ms]\n",
+      "Epoch: [  9/ 10], step: [   90/  390], loss: [0.1912], avg loss: [0.2788], time: [101.8717ms]\n",
+      "Epoch: [  9/ 10], step: [   91/  390], loss: [0.3006], avg loss: [0.2791], time: [103.6603ms]\n",
+      "Epoch: [  9/ 10], step: [   92/  390], loss: [0.1286], avg loss: [0.2774], time: [105.4566ms]\n",
+      "Epoch: [  9/ 10], step: [   93/  390], loss: [0.2179], avg loss: [0.2768], time: [103.0874ms]\n",
+      "Epoch: [  9/ 10], step: [   94/  390], loss: [0.1999], avg loss: [0.2760], time: [104.0862ms]\n",
+      "Epoch: [  9/ 10], step: [   95/  390], loss: [0.2278], avg loss: [0.2755], time: [101.6471ms]\n",
+      "Epoch: [  9/ 10], step: [   96/  390], loss: [0.1420], avg loss: [0.2741], time: [103.2350ms]\n",
+      "Epoch: [  9/ 10], step: [   97/  390], loss: [0.1676], avg loss: [0.2730], time: [103.5488ms]\n",
+      "Epoch: [  9/ 10], step: [   98/  390], loss: [0.2984], avg loss: [0.2732], time: [102.4947ms]\n",
+      "Epoch: [  9/ 10], step: [   99/  390], loss: [0.2156], avg loss: [0.2726], time: [102.6287ms]\n",
+      "Epoch: [  9/ 10], step: [  100/  390], loss: [0.2189], avg loss: [0.2721], time: [101.0580ms]\n",
+      "Epoch: [  9/ 10], step: [  101/  390], loss: [0.2909], avg loss: [0.2723], time: [105.7246ms]\n",
+      "Epoch: [  9/ 10], step: [  102/  390], loss: [0.3303], avg loss: [0.2729], time: [101.7463ms]\n",
+      "Epoch: [  9/ 10], step: [  103/  390], loss: [0.4217], avg loss: [0.2743], time: [102.3078ms]\n",
+      "Epoch: [  9/ 10], step: [  104/  390], loss: [0.2753], avg loss: [0.2743], time: [104.3847ms]\n",
+      "Epoch: [  9/ 10], step: [  105/  390], loss: [0.2595], avg loss: [0.2742], time: [100.9948ms]\n",
+      "Epoch: [  9/ 10], step: [  106/  390], loss: [0.2275], avg loss: [0.2737], time: [104.4550ms]\n",
+      "Epoch: [  9/ 10], step: [  107/  390], loss: [0.3049], avg loss: [0.2740], time: [104.4550ms]\n",
+      "Epoch: [  9/ 10], step: [  108/  390], loss: [0.3463], avg loss: [0.2747], time: [101.8348ms]\n",
+      "Epoch: [  9/ 10], step: [  109/  390], loss: [0.2354], avg loss: [0.2743], time: [102.1039ms]\n",
+      "Epoch: [  9/ 10], step: [  110/  390], loss: [0.2470], avg loss: [0.2741], time: [100.6250ms]\n",
+      "Epoch: [  9/ 10], step: [  111/  390], loss: [0.2685], avg loss: [0.2740], time: [101.1736ms]\n",
+      "Epoch: [  9/ 10], step: [  112/  390], loss: [0.2859], avg loss: [0.2741], time: [102.1161ms]\n",
+      "Epoch: [  9/ 10], step: [  113/  390], loss: [0.2302], avg loss: [0.2738], time: [103.1101ms]\n",
+      "Epoch: [  9/ 10], step: [  114/  390], loss: [0.2259], avg loss: [0.2733], time: [101.3191ms]\n",
+      "Epoch: [  9/ 10], step: [  115/  390], loss: [0.2267], avg loss: [0.2729], time: [104.1887ms]\n",
+      "Epoch: [  9/ 10], step: [  116/  390], loss: [0.2309], avg loss: [0.2726], time: [103.5342ms]\n",
+      "Epoch: [  9/ 10], step: [  117/  390], loss: [0.3122], avg loss: [0.2729], time: [102.5238ms]\n",
+      "Epoch: [  9/ 10], step: [  118/  390], loss: [0.2515], avg loss: [0.2727], time: [107.3129ms]\n",
+      "Epoch: [  9/ 10], step: [  119/  390], loss: [0.2786], avg loss: [0.2728], time: [103.4453ms]\n",
+      "Epoch: [  9/ 10], step: [  120/  390], loss: [0.2677], avg loss: [0.2727], time: [102.1745ms]\n",
+      "Epoch: [  9/ 10], step: [  121/  390], loss: [0.3950], avg loss: [0.2737], time: [101.2299ms]\n",
+      "Epoch: [  9/ 10], step: [  122/  390], loss: [0.2902], avg loss: [0.2739], time: [99.9250ms]\n",
+      "Epoch: [  9/ 10], step: [  123/  390], loss: [0.2933], avg loss: [0.2740], time: [101.8765ms]\n",
+      "Epoch: [  9/ 10], step: [  124/  390], loss: [0.3831], avg loss: [0.2749], time: [103.2825ms]\n",
+      "Epoch: [  9/ 10], step: [  125/  390], loss: [0.2204], avg loss: [0.2745], time: [104.0571ms]\n",
+      "Epoch: [  9/ 10], step: [  126/  390], loss: [0.3225], avg loss: [0.2749], time: [101.8565ms]\n",
+      "Epoch: [  9/ 10], step: [  127/  390], loss: [0.3021], avg loss: [0.2751], time: [103.0264ms]\n",
+      "Epoch: [  9/ 10], step: [  128/  390], loss: [0.3702], avg loss: [0.2758], time: [102.6061ms]\n",
+      "Epoch: [  9/ 10], step: [  129/  390], loss: [0.3515], avg loss: [0.2764], time: [106.0541ms]\n",
+      "Epoch: [  9/ 10], step: [  130/  390], loss: [0.2547], avg loss: [0.2762], time: [102.6504ms]\n",
+      "Epoch: [  9/ 10], step: [  131/  390], loss: [0.2681], avg loss: [0.2762], time: [102.6218ms]\n",
+      "Epoch: [  9/ 10], step: [  132/  390], loss: [0.3002], avg loss: [0.2764], time: [105.4668ms]\n",
+      "Epoch: [  9/ 10], step: [  133/  390], loss: [0.3737], avg loss: [0.2771], time: [100.7266ms]\n",
+      "Epoch: [  9/ 10], step: [  134/  390], loss: [0.2523], avg loss: [0.2769], time: [104.7270ms]\n",
+      "Epoch: [  9/ 10], step: [  135/  390], loss: [0.3247], avg loss: [0.2773], time: [100.9140ms]\n",
+      "Epoch: [  9/ 10], step: [  136/  390], loss: [0.3409], avg loss: [0.2777], time: [101.0613ms]\n",
+      "Epoch: [  9/ 10], step: [  137/  390], loss: [0.3709], avg loss: [0.2784], time: [101.1641ms]\n",
+      "Epoch: [  9/ 10], step: [  138/  390], loss: [0.1743], avg loss: [0.2776], time: [100.7006ms]\n",
+      "Epoch: [  9/ 10], step: [  139/  390], loss: [0.3687], avg loss: [0.2783], time: [102.7555ms]\n",
+      "Epoch: [  9/ 10], step: [  140/  390], loss: [0.3255], avg loss: [0.2786], time: [101.9893ms]\n",
+      "Epoch: [  9/ 10], step: [  141/  390], loss: [0.2741], avg loss: [0.2786], time: [102.7770ms]\n",
+      "Epoch: [  9/ 10], step: [  142/  390], loss: [0.1603], avg loss: [0.2778], time: [103.8938ms]\n",
+      "Epoch: [  9/ 10], step: [  143/  390], loss: [0.3056], avg loss: [0.2780], time: [102.9959ms]\n",
+      "Epoch: [  9/ 10], step: [  144/  390], loss: [0.3297], avg loss: [0.2783], time: [100.5542ms]\n",
+      "Epoch: [  9/ 10], step: [  145/  390], loss: [0.2882], avg loss: [0.2784], time: [102.2995ms]\n",
+      "Epoch: [  9/ 10], step: [  146/  390], loss: [0.3367], avg loss: [0.2788], time: [102.5589ms]\n",
+      "Epoch: [  9/ 10], step: [  147/  390], loss: [0.1517], avg loss: [0.2779], time: [104.6801ms]\n",
+      "Epoch: [  9/ 10], step: [  148/  390], loss: [0.2856], avg loss: [0.2780], time: [104.3811ms]\n",
+      "Epoch: [  9/ 10], step: [  149/  390], loss: [0.3148], avg loss: [0.2782], time: [104.4352ms]\n",
+      "Epoch: [  9/ 10], step: [  150/  390], loss: [0.2960], avg loss: [0.2783], time: [101.1314ms]\n",
+      "Epoch: [  9/ 10], step: [  151/  390], loss: [0.2638], avg loss: [0.2783], time: [104.2705ms]\n",
+      "Epoch: [  9/ 10], step: [  152/  390], loss: [0.1726], avg loss: [0.2776], time: [99.9887ms]\n",
+      "Epoch: [  9/ 10], step: [  153/  390], loss: [0.3240], avg loss: [0.2779], time: [102.6273ms]\n",
+      "Epoch: [  9/ 10], step: [  154/  390], loss: [0.2530], avg loss: [0.2777], time: [106.0991ms]\n",
+      "Epoch: [  9/ 10], step: [  155/  390], loss: [0.2303], avg loss: [0.2774], time: [101.4292ms]\n",
+      "Epoch: [  9/ 10], step: [  156/  390], loss: [0.2816], avg loss: [0.2774], time: [101.7385ms]\n",
+      "Epoch: [  9/ 10], step: [  157/  390], loss: [0.3392], avg loss: [0.2778], time: [99.7787ms]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: [  9/ 10], step: [  158/  390], loss: [0.2664], avg loss: [0.2777], time: [101.3439ms]\n",
+      "Epoch: [  9/ 10], step: [  159/  390], loss: [0.4234], avg loss: [0.2787], time: [104.7285ms]\n",
+      "Epoch: [  9/ 10], step: [  160/  390], loss: [0.2787], avg loss: [0.2787], time: [101.1209ms]\n",
+      "Epoch: [  9/ 10], step: [  161/  390], loss: [0.3272], avg loss: [0.2790], time: [101.0838ms]\n",
+      "Epoch: [  9/ 10], step: [  162/  390], loss: [0.3409], avg loss: [0.2793], time: [101.6126ms]\n",
+      "Epoch: [  9/ 10], step: [  163/  390], loss: [0.3722], avg loss: [0.2799], time: [102.9675ms]\n",
+      "Epoch: [  9/ 10], step: [  164/  390], loss: [0.2464], avg loss: [0.2797], time: [103.0622ms]\n",
+      "Epoch: [  9/ 10], step: [  165/  390], loss: [0.1451], avg loss: [0.2789], time: [101.7148ms]\n",
+      "Epoch: [  9/ 10], step: [  166/  390], loss: [0.3036], avg loss: [0.2790], time: [102.2775ms]\n",
+      "Epoch: [  9/ 10], step: [  167/  390], loss: [0.2150], avg loss: [0.2787], time: [102.2885ms]\n",
+      "Epoch: [  9/ 10], step: [  168/  390], loss: [0.2903], avg loss: [0.2787], time: [105.4242ms]\n",
+      "Epoch: [  9/ 10], step: [  169/  390], loss: [0.4836], avg loss: [0.2799], time: [99.7148ms]\n",
+      "Epoch: [  9/ 10], step: [  170/  390], loss: [0.2690], avg loss: [0.2799], time: [100.1005ms]\n",
+      "Epoch: [  9/ 10], step: [  171/  390], loss: [0.3030], avg loss: [0.2800], time: [103.6029ms]\n",
+      "Epoch: [  9/ 10], step: [  172/  390], loss: [0.2788], avg loss: [0.2800], time: [101.7156ms]\n",
+      "Epoch: [  9/ 10], step: [  173/  390], loss: [0.3095], avg loss: [0.2802], time: [101.5306ms]\n",
+      "Epoch: [  9/ 10], step: [  174/  390], loss: [0.3485], avg loss: [0.2806], time: [99.8614ms]\n",
+      "Epoch: [  9/ 10], step: [  175/  390], loss: [0.3854], avg loss: [0.2812], time: [102.8297ms]\n",
+      "Epoch: [  9/ 10], step: [  176/  390], loss: [0.2738], avg loss: [0.2811], time: [101.7272ms]\n",
+      "Epoch: [  9/ 10], step: [  177/  390], loss: [0.2012], avg loss: [0.2807], time: [101.0392ms]\n",
+      "Epoch: [  9/ 10], step: [  178/  390], loss: [0.1913], avg loss: [0.2802], time: [105.7694ms]\n",
+      "Epoch: [  9/ 10], step: [  179/  390], loss: [0.1811], avg loss: [0.2796], time: [101.5816ms]\n",
+      "Epoch: [  9/ 10], step: [  180/  390], loss: [0.2216], avg loss: [0.2793], time: [103.5488ms]\n",
+      "Epoch: [  9/ 10], step: [  181/  390], loss: [0.3418], avg loss: [0.2796], time: [103.5931ms]\n",
+      "Epoch: [  9/ 10], step: [  182/  390], loss: [0.4854], avg loss: [0.2808], time: [101.1360ms]\n",
+      "Epoch: [  9/ 10], step: [  183/  390], loss: [0.3358], avg loss: [0.2811], time: [101.8438ms]\n",
+      "Epoch: [  9/ 10], step: [  184/  390], loss: [0.1935], avg loss: [0.2806], time: [104.8281ms]\n",
+      "Epoch: [  9/ 10], step: [  185/  390], loss: [0.3501], avg loss: [0.2810], time: [103.6060ms]\n",
+      "Epoch: [  9/ 10], step: [  186/  390], loss: [0.2153], avg loss: [0.2806], time: [102.9382ms]\n",
+      "Epoch: [  9/ 10], step: [  187/  390], loss: [0.2664], avg loss: [0.2805], time: [103.3432ms]\n",
+      "Epoch: [  9/ 10], step: [  188/  390], loss: [0.1765], avg loss: [0.2800], time: [100.9405ms]\n",
+      "Epoch: [  9/ 10], step: [  189/  390], loss: [0.1346], avg loss: [0.2792], time: [105.4928ms]\n",
+      "Epoch: [  9/ 10], step: [  190/  390], loss: [0.1991], avg loss: [0.2788], time: [102.2525ms]\n",
+      "Epoch: [  9/ 10], step: [  191/  390], loss: [0.2464], avg loss: [0.2786], time: [103.5113ms]\n",
+      "Epoch: [  9/ 10], step: [  192/  390], loss: [0.2229], avg loss: [0.2783], time: [100.6746ms]\n",
+      "Epoch: [  9/ 10], step: [  193/  390], loss: [0.3363], avg loss: [0.2786], time: [101.0532ms]\n",
+      "Epoch: [  9/ 10], step: [  194/  390], loss: [0.2198], avg loss: [0.2783], time: [100.8210ms]\n",
+      "Epoch: [  9/ 10], step: [  195/  390], loss: [0.1779], avg loss: [0.2778], time: [104.1412ms]\n",
+      "Epoch: [  9/ 10], step: [  196/  390], loss: [0.1399], avg loss: [0.2771], time: [104.0020ms]\n",
+      "Epoch: [  9/ 10], step: [  197/  390], loss: [0.3189], avg loss: [0.2773], time: [100.5628ms]\n",
+      "Epoch: [  9/ 10], step: [  198/  390], loss: [0.2123], avg loss: [0.2770], time: [102.3779ms]\n",
+      "Epoch: [  9/ 10], step: [  199/  390], loss: [0.3724], avg loss: [0.2775], time: [101.0129ms]\n",
+      "Epoch: [  9/ 10], step: [  200/  390], loss: [0.2345], avg loss: [0.2773], time: [105.2034ms]\n",
+      "Epoch: [  9/ 10], step: [  201/  390], loss: [0.2212], avg loss: [0.2770], time: [103.3015ms]\n",
+      "Epoch: [  9/ 10], step: [  202/  390], loss: [0.3194], avg loss: [0.2772], time: [104.3479ms]\n",
+      "Epoch: [  9/ 10], step: [  203/  390], loss: [0.1540], avg loss: [0.2766], time: [103.8370ms]\n",
+      "Epoch: [  9/ 10], step: [  204/  390], loss: [0.3313], avg loss: [0.2769], time: [102.0615ms]\n",
+      "Epoch: [  9/ 10], step: [  205/  390], loss: [0.2585], avg loss: [0.2768], time: [103.6479ms]\n",
+      "Epoch: [  9/ 10], step: [  206/  390], loss: [0.1736], avg loss: [0.2763], time: [105.9792ms]\n",
+      "Epoch: [  9/ 10], step: [  207/  390], loss: [0.3516], avg loss: [0.2766], time: [100.9440ms]\n",
+      "Epoch: [  9/ 10], step: [  208/  390], loss: [0.4077], avg loss: [0.2773], time: [101.6276ms]\n",
+      "Epoch: [  9/ 10], step: [  209/  390], loss: [0.2779], avg loss: [0.2773], time: [103.9333ms]\n",
+      "Epoch: [  9/ 10], step: [  210/  390], loss: [0.2984], avg loss: [0.2774], time: [104.6572ms]\n",
+      "Epoch: [  9/ 10], step: [  211/  390], loss: [0.3921], avg loss: [0.2779], time: [100.5015ms]\n",
+      "Epoch: [  9/ 10], step: [  212/  390], loss: [0.2446], avg loss: [0.2778], time: [102.1032ms]\n",
+      "Epoch: [  9/ 10], step: [  213/  390], loss: [0.2475], avg loss: [0.2776], time: [103.1590ms]\n",
+      "Epoch: [  9/ 10], step: [  214/  390], loss: [0.2972], avg loss: [0.2777], time: [102.2763ms]\n",
+      "Epoch: [  9/ 10], step: [  215/  390], loss: [0.2834], avg loss: [0.2777], time: [102.6273ms]\n",
+      "Epoch: [  9/ 10], step: [  216/  390], loss: [0.2070], avg loss: [0.2774], time: [104.4176ms]\n",
+      "Epoch: [  9/ 10], step: [  217/  390], loss: [0.3333], avg loss: [0.2777], time: [101.0535ms]\n",
+      "Epoch: [  9/ 10], step: [  218/  390], loss: [0.2225], avg loss: [0.2774], time: [103.3754ms]\n",
+      "Epoch: [  9/ 10], step: [  219/  390], loss: [0.3896], avg loss: [0.2779], time: [99.9773ms]\n",
+      "Epoch: [  9/ 10], step: [  220/  390], loss: [0.2675], avg loss: [0.2779], time: [104.7366ms]\n",
+      "Epoch: [  9/ 10], step: [  221/  390], loss: [0.2908], avg loss: [0.2779], time: [101.2146ms]\n",
+      "Epoch: [  9/ 10], step: [  222/  390], loss: [0.4031], avg loss: [0.2785], time: [102.5598ms]\n",
+      "Epoch: [  9/ 10], step: [  223/  390], loss: [0.1974], avg loss: [0.2781], time: [103.5082ms]\n",
+      "Epoch: [  9/ 10], step: [  224/  390], loss: [0.3648], avg loss: [0.2785], time: [102.3223ms]\n",
+      "Epoch: [  9/ 10], step: [  225/  390], loss: [0.3166], avg loss: [0.2787], time: [101.7206ms]\n",
+      "Epoch: [  9/ 10], step: [  226/  390], loss: [0.2183], avg loss: [0.2784], time: [101.8331ms]\n",
+      "Epoch: [  9/ 10], step: [  227/  390], loss: [0.3256], avg loss: [0.2786], time: [103.4844ms]\n",
+      "Epoch: [  9/ 10], step: [  228/  390], loss: [0.2786], avg loss: [0.2786], time: [101.1503ms]\n",
+      "Epoch: [  9/ 10], step: [  229/  390], loss: [0.3497], avg loss: [0.2789], time: [102.6747ms]\n",
+      "Epoch: [  9/ 10], step: [  230/  390], loss: [0.3478], avg loss: [0.2792], time: [101.1243ms]\n",
+      "Epoch: [  9/ 10], step: [  231/  390], loss: [0.3882], avg loss: [0.2797], time: [103.8375ms]\n",
+      "Epoch: [  9/ 10], step: [  232/  390], loss: [0.2460], avg loss: [0.2796], time: [100.4696ms]\n",
+      "Epoch: [  9/ 10], step: [  233/  390], loss: [0.1955], avg loss: [0.2792], time: [104.6753ms]\n",
+      "Epoch: [  9/ 10], step: [  234/  390], loss: [0.2888], avg loss: [0.2792], time: [106.8163ms]\n",
+      "Epoch: [  9/ 10], step: [  235/  390], loss: [0.2994], avg loss: [0.2793], time: [102.1876ms]\n",
+      "Epoch: [  9/ 10], step: [  236/  390], loss: [0.3871], avg loss: [0.2798], time: [106.7333ms]\n",
+      "Epoch: [  9/ 10], step: [  237/  390], loss: [0.3991], avg loss: [0.2803], time: [104.6433ms]\n",
+      "Epoch: [  9/ 10], step: [  238/  390], loss: [0.3099], avg loss: [0.2804], time: [101.4895ms]\n",
+      "Epoch: [  9/ 10], step: [  239/  390], loss: [0.3141], avg loss: [0.2806], time: [102.6342ms]\n",
+      "Epoch: [  9/ 10], step: [  240/  390], loss: [0.3390], avg loss: [0.2808], time: [102.2451ms]\n",
+      "Epoch: [  9/ 10], step: [  241/  390], loss: [0.2310], avg loss: [0.2806], time: [104.7823ms]\n",
+      "Epoch: [  9/ 10], step: [  242/  390], loss: [0.2700], avg loss: [0.2805], time: [103.6384ms]\n",
+      "Epoch: [  9/ 10], step: [  243/  390], loss: [0.2811], avg loss: [0.2805], time: [102.1147ms]\n",
+      "Epoch: [  9/ 10], step: [  244/  390], loss: [0.2345], avg loss: [0.2804], time: [103.6584ms]\n",
+      "Epoch: [  9/ 10], step: [  245/  390], loss: [0.2672], avg loss: [0.2803], time: [101.4009ms]\n",
+      "Epoch: [  9/ 10], step: [  246/  390], loss: [0.1876], avg loss: [0.2799], time: [106.1995ms]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: [  9/ 10], step: [  247/  390], loss: [0.3666], avg loss: [0.2803], time: [103.4901ms]\n",
+      "Epoch: [  9/ 10], step: [  248/  390], loss: [0.2445], avg loss: [0.2801], time: [101.3665ms]\n",
+      "Epoch: [  9/ 10], step: [  249/  390], loss: [0.2603], avg loss: [0.2801], time: [103.7555ms]\n",
+      "Epoch: [  9/ 10], step: [  250/  390], loss: [0.2571], avg loss: [0.2800], time: [99.9429ms]\n",
+      "Epoch: [  9/ 10], step: [  251/  390], loss: [0.4252], avg loss: [0.2805], time: [101.2111ms]\n",
+      "Epoch: [  9/ 10], step: [  252/  390], loss: [0.3173], avg loss: [0.2807], time: [100.2634ms]\n",
+      "Epoch: [  9/ 10], step: [  253/  390], loss: [0.2151], avg loss: [0.2804], time: [101.1360ms]\n",
+      "Epoch: [  9/ 10], step: [  254/  390], loss: [0.3287], avg loss: [0.2806], time: [103.8651ms]\n",
+      "Epoch: [  9/ 10], step: [  255/  390], loss: [0.2224], avg loss: [0.2804], time: [104.4121ms]\n",
+      "Epoch: [  9/ 10], step: [  256/  390], loss: [0.2287], avg loss: [0.2802], time: [102.6955ms]\n",
+      "Epoch: [  9/ 10], step: [  257/  390], loss: [0.2828], avg loss: [0.2802], time: [100.9130ms]\n",
+      "Epoch: [  9/ 10], step: [  258/  390], loss: [0.4278], avg loss: [0.2808], time: [105.5205ms]\n",
+      "Epoch: [  9/ 10], step: [  259/  390], loss: [0.2781], avg loss: [0.2808], time: [103.6808ms]\n",
+      "Epoch: [  9/ 10], step: [  260/  390], loss: [0.2918], avg loss: [0.2808], time: [101.3958ms]\n",
+      "Epoch: [  9/ 10], step: [  261/  390], loss: [0.2349], avg loss: [0.2806], time: [101.6803ms]\n",
+      "Epoch: [  9/ 10], step: [  262/  390], loss: [0.3005], avg loss: [0.2807], time: [102.3669ms]\n",
+      "Epoch: [  9/ 10], step: [  263/  390], loss: [0.2941], avg loss: [0.2808], time: [105.2403ms]\n",
+      "Epoch: [  9/ 10], step: [  264/  390], loss: [0.2351], avg loss: [0.2806], time: [102.0873ms]\n",
+      "Epoch: [  9/ 10], step: [  265/  390], loss: [0.3136], avg loss: [0.2807], time: [104.4846ms]\n",
+      "Epoch: [  9/ 10], step: [  266/  390], loss: [0.3938], avg loss: [0.2811], time: [101.5229ms]\n",
+      "Epoch: [  9/ 10], step: [  267/  390], loss: [0.1917], avg loss: [0.2808], time: [103.2660ms]\n",
+      "Epoch: [  9/ 10], step: [  268/  390], loss: [0.2223], avg loss: [0.2806], time: [105.1934ms]\n",
+      "Epoch: [  9/ 10], step: [  269/  390], loss: [0.1965], avg loss: [0.2803], time: [103.9193ms]\n",
+      "Epoch: [  9/ 10], step: [  270/  390], loss: [0.2173], avg loss: [0.2800], time: [100.4248ms]\n",
+      "Epoch: [  9/ 10], step: [  271/  390], loss: [0.3242], avg loss: [0.2802], time: [100.8677ms]\n",
+      "Epoch: [  9/ 10], step: [  272/  390], loss: [0.2942], avg loss: [0.2802], time: [105.2458ms]\n",
+      "Epoch: [  9/ 10], step: [  273/  390], loss: [0.3043], avg loss: [0.2803], time: [100.7495ms]\n",
+      "Epoch: [  9/ 10], step: [  274/  390], loss: [0.5046], avg loss: [0.2812], time: [105.1064ms]\n",
+      "Epoch: [  9/ 10], step: [  275/  390], loss: [0.2275], avg loss: [0.2810], time: [101.8381ms]\n",
+      "Epoch: [  9/ 10], step: [  276/  390], loss: [0.2391], avg loss: [0.2808], time: [100.6587ms]\n",
+      "Epoch: [  9/ 10], step: [  277/  390], loss: [0.2364], avg loss: [0.2806], time: [102.7703ms]\n",
+      "Epoch: [  9/ 10], step: [  278/  390], loss: [0.2180], avg loss: [0.2804], time: [101.0392ms]\n",
+      "Epoch: [  9/ 10], step: [  279/  390], loss: [0.2443], avg loss: [0.2803], time: [101.6967ms]\n",
+      "Epoch: [  9/ 10], step: [  280/  390], loss: [0.3269], avg loss: [0.2805], time: [104.4416ms]\n",
+      "Epoch: [  9/ 10], step: [  281/  390], loss: [0.2290], avg loss: [0.2803], time: [99.6125ms]\n",
+      "Epoch: [  9/ 10], step: [  282/  390], loss: [0.2864], avg loss: [0.2803], time: [106.7188ms]\n",
+      "Epoch: [  9/ 10], step: [  283/  390], loss: [0.5133], avg loss: [0.2811], time: [98.8364ms]\n",
+      "Epoch: [  9/ 10], step: [  284/  390], loss: [0.3965], avg loss: [0.2815], time: [100.8110ms]\n",
+      "Epoch: [  9/ 10], step: [  285/  390], loss: [0.2694], avg loss: [0.2815], time: [103.5564ms]\n",
+      "Epoch: [  9/ 10], step: [  286/  390], loss: [0.2299], avg loss: [0.2813], time: [100.5192ms]\n",
+      "Epoch: [  9/ 10], step: [  287/  390], loss: [0.2477], avg loss: [0.2812], time: [102.1681ms]\n",
+      "Epoch: [  9/ 10], step: [  288/  390], loss: [0.2881], avg loss: [0.2812], time: [101.5325ms]\n",
+      "Epoch: [  9/ 10], step: [  289/  390], loss: [0.2389], avg loss: [0.2811], time: [106.5199ms]\n",
+      "Epoch: [  9/ 10], step: [  290/  390], loss: [0.1957], avg loss: [0.2808], time: [105.9854ms]\n",
+      "Epoch: [  9/ 10], step: [  291/  390], loss: [0.4758], avg loss: [0.2814], time: [101.9738ms]\n",
+      "Epoch: [  9/ 10], step: [  292/  390], loss: [0.2147], avg loss: [0.2812], time: [104.7535ms]\n",
+      "Epoch: [  9/ 10], step: [  293/  390], loss: [0.1834], avg loss: [0.2809], time: [102.1748ms]\n",
+      "Epoch: [  9/ 10], step: [  294/  390], loss: [0.3235], avg loss: [0.2810], time: [100.9235ms]\n",
+      "Epoch: [  9/ 10], step: [  295/  390], loss: [0.2626], avg loss: [0.2810], time: [103.4160ms]\n",
+      "Epoch: [  9/ 10], step: [  296/  390], loss: [0.2007], avg loss: [0.2807], time: [101.7847ms]\n",
+      "Epoch: [  9/ 10], step: [  297/  390], loss: [0.3185], avg loss: [0.2808], time: [101.8078ms]\n",
+      "Epoch: [  9/ 10], step: [  298/  390], loss: [0.2742], avg loss: [0.2808], time: [102.3898ms]\n",
+      "Epoch: [  9/ 10], step: [  299/  390], loss: [0.3474], avg loss: [0.2810], time: [102.4704ms]\n",
+      "Epoch: [  9/ 10], step: [  300/  390], loss: [0.4156], avg loss: [0.2815], time: [100.8892ms]\n",
+      "Epoch: [  9/ 10], step: [  301/  390], loss: [0.3393], avg loss: [0.2817], time: [100.4710ms]\n",
+      "Epoch: [  9/ 10], step: [  302/  390], loss: [0.2162], avg loss: [0.2814], time: [102.9241ms]\n",
+      "Epoch: [  9/ 10], step: [  303/  390], loss: [0.3120], avg loss: [0.2815], time: [104.3801ms]\n",
+      "Epoch: [  9/ 10], step: [  304/  390], loss: [0.3075], avg loss: [0.2816], time: [101.2642ms]\n",
+      "Epoch: [  9/ 10], step: [  305/  390], loss: [0.2437], avg loss: [0.2815], time: [104.6188ms]\n",
+      "Epoch: [  9/ 10], step: [  306/  390], loss: [0.1778], avg loss: [0.2812], time: [100.4915ms]\n",
+      "Epoch: [  9/ 10], step: [  307/  390], loss: [0.3741], avg loss: [0.2815], time: [100.5793ms]\n",
+      "Epoch: [  9/ 10], step: [  308/  390], loss: [0.2621], avg loss: [0.2814], time: [101.7408ms]\n",
+      "Epoch: [  9/ 10], step: [  309/  390], loss: [0.2012], avg loss: [0.2811], time: [100.9459ms]\n",
+      "Epoch: [  9/ 10], step: [  310/  390], loss: [0.2965], avg loss: [0.2812], time: [106.0131ms]\n",
+      "Epoch: [  9/ 10], step: [  311/  390], loss: [0.2786], avg loss: [0.2812], time: [106.2589ms]\n",
+      "Epoch: [  9/ 10], step: [  312/  390], loss: [0.3387], avg loss: [0.2814], time: [100.6551ms]\n",
+      "Epoch: [  9/ 10], step: [  313/  390], loss: [0.1744], avg loss: [0.2810], time: [100.9841ms]\n",
+      "Epoch: [  9/ 10], step: [  314/  390], loss: [0.1716], avg loss: [0.2807], time: [107.1465ms]\n",
+      "Epoch: [  9/ 10], step: [  315/  390], loss: [0.2732], avg loss: [0.2807], time: [100.9767ms]\n",
+      "Epoch: [  9/ 10], step: [  316/  390], loss: [0.2169], avg loss: [0.2805], time: [101.7272ms]\n",
+      "Epoch: [  9/ 10], step: [  317/  390], loss: [0.2133], avg loss: [0.2802], time: [103.0226ms]\n",
+      "Epoch: [  9/ 10], step: [  318/  390], loss: [0.2757], avg loss: [0.2802], time: [100.7805ms]\n",
+      "Epoch: [  9/ 10], step: [  319/  390], loss: [0.2565], avg loss: [0.2802], time: [103.9732ms]\n",
+      "Epoch: [  9/ 10], step: [  320/  390], loss: [0.3456], avg loss: [0.2804], time: [100.8816ms]\n",
+      "Epoch: [  9/ 10], step: [  321/  390], loss: [0.1643], avg loss: [0.2800], time: [99.6258ms]\n",
+      "Epoch: [  9/ 10], step: [  322/  390], loss: [0.2130], avg loss: [0.2798], time: [101.2025ms]\n",
+      "Epoch: [  9/ 10], step: [  323/  390], loss: [0.2580], avg loss: [0.2797], time: [101.8205ms]\n",
+      "Epoch: [  9/ 10], step: [  324/  390], loss: [0.4480], avg loss: [0.2802], time: [101.9170ms]\n",
+      "Epoch: [  9/ 10], step: [  325/  390], loss: [0.1572], avg loss: [0.2799], time: [102.2940ms]\n",
+      "Epoch: [  9/ 10], step: [  326/  390], loss: [0.2302], avg loss: [0.2797], time: [100.7555ms]\n",
+      "Epoch: [  9/ 10], step: [  327/  390], loss: [0.3327], avg loss: [0.2799], time: [103.6375ms]\n",
+      "Epoch: [  9/ 10], step: [  328/  390], loss: [0.2224], avg loss: [0.2797], time: [102.1852ms]\n",
+      "Epoch: [  9/ 10], step: [  329/  390], loss: [0.1517], avg loss: [0.2793], time: [102.0155ms]\n",
+      "Epoch: [  9/ 10], step: [  330/  390], loss: [0.3094], avg loss: [0.2794], time: [103.5008ms]\n",
+      "Epoch: [  9/ 10], step: [  331/  390], loss: [0.3399], avg loss: [0.2796], time: [103.2772ms]\n",
+      "Epoch: [  9/ 10], step: [  332/  390], loss: [0.3457], avg loss: [0.2798], time: [101.5022ms]\n",
+      "Epoch: [  9/ 10], step: [  333/  390], loss: [0.4346], avg loss: [0.2802], time: [102.2038ms]\n",
+      "Epoch: [  9/ 10], step: [  334/  390], loss: [0.3131], avg loss: [0.2803], time: [105.9587ms]\n",
+      "Epoch: [  9/ 10], step: [  335/  390], loss: [0.2407], avg loss: [0.2802], time: [101.8615ms]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: [  9/ 10], step: [  336/  390], loss: [0.2749], avg loss: [0.2802], time: [101.8798ms]\n",
+      "Epoch: [  9/ 10], step: [  337/  390], loss: [0.1938], avg loss: [0.2800], time: [100.8859ms]\n",
+      "Epoch: [  9/ 10], step: [  338/  390], loss: [0.2136], avg loss: [0.2798], time: [101.9592ms]\n",
+      "Epoch: [  9/ 10], step: [  339/  390], loss: [0.1703], avg loss: [0.2794], time: [98.4647ms]\n",
+      "Epoch: [  9/ 10], step: [  340/  390], loss: [0.1344], avg loss: [0.2790], time: [100.3985ms]\n",
+      "Epoch: [  9/ 10], step: [  341/  390], loss: [0.2446], avg loss: [0.2789], time: [100.2448ms]\n",
+      "Epoch: [  9/ 10], step: [  342/  390], loss: [0.2180], avg loss: [0.2787], time: [103.6801ms]\n",
+      "Epoch: [  9/ 10], step: [  343/  390], loss: [0.3273], avg loss: [0.2789], time: [101.4183ms]\n",
+      "Epoch: [  9/ 10], step: [  344/  390], loss: [0.3550], avg loss: [0.2791], time: [106.7882ms]\n",
+      "Epoch: [  9/ 10], step: [  345/  390], loss: [0.2465], avg loss: [0.2790], time: [104.7385ms]\n",
+      "Epoch: [  9/ 10], step: [  346/  390], loss: [0.2084], avg loss: [0.2788], time: [101.6912ms]\n",
+      "Epoch: [  9/ 10], step: [  347/  390], loss: [0.3962], avg loss: [0.2791], time: [104.4145ms]\n",
+      "Epoch: [  9/ 10], step: [  348/  390], loss: [0.2505], avg loss: [0.2790], time: [102.1519ms]\n",
+      "Epoch: [  9/ 10], step: [  349/  390], loss: [0.2329], avg loss: [0.2789], time: [103.4329ms]\n",
+      "Epoch: [  9/ 10], step: [  350/  390], loss: [0.3404], avg loss: [0.2791], time: [102.0799ms]\n",
+      "Epoch: [  9/ 10], step: [  351/  390], loss: [0.3228], avg loss: [0.2792], time: [105.7591ms]\n",
+      "Epoch: [  9/ 10], step: [  352/  390], loss: [0.2663], avg loss: [0.2792], time: [100.6315ms]\n",
+      "Epoch: [  9/ 10], step: [  353/  390], loss: [0.2314], avg loss: [0.2790], time: [103.6012ms]\n",
+      "Epoch: [  9/ 10], step: [  354/  390], loss: [0.4019], avg loss: [0.2794], time: [101.9282ms]\n",
+      "Epoch: [  9/ 10], step: [  355/  390], loss: [0.2190], avg loss: [0.2792], time: [101.0091ms]\n",
+      "Epoch: [  9/ 10], step: [  356/  390], loss: [0.2142], avg loss: [0.2790], time: [103.3542ms]\n",
+      "Epoch: [  9/ 10], step: [  357/  390], loss: [0.2802], avg loss: [0.2790], time: [102.8593ms]\n",
+      "Epoch: [  9/ 10], step: [  358/  390], loss: [0.2102], avg loss: [0.2789], time: [105.3081ms]\n",
+      "Epoch: [  9/ 10], step: [  359/  390], loss: [0.1795], avg loss: [0.2786], time: [102.2232ms]\n",
+      "Epoch: [  9/ 10], step: [  360/  390], loss: [0.2005], avg loss: [0.2784], time: [106.1468ms]\n",
+      "Epoch: [  9/ 10], step: [  361/  390], loss: [0.2372], avg loss: [0.2782], time: [101.1386ms]\n",
+      "Epoch: [  9/ 10], step: [  362/  390], loss: [0.1931], avg loss: [0.2780], time: [101.7928ms]\n",
+      "Epoch: [  9/ 10], step: [  363/  390], loss: [0.3196], avg loss: [0.2781], time: [104.0721ms]\n",
+      "Epoch: [  9/ 10], step: [  364/  390], loss: [0.2563], avg loss: [0.2781], time: [101.5668ms]\n",
+      "Epoch: [  9/ 10], step: [  365/  390], loss: [0.2488], avg loss: [0.2780], time: [102.6225ms]\n",
+      "Epoch: [  9/ 10], step: [  366/  390], loss: [0.2499], avg loss: [0.2779], time: [102.6690ms]\n",
+      "Epoch: [  9/ 10], step: [  367/  390], loss: [0.1904], avg loss: [0.2777], time: [105.2618ms]\n",
+      "Epoch: [  9/ 10], step: [  368/  390], loss: [0.2042], avg loss: [0.2775], time: [102.2058ms]\n",
+      "Epoch: [  9/ 10], step: [  369/  390], loss: [0.3357], avg loss: [0.2776], time: [100.1673ms]\n",
+      "Epoch: [  9/ 10], step: [  370/  390], loss: [0.3050], avg loss: [0.2777], time: [103.9114ms]\n",
+      "Epoch: [  9/ 10], step: [  371/  390], loss: [0.3618], avg loss: [0.2779], time: [99.7210ms]\n",
+      "Epoch: [  9/ 10], step: [  372/  390], loss: [0.2830], avg loss: [0.2779], time: [101.2034ms]\n",
+      "Epoch: [  9/ 10], step: [  373/  390], loss: [0.3102], avg loss: [0.2780], time: [101.0656ms]\n",
+      "Epoch: [  9/ 10], step: [  374/  390], loss: [0.1494], avg loss: [0.2777], time: [101.4268ms]\n",
+      "Epoch: [  9/ 10], step: [  375/  390], loss: [0.3108], avg loss: [0.2778], time: [105.6063ms]\n",
+      "Epoch: [  9/ 10], step: [  376/  390], loss: [0.2621], avg loss: [0.2777], time: [105.5107ms]\n",
+      "Epoch: [  9/ 10], step: [  377/  390], loss: [0.3015], avg loss: [0.2778], time: [103.9784ms]\n",
+      "Epoch: [  9/ 10], step: [  378/  390], loss: [0.3440], avg loss: [0.2780], time: [103.6081ms]\n",
+      "Epoch: [  9/ 10], step: [  379/  390], loss: [0.2310], avg loss: [0.2778], time: [101.1114ms]\n",
+      "Epoch: [  9/ 10], step: [  380/  390], loss: [0.4890], avg loss: [0.2784], time: [103.5185ms]\n",
+      "Epoch: [  9/ 10], step: [  381/  390], loss: [0.3627], avg loss: [0.2786], time: [102.2093ms]\n",
+      "Epoch: [  9/ 10], step: [  382/  390], loss: [0.2582], avg loss: [0.2786], time: [101.8181ms]\n",
+      "Epoch: [  9/ 10], step: [  383/  390], loss: [0.3308], avg loss: [0.2787], time: [102.9022ms]\n",
+      "Epoch: [  9/ 10], step: [  384/  390], loss: [0.2705], avg loss: [0.2787], time: [101.2967ms]\n",
+      "Epoch: [  9/ 10], step: [  385/  390], loss: [0.2209], avg loss: [0.2785], time: [103.0319ms]\n",
+      "Epoch: [  9/ 10], step: [  386/  390], loss: [0.3860], avg loss: [0.2788], time: [101.7649ms]\n",
+      "Epoch: [  9/ 10], step: [  387/  390], loss: [0.3459], avg loss: [0.2790], time: [104.6145ms]\n",
+      "Epoch: [  9/ 10], step: [  388/  390], loss: [0.1994], avg loss: [0.2788], time: [101.5496ms]\n",
+      "Epoch: [  9/ 10], step: [  389/  390], loss: [0.2605], avg loss: [0.2787], time: [101.8271ms]\n",
+      "Epoch: [  9/ 10], step: [  390/  390], loss: [0.3933], avg loss: [0.2790], time: [828.5053ms]\n",
+      "Epoch time: 41044.284, per step time: 105.242\n",
+      "Epoch time: 41044.824, per step time: 105.243, avg loss: 0.279\n",
+      "************************************************************\n",
+      "Epoch: [ 10/ 10], step: [    1/  390], loss: [0.2735], avg loss: [0.2735], time: [107.2650ms]\n",
+      "Epoch: [ 10/ 10], step: [    2/  390], loss: [0.2958], avg loss: [0.2847], time: [109.0686ms]\n",
+      "Epoch: [ 10/ 10], step: [    3/  390], loss: [0.3449], avg loss: [0.3047], time: [104.8942ms]\n",
+      "Epoch: [ 10/ 10], step: [    4/  390], loss: [0.2454], avg loss: [0.2899], time: [109.5195ms]\n",
+      "Epoch: [ 10/ 10], step: [    5/  390], loss: [0.2612], avg loss: [0.2841], time: [105.2437ms]\n",
+      "Epoch: [ 10/ 10], step: [    6/  390], loss: [0.1682], avg loss: [0.2648], time: [108.9017ms]\n",
+      "Epoch: [ 10/ 10], step: [    7/  390], loss: [0.3401], avg loss: [0.2756], time: [109.3621ms]\n",
+      "Epoch: [ 10/ 10], step: [    8/  390], loss: [0.2339], avg loss: [0.2704], time: [106.6265ms]\n",
+      "Epoch: [ 10/ 10], step: [    9/  390], loss: [0.1695], avg loss: [0.2591], time: [110.9552ms]\n",
+      "Epoch: [ 10/ 10], step: [   10/  390], loss: [0.2723], avg loss: [0.2605], time: [109.6358ms]\n",
+      "Epoch: [ 10/ 10], step: [   11/  390], loss: [0.1482], avg loss: [0.2503], time: [105.4866ms]\n",
+      "Epoch: [ 10/ 10], step: [   12/  390], loss: [0.4558], avg loss: [0.2674], time: [112.1240ms]\n",
+      "Epoch: [ 10/ 10], step: [   13/  390], loss: [0.2686], avg loss: [0.2675], time: [105.1400ms]\n",
+      "Epoch: [ 10/ 10], step: [   14/  390], loss: [0.2011], avg loss: [0.2627], time: [105.2475ms]\n",
+      "Epoch: [ 10/ 10], step: [   15/  390], loss: [0.2906], avg loss: [0.2646], time: [112.1695ms]\n",
+      "Epoch: [ 10/ 10], step: [   16/  390], loss: [0.2876], avg loss: [0.2660], time: [108.7074ms]\n",
+      "Epoch: [ 10/ 10], step: [   17/  390], loss: [0.1365], avg loss: [0.2584], time: [106.4265ms]\n",
+      "Epoch: [ 10/ 10], step: [   18/  390], loss: [0.1849], avg loss: [0.2543], time: [105.9957ms]\n",
+      "Epoch: [ 10/ 10], step: [   19/  390], loss: [0.2352], avg loss: [0.2533], time: [109.8456ms]\n",
+      "Epoch: [ 10/ 10], step: [   20/  390], loss: [0.3400], avg loss: [0.2577], time: [110.5843ms]\n",
+      "Epoch: [ 10/ 10], step: [   21/  390], loss: [0.2153], avg loss: [0.2556], time: [106.0696ms]\n",
+      "Epoch: [ 10/ 10], step: [   22/  390], loss: [0.3523], avg loss: [0.2600], time: [109.1082ms]\n",
+      "Epoch: [ 10/ 10], step: [   23/  390], loss: [0.2171], avg loss: [0.2582], time: [105.2434ms]\n",
+      "Epoch: [ 10/ 10], step: [   24/  390], loss: [0.1697], avg loss: [0.2545], time: [108.7265ms]\n",
+      "Epoch: [ 10/ 10], step: [   25/  390], loss: [0.2121], avg loss: [0.2528], time: [110.8615ms]\n",
+      "Epoch: [ 10/ 10], step: [   26/  390], loss: [0.2590], avg loss: [0.2530], time: [106.7364ms]\n",
+      "Epoch: [ 10/ 10], step: [   27/  390], loss: [0.1709], avg loss: [0.2500], time: [105.6533ms]\n",
+      "Epoch: [ 10/ 10], step: [   28/  390], loss: [0.2462], avg loss: [0.2499], time: [107.2991ms]\n",
+      "Epoch: [ 10/ 10], step: [   29/  390], loss: [0.2153], avg loss: [0.2487], time: [107.5842ms]\n",
+      "Epoch: [ 10/ 10], step: [   30/  390], loss: [0.2079], avg loss: [0.2473], time: [112.8020ms]\n",
+      "Epoch: [ 10/ 10], step: [   31/  390], loss: [0.3354], avg loss: [0.2501], time: [105.5183ms]\n",
+      "Epoch: [ 10/ 10], step: [   32/  390], loss: [0.2214], avg loss: [0.2492], time: [107.1084ms]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: [ 10/ 10], step: [   33/  390], loss: [0.2019], avg loss: [0.2478], time: [108.4888ms]\n",
+      "Epoch: [ 10/ 10], step: [   34/  390], loss: [0.2363], avg loss: [0.2475], time: [109.5145ms]\n",
+      "Epoch: [ 10/ 10], step: [   35/  390], loss: [0.1242], avg loss: [0.2440], time: [105.1886ms]\n",
+      "Epoch: [ 10/ 10], step: [   36/  390], loss: [0.1880], avg loss: [0.2424], time: [104.9492ms]\n",
+      "Epoch: [ 10/ 10], step: [   37/  390], loss: [0.2874], avg loss: [0.2436], time: [111.3589ms]\n",
+      "Epoch: [ 10/ 10], step: [   38/  390], loss: [0.1517], avg loss: [0.2412], time: [107.8691ms]\n",
+      "Epoch: [ 10/ 10], step: [   39/  390], loss: [0.2969], avg loss: [0.2426], time: [107.7905ms]\n",
+      "Epoch: [ 10/ 10], step: [   40/  390], loss: [0.2387], avg loss: [0.2425], time: [108.2902ms]\n",
+      "Epoch: [ 10/ 10], step: [   41/  390], loss: [0.1753], avg loss: [0.2409], time: [109.0782ms]\n",
+      "Epoch: [ 10/ 10], step: [   42/  390], loss: [0.1604], avg loss: [0.2390], time: [111.8894ms]\n",
+      "Epoch: [ 10/ 10], step: [   43/  390], loss: [0.2058], avg loss: [0.2382], time: [111.7148ms]\n",
+      "Epoch: [ 10/ 10], step: [   44/  390], loss: [0.1899], avg loss: [0.2371], time: [106.8997ms]\n",
+      "Epoch: [ 10/ 10], step: [   45/  390], loss: [0.1511], avg loss: [0.2352], time: [106.4148ms]\n",
+      "Epoch: [ 10/ 10], step: [   46/  390], loss: [0.2173], avg loss: [0.2348], time: [108.7077ms]\n",
+      "Epoch: [ 10/ 10], step: [   47/  390], loss: [0.1632], avg loss: [0.2333], time: [110.7609ms]\n",
+      "Epoch: [ 10/ 10], step: [   48/  390], loss: [0.3122], avg loss: [0.2349], time: [108.4449ms]\n",
+      "Epoch: [ 10/ 10], step: [   49/  390], loss: [0.3052], avg loss: [0.2364], time: [106.8697ms]\n",
+      "Epoch: [ 10/ 10], step: [   50/  390], loss: [0.3136], avg loss: [0.2379], time: [108.3434ms]\n",
+      "Epoch: [ 10/ 10], step: [   51/  390], loss: [0.3212], avg loss: [0.2395], time: [108.5846ms]\n",
+      "Epoch: [ 10/ 10], step: [   52/  390], loss: [0.3128], avg loss: [0.2409], time: [106.3209ms]\n",
+      "Epoch: [ 10/ 10], step: [   53/  390], loss: [0.2322], avg loss: [0.2408], time: [105.8729ms]\n",
+      "Epoch: [ 10/ 10], step: [   54/  390], loss: [0.1590], avg loss: [0.2393], time: [106.7362ms]\n",
+      "Epoch: [ 10/ 10], step: [   55/  390], loss: [0.2994], avg loss: [0.2404], time: [105.9508ms]\n",
+      "Epoch: [ 10/ 10], step: [   56/  390], loss: [0.1690], avg loss: [0.2391], time: [108.0444ms]\n",
+      "Epoch: [ 10/ 10], step: [   57/  390], loss: [0.2279], avg loss: [0.2389], time: [106.2686ms]\n",
+      "Epoch: [ 10/ 10], step: [   58/  390], loss: [0.2540], avg loss: [0.2392], time: [105.5427ms]\n",
+      "Epoch: [ 10/ 10], step: [   59/  390], loss: [0.3558], avg loss: [0.2411], time: [105.2840ms]\n",
+      "Epoch: [ 10/ 10], step: [   60/  390], loss: [0.2341], avg loss: [0.2410], time: [105.8826ms]\n",
+      "Epoch: [ 10/ 10], step: [   61/  390], loss: [0.2298], avg loss: [0.2408], time: [106.4563ms]\n",
+      "Epoch: [ 10/ 10], step: [   62/  390], loss: [0.3778], avg loss: [0.2430], time: [111.9242ms]\n",
+      "Epoch: [ 10/ 10], step: [   63/  390], loss: [0.3423], avg loss: [0.2446], time: [111.2790ms]\n",
+      "Epoch: [ 10/ 10], step: [   64/  390], loss: [0.3083], avg loss: [0.2456], time: [112.4961ms]\n",
+      "Epoch: [ 10/ 10], step: [   65/  390], loss: [0.2735], avg loss: [0.2460], time: [106.2584ms]\n",
+      "Epoch: [ 10/ 10], step: [   66/  390], loss: [0.2864], avg loss: [0.2466], time: [111.2642ms]\n",
+      "Epoch: [ 10/ 10], step: [   67/  390], loss: [0.1541], avg loss: [0.2453], time: [107.1799ms]\n",
+      "Epoch: [ 10/ 10], step: [   68/  390], loss: [0.2601], avg loss: [0.2455], time: [110.7268ms]\n",
+      "Epoch: [ 10/ 10], step: [   69/  390], loss: [0.1962], avg loss: [0.2448], time: [109.5340ms]\n",
+      "Epoch: [ 10/ 10], step: [   70/  390], loss: [0.3097], avg loss: [0.2457], time: [109.6215ms]\n",
+      "Epoch: [ 10/ 10], step: [   71/  390], loss: [0.3367], avg loss: [0.2470], time: [109.2317ms]\n",
+      "Epoch: [ 10/ 10], step: [   72/  390], loss: [0.1662], avg loss: [0.2459], time: [112.4499ms]\n",
+      "Epoch: [ 10/ 10], step: [   73/  390], loss: [0.2811], avg loss: [0.2463], time: [110.3280ms]\n",
+      "Epoch: [ 10/ 10], step: [   74/  390], loss: [0.2674], avg loss: [0.2466], time: [112.1042ms]\n",
+      "Epoch: [ 10/ 10], step: [   75/  390], loss: [0.1980], avg loss: [0.2460], time: [108.3951ms]\n",
+      "Epoch: [ 10/ 10], step: [   76/  390], loss: [0.3432], avg loss: [0.2473], time: [109.1630ms]\n",
+      "Epoch: [ 10/ 10], step: [   77/  390], loss: [0.2632], avg loss: [0.2475], time: [108.6371ms]\n",
+      "Epoch: [ 10/ 10], step: [   78/  390], loss: [0.3397], avg loss: [0.2486], time: [107.8193ms]\n",
+      "Epoch: [ 10/ 10], step: [   79/  390], loss: [0.2095], avg loss: [0.2482], time: [105.7773ms]\n",
+      "Epoch: [ 10/ 10], step: [   80/  390], loss: [0.2881], avg loss: [0.2487], time: [105.9463ms]\n",
+      "Epoch: [ 10/ 10], step: [   81/  390], loss: [0.2335], avg loss: [0.2485], time: [106.2307ms]\n",
+      "Epoch: [ 10/ 10], step: [   82/  390], loss: [0.2270], avg loss: [0.2482], time: [111.5823ms]\n",
+      "Epoch: [ 10/ 10], step: [   83/  390], loss: [0.2386], avg loss: [0.2481], time: [110.8563ms]\n",
+      "Epoch: [ 10/ 10], step: [   84/  390], loss: [0.3727], avg loss: [0.2496], time: [109.3268ms]\n",
+      "Epoch: [ 10/ 10], step: [   85/  390], loss: [0.2267], avg loss: [0.2493], time: [106.5936ms]\n",
+      "Epoch: [ 10/ 10], step: [   86/  390], loss: [0.3805], avg loss: [0.2508], time: [108.8834ms]\n",
+      "Epoch: [ 10/ 10], step: [   87/  390], loss: [0.2122], avg loss: [0.2504], time: [105.8927ms]\n",
+      "Epoch: [ 10/ 10], step: [   88/  390], loss: [0.2837], avg loss: [0.2508], time: [105.3069ms]\n",
+      "Epoch: [ 10/ 10], step: [   89/  390], loss: [0.2378], avg loss: [0.2506], time: [108.7477ms]\n",
+      "Epoch: [ 10/ 10], step: [   90/  390], loss: [0.2685], avg loss: [0.2508], time: [108.6853ms]\n",
+      "Epoch: [ 10/ 10], step: [   91/  390], loss: [0.2153], avg loss: [0.2504], time: [104.4049ms]\n",
+      "Epoch: [ 10/ 10], step: [   92/  390], loss: [0.1842], avg loss: [0.2497], time: [107.8780ms]\n",
+      "Epoch: [ 10/ 10], step: [   93/  390], loss: [0.2125], avg loss: [0.2493], time: [106.1418ms]\n",
+      "Epoch: [ 10/ 10], step: [   94/  390], loss: [0.2021], avg loss: [0.2488], time: [111.5768ms]\n",
+      "Epoch: [ 10/ 10], step: [   95/  390], loss: [0.3926], avg loss: [0.2503], time: [109.9434ms]\n",
+      "Epoch: [ 10/ 10], step: [   96/  390], loss: [0.1395], avg loss: [0.2492], time: [111.1112ms]\n",
+      "Epoch: [ 10/ 10], step: [   97/  390], loss: [0.1523], avg loss: [0.2482], time: [110.6732ms]\n",
+      "Epoch: [ 10/ 10], step: [   98/  390], loss: [0.1938], avg loss: [0.2476], time: [107.2299ms]\n",
+      "Epoch: [ 10/ 10], step: [   99/  390], loss: [0.2635], avg loss: [0.2478], time: [110.0309ms]\n",
+      "Epoch: [ 10/ 10], step: [  100/  390], loss: [0.2527], avg loss: [0.2478], time: [107.9366ms]\n",
+      "Epoch: [ 10/ 10], step: [  101/  390], loss: [0.2829], avg loss: [0.2482], time: [110.1241ms]\n",
+      "Epoch: [ 10/ 10], step: [  102/  390], loss: [0.2359], avg loss: [0.2480], time: [112.0892ms]\n",
+      "Epoch: [ 10/ 10], step: [  103/  390], loss: [0.0937], avg loss: [0.2465], time: [106.8828ms]\n",
+      "Epoch: [ 10/ 10], step: [  104/  390], loss: [0.1952], avg loss: [0.2461], time: [111.2108ms]\n",
+      "Epoch: [ 10/ 10], step: [  105/  390], loss: [0.2002], avg loss: [0.2456], time: [105.9570ms]\n",
+      "Epoch: [ 10/ 10], step: [  106/  390], loss: [0.3605], avg loss: [0.2467], time: [108.7041ms]\n",
+      "Epoch: [ 10/ 10], step: [  107/  390], loss: [0.3041], avg loss: [0.2472], time: [106.9710ms]\n",
+      "Epoch: [ 10/ 10], step: [  108/  390], loss: [0.2202], avg loss: [0.2470], time: [110.6596ms]\n",
+      "Epoch: [ 10/ 10], step: [  109/  390], loss: [0.2284], avg loss: [0.2468], time: [105.1683ms]\n",
+      "Epoch: [ 10/ 10], step: [  110/  390], loss: [0.2802], avg loss: [0.2471], time: [110.3268ms]\n",
+      "Epoch: [ 10/ 10], step: [  111/  390], loss: [0.2795], avg loss: [0.2474], time: [106.1668ms]\n",
+      "Epoch: [ 10/ 10], step: [  112/  390], loss: [0.2365], avg loss: [0.2473], time: [106.5586ms]\n",
+      "Epoch: [ 10/ 10], step: [  113/  390], loss: [0.3807], avg loss: [0.2485], time: [110.3783ms]\n",
+      "Epoch: [ 10/ 10], step: [  114/  390], loss: [0.2560], avg loss: [0.2486], time: [109.5302ms]\n",
+      "Epoch: [ 10/ 10], step: [  115/  390], loss: [0.2673], avg loss: [0.2487], time: [106.9703ms]\n",
+      "Epoch: [ 10/ 10], step: [  116/  390], loss: [0.3012], avg loss: [0.2492], time: [105.8586ms]\n",
+      "Epoch: [ 10/ 10], step: [  117/  390], loss: [0.2159], avg loss: [0.2489], time: [110.1904ms]\n",
+      "Epoch: [ 10/ 10], step: [  118/  390], loss: [0.1535], avg loss: [0.2481], time: [107.0156ms]\n",
+      "Epoch: [ 10/ 10], step: [  119/  390], loss: [0.2864], avg loss: [0.2484], time: [105.7677ms]\n",
+      "Epoch: [ 10/ 10], step: [  120/  390], loss: [0.2596], avg loss: [0.2485], time: [110.0290ms]\n",
+      "Epoch: [ 10/ 10], step: [  121/  390], loss: [0.3524], avg loss: [0.2494], time: [107.3010ms]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: [ 10/ 10], step: [  122/  390], loss: [0.3619], avg loss: [0.2503], time: [106.2956ms]\n",
+      "Epoch: [ 10/ 10], step: [  123/  390], loss: [0.2152], avg loss: [0.2500], time: [106.0219ms]\n",
+      "Epoch: [ 10/ 10], step: [  124/  390], loss: [0.3646], avg loss: [0.2509], time: [107.1625ms]\n",
+      "Epoch: [ 10/ 10], step: [  125/  390], loss: [0.2300], avg loss: [0.2507], time: [107.6951ms]\n",
+      "Epoch: [ 10/ 10], step: [  126/  390], loss: [0.2405], avg loss: [0.2507], time: [112.6771ms]\n",
+      "Epoch: [ 10/ 10], step: [  127/  390], loss: [0.2607], avg loss: [0.2507], time: [109.9849ms]\n",
+      "Epoch: [ 10/ 10], step: [  128/  390], loss: [0.3845], avg loss: [0.2518], time: [107.1432ms]\n",
+      "Epoch: [ 10/ 10], step: [  129/  390], loss: [0.4600], avg loss: [0.2534], time: [110.2462ms]\n",
+      "Epoch: [ 10/ 10], step: [  130/  390], loss: [0.3505], avg loss: [0.2542], time: [110.1310ms]\n",
+      "Epoch: [ 10/ 10], step: [  131/  390], loss: [0.1911], avg loss: [0.2537], time: [108.8314ms]\n",
+      "Epoch: [ 10/ 10], step: [  132/  390], loss: [0.1612], avg loss: [0.2530], time: [109.6549ms]\n",
+      "Epoch: [ 10/ 10], step: [  133/  390], loss: [0.3517], avg loss: [0.2537], time: [107.7724ms]\n",
+      "Epoch: [ 10/ 10], step: [  134/  390], loss: [0.2793], avg loss: [0.2539], time: [105.5598ms]\n",
+      "Epoch: [ 10/ 10], step: [  135/  390], loss: [0.1697], avg loss: [0.2533], time: [110.5382ms]\n",
+      "Epoch: [ 10/ 10], step: [  136/  390], loss: [0.1566], avg loss: [0.2526], time: [109.5769ms]\n",
+      "Epoch: [ 10/ 10], step: [  137/  390], loss: [0.3282], avg loss: [0.2531], time: [106.1785ms]\n",
+      "Epoch: [ 10/ 10], step: [  138/  390], loss: [0.3097], avg loss: [0.2535], time: [109.4475ms]\n",
+      "Epoch: [ 10/ 10], step: [  139/  390], loss: [0.2631], avg loss: [0.2536], time: [104.8338ms]\n",
+      "Epoch: [ 10/ 10], step: [  140/  390], loss: [0.3907], avg loss: [0.2546], time: [108.2377ms]\n",
+      "Epoch: [ 10/ 10], step: [  141/  390], loss: [0.3358], avg loss: [0.2552], time: [109.1771ms]\n",
+      "Epoch: [ 10/ 10], step: [  142/  390], loss: [0.3061], avg loss: [0.2555], time: [112.1247ms]\n",
+      "Epoch: [ 10/ 10], step: [  143/  390], loss: [0.1727], avg loss: [0.2549], time: [107.6701ms]\n",
+      "Epoch: [ 10/ 10], step: [  144/  390], loss: [0.2522], avg loss: [0.2549], time: [110.6756ms]\n",
+      "Epoch: [ 10/ 10], step: [  145/  390], loss: [0.3008], avg loss: [0.2552], time: [107.2681ms]\n",
+      "Epoch: [ 10/ 10], step: [  146/  390], loss: [0.3309], avg loss: [0.2558], time: [108.1991ms]\n",
+      "Epoch: [ 10/ 10], step: [  147/  390], loss: [0.3308], avg loss: [0.2563], time: [106.8130ms]\n",
+      "Epoch: [ 10/ 10], step: [  148/  390], loss: [0.2165], avg loss: [0.2560], time: [109.6725ms]\n",
+      "Epoch: [ 10/ 10], step: [  149/  390], loss: [0.2901], avg loss: [0.2562], time: [109.9777ms]\n",
+      "Epoch: [ 10/ 10], step: [  150/  390], loss: [0.2647], avg loss: [0.2563], time: [105.8912ms]\n",
+      "Epoch: [ 10/ 10], step: [  151/  390], loss: [0.3280], avg loss: [0.2568], time: [106.5724ms]\n",
+      "Epoch: [ 10/ 10], step: [  152/  390], loss: [0.2017], avg loss: [0.2564], time: [105.4506ms]\n",
+      "Epoch: [ 10/ 10], step: [  153/  390], loss: [0.2675], avg loss: [0.2565], time: [106.7176ms]\n",
+      "Epoch: [ 10/ 10], step: [  154/  390], loss: [0.2361], avg loss: [0.2563], time: [107.7940ms]\n",
+      "Epoch: [ 10/ 10], step: [  155/  390], loss: [0.3119], avg loss: [0.2567], time: [110.4105ms]\n",
+      "Epoch: [ 10/ 10], step: [  156/  390], loss: [0.3522], avg loss: [0.2573], time: [107.7125ms]\n",
+      "Epoch: [ 10/ 10], step: [  157/  390], loss: [0.1649], avg loss: [0.2567], time: [107.8327ms]\n",
+      "Epoch: [ 10/ 10], step: [  158/  390], loss: [0.3038], avg loss: [0.2570], time: [111.7573ms]\n",
+      "Epoch: [ 10/ 10], step: [  159/  390], loss: [0.3336], avg loss: [0.2575], time: [108.8490ms]\n",
+      "Epoch: [ 10/ 10], step: [  160/  390], loss: [0.3087], avg loss: [0.2578], time: [109.7050ms]\n",
+      "Epoch: [ 10/ 10], step: [  161/  390], loss: [0.2617], avg loss: [0.2578], time: [106.5736ms]\n",
+      "Epoch: [ 10/ 10], step: [  162/  390], loss: [0.3109], avg loss: [0.2582], time: [109.8616ms]\n",
+      "Epoch: [ 10/ 10], step: [  163/  390], loss: [0.2865], avg loss: [0.2583], time: [109.3175ms]\n",
+      "Epoch: [ 10/ 10], step: [  164/  390], loss: [0.2566], avg loss: [0.2583], time: [110.2631ms]\n",
+      "Epoch: [ 10/ 10], step: [  165/  390], loss: [0.1423], avg loss: [0.2576], time: [107.8370ms]\n",
+      "Epoch: [ 10/ 10], step: [  166/  390], loss: [0.2079], avg loss: [0.2573], time: [110.5609ms]\n",
+      "Epoch: [ 10/ 10], step: [  167/  390], loss: [0.2017], avg loss: [0.2570], time: [107.8053ms]\n",
+      "Epoch: [ 10/ 10], step: [  168/  390], loss: [0.2564], avg loss: [0.2570], time: [104.9502ms]\n",
+      "Epoch: [ 10/ 10], step: [  169/  390], loss: [0.2955], avg loss: [0.2572], time: [105.9213ms]\n",
+      "Epoch: [ 10/ 10], step: [  170/  390], loss: [0.2940], avg loss: [0.2574], time: [106.9319ms]\n",
+      "Epoch: [ 10/ 10], step: [  171/  390], loss: [0.2015], avg loss: [0.2571], time: [111.7773ms]\n",
+      "Epoch: [ 10/ 10], step: [  172/  390], loss: [0.2100], avg loss: [0.2568], time: [108.8059ms]\n",
+      "Epoch: [ 10/ 10], step: [  173/  390], loss: [0.3030], avg loss: [0.2571], time: [107.4409ms]\n",
+      "Epoch: [ 10/ 10], step: [  174/  390], loss: [0.1818], avg loss: [0.2567], time: [107.5635ms]\n",
+      "Epoch: [ 10/ 10], step: [  175/  390], loss: [0.3993], avg loss: [0.2575], time: [111.4011ms]\n",
+      "Epoch: [ 10/ 10], step: [  176/  390], loss: [0.2567], avg loss: [0.2575], time: [105.8695ms]\n",
+      "Epoch: [ 10/ 10], step: [  177/  390], loss: [0.1747], avg loss: [0.2570], time: [111.1901ms]\n",
+      "Epoch: [ 10/ 10], step: [  178/  390], loss: [0.2136], avg loss: [0.2568], time: [106.9398ms]\n",
+      "Epoch: [ 10/ 10], step: [  179/  390], loss: [0.3743], avg loss: [0.2574], time: [105.7682ms]\n",
+      "Epoch: [ 10/ 10], step: [  180/  390], loss: [0.2902], avg loss: [0.2576], time: [111.6295ms]\n",
+      "Epoch: [ 10/ 10], step: [  181/  390], loss: [0.3440], avg loss: [0.2581], time: [105.8998ms]\n",
+      "Epoch: [ 10/ 10], step: [  182/  390], loss: [0.1998], avg loss: [0.2578], time: [109.8609ms]\n",
+      "Epoch: [ 10/ 10], step: [  183/  390], loss: [0.2522], avg loss: [0.2577], time: [105.9847ms]\n",
+      "Epoch: [ 10/ 10], step: [  184/  390], loss: [0.2341], avg loss: [0.2576], time: [106.9517ms]\n",
+      "Epoch: [ 10/ 10], step: [  185/  390], loss: [0.4920], avg loss: [0.2589], time: [109.2141ms]\n",
+      "Epoch: [ 10/ 10], step: [  186/  390], loss: [0.2786], avg loss: [0.2590], time: [108.4044ms]\n",
+      "Epoch: [ 10/ 10], step: [  187/  390], loss: [0.2460], avg loss: [0.2589], time: [107.6396ms]\n",
+      "Epoch: [ 10/ 10], step: [  188/  390], loss: [0.2580], avg loss: [0.2589], time: [107.5847ms]\n",
+      "Epoch: [ 10/ 10], step: [  189/  390], loss: [0.3025], avg loss: [0.2591], time: [103.6551ms]\n",
+      "Epoch: [ 10/ 10], step: [  190/  390], loss: [0.4090], avg loss: [0.2599], time: [111.1393ms]\n",
+      "Epoch: [ 10/ 10], step: [  191/  390], loss: [0.3260], avg loss: [0.2603], time: [107.5289ms]\n",
+      "Epoch: [ 10/ 10], step: [  192/  390], loss: [0.3987], avg loss: [0.2610], time: [108.2554ms]\n",
+      "Epoch: [ 10/ 10], step: [  193/  390], loss: [0.2152], avg loss: [0.2608], time: [108.1114ms]\n",
+      "Epoch: [ 10/ 10], step: [  194/  390], loss: [0.3381], avg loss: [0.2612], time: [108.8758ms]\n",
+      "Epoch: [ 10/ 10], step: [  195/  390], loss: [0.2607], avg loss: [0.2611], time: [108.9261ms]\n",
+      "Epoch: [ 10/ 10], step: [  196/  390], loss: [0.4350], avg loss: [0.2620], time: [111.8217ms]\n",
+      "Epoch: [ 10/ 10], step: [  197/  390], loss: [0.2947], avg loss: [0.2622], time: [104.9793ms]\n",
+      "Epoch: [ 10/ 10], step: [  198/  390], loss: [0.3505], avg loss: [0.2626], time: [105.8023ms]\n",
+      "Epoch: [ 10/ 10], step: [  199/  390], loss: [0.2185], avg loss: [0.2624], time: [107.3391ms]\n",
+      "Epoch: [ 10/ 10], step: [  200/  390], loss: [0.1370], avg loss: [0.2618], time: [110.1713ms]\n",
+      "Epoch: [ 10/ 10], step: [  201/  390], loss: [0.2361], avg loss: [0.2617], time: [110.3864ms]\n",
+      "Epoch: [ 10/ 10], step: [  202/  390], loss: [0.3012], avg loss: [0.2619], time: [106.5571ms]\n",
+      "Epoch: [ 10/ 10], step: [  203/  390], loss: [0.4109], avg loss: [0.2626], time: [106.7629ms]\n",
+      "Epoch: [ 10/ 10], step: [  204/  390], loss: [0.1969], avg loss: [0.2623], time: [106.7109ms]\n",
+      "Epoch: [ 10/ 10], step: [  205/  390], loss: [0.2397], avg loss: [0.2622], time: [109.5626ms]\n",
+      "Epoch: [ 10/ 10], step: [  206/  390], loss: [0.1650], avg loss: [0.2617], time: [106.1697ms]\n",
+      "Epoch: [ 10/ 10], step: [  207/  390], loss: [0.1021], avg loss: [0.2609], time: [109.8776ms]\n",
+      "Epoch: [ 10/ 10], step: [  208/  390], loss: [0.2504], avg loss: [0.2609], time: [107.1084ms]\n",
+      "Epoch: [ 10/ 10], step: [  209/  390], loss: [0.3086], avg loss: [0.2611], time: [108.1409ms]\n",
+      "Epoch: [ 10/ 10], step: [  210/  390], loss: [0.5832], avg loss: [0.2626], time: [110.9455ms]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: [ 10/ 10], step: [  211/  390], loss: [0.3747], avg loss: [0.2632], time: [110.2545ms]\n",
+      "Epoch: [ 10/ 10], step: [  212/  390], loss: [0.1915], avg loss: [0.2628], time: [109.9415ms]\n",
+      "Epoch: [ 10/ 10], step: [  213/  390], loss: [0.2435], avg loss: [0.2627], time: [104.7785ms]\n",
+      "Epoch: [ 10/ 10], step: [  214/  390], loss: [0.1964], avg loss: [0.2624], time: [107.4312ms]\n",
+      "Epoch: [ 10/ 10], step: [  215/  390], loss: [0.1412], avg loss: [0.2619], time: [104.5918ms]\n",
+      "Epoch: [ 10/ 10], step: [  216/  390], loss: [0.3663], avg loss: [0.2623], time: [108.3212ms]\n",
+      "Epoch: [ 10/ 10], step: [  217/  390], loss: [0.2127], avg loss: [0.2621], time: [108.3050ms]\n",
+      "Epoch: [ 10/ 10], step: [  218/  390], loss: [0.3638], avg loss: [0.2626], time: [110.9850ms]\n",
+      "Epoch: [ 10/ 10], step: [  219/  390], loss: [0.2969], avg loss: [0.2627], time: [105.5558ms]\n",
+      "Epoch: [ 10/ 10], step: [  220/  390], loss: [0.2878], avg loss: [0.2629], time: [108.3949ms]\n",
+      "Epoch: [ 10/ 10], step: [  221/  390], loss: [0.3518], avg loss: [0.2633], time: [110.9416ms]\n",
+      "Epoch: [ 10/ 10], step: [  222/  390], loss: [0.2342], avg loss: [0.2631], time: [109.9663ms]\n",
+      "Epoch: [ 10/ 10], step: [  223/  390], loss: [0.2159], avg loss: [0.2629], time: [105.5830ms]\n",
+      "Epoch: [ 10/ 10], step: [  224/  390], loss: [0.3619], avg loss: [0.2634], time: [110.3156ms]\n",
+      "Epoch: [ 10/ 10], step: [  225/  390], loss: [0.2785], avg loss: [0.2634], time: [104.5990ms]\n",
+      "Epoch: [ 10/ 10], step: [  226/  390], loss: [0.2721], avg loss: [0.2635], time: [109.7393ms]\n",
+      "Epoch: [ 10/ 10], step: [  227/  390], loss: [0.2554], avg loss: [0.2634], time: [111.9573ms]\n",
+      "Epoch: [ 10/ 10], step: [  228/  390], loss: [0.3147], avg loss: [0.2637], time: [107.5308ms]\n",
+      "Epoch: [ 10/ 10], step: [  229/  390], loss: [0.2355], avg loss: [0.2635], time: [107.7449ms]\n",
+      "Epoch: [ 10/ 10], step: [  230/  390], loss: [0.2799], avg loss: [0.2636], time: [107.4722ms]\n",
+      "Epoch: [ 10/ 10], step: [  231/  390], loss: [0.3037], avg loss: [0.2638], time: [106.1707ms]\n",
+      "Epoch: [ 10/ 10], step: [  232/  390], loss: [0.3153], avg loss: [0.2640], time: [107.8808ms]\n",
+      "Epoch: [ 10/ 10], step: [  233/  390], loss: [0.2251], avg loss: [0.2638], time: [107.4030ms]\n",
+      "Epoch: [ 10/ 10], step: [  234/  390], loss: [0.3054], avg loss: [0.2640], time: [107.9900ms]\n",
+      "Epoch: [ 10/ 10], step: [  235/  390], loss: [0.2202], avg loss: [0.2638], time: [106.9703ms]\n",
+      "Epoch: [ 10/ 10], step: [  236/  390], loss: [0.3073], avg loss: [0.2640], time: [109.4832ms]\n",
+      "Epoch: [ 10/ 10], step: [  237/  390], loss: [0.2066], avg loss: [0.2638], time: [110.3923ms]\n",
+      "Epoch: [ 10/ 10], step: [  238/  390], loss: [0.1443], avg loss: [0.2633], time: [108.6230ms]\n",
+      "Epoch: [ 10/ 10], step: [  239/  390], loss: [0.2317], avg loss: [0.2631], time: [108.2540ms]\n",
+      "Epoch: [ 10/ 10], step: [  240/  390], loss: [0.3590], avg loss: [0.2635], time: [111.8877ms]\n",
+      "Epoch: [ 10/ 10], step: [  241/  390], loss: [0.2146], avg loss: [0.2633], time: [106.5567ms]\n",
+      "Epoch: [ 10/ 10], step: [  242/  390], loss: [0.3797], avg loss: [0.2638], time: [106.3190ms]\n",
+      "Epoch: [ 10/ 10], step: [  243/  390], loss: [0.2756], avg loss: [0.2639], time: [107.1041ms]\n",
+      "Epoch: [ 10/ 10], step: [  244/  390], loss: [0.1608], avg loss: [0.2634], time: [107.5895ms]\n",
+      "Epoch: [ 10/ 10], step: [  245/  390], loss: [0.2442], avg loss: [0.2634], time: [108.3300ms]\n",
+      "Epoch: [ 10/ 10], step: [  246/  390], loss: [0.2288], avg loss: [0.2632], time: [110.5840ms]\n",
+      "Epoch: [ 10/ 10], step: [  247/  390], loss: [0.2711], avg loss: [0.2632], time: [104.5589ms]\n",
+      "Epoch: [ 10/ 10], step: [  248/  390], loss: [0.0924], avg loss: [0.2626], time: [107.1560ms]\n",
+      "Epoch: [ 10/ 10], step: [  249/  390], loss: [0.3406], avg loss: [0.2629], time: [106.8752ms]\n",
+      "Epoch: [ 10/ 10], step: [  250/  390], loss: [0.2317], avg loss: [0.2627], time: [106.2210ms]\n",
+      "Epoch: [ 10/ 10], step: [  251/  390], loss: [0.2523], avg loss: [0.2627], time: [107.2834ms]\n",
+      "Epoch: [ 10/ 10], step: [  252/  390], loss: [0.2392], avg loss: [0.2626], time: [106.4541ms]\n",
+      "Epoch: [ 10/ 10], step: [  253/  390], loss: [0.2634], avg loss: [0.2626], time: [106.4119ms]\n",
+      "Epoch: [ 10/ 10], step: [  254/  390], loss: [0.3347], avg loss: [0.2629], time: [106.7481ms]\n",
+      "Epoch: [ 10/ 10], step: [  255/  390], loss: [0.2345], avg loss: [0.2628], time: [110.1220ms]\n",
+      "Epoch: [ 10/ 10], step: [  256/  390], loss: [0.3497], avg loss: [0.2631], time: [106.4658ms]\n",
+      "Epoch: [ 10/ 10], step: [  257/  390], loss: [0.2975], avg loss: [0.2633], time: [106.3850ms]\n",
+      "Epoch: [ 10/ 10], step: [  258/  390], loss: [0.2213], avg loss: [0.2631], time: [108.4590ms]\n",
+      "Epoch: [ 10/ 10], step: [  259/  390], loss: [0.2213], avg loss: [0.2629], time: [109.8859ms]\n",
+      "Epoch: [ 10/ 10], step: [  260/  390], loss: [0.3164], avg loss: [0.2631], time: [112.5925ms]\n",
+      "Epoch: [ 10/ 10], step: [  261/  390], loss: [0.2560], avg loss: [0.2631], time: [110.0361ms]\n",
+      "Epoch: [ 10/ 10], step: [  262/  390], loss: [0.1884], avg loss: [0.2628], time: [109.5445ms]\n",
+      "Epoch: [ 10/ 10], step: [  263/  390], loss: [0.3105], avg loss: [0.2630], time: [107.9669ms]\n",
+      "Epoch: [ 10/ 10], step: [  264/  390], loss: [0.2927], avg loss: [0.2631], time: [107.5475ms]\n",
+      "Epoch: [ 10/ 10], step: [  265/  390], loss: [0.2530], avg loss: [0.2631], time: [109.3583ms]\n",
+      "Epoch: [ 10/ 10], step: [  266/  390], loss: [0.3810], avg loss: [0.2635], time: [111.5427ms]\n",
+      "Epoch: [ 10/ 10], step: [  267/  390], loss: [0.2432], avg loss: [0.2635], time: [108.4878ms]\n",
+      "Epoch: [ 10/ 10], step: [  268/  390], loss: [0.3442], avg loss: [0.2638], time: [110.7740ms]\n",
+      "Epoch: [ 10/ 10], step: [  269/  390], loss: [0.2244], avg loss: [0.2636], time: [107.2319ms]\n",
+      "Epoch: [ 10/ 10], step: [  270/  390], loss: [0.3054], avg loss: [0.2638], time: [109.4458ms]\n",
+      "Epoch: [ 10/ 10], step: [  271/  390], loss: [0.2844], avg loss: [0.2638], time: [110.3549ms]\n",
+      "Epoch: [ 10/ 10], step: [  272/  390], loss: [0.3220], avg loss: [0.2641], time: [106.1864ms]\n",
+      "Epoch: [ 10/ 10], step: [  273/  390], loss: [0.2778], avg loss: [0.2641], time: [111.6083ms]\n",
+      "Epoch: [ 10/ 10], step: [  274/  390], loss: [0.2705], avg loss: [0.2641], time: [107.1002ms]\n",
+      "Epoch: [ 10/ 10], step: [  275/  390], loss: [0.1720], avg loss: [0.2638], time: [105.9129ms]\n",
+      "Epoch: [ 10/ 10], step: [  276/  390], loss: [0.1866], avg loss: [0.2635], time: [110.8553ms]\n",
+      "Epoch: [ 10/ 10], step: [  277/  390], loss: [0.3264], avg loss: [0.2637], time: [111.2182ms]\n",
+      "Epoch: [ 10/ 10], step: [  278/  390], loss: [0.3074], avg loss: [0.2639], time: [109.3917ms]\n",
+      "Epoch: [ 10/ 10], step: [  279/  390], loss: [0.1466], avg loss: [0.2635], time: [105.2890ms]\n",
+      "Epoch: [ 10/ 10], step: [  280/  390], loss: [0.1658], avg loss: [0.2631], time: [112.2177ms]\n",
+      "Epoch: [ 10/ 10], step: [  281/  390], loss: [0.2875], avg loss: [0.2632], time: [108.7754ms]\n",
+      "Epoch: [ 10/ 10], step: [  282/  390], loss: [0.2496], avg loss: [0.2632], time: [106.4527ms]\n",
+      "Epoch: [ 10/ 10], step: [  283/  390], loss: [0.2294], avg loss: [0.2630], time: [113.1241ms]\n",
+      "Epoch: [ 10/ 10], step: [  284/  390], loss: [0.2058], avg loss: [0.2628], time: [112.6666ms]\n",
+      "Epoch: [ 10/ 10], step: [  285/  390], loss: [0.2605], avg loss: [0.2628], time: [107.1360ms]\n",
+      "Epoch: [ 10/ 10], step: [  286/  390], loss: [0.3054], avg loss: [0.2630], time: [109.3924ms]\n",
+      "Epoch: [ 10/ 10], step: [  287/  390], loss: [0.2496], avg loss: [0.2629], time: [105.4373ms]\n",
+      "Epoch: [ 10/ 10], step: [  288/  390], loss: [0.1728], avg loss: [0.2626], time: [106.0085ms]\n",
+      "Epoch: [ 10/ 10], step: [  289/  390], loss: [0.3792], avg loss: [0.2630], time: [111.7752ms]\n",
+      "Epoch: [ 10/ 10], step: [  290/  390], loss: [0.1727], avg loss: [0.2627], time: [106.4239ms]\n",
+      "Epoch: [ 10/ 10], step: [  291/  390], loss: [0.2272], avg loss: [0.2626], time: [105.7825ms]\n",
+      "Epoch: [ 10/ 10], step: [  292/  390], loss: [0.2899], avg loss: [0.2627], time: [110.7328ms]\n",
+      "Epoch: [ 10/ 10], step: [  293/  390], loss: [0.3781], avg loss: [0.2631], time: [111.2952ms]\n",
+      "Epoch: [ 10/ 10], step: [  294/  390], loss: [0.2894], avg loss: [0.2632], time: [107.3008ms]\n",
+      "Epoch: [ 10/ 10], step: [  295/  390], loss: [0.2592], avg loss: [0.2632], time: [105.6979ms]\n",
+      "Epoch: [ 10/ 10], step: [  296/  390], loss: [0.2395], avg loss: [0.2631], time: [111.0971ms]\n",
+      "Epoch: [ 10/ 10], step: [  297/  390], loss: [0.2941], avg loss: [0.2632], time: [106.8094ms]\n",
+      "Epoch: [ 10/ 10], step: [  298/  390], loss: [0.2771], avg loss: [0.2632], time: [106.8492ms]\n",
+      "Epoch: [ 10/ 10], step: [  299/  390], loss: [0.2782], avg loss: [0.2633], time: [106.0860ms]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: [ 10/ 10], step: [  300/  390], loss: [0.2784], avg loss: [0.2633], time: [106.0503ms]\n",
+      "Epoch: [ 10/ 10], step: [  301/  390], loss: [0.2806], avg loss: [0.2634], time: [105.9799ms]\n",
+      "Epoch: [ 10/ 10], step: [  302/  390], loss: [0.2436], avg loss: [0.2633], time: [106.9660ms]\n",
+      "Epoch: [ 10/ 10], step: [  303/  390], loss: [0.3769], avg loss: [0.2637], time: [109.2713ms]\n",
+      "Epoch: [ 10/ 10], step: [  304/  390], loss: [0.3425], avg loss: [0.2640], time: [108.8324ms]\n",
+      "Epoch: [ 10/ 10], step: [  305/  390], loss: [0.2269], avg loss: [0.2638], time: [107.3177ms]\n",
+      "Epoch: [ 10/ 10], step: [  306/  390], loss: [0.4220], avg loss: [0.2643], time: [109.8456ms]\n",
+      "Epoch: [ 10/ 10], step: [  307/  390], loss: [0.2467], avg loss: [0.2643], time: [105.4230ms]\n",
+      "Epoch: [ 10/ 10], step: [  308/  390], loss: [0.1316], avg loss: [0.2639], time: [110.8382ms]\n",
+      "Epoch: [ 10/ 10], step: [  309/  390], loss: [0.1762], avg loss: [0.2636], time: [109.5812ms]\n",
+      "Epoch: [ 10/ 10], step: [  310/  390], loss: [0.3126], avg loss: [0.2637], time: [108.0234ms]\n",
+      "Epoch: [ 10/ 10], step: [  311/  390], loss: [0.3991], avg loss: [0.2642], time: [111.2289ms]\n",
+      "Epoch: [ 10/ 10], step: [  312/  390], loss: [0.1567], avg loss: [0.2638], time: [107.8992ms]\n",
+      "Epoch: [ 10/ 10], step: [  313/  390], loss: [0.2893], avg loss: [0.2639], time: [106.9951ms]\n",
+      "Epoch: [ 10/ 10], step: [  314/  390], loss: [0.1417], avg loss: [0.2635], time: [106.4160ms]\n",
+      "Epoch: [ 10/ 10], step: [  315/  390], loss: [0.2252], avg loss: [0.2634], time: [105.4993ms]\n",
+      "Epoch: [ 10/ 10], step: [  316/  390], loss: [0.2381], avg loss: [0.2633], time: [109.7620ms]\n",
+      "Epoch: [ 10/ 10], step: [  317/  390], loss: [0.2423], avg loss: [0.2633], time: [111.3331ms]\n",
+      "Epoch: [ 10/ 10], step: [  318/  390], loss: [0.2374], avg loss: [0.2632], time: [108.0234ms]\n",
+      "Epoch: [ 10/ 10], step: [  319/  390], loss: [0.2307], avg loss: [0.2631], time: [108.1231ms]\n",
+      "Epoch: [ 10/ 10], step: [  320/  390], loss: [0.0773], avg loss: [0.2625], time: [112.3674ms]\n",
+      "Epoch: [ 10/ 10], step: [  321/  390], loss: [0.2638], avg loss: [0.2625], time: [107.9214ms]\n",
+      "Epoch: [ 10/ 10], step: [  322/  390], loss: [0.2122], avg loss: [0.2623], time: [105.8342ms]\n",
+      "Epoch: [ 10/ 10], step: [  323/  390], loss: [0.3638], avg loss: [0.2626], time: [111.7072ms]\n",
+      "Epoch: [ 10/ 10], step: [  324/  390], loss: [0.2257], avg loss: [0.2625], time: [110.1723ms]\n",
+      "Epoch: [ 10/ 10], step: [  325/  390], loss: [0.1227], avg loss: [0.2621], time: [109.4925ms]\n",
+      "Epoch: [ 10/ 10], step: [  326/  390], loss: [0.2076], avg loss: [0.2619], time: [111.5489ms]\n",
+      "Epoch: [ 10/ 10], step: [  327/  390], loss: [0.3363], avg loss: [0.2622], time: [110.1079ms]\n",
+      "Epoch: [ 10/ 10], step: [  328/  390], loss: [0.2720], avg loss: [0.2622], time: [107.8780ms]\n",
+      "Epoch: [ 10/ 10], step: [  329/  390], loss: [0.3177], avg loss: [0.2624], time: [106.6554ms]\n",
+      "Epoch: [ 10/ 10], step: [  330/  390], loss: [0.3589], avg loss: [0.2627], time: [112.8848ms]\n",
+      "Epoch: [ 10/ 10], step: [  331/  390], loss: [0.2251], avg loss: [0.2625], time: [106.4649ms]\n",
+      "Epoch: [ 10/ 10], step: [  332/  390], loss: [0.2356], avg loss: [0.2625], time: [111.0713ms]\n",
+      "Epoch: [ 10/ 10], step: [  333/  390], loss: [0.2400], avg loss: [0.2624], time: [109.3411ms]\n",
+      "Epoch: [ 10/ 10], step: [  334/  390], loss: [0.2644], avg loss: [0.2624], time: [109.3733ms]\n",
+      "Epoch: [ 10/ 10], step: [  335/  390], loss: [0.1769], avg loss: [0.2621], time: [109.7476ms]\n",
+      "Epoch: [ 10/ 10], step: [  336/  390], loss: [0.2161], avg loss: [0.2620], time: [112.4244ms]\n",
+      "Epoch: [ 10/ 10], step: [  337/  390], loss: [0.2156], avg loss: [0.2619], time: [106.9608ms]\n",
+      "Epoch: [ 10/ 10], step: [  338/  390], loss: [0.1552], avg loss: [0.2616], time: [111.5198ms]\n",
+      "Epoch: [ 10/ 10], step: [  339/  390], loss: [0.3564], avg loss: [0.2618], time: [108.8660ms]\n",
+      "Epoch: [ 10/ 10], step: [  340/  390], loss: [0.3401], avg loss: [0.2621], time: [112.4554ms]\n",
+      "Epoch: [ 10/ 10], step: [  341/  390], loss: [0.2185], avg loss: [0.2619], time: [107.0263ms]\n",
+      "Epoch: [ 10/ 10], step: [  342/  390], loss: [0.1962], avg loss: [0.2617], time: [108.1316ms]\n",
+      "Epoch: [ 10/ 10], step: [  343/  390], loss: [0.2351], avg loss: [0.2617], time: [106.5624ms]\n",
+      "Epoch: [ 10/ 10], step: [  344/  390], loss: [0.2256], avg loss: [0.2616], time: [106.1256ms]\n",
+      "Epoch: [ 10/ 10], step: [  345/  390], loss: [0.3031], avg loss: [0.2617], time: [107.1990ms]\n",
+      "Epoch: [ 10/ 10], step: [  346/  390], loss: [0.3497], avg loss: [0.2619], time: [107.4386ms]\n",
+      "Epoch: [ 10/ 10], step: [  347/  390], loss: [0.3768], avg loss: [0.2623], time: [105.7918ms]\n",
+      "Epoch: [ 10/ 10], step: [  348/  390], loss: [0.2074], avg loss: [0.2621], time: [105.9022ms]\n",
+      "Epoch: [ 10/ 10], step: [  349/  390], loss: [0.1948], avg loss: [0.2619], time: [106.8838ms]\n",
+      "Epoch: [ 10/ 10], step: [  350/  390], loss: [0.2780], avg loss: [0.2620], time: [110.0733ms]\n",
+      "Epoch: [ 10/ 10], step: [  351/  390], loss: [0.2888], avg loss: [0.2620], time: [108.5923ms]\n",
+      "Epoch: [ 10/ 10], step: [  352/  390], loss: [0.2742], avg loss: [0.2621], time: [109.2544ms]\n",
+      "Epoch: [ 10/ 10], step: [  353/  390], loss: [0.3123], avg loss: [0.2622], time: [106.8628ms]\n",
+      "Epoch: [ 10/ 10], step: [  354/  390], loss: [0.3578], avg loss: [0.2625], time: [109.7829ms]\n",
+      "Epoch: [ 10/ 10], step: [  355/  390], loss: [0.1633], avg loss: [0.2622], time: [110.7647ms]\n",
+      "Epoch: [ 10/ 10], step: [  356/  390], loss: [0.2015], avg loss: [0.2620], time: [107.0683ms]\n",
+      "Epoch: [ 10/ 10], step: [  357/  390], loss: [0.2081], avg loss: [0.2619], time: [109.8542ms]\n",
+      "Epoch: [ 10/ 10], step: [  358/  390], loss: [0.2807], avg loss: [0.2619], time: [110.7974ms]\n",
+      "Epoch: [ 10/ 10], step: [  359/  390], loss: [0.2153], avg loss: [0.2618], time: [108.9861ms]\n",
+      "Epoch: [ 10/ 10], step: [  360/  390], loss: [0.3053], avg loss: [0.2619], time: [105.6628ms]\n",
+      "Epoch: [ 10/ 10], step: [  361/  390], loss: [0.3514], avg loss: [0.2622], time: [108.7244ms]\n",
+      "Epoch: [ 10/ 10], step: [  362/  390], loss: [0.2499], avg loss: [0.2621], time: [105.8607ms]\n",
+      "Epoch: [ 10/ 10], step: [  363/  390], loss: [0.2624], avg loss: [0.2621], time: [110.3516ms]\n",
+      "Epoch: [ 10/ 10], step: [  364/  390], loss: [0.2889], avg loss: [0.2622], time: [108.8624ms]\n",
+      "Epoch: [ 10/ 10], step: [  365/  390], loss: [0.2481], avg loss: [0.2622], time: [107.4762ms]\n",
+      "Epoch: [ 10/ 10], step: [  366/  390], loss: [0.2942], avg loss: [0.2623], time: [111.0370ms]\n",
+      "Epoch: [ 10/ 10], step: [  367/  390], loss: [0.3332], avg loss: [0.2625], time: [107.2378ms]\n",
+      "Epoch: [ 10/ 10], step: [  368/  390], loss: [0.3419], avg loss: [0.2627], time: [109.6377ms]\n",
+      "Epoch: [ 10/ 10], step: [  369/  390], loss: [0.1517], avg loss: [0.2624], time: [105.7010ms]\n",
+      "Epoch: [ 10/ 10], step: [  370/  390], loss: [0.2912], avg loss: [0.2625], time: [106.8776ms]\n",
+      "Epoch: [ 10/ 10], step: [  371/  390], loss: [0.2824], avg loss: [0.2625], time: [109.8986ms]\n",
+      "Epoch: [ 10/ 10], step: [  372/  390], loss: [0.2197], avg loss: [0.2624], time: [112.1969ms]\n",
+      "Epoch: [ 10/ 10], step: [  373/  390], loss: [0.4275], avg loss: [0.2628], time: [110.7540ms]\n",
+      "Epoch: [ 10/ 10], step: [  374/  390], loss: [0.3104], avg loss: [0.2630], time: [110.7924ms]\n",
+      "Epoch: [ 10/ 10], step: [  375/  390], loss: [0.1147], avg loss: [0.2626], time: [107.7850ms]\n",
+      "Epoch: [ 10/ 10], step: [  376/  390], loss: [0.2216], avg loss: [0.2625], time: [106.1397ms]\n",
+      "Epoch: [ 10/ 10], step: [  377/  390], loss: [0.2799], avg loss: [0.2625], time: [105.4490ms]\n",
+      "Epoch: [ 10/ 10], step: [  378/  390], loss: [0.2447], avg loss: [0.2625], time: [110.6384ms]\n",
+      "Epoch: [ 10/ 10], step: [  379/  390], loss: [0.2776], avg loss: [0.2625], time: [107.8119ms]\n",
+      "Epoch: [ 10/ 10], step: [  380/  390], loss: [0.3090], avg loss: [0.2626], time: [109.9536ms]\n",
+      "Epoch: [ 10/ 10], step: [  381/  390], loss: [0.2692], avg loss: [0.2626], time: [105.9964ms]\n",
+      "Epoch: [ 10/ 10], step: [  382/  390], loss: [0.3088], avg loss: [0.2628], time: [109.5507ms]\n",
+      "Epoch: [ 10/ 10], step: [  383/  390], loss: [0.2008], avg loss: [0.2626], time: [105.6182ms]\n",
+      "Epoch: [ 10/ 10], step: [  384/  390], loss: [0.1450], avg loss: [0.2623], time: [107.9791ms]\n",
+      "Epoch: [ 10/ 10], step: [  385/  390], loss: [0.2522], avg loss: [0.2623], time: [108.9914ms]\n",
+      "Epoch: [ 10/ 10], step: [  386/  390], loss: [0.2532], avg loss: [0.2622], time: [106.7832ms]\n",
+      "Epoch: [ 10/ 10], step: [  387/  390], loss: [0.3558], avg loss: [0.2625], time: [111.0179ms]\n",
+      "Epoch: [ 10/ 10], step: [  388/  390], loss: [0.2641], avg loss: [0.2625], time: [107.2817ms]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: [ 10/ 10], step: [  389/  390], loss: [0.2334], avg loss: [0.2624], time: [110.2269ms]\n",
+      "Epoch: [ 10/ 10], step: [  390/  390], loss: [0.1966], avg loss: [0.2622], time: [829.2229ms]\n",
+      "Epoch time: 43320.503, per step time: 111.078\n",
+      "Epoch time: 43320.815, per step time: 111.079, avg loss: 0.262\n",
+      "************************************************************\n",
+      "============== Training Success ==============\n"
+     ]
+    }
+   ],
+   "source": [
+    "from mindspore import Model\n",
+    "from mindspore.train.callback import CheckpointConfig, ModelCheckpoint, TimeMonitor, LossMonitor\n",
+    "from mindspore.nn import Accuracy\n",
+    "\n",
+    "\n",
+    "model = Model(network, loss, opt, {'acc': Accuracy()})\n",
+    "loss_cb = LossMonitor()\n",
+    "print(\"============== Starting Training ==============\")\n",
+    "config_ck = CheckpointConfig(save_checkpoint_steps=cfg.save_checkpoint_steps,\n",
+    "                             keep_checkpoint_max=cfg.keep_checkpoint_max)\n",
+    "ckpoint_cb = ModelCheckpoint(prefix=\"lstm\", directory=args.ckpt_path, config=config_ck)\n",
+    "time_cb = TimeMonitor(data_size=ds_train.get_dataset_size())\n",
+    "if args.device_target == \"CPU\":\n",
+    "    model.train(cfg.num_epochs, ds_train, callbacks=[time_cb, ckpoint_cb, loss_cb], dataset_sink_mode=False)\n",
+    "else:\n",
+    "    model.train(cfg.num_epochs, ds_train, callbacks=[time_cb, ckpoint_cb, loss_cb])\n",
+    "print(\"============== Training Success ==============\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 模型验证\n",
+    "\n",
+    "创建并加载验证数据集(`ds_eval`),加载由**训练**保存的CheckPoint文件,进行验证,查看模型质量,此步骤用时约30秒。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {
+    "scrolled": false
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "============== Starting Testing ==============\n",
+      "============== {'acc': 0.8495592948717948} ==============\n"
+     ]
+    }
+   ],
+   "source": [
+    "from mindspore.train.serialization import load_checkpoint, load_param_into_net\n",
+    "\n",
+    "\n",
+    "args.ckpt_path = f'./lstm-{cfg.num_epochs}_390.ckpt'\n",
+    "print(\"============== Starting Testing ==============\")\n",
+    "ds_eval = lstm_create_dataset(args.preprocess_path, cfg.batch_size, training=False)\n",
+    "param_dict = load_checkpoint(args.ckpt_path)\n",
+    "load_param_into_net(network, param_dict)\n",
+    "if args.device_target == \"CPU\":\n",
+    "    acc = model.eval(ds_eval, dataset_sink_mode=False)\n",
+    "else:\n",
+    "    acc = model.eval(ds_eval)\n",
+    "print(\"============== {} ==============\".format(acc))\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 训练结果评价\n",
+    "\n",
+    "根据以上一段代码的输出可以看到,在经历了10轮epoch之后,使用验证的数据集,对文本的情感分析正确率在85%左右,达到一个基本满意的结果。"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 总结\n",
+    "\n",
+    "以上便完成了MindSpore自然语言处理应用的体验,我们通过本次体验全面了解了如何使用MindSpore进行自然语言中处理情感分类问题,理解了如何通过定义和初始化基于LSTM的`SentimentNet`网络进行训练模型及验证正确率。"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/tutorials/source_zh_cn/advanced_use/nlp_application.md b/tutorials/source_zh_cn/advanced_use/nlp_application.md
index 5ff36c18fdc6af77e3ba34b5a785dd0970003cd5..69c99787f96254f7843e002ab55c5aa5d798617f 100644
--- a/tutorials/source_zh_cn/advanced_use/nlp_application.md
+++ b/tutorials/source_zh_cn/advanced_use/nlp_application.md
@@ -21,7 +21,8 @@
 
 <!-- /TOC -->
 
-<a href="https://gitee.com/mindspore/docs/blob/master/tutorials/source_zh_cn/advanced_use/nlp_application.md" target="_blank"><img src="../_static/logo_source.png"></a>
+<a href="https://gitee.com/mindspore/docs/blob/master/tutorials/source_zh_cn/advanced_use/nlp_application.md" target="_blank"><img src="../_static/logo_source.png"></a>&nbsp;&nbsp;
+<a href="https://gitee.com/mindspore/docs/blob/master/tutorials/notebook/nlp_application.ipynb" target="_blank"><img src="../_static/logo_notebook.png">
 
 ## 概述