# 自然语言处理应用

## 概述

情感分类是自然语言处理中文本分类问题的子集，属于自然语言处理最基础的应用。它是对带有感情色彩的主观性文本进行分析和推理的过程，即分析说话人的态度，是倾向正面还是反面。

> 通常情况下，我们会把情感类别分为正面、反面和中性三类。虽然“面无表情”的评论也有不少；不过，大部分时候会只采用正面和反面的案例进行训练，下面这个数据集就是很好的例子。

传统的文本主题分类问题的典型参考数据集为[20 Newsgroups](http://qwone.com/~jason/20Newsgroups/)，该数据集由20组新闻数据组成，包含约20000个新闻文档。
其主题列表中有些类别的数据比较相似，例如comp.sys.ibm.pc.hardware和comp.sys.mac.hardware都是和电脑系统硬件相关的题目，相似度比较高。而有些主题类别的数据相对来说就毫无关联，例如misc.forsale和soc.religion.christian。

就网络本身而言，文本主题分类的网络结构和情感分类的网络结构大致相似。在掌握了情感分类网络如何构造之后，很容易可以构造一个类似的网络，稍作调参即可用于文本主题分类任务。

但在业务上下文侧，文本主题分类是分析文本讨论的客观内容，而情感分类是要从文本中得到它是否支持某种观点的信息。比如，“《阿甘正传》真是好看极了，影片主题明确，节奏流畅。”这句话，在文本主题分类是要将其归为类别为“电影”主题，而情感分类则要挖掘出这一影评的态度是正面还是负面。

相对于传统的文本主题分类，情感分类较为简单，实用性也较强。常见的购物网站、电影网站都可以采集到相对高质量的数据集，也很容易给业务领域带来收益。例如，可以结合领域上下文，自动分析特定类型客户对当前产品的意见，可以分主题分用户类型对情感进行分析，以作针对性的处理，甚至基于此进一步推荐产品，提高转化率，带来更高的商业收益。

特殊领域中，某些非极性词也充分表达了用户的情感倾向，比如下载使用APP时，“卡死了”、“下载太慢了”就表达了用户的负面情感倾向；股票领域中，“看涨”、“牛市”表达的就是用户的正面情感倾向。所以，本质上，我们希望模型能够在垂直领域中，挖掘出一些特殊的表达，作为极性词给情感分类系统使用：

$垂直极性词 = 通用极性词 + 领域特有极性词$

按照处理文本的粒度不同，情感分析可分为词语级、短语级、句子级、段落级以及篇章级等几个研究层次。这里以“段落级”为例，输入为一个段落，输出为影评是正面还是负面的信息。

接下来，以IMDB影评情感分类为例来体验MindSpore在自然语言处理上的应用。

## 整体流程

1. 准备环节。
2. 加载数据集，进行数据处理。
3. 定义网络。
4. 定义优化器和损失函数。
5. 使用网络训练数据，生成模型。
6. 得到模型之后，使用验证数据集，查看模型精度情况。

## 准备环节

### 下载数据集

本次体验采用IMDB影评数据集作为实验数据。

1. 下载IMDB影评数据集,数据集下载地址：<http://ai.stanford.edu/~amaas/data/sentiment/>。

    以下是负面影评（Negative）和正面影评（Positive）的案例。

| Review  | Label  | 
|:---|:---:|
| "Quitting" may be as much about exiting a pre-ordained identity as about drug withdrawal. As a rural guy coming to Beijing, class and success must have struck this young artist face on as an appeal to separate from his roots and far surpass his peasant parents' acting success. Troubles arise, however, when the new man is too new, when it demands too big a departure from family, history, nature, and personal identity. The ensuing splits, and confusion between the imaginary and the real and the dissonance between the ordinary and the heroic are the stuff of a gut check on the one hand or a complete escape from self on the other.  |  Negative |  
| This movie is amazing because the fact that the real people portray themselves and their real life experience and do such a good job it's like they're almost living the past over again. Jia Hongsheng plays himself an actor who quit everything except music and drugs struggling with depression and searching for the meaning of life while being angry at everyone especially the people who care for him most.  | Positive  |
    
    将下载好的数据集解压并放在当前工作目录下。


2. 下载GloVe文件
    下载并解压GloVe文件到当前工作目录下，修改解压后的目录名为`glove`，并在所有Glove文件开头处添加如下所示新的一行，意思是总共读取400000个单词，每个单词用300纬度的词向量表示。

    ```
    400000 300
    ```

    GloVe文件下载地址：<http://nlp.stanford.edu/data/glove.6B.zip>


3. 在当前工作目录创建名为`preprocess`的空目录，该目录将用于存储在数据集预处理操作中IMDB数据集转换为MindRecord格式后的文件。

    此时当前工作目录结构如下所示。
    
    ```shell
    $ tree -L 2 lstm
    lstm
    ├── aclImdb
    │   ├── imdbEr.txt
    │   ├── imdb.vocab
    │   ├── README
    │   ├── test
    │   └── train
    ├── glove
    │   ├── glove.6B.100d.txt
    │   ├── glove.6B.200d.txt
    │   ├── glove.6B.300d.txt
    │   └── glove.6B.50d.txt
    └── preprocess
    ```

### 确定评价标准

作为典型的分类问题，情感分类的评价标准可以比照普通的分类问题处理。常见的精度（Accuracy）、精准度（Precision）、召回率（Recall）和F_beta分数都可以作为参考。

$精度（Accuracy）= 分类正确的样本数目 / 总样本数目$

$精准度（Precision）= 真阳性样本数目 / 所有预测类别为阳性的样本数目$

$召回率（Recall）= 真阳性样本数目 / 所有真实类别为阳性的样本数目$ 

$F1分数 = (2 * Precision * Recall) / (Precision + Recall)$

在IMDB这个数据集中，正负样本数差别不大，可以简单地用精度（accuracy）作为分类器的衡量标准。

### 确定网络

我们使用基于LSTM构建的SentimentNet网络进行自然语言处理。

> LSTM（Long short-term memory，长短期记忆）网络是一种时间循环神经网络，适合于处理和预测时间序列中间隔和延迟非常长的重要事件。
> 本次体验面向GPU或CPU硬件平台。

### 配置运行信息

1. 使用`parser`模块传入运行必要的信息。
    
    - `preprocess`：是否预处理数据集，默认为否。
    - `aclimdb_path`：数据集存放路径。
    - `glove_path`：GloVe文件存放路径。
    - `preprocess_path`：预处理数据集的结果文件夹。
    - `ckpt_path`：CheckPoint文件路径。
    - `pre_trained`：预加载CheckPoint文件。
    - `device_target`：指定GPU或CPU环境。

In [1]:
import argparse


parser = argparse.ArgumentParser(description='MindSpore LSTM Example')
parser.add_argument('--preprocess', type=str, default='false', choices=['true', 'false'],
                    help='whether to preprocess data.')
parser.add_argument('--aclimdb_path', type=str, default="./aclImdb",
                    help='path where the dataset is stored.')
parser.add_argument('--glove_path', type=str, default="./glove",
                    help='path where the GloVe is stored.')
parser.add_argument('--preprocess_path', type=str, default="./preprocess",
                    help='path where the pre-process data is stored.')
parser.add_argument('--ckpt_path', type=str, default="./",
                    help='the path to save the checkpoint file.')
parser.add_argument('--pre_trained', type=str, default=None,
                    help='the pretrained checkpoint file path.')
parser.add_argument('--device_target', type=str, default="GPU", choices=['GPU', 'CPU'],
                    help='the target device to run, support "GPU", "CPU". Default: "GPU".')
args = parser.parse_args(['--device_target', 'GPU', '--preprocess', 'true'])

2. 进行训练前，需要配置必要的信息，包括环境信息、执行的模式、后端信息及硬件信息。 
    
> 详细的接口配置信息，请参见MindSpore官网`context.set_context`API接口说明。

In [2]:
from mindspore import context


context.set_context(
        mode=context.GRAPH_MODE,
        save_graphs=False,
        device_target=args.device_target)

### 配置SentimentNet网络参数

在以下一段代码中配置基于LSTM构建的SentimentNet网络所需相关参数。

In [3]:
from easydict import EasyDict as edict


# LSTM CONFIG
lstm_cfg = edict({
    'num_classes': 2,
    'learning_rate': 0.1,
    'momentum': 0.9,
    'num_epochs': 10,
    'batch_size': 64,
    'embed_size': 300,
    'num_hiddens': 100,
    'num_layers': 2,
    'bidirectional': True,
    'save_checkpoint_steps': 390,
    'keep_checkpoint_max': 10
})

cfg = lstm_cfg

# 数据处理

## 预处理数据集

1. 定义`ImdbParser`类解析文本数据集，包括编码、分词、对齐、处理GloVe原始数据，使之能够适应网络结构。

In [4]:
import os
from itertools import chain
import numpy as np
import gensim


class ImdbParser():
    """
    parse aclImdb data to features and labels.
    sentence->tokenized->encoded->padding->features
    """

    def __init__(self, imdb_path, glove_path, embed_size=300):
        self.__segs = ['train', 'test']
        self.__label_dic = {'pos': 1, 'neg': 0}
        self.__imdb_path = imdb_path
        self.__glove_dim = embed_size
        self.__glove_file = os.path.join(glove_path, 'glove.6B.' + str(self.__glove_dim) + 'd.txt')

        # properties
        self.__imdb_datas = {}
        self.__features = {}
        self.__labels = {}
        self.__vacab = {}
        self.__word2idx = {}
        self.__weight_np = {}
        self.__wvmodel = None

    def parse(self):
        """
        parse imdb data to memory
        """
        self.__wvmodel = gensim.models.KeyedVectors.load_word2vec_format(self.__glove_file)

        for seg in self.__segs:
            self.__parse_imdb_datas(seg)
            self.__parse_features_and_labels(seg)
            self.__gen_weight_np(seg)

    def __parse_imdb_datas(self, seg):
        """
        load data from txt
        """
        data_lists = []
        for label_name, label_id in self.__label_dic.items():
            sentence_dir = os.path.join(self.__imdb_path, seg, label_name)
            for file in os.listdir(sentence_dir):
                with open(os.path.join(sentence_dir, file), mode='r', encoding='utf8') as f:
                    sentence = f.read().replace('\n', '')
                    data_lists.append([sentence, label_id])
        self.__imdb_datas[seg] = data_lists

    def __parse_features_and_labels(self, seg):
        """
        parse features and labels
        """
        features = []
        labels = []
        for sentence, label in self.__imdb_datas[seg]:
            features.append(sentence)
            labels.append(label)

        self.__features[seg] = features
        self.__labels[seg] = labels

        # update feature to tokenized
        self.__updata_features_to_tokenized(seg)
        # parse vacab
        self.__parse_vacab(seg)
        # encode feature
        self.__encode_features(seg)
        # padding feature
        self.__padding_features(seg)

    def __updata_features_to_tokenized(self, seg):
        tokenized_features = []
        for sentence in self.__features[seg]:
            tokenized_sentence = [word.lower() for word in sentence.split(" ")]
            tokenized_features.append(tokenized_sentence)
        self.__features[seg] = tokenized_features

    def __parse_vacab(self, seg):
        # vocab
        tokenized_features = self.__features[seg]
        vocab = set(chain(*tokenized_features))
        self.__vacab[seg] = vocab

        # word_to_idx: {'hello': 1, 'world':111, ... '<unk>': 0}
        word_to_idx = {word: i + 1 for i, word in enumerate(vocab)}
        word_to_idx['<unk>'] = 0
        self.__word2idx[seg] = word_to_idx

    def __encode_features(self, seg):
        """ encode word to index """
        word_to_idx = self.__word2idx['train']
        encoded_features = []
        for tokenized_sentence in self.__features[seg]:
            encoded_sentence = []
            for word in tokenized_sentence:
                encoded_sentence.append(word_to_idx.get(word, 0))
            encoded_features.append(encoded_sentence)
        self.__features[seg] = encoded_features

    def __padding_features(self, seg, maxlen=500, pad=0):
        """ pad all features to the same length """
        padded_features = []
        for feature in self.__features[seg]:
            if len(feature) >= maxlen:
                padded_feature = feature[:maxlen]
            else:
                padded_feature = feature
                while len(padded_feature) < maxlen:
                    padded_feature.append(pad)
            padded_features.append(padded_feature)
        self.__features[seg] = padded_features

    def __gen_weight_np(self, seg):
        """
        generate weight by gensim
        """
        weight_np = np.zeros((len(self.__word2idx[seg]), self.__glove_dim), dtype=np.float32)
        for word, idx in self.__word2idx[seg].items():
            if word not in self.__wvmodel:
                continue
            word_vector = self.__wvmodel.get_vector(word)
            weight_np[idx, :] = word_vector

        self.__weight_np[seg] = weight_np

    def get_datas(self, seg):
        """
        return features, labels, and weight
        """
        features = np.array(self.__features[seg]).astype(np.int32)
        labels = np.array(self.__labels[seg]).astype(np.int32)
        weight = np.array(self.__weight_np[seg])
        return features, labels, weight

2. 定义`convert_to_mindrecord`函数将数据集格式转换为MindRecord格式，便于MindSpore读取。

    函数`_convert_to_mindrecord`中`weight.txt`为数据预处理后自动生成的weight参数信息文件。

In [5]:
import os
import numpy as np
from mindspore.mindrecord import FileWriter


def _convert_to_mindrecord(data_home, features, labels, weight_np=None, training=True):
    """
    convert imdb dataset to mindrecoed dataset
    """
    if weight_np is not None:
        np.savetxt(os.path.join(data_home, 'weight.txt'), weight_np)

    # write mindrecord
    schema_json = {"id": {"type": "int32"},
                   "label": {"type": "int32"},
                   "feature": {"type": "int32", "shape": [-1]}}

    data_dir = os.path.join(data_home, "aclImdb_train.mindrecord")
    if not training:
        data_dir = os.path.join(data_home, "aclImdb_test.mindrecord")

    def get_imdb_data(features, labels):
        data_list = []
        for i, (label, feature) in enumerate(zip(labels, features)):
            data_json = {"id": i,
                         "label": int(label),
                         "feature": feature.reshape(-1)}
            data_list.append(data_json)
        return data_list

    writer = FileWriter(data_dir, shard_num=4)
    data = get_imdb_data(features, labels)
    writer.add_schema(schema_json, "nlp_schema")
    writer.add_index(["id", "label"])
    writer.write_raw_data(data)
    writer.commit()


def convert_to_mindrecord(embed_size, aclimdb_path, preprocess_path, glove_path):
    """
    convert imdb dataset to mindrecoed dataset
    """
    parser = ImdbParser(aclimdb_path, glove_path, embed_size)
    parser.parse()

    if not os.path.exists(preprocess_path):
        print(f"preprocess path {preprocess_path} is not exist")
        os.makedirs(preprocess_path)

    train_features, train_labels, train_weight_np = parser.get_datas('train')
    _convert_to_mindrecord(preprocess_path, train_features, train_labels, train_weight_np)

    test_features, test_labels, _ = parser.get_datas('test')
    _convert_to_mindrecord(preprocess_path, test_features, test_labels, training=False)
    

3. 调用`convert_to_mindrecord`函数执行数据集预处理，此处用时约3分钟。

In [6]:
if args.preprocess == "true":
    print("============== Starting Data Pre-processing ==============")
    convert_to_mindrecord(cfg.embed_size, args.aclimdb_path, args.preprocess_path, args.glove_path)
    print("======================= Successful =======================")




&nbsp;&nbsp;&nbsp;&nbsp;转换成功后会在`preprocess`目录下生成MindRecord文件，通常该操作在数据集不变的情况下，无需每次训练都执行，此时`preprocess`文件目录如下所示：

```shell
 $ tree preprocess
 ├── aclImdb_test.mindrecord0
 ├── aclImdb_test.mindrecord0.db
 ├── aclImdb_test.mindrecord1
 ├── aclImdb_test.mindrecord1.db
 ├── aclImdb_test.mindrecord2
 ├── aclImdb_test.mindrecord2.db
 ├── aclImdb_test.mindrecord3
 ├── aclImdb_test.mindrecord3.db
 ├── aclImdb_train.mindrecord0
 ├── aclImdb_train.mindrecord0.db
 ├── aclImdb_train.mindrecord1
 ├── aclImdb_train.mindrecord1.db
 ├── aclImdb_train.mindrecord2
 ├── aclImdb_train.mindrecord2.db
 ├── aclImdb_train.mindrecord3
 ├── aclImdb_train.mindrecord3.db
 └── weight.txt
```

- 以上各文件中：
    - 名称包含`aclImdb_train.mindrecord`的为转换后的MindRecord格式的训练数据集。
    - 名称包含`aclImdb_test.mindrecord`的为转换后的MindRecord格式的测试数据集。
    - `weight.txt`为预处理后自动生成的weight参数信息文件。


4. 定义创建数据集函数`lstm_create_dataset`，创建训练集`ds_train`。

In [7]:
import os
import mindspore.dataset as ds


def lstm_create_dataset(data_home, batch_size, repeat_num=1, training=True):
    """Data operations."""
    ds.config.set_seed(1)
    data_dir = os.path.join(data_home, "aclImdb_train.mindrecord0")
    if not training:
        data_dir = os.path.join(data_home, "aclImdb_test.mindrecord0")

    data_set = ds.MindDataset(data_dir, columns_list=["feature", "label"], num_parallel_workers=4)

    # apply map operations on images
    data_set = data_set.shuffle(buffer_size=data_set.get_dataset_size())
    data_set = data_set.batch(batch_size=batch_size, drop_remainder=True)
    data_set = data_set.repeat(count=repeat_num)

    return data_set

ds_train = lstm_create_dataset(args.preprocess_path, cfg.batch_size)

5. 通过`create_dict_iterator`方法创建字典迭代器，读取已创建的数据集`ds_train`中的数据。

    运行以下一段代码，读取第1个`batch`中的`label`数据列表，和第1个`batch`中第1个元素的`feature`数据。

In [8]:
iterator = ds_train.create_dict_iterator().get_next()
first_batch_label = iterator["label"]
first_batch_first_feature = iterator["feature"][0]
print(f"The first batch contains label below:\n{first_batch_label}\n")
print(f"The feature of the first item in the first batch is below vector:\n{first_batch_first_feature}")

The first batch contains label below:
[0 0 1 1 1 0 0 0 0 0 0 1 0 0 1 0 1 1 0 0 0 0 1 1 1 0 1 1 1 0 0 1 0 0 1 0 1
 0 0 0 0 1 0 0 1 1 1 0 0 0 1 1 1 1 0 1 0 0 1 1 0 1 1 0]

The feature of the first item in the first batch is below vector:
[210974 227370 167874 221440 205821 250308  57410 167874 157597 211314
 104140 154424 238018 167874 216357  23869 209921 187724 131973 144940
 177558 221440 205821 119691 149127 137330 212709 117415  61509  42345
 166849 155531 219231  64473 210974 103293 225985 181047  41304 210974
 132905  33755  96216   8987 210974 195260 117816  15665 241057   8987
  93501 155531 118935 110275 101659 181047 226216 133895 114115   6596
 189694 210974  56753   3426  29344 103100 131973  46391  25351  35080
  27231  69404 190304 212709 117415 157277 167874 210974 109102  92239
 101085 123273  64473 117415 176947  27231 168206 219146 167874 210974
 227370  18539 155531 219231  64473 210974 155781  93577 192315 157597
 213189  66091 216583 100381 158491 181047  15368 2214

## 定义网络

1. 导入初始化网络所需模块。

In [9]:
import numpy as np
from mindspore import Tensor, nn, context
from mindspore.ops import operations as P
from mindspore.train.serialization import load_param_into_net, load_checkpoint

2. 定义`lstm_default_state`函数来初始化网络参数及网络状态。

In [10]:
# Initialize short-term memory (h) and long-term memory (c) to 0
def lstm_default_state(batch_size, hidden_size, num_layers, bidirectional):
    """init default input."""
    num_directions = 1
    if bidirectional:
        num_directions = 2

    if context.get_context("device_target") == "CPU":
        h_list = []
        c_list = []
        i = 0
        while i < num_layers:
            hi = Tensor(np.zeros((num_directions, batch_size, hidden_size)).astype(np.float32))
            h_list.append(hi)
            ci = Tensor(np.zeros((num_directions, batch_size, hidden_size)).astype(np.float32))
            c_list.append(ci)
            i = i + 1
        h = tuple(h_list)
        c = tuple(c_list)
        return h, c

    h = Tensor(
        np.zeros((num_layers * num_directions, batch_size, hidden_size)).astype(np.float32))
    c = Tensor(
        np.zeros((num_layers * num_directions, batch_size, hidden_size)).astype(np.float32))
    return h, c

3. 使用`Cell`方法，定义网络结构（`SentimentNet`网络）。

In [11]:
class SentimentNet(nn.Cell):
    """Sentiment network structure."""

    def __init__(self,
                 vocab_size,
                 embed_size,
                 num_hiddens,
                 num_layers,
                 bidirectional,
                 num_classes,
                 weight,
                 batch_size):
        super(SentimentNet, self).__init__()
        # Mapp words to vectors
        self.embedding = nn.Embedding(vocab_size,
                                      embed_size,
                                      embedding_table=weight)
        self.embedding.embedding_table.requires_grad = False
        self.trans = P.Transpose()
        self.perm = (1, 0, 2)
        self.encoder = nn.LSTM(input_size=embed_size,
                               hidden_size=num_hiddens,
                               num_layers=num_layers,
                               has_bias=True,
                               bidirectional=bidirectional,
                               dropout=0.0)

        self.h, self.c = lstm_default_state(batch_size, num_hiddens, num_layers, bidirectional)

        self.concat = P.Concat(1)
        if bidirectional:
            self.decoder = nn.Dense(num_hiddens * 4, num_classes)
        else:
            self.decoder = nn.Dense(num_hiddens * 2, num_classes)

    def construct(self, inputs):
        # input：(64,500,300)
        embeddings = self.embedding(inputs)
        embeddings = self.trans(embeddings, self.perm)
        output, _ = self.encoder(embeddings, (self.h, self.c))
        # states[i] size(64,200)  -> encoding.size(64,400)
        encoding = self.concat((output[0], output[499]))
        outputs = self.decoder(encoding)
        return outputs

4. 实例化`SentimentNet`，创建网络，此步骤用时约1分钟。

In [12]:
embedding_table = np.loadtxt(os.path.join(args.preprocess_path, "weight.txt")).astype(np.float32)
network = SentimentNet(vocab_size=embedding_table.shape[0],
                       embed_size=cfg.embed_size,
                       num_hiddens=cfg.num_hiddens,
                       num_layers=cfg.num_layers,
                       bidirectional=cfg.bidirectional,
                       num_classes=cfg.num_classes,
                       weight=Tensor(embedding_table),
                       batch_size=cfg.batch_size)

## 定义优化器及损失函数

运行以下一段代码，创建优化器和损失函数模型。

In [13]:
from mindspore import nn


loss = nn.SoftmaxCrossEntropyWithLogits(is_grad=False, sparse=True)
opt = nn.Momentum(network.trainable_params(), cfg.learning_rate, cfg.momentum)

## 训练并保存模型

加载训练数据集（`ds_train`）并配置好`CheckPoint`生成信息，然后使用`model.train`接口，进行模型训练，此步骤用时约7分钟。根据输出可以看到loss值随着训练逐步降低，最后达到0.262左右。

In [14]:
from mindspore import Model
from mindspore.train.callback import CheckpointConfig, ModelCheckpoint, TimeMonitor, LossMonitor
from mindspore.nn import Accuracy


model = Model(network, loss, opt, {'acc': Accuracy()})
loss_cb = LossMonitor()
print("============== Starting Training ==============")
config_ck = CheckpointConfig(save_checkpoint_steps=cfg.save_checkpoint_steps,
                             keep_checkpoint_max=cfg.keep_checkpoint_max)
ckpoint_cb = ModelCheckpoint(prefix="lstm", directory=args.ckpt_path, config=config_ck)
time_cb = TimeMonitor(data_size=ds_train.get_dataset_size())
if args.device_target == "CPU":
    model.train(cfg.num_epochs, ds_train, callbacks=[time_cb, ckpoint_cb, loss_cb], dataset_sink_mode=False)
else:
    model.train(cfg.num_epochs, ds_train, callbacks=[time_cb, ckpoint_cb, loss_cb])
print("============== Training Success ==============")

Epoch: [  1/ 10], step: [    1/  390], loss: [0.6938], avg loss: [0.6938], time: [445.6811ms]
Epoch: [  1/ 10], step: [    2/  390], loss: [0.6922], avg loss: [0.6930], time: [106.1635ms]
Epoch: [  1/ 10], step: [    3/  390], loss: [0.6917], avg loss: [0.6926], time: [103.0388ms]
Epoch: [  1/ 10], step: [    4/  390], loss: [0.6952], avg loss: [0.6932], time: [102.2997ms]
Epoch: [  1/ 10], step: [    5/  390], loss: [0.6868], avg loss: [0.6920], time: [102.2105ms]
Epoch: [  1/ 10], step: [    6/  390], loss: [0.6982], avg loss: [0.6930], time: [67.6618ms]
Epoch: [  1/ 10], step: [    7/  390], loss: [0.6856], avg loss: [0.6919], time: [99.7233ms]
Epoch: [  1/ 10], step: [    8/  390], loss: [0.6819], avg loss: [0.6907], time: [102.4535ms]
Epoch: [  1/ 10], step: [    9/  390], loss: [0.7372], avg loss: [0.6959], time: [99.7229ms]
Epoch: [  1/ 10], step: [   10/  390], loss: [0.6948], avg loss: [0.6957], time: [101.9838ms]
Epoch: [  1/ 10], step: [   11/  390], loss: [0.6961], avg loss

Epoch: [  1/ 10], step: [   90/  390], loss: [0.6860], avg loss: [0.6949], time: [104.0313ms]
Epoch: [  1/ 10], step: [   91/  390], loss: [0.6900], avg loss: [0.6949], time: [98.9680ms]
Epoch: [  1/ 10], step: [   92/  390], loss: [0.6846], avg loss: [0.6947], time: [100.7631ms]
Epoch: [  1/ 10], step: [   93/  390], loss: [0.6833], avg loss: [0.6946], time: [99.0198ms]
Epoch: [  1/ 10], step: [   94/  390], loss: [0.6901], avg loss: [0.6946], time: [99.3226ms]
Epoch: [  1/ 10], step: [   95/  390], loss: [0.6831], avg loss: [0.6945], time: [97.3852ms]
Epoch: [  1/ 10], step: [   96/  390], loss: [0.7010], avg loss: [0.6945], time: [102.8271ms]
Epoch: [  1/ 10], step: [   97/  390], loss: [0.6925], avg loss: [0.6945], time: [96.1418ms]
Epoch: [  1/ 10], step: [   98/  390], loss: [0.6768], avg loss: [0.6943], time: [98.8572ms]
Epoch: [  1/ 10], step: [   99/  390], loss: [0.6848], avg loss: [0.6942], time: [96.3254ms]
Epoch: [  1/ 10], step: [  100/  390], loss: [0.6925], avg loss: [0

Epoch: [  1/ 10], step: [  179/  390], loss: [0.6838], avg loss: [0.6860], time: [97.2741ms]
Epoch: [  1/ 10], step: [  180/  390], loss: [0.7194], avg loss: [0.6862], time: [104.8265ms]
Epoch: [  1/ 10], step: [  181/  390], loss: [0.5811], avg loss: [0.6856], time: [96.4580ms]
Epoch: [  1/ 10], step: [  182/  390], loss: [0.7140], avg loss: [0.6858], time: [99.6931ms]
Epoch: [  1/ 10], step: [  183/  390], loss: [0.7558], avg loss: [0.6862], time: [100.7893ms]
Epoch: [  1/ 10], step: [  184/  390], loss: [0.6419], avg loss: [0.6859], time: [99.4534ms]
Epoch: [  1/ 10], step: [  185/  390], loss: [0.5970], avg loss: [0.6855], time: [98.1152ms]
Epoch: [  1/ 10], step: [  186/  390], loss: [0.7137], avg loss: [0.6856], time: [99.8573ms]
Epoch: [  1/ 10], step: [  187/  390], loss: [0.6258], avg loss: [0.6853], time: [99.4055ms]
Epoch: [  1/ 10], step: [  188/  390], loss: [0.6423], avg loss: [0.6851], time: [100.4550ms]
Epoch: [  1/ 10], step: [  189/  390], loss: [0.6785], avg loss: [0

Epoch: [  1/ 10], step: [  268/  390], loss: [0.6293], avg loss: [0.6774], time: [99.0884ms]
Epoch: [  1/ 10], step: [  269/  390], loss: [0.6679], avg loss: [0.6774], time: [98.5043ms]
Epoch: [  1/ 10], step: [  270/  390], loss: [0.6610], avg loss: [0.6773], time: [104.2540ms]
Epoch: [  1/ 10], step: [  271/  390], loss: [0.6144], avg loss: [0.6771], time: [98.5131ms]
Epoch: [  1/ 10], step: [  272/  390], loss: [0.6461], avg loss: [0.6770], time: [98.3980ms]
Epoch: [  1/ 10], step: [  273/  390], loss: [0.6446], avg loss: [0.6769], time: [97.9280ms]
Epoch: [  1/ 10], step: [  274/  390], loss: [0.7186], avg loss: [0.6770], time: [100.8170ms]
Epoch: [  1/ 10], step: [  275/  390], loss: [0.7003], avg loss: [0.6771], time: [100.1592ms]
Epoch: [  1/ 10], step: [  276/  390], loss: [0.6935], avg loss: [0.6772], time: [101.7003ms]
Epoch: [  1/ 10], step: [  277/  390], loss: [0.7605], avg loss: [0.6775], time: [102.7503ms]
Epoch: [  1/ 10], step: [  278/  390], loss: [0.6664], avg loss: 

Epoch: [  1/ 10], step: [  357/  390], loss: [0.6403], avg loss: [0.6711], time: [98.1786ms]
Epoch: [  1/ 10], step: [  358/  390], loss: [0.6679], avg loss: [0.6711], time: [103.6110ms]
Epoch: [  1/ 10], step: [  359/  390], loss: [0.6559], avg loss: [0.6711], time: [97.9443ms]
Epoch: [  1/ 10], step: [  360/  390], loss: [0.6298], avg loss: [0.6709], time: [103.2884ms]
Epoch: [  1/ 10], step: [  361/  390], loss: [0.6193], avg loss: [0.6708], time: [98.4299ms]
Epoch: [  1/ 10], step: [  362/  390], loss: [0.6649], avg loss: [0.6708], time: [102.1912ms]
Epoch: [  1/ 10], step: [  363/  390], loss: [0.6179], avg loss: [0.6706], time: [102.8066ms]
Epoch: [  1/ 10], step: [  364/  390], loss: [0.6771], avg loss: [0.6707], time: [102.7441ms]
Epoch: [  1/ 10], step: [  365/  390], loss: [0.6193], avg loss: [0.6705], time: [97.2888ms]
Epoch: [  1/ 10], step: [  366/  390], loss: [0.5615], avg loss: [0.6702], time: [100.9829ms]
Epoch: [  1/ 10], step: [  367/  390], loss: [0.6999], avg loss:

Epoch: [  2/ 10], step: [   54/  390], loss: [0.5748], avg loss: [0.6417], time: [99.1259ms]
Epoch: [  2/ 10], step: [   55/  390], loss: [0.5293], avg loss: [0.6397], time: [104.4934ms]
Epoch: [  2/ 10], step: [   56/  390], loss: [0.5660], avg loss: [0.6384], time: [100.8847ms]
Epoch: [  2/ 10], step: [   57/  390], loss: [0.5283], avg loss: [0.6364], time: [108.6614ms]
Epoch: [  2/ 10], step: [   58/  390], loss: [0.5347], avg loss: [0.6347], time: [101.3596ms]
Epoch: [  2/ 10], step: [   59/  390], loss: [0.5154], avg loss: [0.6327], time: [100.2188ms]
Epoch: [  2/ 10], step: [   60/  390], loss: [0.6732], avg loss: [0.6333], time: [99.3276ms]
Epoch: [  2/ 10], step: [   61/  390], loss: [0.5197], avg loss: [0.6315], time: [104.8396ms]
Epoch: [  2/ 10], step: [   62/  390], loss: [0.7254], avg loss: [0.6330], time: [100.4605ms]
Epoch: [  2/ 10], step: [   63/  390], loss: [0.9070], avg loss: [0.6373], time: [108.4003ms]
Epoch: [  2/ 10], step: [   64/  390], loss: [0.5558], avg los

Epoch: [  2/ 10], step: [  143/  390], loss: [0.5716], avg loss: [0.6500], time: [102.3705ms]
Epoch: [  2/ 10], step: [  144/  390], loss: [0.6271], avg loss: [0.6498], time: [98.9797ms]
Epoch: [  2/ 10], step: [  145/  390], loss: [0.5050], avg loss: [0.6488], time: [101.5060ms]
Epoch: [  2/ 10], step: [  146/  390], loss: [0.5590], avg loss: [0.6482], time: [98.9730ms]
Epoch: [  2/ 10], step: [  147/  390], loss: [0.6321], avg loss: [0.6481], time: [102.5190ms]
Epoch: [  2/ 10], step: [  148/  390], loss: [0.6130], avg loss: [0.6479], time: [100.9171ms]
Epoch: [  2/ 10], step: [  149/  390], loss: [0.5702], avg loss: [0.6473], time: [105.2413ms]
Epoch: [  2/ 10], step: [  150/  390], loss: [0.5732], avg loss: [0.6468], time: [99.3552ms]
Epoch: [  2/ 10], step: [  151/  390], loss: [0.5903], avg loss: [0.6465], time: [100.8067ms]
Epoch: [  2/ 10], step: [  152/  390], loss: [0.5511], avg loss: [0.6458], time: [100.9417ms]
Epoch: [  2/ 10], step: [  153/  390], loss: [0.6821], avg loss

Epoch: [  2/ 10], step: [  232/  390], loss: [0.5745], avg loss: [0.6300], time: [99.3531ms]
Epoch: [  2/ 10], step: [  233/  390], loss: [0.5614], avg loss: [0.6297], time: [107.9669ms]
Epoch: [  2/ 10], step: [  234/  390], loss: [0.5357], avg loss: [0.6293], time: [101.3925ms]
Epoch: [  2/ 10], step: [  235/  390], loss: [0.5186], avg loss: [0.6288], time: [102.6874ms]
Epoch: [  2/ 10], step: [  236/  390], loss: [0.6700], avg loss: [0.6290], time: [103.9753ms]
Epoch: [  2/ 10], step: [  237/  390], loss: [0.5584], avg loss: [0.6287], time: [101.3067ms]
Epoch: [  2/ 10], step: [  238/  390], loss: [0.5589], avg loss: [0.6284], time: [103.8983ms]
Epoch: [  2/ 10], step: [  239/  390], loss: [0.5363], avg loss: [0.6280], time: [105.3576ms]
Epoch: [  2/ 10], step: [  240/  390], loss: [0.5776], avg loss: [0.6278], time: [101.3453ms]
Epoch: [  2/ 10], step: [  241/  390], loss: [0.7283], avg loss: [0.6282], time: [104.6221ms]
Epoch: [  2/ 10], step: [  242/  390], loss: [0.5002], avg lo

Epoch: [  2/ 10], step: [  321/  390], loss: [0.3705], avg loss: [0.5948], time: [105.0887ms]
Epoch: [  2/ 10], step: [  322/  390], loss: [0.4149], avg loss: [0.5942], time: [101.4295ms]
Epoch: [  2/ 10], step: [  323/  390], loss: [0.4527], avg loss: [0.5938], time: [102.3750ms]
Epoch: [  2/ 10], step: [  324/  390], loss: [0.3693], avg loss: [0.5931], time: [100.5075ms]
Epoch: [  2/ 10], step: [  325/  390], loss: [0.4761], avg loss: [0.5927], time: [100.6401ms]
Epoch: [  2/ 10], step: [  326/  390], loss: [0.3317], avg loss: [0.5919], time: [102.6881ms]
Epoch: [  2/ 10], step: [  327/  390], loss: [0.5316], avg loss: [0.5917], time: [103.8661ms]
Epoch: [  2/ 10], step: [  328/  390], loss: [0.4163], avg loss: [0.5912], time: [99.2351ms]
Epoch: [  2/ 10], step: [  329/  390], loss: [0.3904], avg loss: [0.5906], time: [100.8196ms]
Epoch: [  2/ 10], step: [  330/  390], loss: [0.6191], avg loss: [0.5907], time: [99.3040ms]
Epoch: [  2/ 10], step: [  331/  390], loss: [0.3622], avg los

Epoch: [  3/ 10], step: [   18/  390], loss: [0.3993], avg loss: [0.4163], time: [104.6028ms]
Epoch: [  3/ 10], step: [   19/  390], loss: [0.4321], avg loss: [0.4171], time: [106.9975ms]
Epoch: [  3/ 10], step: [   20/  390], loss: [0.3459], avg loss: [0.4135], time: [101.6135ms]
Epoch: [  3/ 10], step: [   21/  390], loss: [0.3473], avg loss: [0.4104], time: [106.1561ms]
Epoch: [  3/ 10], step: [   22/  390], loss: [0.4423], avg loss: [0.4118], time: [102.2394ms]
Epoch: [  3/ 10], step: [   23/  390], loss: [0.5265], avg loss: [0.4168], time: [106.6220ms]
Epoch: [  3/ 10], step: [   24/  390], loss: [0.4170], avg loss: [0.4168], time: [105.6414ms]
Epoch: [  3/ 10], step: [   25/  390], loss: [0.4483], avg loss: [0.4181], time: [108.6771ms]
Epoch: [  3/ 10], step: [   26/  390], loss: [0.5304], avg loss: [0.4224], time: [107.1980ms]
Epoch: [  3/ 10], step: [   27/  390], loss: [0.4433], avg loss: [0.4232], time: [105.8927ms]
Epoch: [  3/ 10], step: [   28/  390], loss: [0.4486], avg l

Epoch: [  3/ 10], step: [  107/  390], loss: [0.4050], avg loss: [0.3944], time: [105.6411ms]
Epoch: [  3/ 10], step: [  108/  390], loss: [0.4224], avg loss: [0.3946], time: [105.2477ms]
Epoch: [  3/ 10], step: [  109/  390], loss: [0.3945], avg loss: [0.3946], time: [104.9845ms]
Epoch: [  3/ 10], step: [  110/  390], loss: [0.3166], avg loss: [0.3939], time: [102.7188ms]
Epoch: [  3/ 10], step: [  111/  390], loss: [0.4504], avg loss: [0.3944], time: [106.3836ms]
Epoch: [  3/ 10], step: [  112/  390], loss: [0.4167], avg loss: [0.3946], time: [105.3114ms]
Epoch: [  3/ 10], step: [  113/  390], loss: [0.4151], avg loss: [0.3948], time: [104.2454ms]
Epoch: [  3/ 10], step: [  114/  390], loss: [0.4592], avg loss: [0.3954], time: [101.6955ms]
Epoch: [  3/ 10], step: [  115/  390], loss: [0.4591], avg loss: [0.3959], time: [108.1009ms]
Epoch: [  3/ 10], step: [  116/  390], loss: [0.4377], avg loss: [0.3963], time: [102.1514ms]
Epoch: [  3/ 10], step: [  117/  390], loss: [0.3935], avg l

Epoch: [  3/ 10], step: [  196/  390], loss: [0.3696], avg loss: [0.4000], time: [104.5880ms]
Epoch: [  3/ 10], step: [  197/  390], loss: [0.3521], avg loss: [0.3997], time: [102.7956ms]
Epoch: [  3/ 10], step: [  198/  390], loss: [0.3601], avg loss: [0.3995], time: [104.1501ms]
Epoch: [  3/ 10], step: [  199/  390], loss: [0.4757], avg loss: [0.3999], time: [102.9751ms]
Epoch: [  3/ 10], step: [  200/  390], loss: [0.4163], avg loss: [0.4000], time: [103.6022ms]
Epoch: [  3/ 10], step: [  201/  390], loss: [0.3398], avg loss: [0.3997], time: [104.8150ms]
Epoch: [  3/ 10], step: [  202/  390], loss: [0.4203], avg loss: [0.3998], time: [103.7111ms]
Epoch: [  3/ 10], step: [  203/  390], loss: [0.3198], avg loss: [0.3994], time: [102.7951ms]
Epoch: [  3/ 10], step: [  204/  390], loss: [0.3190], avg loss: [0.3990], time: [103.6525ms]
Epoch: [  3/ 10], step: [  205/  390], loss: [0.3116], avg loss: [0.3986], time: [103.7445ms]
Epoch: [  3/ 10], step: [  206/  390], loss: [0.3934], avg l

Epoch: [  3/ 10], step: [  285/  390], loss: [0.4163], avg loss: [0.3982], time: [105.0472ms]
Epoch: [  3/ 10], step: [  286/  390], loss: [0.4400], avg loss: [0.3983], time: [101.8260ms]
Epoch: [  3/ 10], step: [  287/  390], loss: [0.5866], avg loss: [0.3990], time: [107.6553ms]
Epoch: [  3/ 10], step: [  288/  390], loss: [0.5641], avg loss: [0.3996], time: [104.2180ms]
Epoch: [  3/ 10], step: [  289/  390], loss: [0.4612], avg loss: [0.3998], time: [105.5670ms]
Epoch: [  3/ 10], step: [  290/  390], loss: [0.2980], avg loss: [0.3994], time: [101.8584ms]
Epoch: [  3/ 10], step: [  291/  390], loss: [0.4731], avg loss: [0.3997], time: [107.9223ms]
Epoch: [  3/ 10], step: [  292/  390], loss: [0.3319], avg loss: [0.3994], time: [102.9286ms]
Epoch: [  3/ 10], step: [  293/  390], loss: [0.2109], avg loss: [0.3988], time: [102.7219ms]
Epoch: [  3/ 10], step: [  294/  390], loss: [0.3556], avg loss: [0.3987], time: [106.2779ms]
Epoch: [  3/ 10], step: [  295/  390], loss: [0.5077], avg l

Epoch: [  3/ 10], step: [  374/  390], loss: [0.3921], avg loss: [0.4026], time: [107.5480ms]
Epoch: [  3/ 10], step: [  375/  390], loss: [0.4149], avg loss: [0.4026], time: [106.8847ms]
Epoch: [  3/ 10], step: [  376/  390], loss: [0.4907], avg loss: [0.4028], time: [104.5911ms]
Epoch: [  3/ 10], step: [  377/  390], loss: [0.3688], avg loss: [0.4027], time: [109.8232ms]
Epoch: [  3/ 10], step: [  378/  390], loss: [0.3472], avg loss: [0.4026], time: [101.5713ms]
Epoch: [  3/ 10], step: [  379/  390], loss: [0.4601], avg loss: [0.4028], time: [109.2470ms]
Epoch: [  3/ 10], step: [  380/  390], loss: [0.3989], avg loss: [0.4027], time: [105.3076ms]
Epoch: [  3/ 10], step: [  381/  390], loss: [0.4383], avg loss: [0.4028], time: [104.4583ms]
Epoch: [  3/ 10], step: [  382/  390], loss: [0.4026], avg loss: [0.4028], time: [105.3464ms]
Epoch: [  3/ 10], step: [  383/  390], loss: [0.4012], avg loss: [0.4028], time: [101.9688ms]
Epoch: [  3/ 10], step: [  384/  390], loss: [0.3780], avg l

Epoch: [  4/ 10], step: [   71/  390], loss: [0.3085], avg loss: [0.3732], time: [103.1454ms]
Epoch: [  4/ 10], step: [   72/  390], loss: [0.2767], avg loss: [0.3719], time: [102.9990ms]
Epoch: [  4/ 10], step: [   73/  390], loss: [0.3353], avg loss: [0.3714], time: [107.5771ms]
Epoch: [  4/ 10], step: [   74/  390], loss: [0.4800], avg loss: [0.3729], time: [104.0356ms]
Epoch: [  4/ 10], step: [   75/  390], loss: [0.2814], avg loss: [0.3716], time: [104.0728ms]
Epoch: [  4/ 10], step: [   76/  390], loss: [0.4233], avg loss: [0.3723], time: [104.9471ms]
Epoch: [  4/ 10], step: [   77/  390], loss: [0.2641], avg loss: [0.3709], time: [103.7886ms]
Epoch: [  4/ 10], step: [   78/  390], loss: [0.3865], avg loss: [0.3711], time: [107.5280ms]
Epoch: [  4/ 10], step: [   79/  390], loss: [0.2459], avg loss: [0.3695], time: [106.9174ms]
Epoch: [  4/ 10], step: [   80/  390], loss: [0.4205], avg loss: [0.3702], time: [104.5945ms]
Epoch: [  4/ 10], step: [   81/  390], loss: [0.4781], avg l

Epoch: [  4/ 10], step: [  160/  390], loss: [0.4387], avg loss: [0.3725], time: [105.3257ms]
Epoch: [  4/ 10], step: [  161/  390], loss: [0.3441], avg loss: [0.3724], time: [105.7281ms]
Epoch: [  4/ 10], step: [  162/  390], loss: [0.3684], avg loss: [0.3723], time: [105.7646ms]
Epoch: [  4/ 10], step: [  163/  390], loss: [0.3465], avg loss: [0.3722], time: [106.7050ms]
Epoch: [  4/ 10], step: [  164/  390], loss: [0.5299], avg loss: [0.3731], time: [105.3362ms]
Epoch: [  4/ 10], step: [  165/  390], loss: [0.5045], avg loss: [0.3739], time: [106.9767ms]
Epoch: [  4/ 10], step: [  166/  390], loss: [0.3958], avg loss: [0.3741], time: [106.9121ms]
Epoch: [  4/ 10], step: [  167/  390], loss: [0.3517], avg loss: [0.3739], time: [107.1458ms]
Epoch: [  4/ 10], step: [  168/  390], loss: [0.4668], avg loss: [0.3745], time: [107.9512ms]
Epoch: [  4/ 10], step: [  169/  390], loss: [0.2722], avg loss: [0.3739], time: [102.9236ms]
Epoch: [  4/ 10], step: [  170/  390], loss: [0.4252], avg l

Epoch: [  4/ 10], step: [  249/  390], loss: [0.2710], avg loss: [0.3683], time: [106.2400ms]
Epoch: [  4/ 10], step: [  250/  390], loss: [0.3260], avg loss: [0.3682], time: [103.6398ms]
Epoch: [  4/ 10], step: [  251/  390], loss: [0.3744], avg loss: [0.3682], time: [108.1443ms]
Epoch: [  4/ 10], step: [  252/  390], loss: [0.2942], avg loss: [0.3679], time: [103.0304ms]
Epoch: [  4/ 10], step: [  253/  390], loss: [0.4133], avg loss: [0.3681], time: [103.5023ms]
Epoch: [  4/ 10], step: [  254/  390], loss: [0.2983], avg loss: [0.3678], time: [109.1344ms]
Epoch: [  4/ 10], step: [  255/  390], loss: [0.4217], avg loss: [0.3680], time: [103.4021ms]
Epoch: [  4/ 10], step: [  256/  390], loss: [0.3493], avg loss: [0.3679], time: [105.2632ms]
Epoch: [  4/ 10], step: [  257/  390], loss: [0.2805], avg loss: [0.3676], time: [110.6668ms]
Epoch: [  4/ 10], step: [  258/  390], loss: [0.3151], avg loss: [0.3674], time: [108.0148ms]
Epoch: [  4/ 10], step: [  259/  390], loss: [0.3350], avg l

Epoch: [  4/ 10], step: [  338/  390], loss: [0.4460], avg loss: [0.3684], time: [105.4652ms]
Epoch: [  4/ 10], step: [  339/  390], loss: [0.3561], avg loss: [0.3683], time: [105.1250ms]
Epoch: [  4/ 10], step: [  340/  390], loss: [0.5193], avg loss: [0.3688], time: [104.8603ms]
Epoch: [  4/ 10], step: [  341/  390], loss: [0.4446], avg loss: [0.3690], time: [105.6721ms]
Epoch: [  4/ 10], step: [  342/  390], loss: [0.3434], avg loss: [0.3689], time: [105.7959ms]
Epoch: [  4/ 10], step: [  343/  390], loss: [0.3595], avg loss: [0.3689], time: [105.6604ms]
Epoch: [  4/ 10], step: [  344/  390], loss: [0.4241], avg loss: [0.3691], time: [107.3353ms]
Epoch: [  4/ 10], step: [  345/  390], loss: [0.2956], avg loss: [0.3689], time: [110.8987ms]
Epoch: [  4/ 10], step: [  346/  390], loss: [0.3377], avg loss: [0.3688], time: [107.3465ms]
Epoch: [  4/ 10], step: [  347/  390], loss: [0.3574], avg loss: [0.3687], time: [109.6387ms]
Epoch: [  4/ 10], step: [  348/  390], loss: [0.4708], avg l

Epoch: [  5/ 10], step: [   35/  390], loss: [0.3577], avg loss: [0.3242], time: [100.6477ms]
Epoch: [  5/ 10], step: [   36/  390], loss: [0.4371], avg loss: [0.3273], time: [100.9886ms]
Epoch: [  5/ 10], step: [   37/  390], loss: [0.4086], avg loss: [0.3295], time: [100.7073ms]
Epoch: [  5/ 10], step: [   38/  390], loss: [0.1705], avg loss: [0.3253], time: [101.3937ms]
Epoch: [  5/ 10], step: [   39/  390], loss: [0.3365], avg loss: [0.3256], time: [97.3103ms]
Epoch: [  5/ 10], step: [   40/  390], loss: [0.3910], avg loss: [0.3273], time: [100.9321ms]
Epoch: [  5/ 10], step: [   41/  390], loss: [0.3509], avg loss: [0.3278], time: [97.9929ms]
Epoch: [  5/ 10], step: [   42/  390], loss: [0.4014], avg loss: [0.3296], time: [98.8083ms]
Epoch: [  5/ 10], step: [   43/  390], loss: [0.2674], avg loss: [0.3281], time: [103.3001ms]
Epoch: [  5/ 10], step: [   44/  390], loss: [0.3730], avg loss: [0.3292], time: [99.5758ms]
Epoch: [  5/ 10], step: [   45/  390], loss: [0.2710], avg loss:

Epoch: [  5/ 10], step: [  124/  390], loss: [0.3165], avg loss: [0.3345], time: [98.5579ms]
Epoch: [  5/ 10], step: [  125/  390], loss: [0.2910], avg loss: [0.3341], time: [104.8245ms]
Epoch: [  5/ 10], step: [  126/  390], loss: [0.4151], avg loss: [0.3348], time: [100.1546ms]
Epoch: [  5/ 10], step: [  127/  390], loss: [0.3650], avg loss: [0.3350], time: [98.5594ms]
Epoch: [  5/ 10], step: [  128/  390], loss: [0.4466], avg loss: [0.3359], time: [98.1710ms]
Epoch: [  5/ 10], step: [  129/  390], loss: [0.3491], avg loss: [0.3360], time: [102.2282ms]
Epoch: [  5/ 10], step: [  130/  390], loss: [0.3943], avg loss: [0.3364], time: [102.3917ms]
Epoch: [  5/ 10], step: [  131/  390], loss: [0.3831], avg loss: [0.3368], time: [102.0710ms]
Epoch: [  5/ 10], step: [  132/  390], loss: [0.3353], avg loss: [0.3368], time: [99.2439ms]
Epoch: [  5/ 10], step: [  133/  390], loss: [0.3608], avg loss: [0.3370], time: [99.8654ms]
Epoch: [  5/ 10], step: [  134/  390], loss: [0.3089], avg loss: 

Epoch: [  5/ 10], step: [  213/  390], loss: [0.4016], avg loss: [0.3429], time: [103.5557ms]
Epoch: [  5/ 10], step: [  214/  390], loss: [0.2758], avg loss: [0.3426], time: [99.9570ms]
Epoch: [  5/ 10], step: [  215/  390], loss: [0.4611], avg loss: [0.3432], time: [102.7234ms]
Epoch: [  5/ 10], step: [  216/  390], loss: [0.3102], avg loss: [0.3430], time: [101.8171ms]
Epoch: [  5/ 10], step: [  217/  390], loss: [0.3919], avg loss: [0.3432], time: [104.2428ms]
Epoch: [  5/ 10], step: [  218/  390], loss: [0.3644], avg loss: [0.3433], time: [102.9439ms]
Epoch: [  5/ 10], step: [  219/  390], loss: [0.3343], avg loss: [0.3433], time: [101.6750ms]
Epoch: [  5/ 10], step: [  220/  390], loss: [0.3409], avg loss: [0.3433], time: [100.8224ms]
Epoch: [  5/ 10], step: [  221/  390], loss: [0.3408], avg loss: [0.3433], time: [100.2448ms]
Epoch: [  5/ 10], step: [  222/  390], loss: [0.3310], avg loss: [0.3432], time: [101.1682ms]
Epoch: [  5/ 10], step: [  223/  390], loss: [0.3425], avg lo

Epoch: [  5/ 10], step: [  302/  390], loss: [0.3439], avg loss: [0.3440], time: [102.3622ms]
Epoch: [  5/ 10], step: [  303/  390], loss: [0.4070], avg loss: [0.3443], time: [104.6326ms]
Epoch: [  5/ 10], step: [  304/  390], loss: [0.4360], avg loss: [0.3446], time: [100.9424ms]
Epoch: [  5/ 10], step: [  305/  390], loss: [0.4695], avg loss: [0.3450], time: [98.6810ms]
Epoch: [  5/ 10], step: [  306/  390], loss: [0.2571], avg loss: [0.3447], time: [101.9230ms]
Epoch: [  5/ 10], step: [  307/  390], loss: [0.2597], avg loss: [0.3444], time: [98.5708ms]
Epoch: [  5/ 10], step: [  308/  390], loss: [0.3709], avg loss: [0.3445], time: [98.8483ms]
Epoch: [  5/ 10], step: [  309/  390], loss: [0.2729], avg loss: [0.3443], time: [100.6372ms]
Epoch: [  5/ 10], step: [  310/  390], loss: [0.3060], avg loss: [0.3441], time: [100.9982ms]
Epoch: [  5/ 10], step: [  311/  390], loss: [0.2724], avg loss: [0.3439], time: [102.2642ms]
Epoch: [  5/ 10], step: [  312/  390], loss: [0.4042], avg loss

Epoch time: 40546.816, per step time: 103.966
Epoch time: 40547.118, per step time: 103.967, avg loss: 0.346
************************************************************
Epoch: [  6/ 10], step: [    1/  390], loss: [0.3137], avg loss: [0.3137], time: [102.8788ms]
Epoch: [  6/ 10], step: [    2/  390], loss: [0.3295], avg loss: [0.3216], time: [107.4462ms]
Epoch: [  6/ 10], step: [    3/  390], loss: [0.4285], avg loss: [0.3572], time: [107.7762ms]
Epoch: [  6/ 10], step: [    4/  390], loss: [0.2917], avg loss: [0.3409], time: [104.9762ms]
Epoch: [  6/ 10], step: [    5/  390], loss: [0.3357], avg loss: [0.3398], time: [104.1481ms]
Epoch: [  6/ 10], step: [    6/  390], loss: [0.3456], avg loss: [0.3408], time: [105.6588ms]
Epoch: [  6/ 10], step: [    7/  390], loss: [0.4375], avg loss: [0.3546], time: [105.3269ms]
Epoch: [  6/ 10], step: [    8/  390], loss: [0.3685], avg loss: [0.3563], time: [100.5785ms]
Epoch: [  6/ 10], step: [    9/  390], loss: [0.2734], avg loss: [0.3471], tim

Epoch: [  6/ 10], step: [   88/  390], loss: [0.3815], avg loss: [0.3327], time: [99.7779ms]
Epoch: [  6/ 10], step: [   89/  390], loss: [0.3205], avg loss: [0.3326], time: [102.2894ms]
Epoch: [  6/ 10], step: [   90/  390], loss: [0.1674], avg loss: [0.3308], time: [107.3177ms]
Epoch: [  6/ 10], step: [   91/  390], loss: [0.3302], avg loss: [0.3308], time: [104.4667ms]
Epoch: [  6/ 10], step: [   92/  390], loss: [0.3680], avg loss: [0.3312], time: [105.5598ms]
Epoch: [  6/ 10], step: [   93/  390], loss: [0.3370], avg loss: [0.3312], time: [103.6875ms]
Epoch: [  6/ 10], step: [   94/  390], loss: [0.3272], avg loss: [0.3312], time: [105.0935ms]
Epoch: [  6/ 10], step: [   95/  390], loss: [0.3728], avg loss: [0.3316], time: [108.2509ms]
Epoch: [  6/ 10], step: [   96/  390], loss: [0.2415], avg loss: [0.3307], time: [104.2969ms]
Epoch: [  6/ 10], step: [   97/  390], loss: [0.3413], avg loss: [0.3308], time: [106.3817ms]
Epoch: [  6/ 10], step: [   98/  390], loss: [0.2772], avg lo

Epoch: [  6/ 10], step: [  177/  390], loss: [0.3473], avg loss: [0.3276], time: [103.0400ms]
Epoch: [  6/ 10], step: [  178/  390], loss: [0.4617], avg loss: [0.3284], time: [102.4530ms]
Epoch: [  6/ 10], step: [  179/  390], loss: [0.2574], avg loss: [0.3280], time: [104.6448ms]
Epoch: [  6/ 10], step: [  180/  390], loss: [0.2926], avg loss: [0.3278], time: [102.3531ms]
Epoch: [  6/ 10], step: [  181/  390], loss: [0.2689], avg loss: [0.3274], time: [105.6643ms]
Epoch: [  6/ 10], step: [  182/  390], loss: [0.2425], avg loss: [0.3270], time: [105.7646ms]
Epoch: [  6/ 10], step: [  183/  390], loss: [0.4197], avg loss: [0.3275], time: [104.4226ms]
Epoch: [  6/ 10], step: [  184/  390], loss: [0.3622], avg loss: [0.3277], time: [102.5190ms]
Epoch: [  6/ 10], step: [  185/  390], loss: [0.3172], avg loss: [0.3276], time: [107.5490ms]
Epoch: [  6/ 10], step: [  186/  390], loss: [0.2831], avg loss: [0.3274], time: [100.2440ms]
Epoch: [  6/ 10], step: [  187/  390], loss: [0.4395], avg l

Epoch: [  6/ 10], step: [  266/  390], loss: [0.1851], avg loss: [0.3286], time: [106.4603ms]
Epoch: [  6/ 10], step: [  267/  390], loss: [0.3902], avg loss: [0.3288], time: [105.4015ms]
Epoch: [  6/ 10], step: [  268/  390], loss: [0.1962], avg loss: [0.3284], time: [102.4544ms]
Epoch: [  6/ 10], step: [  269/  390], loss: [0.2614], avg loss: [0.3281], time: [105.6340ms]
Epoch: [  6/ 10], step: [  270/  390], loss: [0.2919], avg loss: [0.3280], time: [103.2822ms]
Epoch: [  6/ 10], step: [  271/  390], loss: [0.4295], avg loss: [0.3283], time: [104.4779ms]
Epoch: [  6/ 10], step: [  272/  390], loss: [0.3681], avg loss: [0.3285], time: [107.9822ms]
Epoch: [  6/ 10], step: [  273/  390], loss: [0.2417], avg loss: [0.3282], time: [106.6778ms]
Epoch: [  6/ 10], step: [  274/  390], loss: [0.3749], avg loss: [0.3283], time: [107.0487ms]
Epoch: [  6/ 10], step: [  275/  390], loss: [0.3401], avg loss: [0.3284], time: [103.5895ms]
Epoch: [  6/ 10], step: [  276/  390], loss: [0.3363], avg l

Epoch: [  6/ 10], step: [  355/  390], loss: [0.4533], avg loss: [0.3298], time: [105.9098ms]
Epoch: [  6/ 10], step: [  356/  390], loss: [0.2419], avg loss: [0.3295], time: [103.5061ms]
Epoch: [  6/ 10], step: [  357/  390], loss: [0.2371], avg loss: [0.3293], time: [102.0308ms]
Epoch: [  6/ 10], step: [  358/  390], loss: [0.3193], avg loss: [0.3293], time: [102.3316ms]
Epoch: [  6/ 10], step: [  359/  390], loss: [0.4685], avg loss: [0.3296], time: [103.1935ms]
Epoch: [  6/ 10], step: [  360/  390], loss: [0.3362], avg loss: [0.3297], time: [103.2398ms]
Epoch: [  6/ 10], step: [  361/  390], loss: [0.4437], avg loss: [0.3300], time: [105.7558ms]
Epoch: [  6/ 10], step: [  362/  390], loss: [0.3613], avg loss: [0.3301], time: [100.1587ms]
Epoch: [  6/ 10], step: [  363/  390], loss: [0.4118], avg loss: [0.3303], time: [106.5342ms]
Epoch: [  6/ 10], step: [  364/  390], loss: [0.3095], avg loss: [0.3302], time: [103.8628ms]
Epoch: [  6/ 10], step: [  365/  390], loss: [0.2669], avg l

Epoch: [  7/ 10], step: [   52/  390], loss: [0.3062], avg loss: [0.3044], time: [106.5502ms]
Epoch: [  7/ 10], step: [   53/  390], loss: [0.3455], avg loss: [0.3052], time: [105.5696ms]
Epoch: [  7/ 10], step: [   54/  390], loss: [0.3581], avg loss: [0.3062], time: [104.5468ms]
Epoch: [  7/ 10], step: [   55/  390], loss: [0.2514], avg loss: [0.3052], time: [104.1136ms]
Epoch: [  7/ 10], step: [   56/  390], loss: [0.3478], avg loss: [0.3060], time: [109.1614ms]
Epoch: [  7/ 10], step: [   57/  390], loss: [0.2962], avg loss: [0.3058], time: [108.6161ms]
Epoch: [  7/ 10], step: [   58/  390], loss: [0.2631], avg loss: [0.3050], time: [104.1448ms]
Epoch: [  7/ 10], step: [   59/  390], loss: [0.2864], avg loss: [0.3047], time: [105.4285ms]
Epoch: [  7/ 10], step: [   60/  390], loss: [0.3093], avg loss: [0.3048], time: [104.7280ms]
Epoch: [  7/ 10], step: [   61/  390], loss: [0.2864], avg loss: [0.3045], time: [103.9274ms]
Epoch: [  7/ 10], step: [   62/  390], loss: [0.1889], avg l

Epoch: [  7/ 10], step: [  141/  390], loss: [0.4725], avg loss: [0.3093], time: [102.4973ms]
Epoch: [  7/ 10], step: [  142/  390], loss: [0.3928], avg loss: [0.3099], time: [105.4571ms]
Epoch: [  7/ 10], step: [  143/  390], loss: [0.3646], avg loss: [0.3102], time: [103.7819ms]
Epoch: [  7/ 10], step: [  144/  390], loss: [0.2601], avg loss: [0.3099], time: [107.0786ms]
Epoch: [  7/ 10], step: [  145/  390], loss: [0.4328], avg loss: [0.3107], time: [107.2645ms]
Epoch: [  7/ 10], step: [  146/  390], loss: [0.4251], avg loss: [0.3115], time: [104.9128ms]
Epoch: [  7/ 10], step: [  147/  390], loss: [0.2112], avg loss: [0.3108], time: [105.6156ms]
Epoch: [  7/ 10], step: [  148/  390], loss: [0.3383], avg loss: [0.3110], time: [102.7992ms]
Epoch: [  7/ 10], step: [  149/  390], loss: [0.3793], avg loss: [0.3115], time: [109.6804ms]
Epoch: [  7/ 10], step: [  150/  390], loss: [0.2300], avg loss: [0.3109], time: [105.2263ms]
Epoch: [  7/ 10], step: [  151/  390], loss: [0.3427], avg l

Epoch: [  7/ 10], step: [  230/  390], loss: [0.4031], avg loss: [0.3077], time: [104.8095ms]
Epoch: [  7/ 10], step: [  231/  390], loss: [0.2659], avg loss: [0.3075], time: [106.7090ms]
Epoch: [  7/ 10], step: [  232/  390], loss: [0.4359], avg loss: [0.3081], time: [105.9318ms]
Epoch: [  7/ 10], step: [  233/  390], loss: [0.2296], avg loss: [0.3078], time: [104.9607ms]
Epoch: [  7/ 10], step: [  234/  390], loss: [0.3760], avg loss: [0.3080], time: [104.5735ms]
Epoch: [  7/ 10], step: [  235/  390], loss: [0.1930], avg loss: [0.3076], time: [106.2450ms]
Epoch: [  7/ 10], step: [  236/  390], loss: [0.4012], avg loss: [0.3080], time: [105.0429ms]
Epoch: [  7/ 10], step: [  237/  390], loss: [0.1525], avg loss: [0.3073], time: [103.3261ms]
Epoch: [  7/ 10], step: [  238/  390], loss: [0.4822], avg loss: [0.3080], time: [105.6840ms]
Epoch: [  7/ 10], step: [  239/  390], loss: [0.2978], avg loss: [0.3080], time: [108.5942ms]
Epoch: [  7/ 10], step: [  240/  390], loss: [0.2879], avg l

Epoch: [  7/ 10], step: [  319/  390], loss: [0.2874], avg loss: [0.3080], time: [110.2710ms]
Epoch: [  7/ 10], step: [  320/  390], loss: [0.2773], avg loss: [0.3079], time: [105.9952ms]
Epoch: [  7/ 10], step: [  321/  390], loss: [0.3119], avg loss: [0.3079], time: [103.2131ms]
Epoch: [  7/ 10], step: [  322/  390], loss: [0.5180], avg loss: [0.3086], time: [105.6886ms]
Epoch: [  7/ 10], step: [  323/  390], loss: [0.2819], avg loss: [0.3085], time: [108.1693ms]
Epoch: [  7/ 10], step: [  324/  390], loss: [0.2582], avg loss: [0.3084], time: [105.5784ms]
Epoch: [  7/ 10], step: [  325/  390], loss: [0.3137], avg loss: [0.3084], time: [107.8506ms]
Epoch: [  7/ 10], step: [  326/  390], loss: [0.3719], avg loss: [0.3086], time: [105.4270ms]
Epoch: [  7/ 10], step: [  327/  390], loss: [0.2965], avg loss: [0.3085], time: [106.1039ms]
Epoch: [  7/ 10], step: [  328/  390], loss: [0.2923], avg loss: [0.3085], time: [104.4450ms]
Epoch: [  7/ 10], step: [  329/  390], loss: [0.2939], avg l

Epoch: [  8/ 10], step: [   16/  390], loss: [0.2551], avg loss: [0.2677], time: [107.0006ms]
Epoch: [  8/ 10], step: [   17/  390], loss: [0.3402], avg loss: [0.2719], time: [102.4706ms]
Epoch: [  8/ 10], step: [   18/  390], loss: [0.2975], avg loss: [0.2733], time: [106.4065ms]
Epoch: [  8/ 10], step: [   19/  390], loss: [0.2487], avg loss: [0.2720], time: [106.8141ms]
Epoch: [  8/ 10], step: [   20/  390], loss: [0.2542], avg loss: [0.2712], time: [108.3596ms]
Epoch: [  8/ 10], step: [   21/  390], loss: [0.2751], avg loss: [0.2713], time: [101.2235ms]
Epoch: [  8/ 10], step: [   22/  390], loss: [0.3212], avg loss: [0.2736], time: [107.4750ms]
Epoch: [  8/ 10], step: [   23/  390], loss: [0.2760], avg loss: [0.2737], time: [105.3512ms]
Epoch: [  8/ 10], step: [   24/  390], loss: [0.1505], avg loss: [0.2686], time: [101.8736ms]
Epoch: [  8/ 10], step: [   25/  390], loss: [0.2349], avg loss: [0.2672], time: [104.0020ms]
Epoch: [  8/ 10], step: [   26/  390], loss: [0.1072], avg l

Epoch: [  8/ 10], step: [  105/  390], loss: [0.3339], avg loss: [0.2830], time: [101.1257ms]
Epoch: [  8/ 10], step: [  106/  390], loss: [0.3085], avg loss: [0.2832], time: [104.0373ms]
Epoch: [  8/ 10], step: [  107/  390], loss: [0.3561], avg loss: [0.2839], time: [104.2287ms]
Epoch: [  8/ 10], step: [  108/  390], loss: [0.3255], avg loss: [0.2843], time: [104.0325ms]
Epoch: [  8/ 10], step: [  109/  390], loss: [0.3709], avg loss: [0.2851], time: [103.4937ms]
Epoch: [  8/ 10], step: [  110/  390], loss: [0.2567], avg loss: [0.2848], time: [101.7263ms]
Epoch: [  8/ 10], step: [  111/  390], loss: [0.2285], avg loss: [0.2843], time: [103.9937ms]
Epoch: [  8/ 10], step: [  112/  390], loss: [0.1699], avg loss: [0.2833], time: [105.4158ms]
Epoch: [  8/ 10], step: [  113/  390], loss: [0.2693], avg loss: [0.2832], time: [105.9487ms]
Epoch: [  8/ 10], step: [  114/  390], loss: [0.4444], avg loss: [0.2846], time: [104.3928ms]
Epoch: [  8/ 10], step: [  115/  390], loss: [0.2116], avg l

Epoch: [  8/ 10], step: [  194/  390], loss: [0.2501], avg loss: [0.2871], time: [105.4816ms]
Epoch: [  8/ 10], step: [  195/  390], loss: [0.1891], avg loss: [0.2866], time: [102.4518ms]
Epoch: [  8/ 10], step: [  196/  390], loss: [0.2274], avg loss: [0.2863], time: [101.9406ms]
Epoch: [  8/ 10], step: [  197/  390], loss: [0.3215], avg loss: [0.2865], time: [100.8925ms]
Epoch: [  8/ 10], step: [  198/  390], loss: [0.2382], avg loss: [0.2863], time: [106.7557ms]
Epoch: [  8/ 10], step: [  199/  390], loss: [0.3136], avg loss: [0.2864], time: [105.3262ms]
Epoch: [  8/ 10], step: [  200/  390], loss: [0.3687], avg loss: [0.2868], time: [102.4990ms]
Epoch: [  8/ 10], step: [  201/  390], loss: [0.1899], avg loss: [0.2863], time: [101.1612ms]
Epoch: [  8/ 10], step: [  202/  390], loss: [0.2513], avg loss: [0.2862], time: [101.4724ms]
Epoch: [  8/ 10], step: [  203/  390], loss: [0.2842], avg loss: [0.2861], time: [102.1821ms]
Epoch: [  8/ 10], step: [  204/  390], loss: [0.2917], avg l

Epoch: [  8/ 10], step: [  283/  390], loss: [0.3052], avg loss: [0.2904], time: [100.7648ms]
Epoch: [  8/ 10], step: [  284/  390], loss: [0.3046], avg loss: [0.2905], time: [107.3947ms]
Epoch: [  8/ 10], step: [  285/  390], loss: [0.3282], avg loss: [0.2906], time: [102.7765ms]
Epoch: [  8/ 10], step: [  286/  390], loss: [0.2687], avg loss: [0.2905], time: [108.1564ms]
Epoch: [  8/ 10], step: [  287/  390], loss: [0.2085], avg loss: [0.2903], time: [102.8466ms]
Epoch: [  8/ 10], step: [  288/  390], loss: [0.2500], avg loss: [0.2901], time: [107.4021ms]
Epoch: [  8/ 10], step: [  289/  390], loss: [0.2477], avg loss: [0.2900], time: [105.6416ms]
Epoch: [  8/ 10], step: [  290/  390], loss: [0.1799], avg loss: [0.2896], time: [103.6391ms]
Epoch: [  8/ 10], step: [  291/  390], loss: [0.3890], avg loss: [0.2899], time: [103.8487ms]
Epoch: [  8/ 10], step: [  292/  390], loss: [0.2363], avg loss: [0.2897], time: [101.7900ms]
Epoch: [  8/ 10], step: [  293/  390], loss: [0.3996], avg l

Epoch: [  8/ 10], step: [  372/  390], loss: [0.2237], avg loss: [0.2930], time: [104.1131ms]
Epoch: [  8/ 10], step: [  373/  390], loss: [0.1964], avg loss: [0.2928], time: [107.8188ms]
Epoch: [  8/ 10], step: [  374/  390], loss: [0.3240], avg loss: [0.2928], time: [102.8645ms]
Epoch: [  8/ 10], step: [  375/  390], loss: [0.4185], avg loss: [0.2932], time: [101.9661ms]
Epoch: [  8/ 10], step: [  376/  390], loss: [0.2762], avg loss: [0.2931], time: [102.5932ms]
Epoch: [  8/ 10], step: [  377/  390], loss: [0.2433], avg loss: [0.2930], time: [105.0153ms]
Epoch: [  8/ 10], step: [  378/  390], loss: [0.3024], avg loss: [0.2930], time: [103.4801ms]
Epoch: [  8/ 10], step: [  379/  390], loss: [0.3009], avg loss: [0.2930], time: [106.7383ms]
Epoch: [  8/ 10], step: [  380/  390], loss: [0.3313], avg loss: [0.2931], time: [103.3118ms]
Epoch: [  8/ 10], step: [  381/  390], loss: [0.2318], avg loss: [0.2930], time: [100.6205ms]
Epoch: [  8/ 10], step: [  382/  390], loss: [0.2963], avg l

Epoch: [  9/ 10], step: [   69/  390], loss: [0.4718], avg loss: [0.2856], time: [103.2302ms]
Epoch: [  9/ 10], step: [   70/  390], loss: [0.4030], avg loss: [0.2873], time: [100.6932ms]
Epoch: [  9/ 10], step: [   71/  390], loss: [0.3980], avg loss: [0.2888], time: [100.3315ms]
Epoch: [  9/ 10], step: [   72/  390], loss: [0.2488], avg loss: [0.2883], time: [102.1490ms]
Epoch: [  9/ 10], step: [   73/  390], loss: [0.1879], avg loss: [0.2869], time: [103.2121ms]
Epoch: [  9/ 10], step: [   74/  390], loss: [0.3052], avg loss: [0.2872], time: [105.9821ms]
Epoch: [  9/ 10], step: [   75/  390], loss: [0.1858], avg loss: [0.2858], time: [103.2846ms]
Epoch: [  9/ 10], step: [   76/  390], loss: [0.1737], avg loss: [0.2843], time: [102.9892ms]
Epoch: [  9/ 10], step: [   77/  390], loss: [0.3333], avg loss: [0.2850], time: [101.4016ms]
Epoch: [  9/ 10], step: [   78/  390], loss: [0.1959], avg loss: [0.2838], time: [103.2929ms]
Epoch: [  9/ 10], step: [   79/  390], loss: [0.2411], avg l

Epoch: [  9/ 10], step: [  158/  390], loss: [0.2664], avg loss: [0.2777], time: [101.3439ms]
Epoch: [  9/ 10], step: [  159/  390], loss: [0.4234], avg loss: [0.2787], time: [104.7285ms]
Epoch: [  9/ 10], step: [  160/  390], loss: [0.2787], avg loss: [0.2787], time: [101.1209ms]
Epoch: [  9/ 10], step: [  161/  390], loss: [0.3272], avg loss: [0.2790], time: [101.0838ms]
Epoch: [  9/ 10], step: [  162/  390], loss: [0.3409], avg loss: [0.2793], time: [101.6126ms]
Epoch: [  9/ 10], step: [  163/  390], loss: [0.3722], avg loss: [0.2799], time: [102.9675ms]
Epoch: [  9/ 10], step: [  164/  390], loss: [0.2464], avg loss: [0.2797], time: [103.0622ms]
Epoch: [  9/ 10], step: [  165/  390], loss: [0.1451], avg loss: [0.2789], time: [101.7148ms]
Epoch: [  9/ 10], step: [  166/  390], loss: [0.3036], avg loss: [0.2790], time: [102.2775ms]
Epoch: [  9/ 10], step: [  167/  390], loss: [0.2150], avg loss: [0.2787], time: [102.2885ms]
Epoch: [  9/ 10], step: [  168/  390], loss: [0.2903], avg l

Epoch: [  9/ 10], step: [  247/  390], loss: [0.3666], avg loss: [0.2803], time: [103.4901ms]
Epoch: [  9/ 10], step: [  248/  390], loss: [0.2445], avg loss: [0.2801], time: [101.3665ms]
Epoch: [  9/ 10], step: [  249/  390], loss: [0.2603], avg loss: [0.2801], time: [103.7555ms]
Epoch: [  9/ 10], step: [  250/  390], loss: [0.2571], avg loss: [0.2800], time: [99.9429ms]
Epoch: [  9/ 10], step: [  251/  390], loss: [0.4252], avg loss: [0.2805], time: [101.2111ms]
Epoch: [  9/ 10], step: [  252/  390], loss: [0.3173], avg loss: [0.2807], time: [100.2634ms]
Epoch: [  9/ 10], step: [  253/  390], loss: [0.2151], avg loss: [0.2804], time: [101.1360ms]
Epoch: [  9/ 10], step: [  254/  390], loss: [0.3287], avg loss: [0.2806], time: [103.8651ms]
Epoch: [  9/ 10], step: [  255/  390], loss: [0.2224], avg loss: [0.2804], time: [104.4121ms]
Epoch: [  9/ 10], step: [  256/  390], loss: [0.2287], avg loss: [0.2802], time: [102.6955ms]
Epoch: [  9/ 10], step: [  257/  390], loss: [0.2828], avg lo

Epoch: [  9/ 10], step: [  336/  390], loss: [0.2749], avg loss: [0.2802], time: [101.8798ms]
Epoch: [  9/ 10], step: [  337/  390], loss: [0.1938], avg loss: [0.2800], time: [100.8859ms]
Epoch: [  9/ 10], step: [  338/  390], loss: [0.2136], avg loss: [0.2798], time: [101.9592ms]
Epoch: [  9/ 10], step: [  339/  390], loss: [0.1703], avg loss: [0.2794], time: [98.4647ms]
Epoch: [  9/ 10], step: [  340/  390], loss: [0.1344], avg loss: [0.2790], time: [100.3985ms]
Epoch: [  9/ 10], step: [  341/  390], loss: [0.2446], avg loss: [0.2789], time: [100.2448ms]
Epoch: [  9/ 10], step: [  342/  390], loss: [0.2180], avg loss: [0.2787], time: [103.6801ms]
Epoch: [  9/ 10], step: [  343/  390], loss: [0.3273], avg loss: [0.2789], time: [101.4183ms]
Epoch: [  9/ 10], step: [  344/  390], loss: [0.3550], avg loss: [0.2791], time: [106.7882ms]
Epoch: [  9/ 10], step: [  345/  390], loss: [0.2465], avg loss: [0.2790], time: [104.7385ms]
Epoch: [  9/ 10], step: [  346/  390], loss: [0.2084], avg lo

Epoch: [ 10/ 10], step: [   33/  390], loss: [0.2019], avg loss: [0.2478], time: [108.4888ms]
Epoch: [ 10/ 10], step: [   34/  390], loss: [0.2363], avg loss: [0.2475], time: [109.5145ms]
Epoch: [ 10/ 10], step: [   35/  390], loss: [0.1242], avg loss: [0.2440], time: [105.1886ms]
Epoch: [ 10/ 10], step: [   36/  390], loss: [0.1880], avg loss: [0.2424], time: [104.9492ms]
Epoch: [ 10/ 10], step: [   37/  390], loss: [0.2874], avg loss: [0.2436], time: [111.3589ms]
Epoch: [ 10/ 10], step: [   38/  390], loss: [0.1517], avg loss: [0.2412], time: [107.8691ms]
Epoch: [ 10/ 10], step: [   39/  390], loss: [0.2969], avg loss: [0.2426], time: [107.7905ms]
Epoch: [ 10/ 10], step: [   40/  390], loss: [0.2387], avg loss: [0.2425], time: [108.2902ms]
Epoch: [ 10/ 10], step: [   41/  390], loss: [0.1753], avg loss: [0.2409], time: [109.0782ms]
Epoch: [ 10/ 10], step: [   42/  390], loss: [0.1604], avg loss: [0.2390], time: [111.8894ms]
Epoch: [ 10/ 10], step: [   43/  390], loss: [0.2058], avg l

Epoch: [ 10/ 10], step: [  122/  390], loss: [0.3619], avg loss: [0.2503], time: [106.2956ms]
Epoch: [ 10/ 10], step: [  123/  390], loss: [0.2152], avg loss: [0.2500], time: [106.0219ms]
Epoch: [ 10/ 10], step: [  124/  390], loss: [0.3646], avg loss: [0.2509], time: [107.1625ms]
Epoch: [ 10/ 10], step: [  125/  390], loss: [0.2300], avg loss: [0.2507], time: [107.6951ms]
Epoch: [ 10/ 10], step: [  126/  390], loss: [0.2405], avg loss: [0.2507], time: [112.6771ms]
Epoch: [ 10/ 10], step: [  127/  390], loss: [0.2607], avg loss: [0.2507], time: [109.9849ms]
Epoch: [ 10/ 10], step: [  128/  390], loss: [0.3845], avg loss: [0.2518], time: [107.1432ms]
Epoch: [ 10/ 10], step: [  129/  390], loss: [0.4600], avg loss: [0.2534], time: [110.2462ms]
Epoch: [ 10/ 10], step: [  130/  390], loss: [0.3505], avg loss: [0.2542], time: [110.1310ms]
Epoch: [ 10/ 10], step: [  131/  390], loss: [0.1911], avg loss: [0.2537], time: [108.8314ms]
Epoch: [ 10/ 10], step: [  132/  390], loss: [0.1612], avg l

Epoch: [ 10/ 10], step: [  211/  390], loss: [0.3747], avg loss: [0.2632], time: [110.2545ms]
Epoch: [ 10/ 10], step: [  212/  390], loss: [0.1915], avg loss: [0.2628], time: [109.9415ms]
Epoch: [ 10/ 10], step: [  213/  390], loss: [0.2435], avg loss: [0.2627], time: [104.7785ms]
Epoch: [ 10/ 10], step: [  214/  390], loss: [0.1964], avg loss: [0.2624], time: [107.4312ms]
Epoch: [ 10/ 10], step: [  215/  390], loss: [0.1412], avg loss: [0.2619], time: [104.5918ms]
Epoch: [ 10/ 10], step: [  216/  390], loss: [0.3663], avg loss: [0.2623], time: [108.3212ms]
Epoch: [ 10/ 10], step: [  217/  390], loss: [0.2127], avg loss: [0.2621], time: [108.3050ms]
Epoch: [ 10/ 10], step: [  218/  390], loss: [0.3638], avg loss: [0.2626], time: [110.9850ms]
Epoch: [ 10/ 10], step: [  219/  390], loss: [0.2969], avg loss: [0.2627], time: [105.5558ms]
Epoch: [ 10/ 10], step: [  220/  390], loss: [0.2878], avg loss: [0.2629], time: [108.3949ms]
Epoch: [ 10/ 10], step: [  221/  390], loss: [0.3518], avg l

Epoch: [ 10/ 10], step: [  300/  390], loss: [0.2784], avg loss: [0.2633], time: [106.0503ms]
Epoch: [ 10/ 10], step: [  301/  390], loss: [0.2806], avg loss: [0.2634], time: [105.9799ms]
Epoch: [ 10/ 10], step: [  302/  390], loss: [0.2436], avg loss: [0.2633], time: [106.9660ms]
Epoch: [ 10/ 10], step: [  303/  390], loss: [0.3769], avg loss: [0.2637], time: [109.2713ms]
Epoch: [ 10/ 10], step: [  304/  390], loss: [0.3425], avg loss: [0.2640], time: [108.8324ms]
Epoch: [ 10/ 10], step: [  305/  390], loss: [0.2269], avg loss: [0.2638], time: [107.3177ms]
Epoch: [ 10/ 10], step: [  306/  390], loss: [0.4220], avg loss: [0.2643], time: [109.8456ms]
Epoch: [ 10/ 10], step: [  307/  390], loss: [0.2467], avg loss: [0.2643], time: [105.4230ms]
Epoch: [ 10/ 10], step: [  308/  390], loss: [0.1316], avg loss: [0.2639], time: [110.8382ms]
Epoch: [ 10/ 10], step: [  309/  390], loss: [0.1762], avg loss: [0.2636], time: [109.5812ms]
Epoch: [ 10/ 10], step: [  310/  390], loss: [0.3126], avg l

Epoch: [ 10/ 10], step: [  389/  390], loss: [0.2334], avg loss: [0.2624], time: [110.2269ms]
Epoch: [ 10/ 10], step: [  390/  390], loss: [0.1966], avg loss: [0.2622], time: [829.2229ms]
Epoch time: 43320.503, per step time: 111.078
Epoch time: 43320.815, per step time: 111.079, avg loss: 0.262
************************************************************


## 模型验证

创建并加载验证数据集（`ds_eval`），加载由**训练**保存的CheckPoint文件，进行验证，查看模型质量，此步骤用时约30秒。

In [15]:
from mindspore.train.serialization import load_checkpoint, load_param_into_net


args.ckpt_path = f'./lstm-{cfg.num_epochs}_390.ckpt'
print("============== Starting Testing ==============")
ds_eval = lstm_create_dataset(args.preprocess_path, cfg.batch_size, training=False)
param_dict = load_checkpoint(args.ckpt_path)
load_param_into_net(network, param_dict)
if args.device_target == "CPU":
    acc = model.eval(ds_eval, dataset_sink_mode=False)
else:
    acc = model.eval(ds_eval)
print("============== {} ==============".format(acc))




### 训练结果评价

根据以上一段代码的输出可以看到，在经历了10轮epoch之后，使用验证的数据集，对文本的情感分析正确率在85%左右，达到一个基本满意的结果。

## 总结

以上便完成了MindSpore自然语言处理应用的体验，我们通过本次体验全面了解了如何使用MindSpore进行自然语言中处理情感分类问题，理解了如何通过定义和初始化基于LSTM的`SentimentNet`网络进行训练模型及验证正确率。