v2和fluid 1.0训练速度差别问题
Created by: shiyazhou121
我在集群上分别用V2和fluid 1.0实现训练,V2版本的跑10个epoch需要10个小时左右,但fluid版本的已经跑了24个小时了,才跑到第7个epoch,想问下是我代码写的逻辑问题还是其他的,应该怎么优化? 以下是fluid版的部分代码: emb = fluid.layers.embedding( input=feature, size=[input_dim, emb_dim], is_sparse=True) conv_1 = fluid.nets.sequence_conv_pool( input=emb, num_filters=hid_dim, filter_size=1, act="tanh", pool_type="average") conv_2 = fluid.nets.sequence_conv_pool( input=emb, num_filters=hid_dim, filter_size=2, act="tanh", pool_type="average") conv_3 = fluid.nets.sequence_conv_pool( input=emb, num_filters=hid_dim, filter_size=3, act="tanh", pool_type="average") pred = fluid.layers.fc(input=[conv_1, conv_2, conv_3], size=class_dim, act="softmax") cost = fluid.layers.cross_entropy(input=pred, label=label) avg_cost = fluid.layers.mean(cost) acc = fluid.layers.accuracy(input=pred, label=label)
test_program = fluid.default_main_program().clone(for_test=True) adam_optimizer = fluid.optimizer.Adam( learning_rate=4e-3) adam_optimizer.minimize(avg_cost)
for pass_id in range(EPOCHS): # train in the epoch for batch_id, data in enumerate(train_dataReader()): #comp.reset() r_cost, r_acc, r_pred, r_label = exe.run( program=main_program, feed=feeder.feed(data), fetch_list = [avg_cost, acc, pred, label])