same response time in both 1 core(p2.xlarge) and 8 core(p2.8xlarge) AWS gpu
Created by: nayanhalder
I am getting same response time in both 1 core and 8 core gpu.
i am using CUDA10.1 and cudnn7.6
for 8 core gpu. before running python program, i am running the following command in ubuntu
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
am i missing any parameter to set.
please help.
i am running with following settings:
parser = argparse.ArgumentParser(description=__doc__)
add_arg = functools.partial(add_arguments, argparser=parser)
# yapf: disable
add_arg('num_samples', int, 1, "# of samples to infer.")
add_arg('beam_size', int, 500, "Beam search width.")
add_arg('num_proc_bsearch', int, 8, "# of CPUs for beam search.")
add_arg('num_conv_layers', int, 2, "# of convolution layers.")
add_arg('num_rnn_layers', int, 3, "# of recurrent layers.")
add_arg('rnn_layer_size', int, 1024, "# of recurrent cells per layer.")
add_arg('alpha', float, 2.5, "Coef of LM for beam search.")
add_arg('beta', float, 0.3, "Coef of WC for beam search.")
add_arg('cutoff_prob', float, 1.0, "Cutoff probability for pruning.")
add_arg('cutoff_top_n', int, 40, "Cutoff number for pruning.")
add_arg('use_gru', bool, True, "Use GRUs instead of simple RNNs.")
add_arg('use_gpu', bool, True, "Use GPU or not.")
add_arg('share_rnn_weights',bool, False, "Share input-hidden weights across "
"bi-directional RNNs. Not for GRU.")