Quantcast
Channel: Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
Viewing all articles
Browse latest Browse all 2652

Weird latency behavior - Multi models for multiple batch sizes

$
0
0

Hi all, hope you could help me on this.

I have tried to run mobilenetv1 with dynamic batch size but got the "RuntimeError: MKLDNNGraph::CreateGraph: such topology cannot be compiled for dynamic batch!" error. Properly due to the squeeze layer inside mobilenetv1 that changes the shape of the tensor.

So in the end, I've decided to create multiple models for each batch size and did some benchmarks. But I got weird latency/throughput behavior. 

My program is pretty simple

images = np.random.uniform(-1,1, size=[64, 3, 224, 224]).astype(np.float32)
for batch_size in range(1,16):
    #create model
    net = IENetwork(model=model_xml, weights=model_bin)
    net.batch_size = batch_size
    exec_net = plugin.load(network=net)

    input_blob = next(iter(net.inputs))

    #run inference
    batch = images[np.arange(batch_size)]
    res = exec_net.infer(inputs={input_blob: batch})

However, after 4 iterations, openvino started using only 1 single CPU core instead of all my CPU cores (I'm using Intel(R) Xeon(R) Gold 6140). 

Batch_size: 1, Throughput: 643.86 imgs/s
Batch_size: 2, Throughput: 924.83 imgs/s
Batch_size: 3, Throughput: 1064.74 imgs/s
Batch_size: 4, Throughput: 1245.72 imgs/s
Batch_size: 5, Throughput: 168.25 imgs/s
Batch_size: 6, Throughput: 168.66 imgs/s

Do you have any suggestions to fix this problem?

Thank you

 


Viewing all articles
Browse latest Browse all 2652

Trending Articles