Hi all, hope you could help me on this.
I have tried to run mobilenetv1 with dynamic batch size but got the "RuntimeError: MKLDNNGraph::CreateGraph: such topology cannot be compiled for dynamic batch!" error. Properly due to the squeeze layer inside mobilenetv1 that changes the shape of the tensor.
So in the end, I've decided to create multiple models for each batch size and did some benchmarks. But I got weird latency/throughput behavior.
My program is pretty simple
images = np.random.uniform(-1,1, size=[64, 3, 224, 224]).astype(np.float32) for batch_size in range(1,16): #create model net = IENetwork(model=model_xml, weights=model_bin) net.batch_size = batch_size exec_net = plugin.load(network=net) input_blob = next(iter(net.inputs)) #run inference batch = images[np.arange(batch_size)] res = exec_net.infer(inputs={input_blob: batch})
However, after 4 iterations, openvino started using only 1 single CPU core instead of all my CPU cores (I'm using Intel(R) Xeon(R) Gold 6140).
Batch_size: 1, Throughput: 643.86 imgs/s Batch_size: 2, Throughput: 924.83 imgs/s Batch_size: 3, Throughput: 1064.74 imgs/s Batch_size: 4, Throughput: 1245.72 imgs/s Batch_size: 5, Throughput: 168.25 imgs/s Batch_size: 6, Throughput: 168.66 imgs/s
Do you have any suggestions to fix this problem?
Thank you