When running a MKL_DNN operation previously created using dnnConvolutionCreateForwardBias_F32() the XMM7-8 registers (and possibly more) are not preserved.
I have code similar to this (but of course much more elaborate) which does not work in relase mode:
void RunConvolution() { dnnPrimitive_t handle; dnnConvolutionCreateForwardBias_F32(&handle, --- other pars ---); void* resources[dnnResourceNumber]; --- fill in pointers to buffers, kernel and bias --- dnnExecute_F32(handle, resources); } void main() { Timer timer; for (o =0; o < 10; o++) { for (int i = 0; i < 10; i++) RunConvolution(); cout << timer.Elapsed() / 10.0 << endl; } }
I am using the Visual Studio 2017 compiler and I noted it uses XMM7 to store the constant 10.0 _over_ the entire loops which calls RunConvolution(). I tracked the contents of the register and noted that it is destroyed by dnnExecute_F32() but that's just a thin wrapper so the actual problem is probably in some assembly code in the function redirected to when the handle refers to a structure created by dnnConvolutionCreateForwardBias_F32().
According to Microsoft documentation XMM6 and up are to be preserved over function calls, so the numbers printed are bogus. I can't see any other explanation for this than that the calling convention has been breached by the MKL DNN code.
I am using MKL 2018.1 version, 64 bit, static linking with intel threading (not TBB).
I have not tested other operations except dnnConvolutionCreateForwardBias_F32.