I have a Windows application I build with Intel C 18.0.2 that calls MKL extensively. I want to profile it with Vtune.
If I build it with the /Zi optimization i.e. debug info. needed for profiling it seems to get 5 times slower when run as stand alone from the command line. Vtune tells me that much time is spend in dgemm. Could it add a fixed overhead per dgemm call?
How come? Any suggestions?
Erling